基本梳理
- 朴素贝叶斯法的学习与分类
- 基本方法
- 训练集T={(x1,y1),(x2,y2),⋯ ,(xN,yN)}T = \left\{ \left( x _ { 1 } , y _ { 1 } \right) , \left( x _ { 2 } , y _ { 2 } \right) , \cdots , \left( x _ { N } , y _ { N } \right) \right\}T={(x1,y1),(x2,y2),⋯,(xN,yN)}
- 由X和Y的联合概率分布P(X,Y)独立同分布产生
- 朴素贝爷斯通过训练集学习联合概率分布P(X,Y)
- 先验概率分布P(Y=ck),k=1,2,⋯ ,KP \left( Y = c _ { k } \right) , \quad k = 1,2 , \cdots , KP(Y=ck),k=1,2,⋯,K
- 条件概率分布P(X=x∣Y=ck)=P(X(1)=x(1),⋯ ,X(n)=x(n)∣Y=ck),k=1,2,⋯ ,KP ( X = x | Y = c _ { k } ) = P \left( X ^ { ( 1 ) } = x ^ { ( 1 ) } , \cdots , X ^ { ( n ) } = x ^ { ( n ) } | Y = c _ { k } \right) , \quad k = 1,2 , \cdots , KP(X=x∣Y=ck)=P(X(1)=x(1),⋯,X(n)=x(n)∣Y=ck),k=1,2,⋯,K
- 条件独立性假设P(X=x∣Y=ck)=P(X(1)=x(1),⋯ ,X(n)=x(n)∣Y=ck)=∏j=1nP(X(j)=x(j)∣Y=ck)\begin{aligned} P ( X = x | Y = c _ { k } ) & = P \left( X ^ { ( 1 ) } = x ^ { ( 1 ) } , \cdots , X ^ { ( n ) } = x ^ { ( n ) } | Y = c _ { k } \right) \\ & = \prod _ { j = 1 } ^ { n } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right) \end{aligned}P(X=x∣Y=ck)=P(X(1)=x(1),⋯,X(n)=x(n)∣Y=ck)=j=1∏nP(X(j)=x(j)∣Y=ck)
- 贝叶斯定理P(Y=ck∣X=x)=P(X=x∣Y=ck)P(Y=ck)∑kP(X=x∣Y=ck)P(Y=ck)P \left( Y = c _ { k } | X = x \right) = \frac { P ( X = x | Y = c _ { k } ) P \left( Y = c _ { k } \right) } { \sum _ { k } P ( X = x | Y = c _ { k } ) P \left( Y = c _ { k } \right) }P(Y=ck∣X=x)=∑kP(X=x∣Y=ck)P(Y=ck)P(X=x∣Y=ck)P(Y=ck)
- 代入上式P(Y=ck∣X=x)=P(Y=ck)∏jP(X(j)=x(j)∣Y=ck)∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P \left( Y = c _ { k } | X = x \right) = \frac { P \left( Y = c _ { k } \right) \prod _ { j } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right) } { \sum _ { k } P \left( Y = c _ { k } \right) \prod _ { j } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right) }P(Y=ck∣X=x)=∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x(j)∣Y=ck)
- 贝叶斯分类器y=f(x)=argmaxckP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)y = f ( x ) = \arg \max _ { c _ { k } } \frac { P \left( Y = c _ { k } \right) \prod _ { j } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right) } { \sum _ { k } P \left( Y = c _ { k } \right) \prod _ { j } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right) }y=f(x)=argmaxck∑kP(Y=ck)∏jP(X(j)=x(j)∣Y=ck)P(Y=ck)∏jP(X(j)=x(j)∣Y=ck)
- 分母对所有ckc_kck都相同y=argmaxaqP(Y=ck)∏jP(X(U)=x(j)∣Y=ck)y = \arg \max _ { a \atop q } P \left( Y = c _ { k } \right) \prod _ { j } P \left( X ^ { ( U ) } = x ^ { ( j ) } | Y = c _ { k } \right)y=argmaxqaP(Y=ck)∏jP(X(U)=x(j)∣Y=ck)
- 基本方法
- 朴素贝叶斯法的参数估计
- 应用极大似然估计法估计相应的概率
- 先验概率P(Y=ck)\mathrm { P } \left( \mathrm { Y } = \mathrm { c } _ { \mathrm { k } } \right)P(Y=ck)的极大似然估计是P(Y=ck)=∑i=1NI(yi=ck)N,k=1,2,⋯ ,KP \left( Y = c _ { k } \right) = \frac { \sum _ { i = 1 } ^ { N } I \left( y _ { i } = c _ { k } \right) } { N } , k = 1,2 , \cdots , KP(Y=ck)=N∑i=1NI(yi=ck),k=1,2,⋯,K
- 设第j个特征X(j)\mathbf { X } ^ { ( j ) }X(j)可能取值的集合为:{aj1,aj2,⋯ ,ajsj}\left\{ a _ { j 1 } , a _ { j 2 } , \cdots , a _ { j s _ { j } } \right\}{aj1,aj2,⋯,ajsj}
- 条件概率的极大似然估计:P(X(j)=ajl∣Y=ck)=∑i=1NI(xi(j)=ajlyi=ck)∑i=1NI(yi=ck)P \left( X ^ { ( j ) } = a _ { j l } | Y = c _ { k } \right) = \frac { \sum _ { i = 1 } ^ { N } I \left( x _ { i } ^ { ( j ) } = a _ { j l } y _ { i } = c _ { k } \right) } { \sum _ { i = 1 } ^ { N } I \left( y _ { i } = c _ { k } \right) }P(X(j)=ajl∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=ajlyi=ck)
- 步骤
- 计算先验概率和条件概率
- P(Y=ck)=∑i=1NI(yi=ck)N,k=1,2,⋯ ,KP \left( Y = c _ { k } \right) = \frac { \sum _ { i = 1 } ^ { N } I \left( y _ { i } = c _ { k } \right) } { N } , \quad k = 1,2 , \cdots , KP(Y=ck)=N∑i=1NI(yi=ck),k=1,2,⋯,K
- P(X(j)=an∣Y=ck)=∑i=1NI(xi(j)=aj,yi=ck)∑i=1NI(yi=ck)P \left( X ^ { ( j ) } = a _ { n } | Y = c _ { k } \right) = \frac { \sum _ { i = 1 } ^ { N } I \left( x _ { i } ^ { ( j ) } = a _ { j } , y _ { i } = c _ { k } \right) } { \sum _ { i = 1 } ^ { N } I \left( y _ { i } = c _ { k } \right) }P(X(j)=an∣Y=ck)=∑i=1NI(yi=ck)∑i=1NI(xi(j)=aj,yi=ck)
- j=1,2,⋯ ,n;l=1,2,⋯ ,Sj;k=1,2,⋯ ,Kj = 1,2 , \cdots , n ; \quad l = 1,2 , \cdots , S _ { j } ; \quad k = 1,2 , \cdots , Kj=1,2,⋯,n;l=1,2,⋯,Sj;k=1,2,⋯,K
- 对于给定的实例x=(x(1),x(2),⋯ ,x(n))Tx = \left( x ^ { ( 1 ) } , x ^ { ( 2 ) } , \cdots , x ^ { ( n ) } \right) ^ { T }x=(x(1),x(2),⋯,x(n))T
- 计算P(Y=ck)∏j=1nP(X(′)=x(j)∣Y=ck),k=1,2,⋯ ,KP \left( Y = c _ { k } \right) \prod _ { j = 1 } ^ { n } P \left( X ^ { ( \prime ) } = x ^ { ( j ) } | Y = c _ { k } \right) , \quad k = 1,2 , \cdots , KP(Y=ck)∏j=1nP(X(′)=x(j)∣Y=ck),k=1,2,⋯,K
- 确定x的类别
- y=argmaxckP(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)y = \arg \max _ { c _ { k } } P \left( Y = c _ { k } \right) \prod _ { j = 1 } ^ { n } P \left( X ^ { ( j ) } = x ^ { ( j ) } | Y = c _ { k } \right)y=argmaxckP(Y=ck)∏j=1nP(X(j)=x(j)∣Y=ck)
- 计算先验概率和条件概率
代码小练习
基于贝叶斯定理与特征条件独立假设的分类方法。
模型:
- 高斯模型
- 多项式模型
- 伯努利模型
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from collections import Counter
import math
创建数据
# data
def create_data():
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df['label'] = iris.target
df.columns = ['sepal length', 'sepal width', 'petal length', 'petal width', 'label']
data = np.array(df.iloc[:100, :])
# print(data)
return data[:,:-1], data[:,-1]
X, y = create_data()
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
创建模型
GaussianNB 高斯朴素贝叶斯
特征的可能性被假设为高斯
概率密度函数:
P(xi∣yk)=12πσyk2exp(−(xi−μyk)22σyk2)P(x_i | y_k)=\frac{1}{\sqrt{2\pi\sigma^2_{yk}}}exp(-\frac{(x_i-\mu_{yk})^2}{2\sigma^2_{yk}})P(xi∣yk)=2πσyk21exp(−2σyk2(xi−μyk)2)
数学期望(mean):μ\muμ,方差:σ2=∑(X−μ)2N\sigma^2=\frac{\sum(X-\mu)^2}{N}σ2=N∑(X−μ)2
class NaiveBayes:
def __init__(self):
self.model = None
# 数学期望
@staticmethod
def mean(X):
return sum(X) / float(len(X))
# 标准差
def stdev(self,X):
avg = self.mean(X)
return math.sqrt(sum([pow(x-avg,2) for x in X])/float(len(X)))
# 处理X_train
def summarize(self,train_data):
summarizes = [(self.mean(i),self.stdev(i)) for i in zip(*train_data)]
return summarizes
# 概率密度函数
def gaussian_probability(self,x,mean,stdev):
exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2))))
return (1/(math.sqrt(2*math.pi)*stdev))*exponent
# 分类别求出数学期望和标准差
def fit(self, X, y):
labels = list(set(y))
data = {label:[] for label in labels}
for f, label in zip(X, y):
data[label].append(f)
# print("分类",data)
self.model = {label : self.summarize(value) for label,value in data.items()}
# print("分类后期望和方差",self.model)
return "gaussianNB train done!"
# 计算概率
def calculate_probabilities(self,input_data):
probabilities = {}
for label, value in self.model.items():
probabilities[label] = 1
for i in range(len(value)):
mean, stdev = value[i]
probabilities[label] *= self.gaussian_probability(input_data[i],mean,stdev)
return probabilities
# 类别
def predict(self,X_test):
label = sorted(self.calculate_probabilities(X_test).items(),key = lambda x : x[-1])[-1][0]
return label
def score(self,X_test,y_test):
right = 0
for X,y in zip(X_test,y_test):
label = self.predict(X)
if label == y:
right += 1
return right/float(len(X_test))
model = NaiveBayes()
model.fit(X_train, y_train)
'gaussianNB train done!'
print(model.predict([4.4, 3.2, 1.3, 0.2]))
0.0
model.score(X_test, y_test)
1.0
scikit-learn实例
scikit-learn实例
from sklearn.naive_bayes import GaussianNB
clf = GaussianNB()
clf.fit(X_train, y_train)
GaussianNB(priors=None)
clf.score(X_test, y_test)
1.0
clf.predict([[4.4, 3.2, 1.3, 0.2]])
array([ 0.])
from sklearn.naive_bayes import BernoulliNB, MultinomialNB# 伯努利模型和多项式模型
clf2 = BernoulliNB()
clf2.fit(X_train,y_train)
BernoulliNB(alpha=1.0, binarize=0.0, class_prior=None, fit_prior=True)
clf2.score(X_test, y_test)
0.46666666666666667
clf.predict([[4.4, 3.2, 1.3, 0.2]])
array([ 0.])
clf3 = MultinomialNB()
clf3.fit(X_train,y_train)
MultinomialNB(alpha=1.0, class_prior=None, fit_prior=True)
clf3.score(X_test, y_test)
1.0
clf3.predict([[4.4, 3.2, 1.3, 0.2]])
array([ 0.])