今天我们来说明下如何构建贝叶斯的训练器,先附上代码,这篇续接上一篇贝叶斯分类(一)http://blog.youkuaiyun.com/xueyunf/article/details/9243481,依然先附上代码:
def trainNB0(trainMatrix,trainCategory):
numTrainDocs = len(trainMatrix)
numWords = len(trainMatrix[0])
pAbusive = sum(trainCategory)/float(numTrainDocs)
p0Num = zeros(numWords)
p1Num = zeros(numWords) #change to ones()
#print(p0Num,p1Num )
p0Denom = 0.0
p1Denom = 0.0 #change to 0.0
for i in range(numTrainDocs):
if trainCategory[i] == 1:
p1Num += trainMatrix[i]
p1Denom += sum(trainMatrix[i])
else:
p0Num += trainMatrix[i]
p0Denom += sum(trainMatrix[i])
p1Vect = p1Num/p1Denom #change to log()
p0Vect = p0Num/p0Denom #change to log()
return p0Vect,p1Vect,pAbusive
首先输入的是文本和词的对应矩阵,求出每个词在对应文本中的条件概率。然后返回概率矩阵。当然这个是非常简单的。不需要我做太多的解释。
下面来编写分类的代码:
def classifyNB(vec2Classify, p0Vec, p1Vec, pClass1):
p1 = sum(vec2Classify*p1Vec)+log(pClass1)
p0 = sum(vec2Classify*p0Vec)+log(1.0-pClass1)
if p1>p0:
return 1
else:
return 0
当然也非常简单,其实就是计算和每一类的匹配度,返回高匹配度的分类标签。