想和数据挖掘沾点边,所以最近在复习一些算法,因为又学了点R,深感这是个统计分析挖掘的利器,所以想用R实现一些挖掘算法。
朴素贝叶斯法大概是最简单的一种挖掘算法了,《统计学习方法》在第四章做了很详细的叙述,无非是对于输入特征x,利用通过学习得到的模型计算后验概率分布,将后验概率最大的分类作为输出。
根据贝叶斯定理,后验概率P(Y=cx | X=x) = 条件概率P(X=x | Y=cx) * 先验概率P(Y = ck) / P(X=x),取P(X=x | Y=cx) * P(Y = ck)最大的分类作为输出。
下面是一个小数据集下使用R进行朴素贝叶斯分类的例子,参考自博客,代码如下:
#构造训练集
data <- matrix(c("sunny","hot","high","weak","no",
"sunny","hot","high","strong","no",
"overcast","hot","high","weak","yes",
"rain","mild","high","weak","yes",
"rain","cool","normal","weak","yes",
"rain","cool","normal","strong","no",
"overcast","cool","normal","strong","yes",
"sunny","mild","high","weak","no",