1 贝叶斯训练器
所在包:Package org.apache.mahout.classifier.bayes
实现机制
The implementation is divided up into three parts:
-
The Trainer -- responsible for doing the counting of the words and the labels
-
The Model -- responsible for holding the training data in a useful way
-
The Classifier -- responsible for using the trainers output to determine the category of previously unseen documents
1 训练器
The trainer is manifested in several classes:
-
创建Hadoop贝叶斯作业,输出模型,这个类封装了4个map/reduce类。
训练器的输入是KeyValueTextInputFormat 格式,第一个字符时类标签,剩余的是特征(单词),如下面的格式:
hockey puck stick goalie forward defenseman referee ice checking slapshot helmet
football field football pigskin referee helmet turf tackle
hockey 和football 是类标签,剩下的是特征。
本文介绍Apache Mahout中贝叶斯分类器的实现机制,包括训练器、模型及分类器三部分。训练器负责计数词频和标签,模型保存训练数据,分类器则利用这些数据对未知文档进行分类。
775

被折叠的 条评论
为什么被折叠?



