按照《mahout实战》中的布尔型数据的生成与评估代码如下:
public static void booleanPrefEvaluator() throws IOException, TasteException
{
DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
RecommenderEvaluator evaluator = new AverageAbsoluteDifferenceRecommenderEvaluator();
RecommenderBuilder builder = new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel arg0)
throws TasteException {
// TODO Auto-generated method stub
// UserSimilarity similarity = new PearsonCorrelationSimilarity(model);
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
}
};
DataModelBuilder modelBuilder = new DataModelBuilder(){
@Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingData) {
// TODO Auto-generated method stub
//所需参数是FastByIDMap(FastIDSet)
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingData));
}
};
double score = evaluator.evaluate(builder, modelBuilder, model, 0.9, 1.0);
System.out.println("平均差值:" + score);
}
对于布尔型中的偏好值,并不是没有,而是全部为一个假的偏好值1.0
所以对于Pearson相关系数来说,会出现以下错误:
Exception in thread "main" java.lang.IllegalArgumentException: DataModel doesn't have preference values
at com.google.common.base.Preconditions.checkArgument(Preconditions.java:125)
at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:74)
at org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity.<init>(PearsonCorrelationSimilarity.java:66)
at test.mahout.recommendation.BooleanPreferenceRecommender$1.buildRecommender(BooleanPreferenceRecommender.java:43)
at org.apache.mahout.cf.taste.impl.eval.AbstractDifferenceRecommenderEvaluator.evaluate(AbstractDifferenceRecommenderEvaluator.java:125)
at test.mahout.recommendation.BooleanPreferenceRecommender.booleanPrefEvaluator(BooleanPreferenceRecommender.java:60)
at test.mahout.recommendation.BooleanPreferenceRecommender.main(BooleanPreferenceRecommender.java:138)
换成了LogLikelihoodSimilarity,结果为0.0
但是这个结果是无效的。
但是可以利用准确率和召回率来评价,代码如下:
public static void preRecallBooleanPrefEvaluator() throws TasteException, IOException
{
//用DataModel作为参数的GenericBooleanPrefDataModel构造函数在0.9版已经被弃用,可以用下面的构造函数替代
DataModel model_old = new GenericBooleanPrefDataModel(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base")));
DataModel model = new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"))));
// 要想使用布尔型的 一定要在model就获取布尔型的,不能还用如下的构造函数获取model,否则结果不正确
// DataModel model = new FileDataModel(new File("F:\\mahout\\grouplens\\ml-100k\\ua.base"));
RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();
RecommenderBuilder recommenderBuilder = new RecommenderBuilder(){
@Override
public Recommender buildRecommender(DataModel model)
throws TasteException {
// TODO Auto-generated method stub
UserSimilarity similarity = new LogLikelihoodSimilarity(model);
UserNeighborhood neighborhood = new NearestNUserNeighborhood(10, similarity, model);
return new GenericUserBasedRecommender(model, neighborhood, similarity);
// return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
}
};
DataModelBuilder modelBuilder = new DataModelBuilder(){
@Override
public DataModel buildDataModel(FastByIDMap<PreferenceArray> trainingdata) {
// TODO Auto-generated method stub
return new GenericBooleanPrefDataModel(GenericBooleanPrefDataModel.toDataMap(trainingdata));
}
};
IRStatistics stats = evaluator.evaluate(recommenderBuilder, modelBuilder, model_old, null, 10, GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD, 1.0);
System.out.println("准确率:" + stats.getPrecision());
System.out.println("召回率" + stats.getRecall());
}
其中第一行的model_old,书中使用的是该构造函数,经运行结果为:
准确率:0.24496288441145259
召回率0.24496288441145259
但是由于该构造函数在mahout-0.9中已经被弃用,所以使用第二行的构造函数代替,注意一定要使用布尔型的获取model
结果同上。
代码中return new GenericUserBasedRecommender(model, neighborhood, similarity);
由此可见,这个推荐程序扔基于估计的偏好进行排序,但是估计的偏好都为1.0 所以排序是随机的。故而将其更改为
return new GenericBooleanPrefUserBasedRecommender(model, neighborhood, similarity);
结果为:
准确率:0.22926829268292725
召回率0.22926829268292725
可见结果并没有变得更好,这个例子旨在审视如何在mahout中高效部署布尔型数据。