推荐系统之Mahout学习（六）

LL读书人

于 2019-01-10 17:54:38 发布

阅读量590

点赞数 1

CC 4.0 BY-SA版权

分类专栏： Recommendation 文章标签： IR Mahout hadoop 推荐系统评估 Precision Recall

本文链接：https://blog.youkuaiyun.com/jianggeye7485/article/details/86244920

Recommendation 专栏收录该内容

6 篇文章

订阅专栏

本文探讨了如何使用信息检索（IR）指标评估推荐系统，重点关注精确率(Precision)、召回率(Recall)和F1度量。通过Mahout库的IRStatistics API，解释了评估推荐系统所需的关键参数，如推荐个数(at)和相关性阈值(relevanceThreshold)，并阐述了如何构建和应用推荐器。

基于信息检索（IR）指标来评估推荐系统优劣

使用IR指标来评估推荐系统，使用IR评估获得精确率(Precision)和召回率(Recall)和f-measure（精确率(Precision)和召回率(Recall)加权调和平均）。

精确率(Precision)：

召回率(Recall)：
Recall
f-measure计算公式:精确率(Precision)和召回率(Recall)加权调和平均.

当β=1时就变为F1Measure：

import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class IREvaluatorIntro {
    private IREvaluatorIntro(){
    }

    public static void main(String[] args) throws Exception{
//因为使用重复的随机数据来进行测试去看两个指标的结果，保证每次随机都一样，只用于测试，在开发生产中不使用
        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/a.base"));
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);//皮尔森距离，反映两个变量线性相关程度的统计量
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100,similarity,model);//选择近邻算法。三个参数分别是： 邻居的个数，用户相似度，数据模型    
                return new GenericUserBasedRecommender(model,neighborhood,similarity);
            }
        };
        IRStatistics statis = evaluator.evaluate(recommenderBuilder,null,model,null,5,GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,1.0);



        System.out.println(statis.getPrecision());//精确度
        System.out.println(statis.getRecall());//召回率
        System.out.println(statis.getF1Measure());//See F-measure.
    }
}

输出结果：

在api文档中IRStatistics的evaluator说明：
IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage)
throws TasteException
Parameters:
recommenderBuilder - object that can build a Recommender to test 通过public Recommender buildRecommender(DataModel model)定义推荐系统的创建方式。
dataModelBuilder - DataModelBuilder to use, or if null, a default DataModel implementation will be used 数据模型的创建方式，如果创建好了，参数一般为null
dataModel - dataset to test on 推荐系统数据模型
rescorer - if any, to use when computing recommendations 推荐排序方式，不需要排序一般为null
at - as in, “precision at 5”. The number of recommendations to consider when evaluating precision, etc.代表推荐对应个数商品精确度是多少，根据你要推荐多少个商品考虑。它用来定义计算精确率
relevanceThreshold - items whose preference value is at least this value are considered “relevant” for the purposes of computations 和at一起使用一个相关项。
Returns:
IRStatistics with resulting precision, recall, etc.
Throws:
TasteException - if an error occurs while accessing the DataModel