推荐系统之Mahout学习(六)

本文探讨了如何使用信息检索(IR)指标评估推荐系统,重点关注精确率(Precision)、召回率(Recall)和F1度量。通过Mahout库的IRStatistics API,解释了评估推荐系统所需的关键参数,如推荐个数(at)和相关性阈值(relevanceThreshold),并阐述了如何构建和应用推荐器。

基于信息检索(IR)指标来评估推荐系统优劣

使用IR指标来评估推荐系统,使用IR评估获得精确率(Precision)和召回率(Recall)和f-measure(精确率(Precision)和召回率(Recall)加权调和平均)。

精确率(Precision):
Precision

召回率(Recall):
Recall
f-measure计算公式:精确率(Precision)和召回率(Recall)加权调和平均.
f-measure计算公式
当β=1时就变为F1Measure:
F1Measure

import org.apache.mahout.cf.taste.eval.IRStatistics;
import org.apache.mahout.cf.taste.eval.RecommenderBuilder;
import org.apache.mahout.cf.taste.eval.RecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.eval.GenericRecommenderIRStatsEvaluator;
import org.apache.mahout.cf.taste.impl.model.file.FileDataModel;
import org.apache.mahout.cf.taste.impl.neighborhood.NearestNUserNeighborhood;
import org.apache.mahout.cf.taste.impl.recommender.GenericUserBasedRecommender;
import org.apache.mahout.cf.taste.impl.similarity.PearsonCorrelationSimilarity;
import org.apache.mahout.cf.taste.model.DataModel;
import org.apache.mahout.cf.taste.neighborhood.UserNeighborhood;
import org.apache.mahout.cf.taste.recommender.Recommender;
import org.apache.mahout.cf.taste.similarity.UserSimilarity;
import org.apache.mahout.common.RandomUtils;

import java.io.File;

public class IREvaluatorIntro {
    private IREvaluatorIntro(){
    }

    public static void main(String[] args) throws Exception{
//因为使用重复的随机数据来进行测试去看两个指标的结果,保证每次随机都一样,只用于测试,在开发生产中不使用
        RandomUtils.useTestSeed();

        final DataModel model = new FileDataModel(new File("/root/data/a.base"));
        RecommenderIRStatsEvaluator evaluator = new GenericRecommenderIRStatsEvaluator();

        RecommenderBuilder recommenderBuilder = new RecommenderBuilder() {
            @Override
            public Recommender buildRecommender(DataModel model) throws TasteException {
                UserSimilarity similarity = new PearsonCorrelationSimilarity(model);//皮尔森距离,反映两个变量线性相关程度的统计量
                UserNeighborhood neighborhood = new NearestNUserNeighborhood(100,similarity,model);//选择近邻算法。三个参数分别是: 邻居的个数,用户相似度,数据模型    
                return new GenericUserBasedRecommender(model,neighborhood,similarity);
            }
        };
        IRStatistics statis = evaluator.evaluate(recommenderBuilder,null,model,null,5,GenericRecommenderIRStatsEvaluator.CHOOSE_THRESHOLD,1.0);



        System.out.println(statis.getPrecision());//精确度
        System.out.println(statis.getRecall());//召回率
        System.out.println(statis.getF1Measure());//See F-measure.
    }
}

输出结果:
输出结果

在api文档中IRStatistics的evaluator说明:
IRStatistics evaluate(RecommenderBuilder recommenderBuilder,
DataModelBuilder dataModelBuilder,
DataModel dataModel,
IDRescorer rescorer,
int at,
double relevanceThreshold,
double evaluationPercentage)
throws TasteException
Parameters:
recommenderBuilder - object that can build a Recommender to test 通过public Recommender buildRecommender(DataModel model)定义推荐系统的创建方式。
dataModelBuilder - DataModelBuilder to use, or if null, a default DataModel implementation will be used 数据模型的创建方式,如果创建好了,参数一般为null
dataModel - dataset to test on 推荐系统数据模型
rescorer - if any, to use when computing recommendations 推荐排序方式,不需要排序一般为null
at - as in, “precision at 5”. The number of recommendations to consider when evaluating precision, etc.代表推荐对应个数商品精确度是多少,根据你要推荐多少个商品考虑。 它用来定义计算精确率
relevanceThreshold - items whose preference value is at least this value are considered “relevant” for the purposes of computations 和at一起使用一个相关项。
Returns:
IRStatistics with resulting precision, recall, etc.
Throws:
TasteException - if an error occurs while accessing the DataModel

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值