本人博客开始迁移,博客整个架构自己搭建及编码http://www.cookqq.com/listBlog.action
关于皮尔逊积矩相关系数原理分析我写了一片文章,由于图片太多,上传很麻烦,我就放在了GITHUB,地址:https://github.com/tianbaoxing/hmahout/blob/master/doc/recommder/pearsonCorrelation-%E5%8E%9F%E7%90%86%E5%88%86%E6%9E%90.doc
GenericUserBasedRecommender推荐源码流程图:
GenericUserBasedRecommender中recommend推荐方法源码分析
@Override
public List<RecommendedItem> recommend(long userID, int howMany, IDRescorer rescorer) throws TasteException {
Preconditions.checkArgument(howMany >= 1, "howMany must be at least 1");
log.debug("Recommending items for user ID '{}'", userID);
long[] theNeighborhood = neighborhood.getUserNeighborhood(userID);
if (theNeighborhood.length == 0) {
return Collections.emptyList();
}
FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID);
TopItems.Estimator<Long> estimator = new Estimator(userID, theNeighborhood);
List<RecommendedItem> topItems = TopItems
.getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);
log.debug("Recommendations are: {}", topItems);
return topItems;
}
注释:(1)long[] theNeighborhood = neighborhood.getUserNeighborhood(userID) 根据用户ID查找到相应的邻居
(2) FastIDSet allItemIDs = getAllOtherItems(theNeighborhood, userID); 获取到用户邻居喜欢的主题(去掉自己喜欢的主题)
(3) List<RecommendedItem> topItems = TopItems.getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);
获取到评分最高的主题,推荐给用户.
现在看看那TopItems.getTopItems(howMany, allItemIDs.iterator(), rescorer, estimator);源码
public static List<RecommendedItem> getTopItems(int howMany,
LongPrimitiveIterator possibleItemIDs,
IDRescorer rescorer,
Estimator<Long> estimator) throws TasteException {
Preconditions.checkArgument(possibleItemIDs != null, "argument is null");
Preconditions.checkArgument(estimator != null, "argument is null");
Queue<RecommendedItem> topItems = new PriorityQueue<RecommendedItem>(howMany + 1,
Collections.reverseOrder(ByValueRecommendedItemComparator.getInstance()));
boolean full = false;
double lowestTopValue = Double.NEGATIVE_INFINITY;
while (possibleItemIDs.hasNext()) {
long itemID = possibleItemIDs.next();
if (rescorer == null || !rescorer.isFiltered(itemID)) {
double preference;
try {
preference = estimator.estimate(itemID);
} catch (NoSuchItemException nsie) {
continue;
}
double rescoredPref = rescorer == null ? preference : rescorer.rescore(itemID, preference);
if (!Double.isNaN(rescoredPref) && (!full || rescoredPref > lowestTopValue)) {
topItems.add(new GenericRecommendedItem(itemID, (float) rescoredPref));
if (full) {
topItems.poll();
} else if (topItems.size() > howMany) {
full = true;
topItems.poll();
}
lowestTopValue = topItems.peek().getValue();
}
}
}
int size = topItems.size();
if (size == 0) {
return Collections.emptyList();
}
List<RecommendedItem> result = Lists.newArrayListWithCapacity(size);
result.addAll(topItems);
Collections.sort(result, ByValueRecommendedItemComparator.getInstance());
return result;
}
主意:IDRescorer就是在这里调用的
(1) if (rescorer == null || !rescorer.isFiltered(itemID)) 这里主要是判断这个主题是否过滤掉.
(2)preference = estimator.estimate(itemID)计算这个用户喜欢这个主题的评分
(3)topItems.add(new GenericRecommendedItem(itemID, (float) rescoredPref)); 把主题增加到一个集合中.