Mahout: distributed item-based algorithm 1

最新推荐文章于 2024-11-21 14:51:14 发布

原创最新推荐文章于 2024-11-21 14:51:14 发布 · 110 阅读

0 ·

CC 4.0 BY-SA版权

Mahout 专栏收录该内容

25 篇文章

订阅专栏

本文介绍了一种使用共现矩阵和用户向量来生成个性化推荐的算法。通过计算物品之间的共现次数，构建共现矩阵，并将其与用户的偏好向量相乘，从而得到推荐结果。这种方法适用于大规模物品推荐场景。

co-occurrence matrix

Instead of computing the similarity between every pair of items, it’ll compute the number of times each pair of items occurs together in some user’s list of preferences, in order to fill out the matrix.

Co-occurrence is like similarity; the more two items turn up together, the more related or similar they probably are. The co-occurrence matrix plays a role like that of ItemSimilarity in the nondistributed item-based algorithm.

user vectors

Likewise, in a data model with n items, user preferences are like a vector over n dimensions, with one dimension for each item. The user’s preference values for items are the values in the vector. Items that the user expresses no preference for map to a 0 value in the vector. Such a vector is typically quite sparse, and mostly zeroes, because users typically express a preference for only a small subset of all items.

Producing the recommendations

The product of the co-occurrence matrix and a user vector is itself a vector whose dimension is equal to the number of items. The values in this resulting vector, R, lead us directly to recommendations: the highest values in R correspond to the best recommendations.

That third row contains co-occurrences between item 103 and all other items. Intuitively, if item 103 co-occurs with many items that user 3 expresses a preference for, then it’s probably something that user 3 would like.