Collaborative Filtering Resources

最新推荐文章于 2025-07-16 11:44:09 发布

转载最新推荐文章于 2025-07-16 11:44:09 发布 · 2.7k 阅读

文章标签：

#resources #algorithm #dataset #user #visualization #associations

本文深入解析了协作过滤算法的核心概念、实现方式及在推荐系统中的广泛应用，包括研究软件、数据集、文献综述等关键内容。通过分析不同算法的优缺点，为读者提供了一站式的协作过滤算法学习指南。

Generally, collaborative filtering (CF) is any algorithm that filters information for a user based on a collection of user profiles. Users having similar profiles may share similar interests. For a user, information can be filtered in/out regarding to the behaviors of his or her similar users.

Users profiles can be collected either explicitly or implicitly. One can explicitly ask users to rate what they have used/purchased. Such a profile is filled explicitly by the users ratings. An implicit profile is based on passive observation and contains users historic interaction data.

The most common usage of CF is to make recommendation. That's why collaborative filtering is strongly correlated to recommender system in literature, although CF is only one of the methods for recommender system.

In this page, I collected some useful online materials for collaborative filtering research.

Content

Research Software

CoFE: a java based collaborative filtering engine. http://eecs.oregonstate.edu/iis/CoFE/
Suggest Top-N recommendation engine: it implements the item-based and user-based collaborative filtering algorithms. Only lib files, no source codes included. http://www-users.cs.umn.edu/~karypis/suggest/
C/Matlab Toolkit: a Matlab implementation of some collaborative filtering algorithms, including memory-user-based, personality diagnosis method (see Pennock et FL., 2000) etc. http://www-2.cs.cmu.edu/~lebanon/IR-lab.htm
Matlab code for Canny's factor analysis based collaborative filtering. www.cs.berkeley.edu/~jfc/'mender/.
Taste is a collaborative filtering engine for Java. http://taste.sourceforge.net/

Data Sets

Explicit Rating Data Sets:

Movielens Movie Rating Data Set. http://www.grouplens.org/
Jester Joke Rating Data Set. http://www.ieor.berkeley.edu/~goldberg/jester-data/
Book-Crossing Book Rating Data Set. http://www.informatik.uni-freiburg.de/~cziegler/BX/
Parliament Voting. http://ucdata.berkeley.edu:7101/new_web/VoteWorld/voteworld/datasets.html
Online Dating Data Set. http://www.ksi.ms.mff.cuni.cz/~petricek/data/ It contains user ratings from an online dating we site: libimseti.cz. Courtesy of Vaclav Petricek.

Implicit Rating Data Sets:

Audioscrobblers Music Play-list Data-sets.The Audioscrobbler dataset collects the play-lists of the users in a one-line community (http://www.audioscrobbler.com/) by using a plug-in in the users' media players such as Winamp, iTunes, XMMS etc. The plug-ins send the title and artist of every song users play to the Audioscrobbler server, which updates the user's musical profile with the new songs. In the database, the user's profile is recorded as a form of co-occurrence pair like {userID,itemID} pair. The pair means a user {userID} has played a/ song {itemID}. The dataset can be obtained at http://www.audioscrobbler.com/data/
AOL Web search query: http://www.gregsadetsky.com/aol-data/

Collaborative Filtering Bibliography

1. Pure Collaborative Filtering

Memory-based

Unifying User-based and Item-based Collaborative Filtering Approaches by Similarity Fusion (2006). Appear in SIGIR 2006. http://ict.ewi.tudelft.nl/pub/jun/sigir06_similarityfuson.pdf
Scalable collaborative filtering using cluster-based smoothing (2005). http://doi.acm.org/10.1145/1076034.1076056
An automatic weighting scheme for collaborative filtering (2004). http://doi.acm.org/10.1145/1008992.1009051
Item-based Collaborative Filtering Recommendation Algorithms (2001). http://www10.org/cdrom/papers/519/
Evaluation of Item-Based Top-N Recommendation Algorithms (2001). http://www-users.cs.umn.edu/~karypis/publications/Papers/PDF/itemrs.pdf
A regression-based approach for scaling-up personalized recommender systems in e-commerce (2000). http://nas.cl.uh.edu/boetticher/ML_DataMining/vucetic.pdf
Collaborative Filtering by Personality Diagnosis: A Hybrid Memory- and Model-Based Approach (1999). http://research.microsoft.com/~horvitz/cfpd.htm
An algorithmic framework for performing collaborative filtering (1999).
Empirical Analysis of Predictive Algorithms for Collaborative Filtering (1998). http://research.microsoft.com/research/pubs/view.aspx?tr_id=166
Grouplens: Applying Collaborative Filtering to Usenet News (1997). http://www.ics.uci.edu/~pratt/courses/papers/p77-konstan.pdf
Social Information Filtering: Algorithms for Automating "Word of Mouth" (1995). http://citeseer.ist.psu.edu/195430.html
Grouplens: an open architecture for collaborative filtering of netnews (1994). http://doi.acm.org/10.1145/192844.192905
Using collaborative filtering to weave an information tapestry (1992). http://citeseer.ist.psu.edu/context/1727112/0

Relevance Models

A User-Item Relevance Model for Log-based Collaborative Filtering (2006). http://ict.ewi.tudelft.nl/pub/jun/ecir06.pdf
Relevance Feedback Models for Recommendation (2006). http://acl.ldc.upenn.edu/W/W06/W06-1653.pdf

Latent Class Models

A study of Mixture Models for Collaborative Filtering (2006). http://www.cs.cmu.edu/~lsi/Paper_JIR_Si.pdf
Two-way latent grouping model for user preference prediction (2005). http://eprints.pascal-network.org/archive/00001005/01/uai05.pdf
The Multiple Multiplicative Factor Model For Collaborative Filtering (2004). http://www.machinelearning.org/proceedings/icml2004/papers/363.pdf
Collaborative filtering: a machine learning perspective (2004). http://citeseer.ist.psu.edu/marlin04collaborative.html
Flexible mixture model for collaborative filtering (2003). http://www.hpl.hp.com/conferences/icml2003/papers/183.pdf
Latent class models for collaborative filtering (1999). http://portal.acm.org/citation.cfm?id=687583

Matrix Factorization

Fast Maximum Margin Matrix Factorization for Collaborative Prediction (2005). http://people.csail.mit.edu/jrennie/papers/icml05-mmmf.pdf
Eigentaste: A constant time collaborative filtering algorithm (2001). (Using PCA) http://www.ieor.berkeley.edu/~goldberg/pubs/eigentaste.pdf
Application of Dimensionality Reduction in Recommender System -- A Case Study (2000). http://citeseer.ist.psu.edu/sarwar00application.html
Collaborative filtering with privacy via factor analysis (1999). (Using factor analysis) http://www.cs.berkeley.edu/~jfc/papers/02/SIGIR02.pdf
Learning collaborative information filters (1998). (using SVD) http://www.ics.uci.edu/~pazzani/Publications/MLC98.pdf

Clustering

A maximum entropy approach to collaborative filtering in dynamic, sparse, high dimensional domains (2002). http://research.yahoo.com/publication/OR-2003-007.pdf
Clustering Methods for Collaborative Filtering (1998). http://citeseer.ist.psu.edu/ungar98clustering.html
A Formal Statistical Approach to Collaborative Filtering (1998). http://citeseer.ist.psu.edu/387035.html
A Scalable Collaborative Filtering Framework based on Co-clustering (2005). http://hercules.ece.utexas.edu/~srujana/papers/icdm05.pdf
Model-based Overlapping Co-Clustering. http://www.siam.org/meetings/sdm06/workproceed/Text%20Mining/shafiei16.pdf

Transitive Associations

Applying associative retrieval techniques to alleviate the sparsity problem in collaborative filtering (2004). http://doi.acm.org/10.1145/963770.963775

Trust Inference

Improving Collaborative Filtering with Trust-based Metrics (2006). http://doi.acm.org/10.1145/1141277.1141717
Alleviating the Sparsity Problem of Collaborative Filtering Using Trust Inferences. http://www.ics.forth.gr/isl/publications/paperlink/LNCS_Formatted_iTrust_34770228.pdf

Perception-based

Online ranking/collaborative filtering using the perception algorithm (2003).

2. Combining Content-based and Collaborative Filtering

A Unified Recommendation Framework Based on Probabilistic Relational Models (2005). http://www.stern.nyu.edu/ciio/WorkOnline/IS20042005/0217-01.pdf
Unifying Collaborative and Content-Based Filtering (2004). http://www.cs.brown.edu/people/th/publications.html
Collaborative Ensemble Learning: Combining Collaborative and Content-Based Information Filtering via Hierarchical Bayes (2003). http://www.dbs.informatik.uni-muenchen.de/~yu_k
Content-Boosted Collaborative Filtering (2001). http://citeseer.ist.psu.edu/507656.html

3. Distributed Collaborative Filtering

Personalization of a peer-to-peer television system (2006). http://ict.ewi.tudelft.nl/pub/jun/euroitv06.pdf
Distributed Collaborative Filtering for Peer-to-Peer File Sharing Systems (2006). http://ict.ewi.tudelft.nl/pub/jun/sac06.pdf
Pocketlens: Toward a Personal Recommender System (2004). http://doi.acm.org/10.1145/1010614.1010618

4. Other issues

Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems (2006). http://www.grouplens.org/papers/pdf/mcnee-chi06-acc.pdf
A collaborative filtering algorithm and evaluation metric that accurately model the user experience (2004). http://doi.acm.org/10.1145/1008992.1009050
Evaluating collaborative filtering recommender systems (2004). http://doi.acm.org/10.1145/963770.963772

Related Information Retrieval Papers

In general, collaborative filtering is formulated as a self-contained problem, apart from classic approaches for text retrieval, e.g. RSJ models and language models. However, the collaborative filtering problem can be treated as a prediction problem - a prediction of the relevance between user and item (see user-item relevance models). Under this veiw, the instant benefits are gained from the current advances in these text retrieval models. We found the following papers are pretty interesting and are related to the collaborative filtering problem.

Query Chains: Learning to Rank from Implicit Feedback (2005). http://www.cs.cornell.edu/%7Efilip/papers/Radlinski05QueryChains.pdf
On Event Spaces and Probabilistic Models in Information Retrieval (2005).
Probabilistic relevance models based on document and query generation (2003).
Novelty and redundancy detection in adaptive filtering (2002). http://doi.acm.org/10.1145/564376.564393
Term-specific smoothing for the language modeling approach to information retrieval: the importance of a query term (2002).
Exact Maximum Likelihood Estimation for Word Mixtures (2002). http://www-2.cs.cmu.edu/~yiz/research/paper/icml2002.ps
A Study of Smoothing Methods for Language Models Applied to Ad Hoc Information Retrieval (2001).
Document language models, query models, and risk minimization for information retrieval (2001).
Information Retrieval as Statistical Translation (1999). http://www.informedia.cs.cmu.edu/documents/irast-final.pdf
Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval (1998).
Relevance weighting of search terms (1976).

Related Machine Learning Papers

On Combining Classifiers (1998). http://ieeexplore.ieee.org/iel4/34/14695/00667881.pdf
On the Choice of Smoothing Parameters for Parzen Estimators of Probability Density Functions (1976).
Spectral clustering for multi-type relational data (2006). http://portal.acm.org/citation.cfm?id=1143918
Hierarchical Bayesian Models for Applications in Information Retrieval (2003). http://www.cs.berkeley.edu/~jordan/papers/jordan-valencia.pdf
A Hierarchical Latent Variable Model for Data Visualization (1998). http://citeseer.ist.psu.edu/bishop98hierarchical.html
Combining Labeled and Unlabeled Data with Co-Training (1998). http://citeseer.ist.psu.edu/47625.html
Enhancing Supervised Learning with Unlabeled Data (2000). http://citeseer.ist.psu.edu/goldman00enhancing.html