Mahout
ylzhjlinux
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
Item-based recommendation
User-based: Who is similar to the boy, and what do they like?Item-based: What is similar to what the boy likes? The algorithm The difference between User-based and Item-based :Slope-one...原创 2014-04-30 11:33:14 · 250 阅读 · 0 评论 -
Mahout:Topic modeling using latent Dirichlet allocation (LDA)
IntroductionTo find these topics in a particular set of documents,We’d modify our clustering code to work with word vectors instead of the document vectors we’ve been using so far. A word vector i...原创 2014-06-12 14:46:25 · 339 阅读 · 0 评论 -
Mahout: Batch and online clustering
Online news clusteringCluster one million articles, as showed below, and save the cluster centroids for all clusters. Periodically, for each new article, use canopy clustering to assign it t...原创 2014-06-13 10:47:41 · 267 阅读 · 0 评论 -
Mahoout: CWSS
jcseghttp://www.oschina.net/p/jcseghttp://technology.chtsai.org/mmseg/ scwshttp://www.ftphp.com/scws/demo/v48.phphttp://www.ftphp.com/scws/docs.php#instscwshttp://www.350351.com/...原创 2014-06-13 14:39:16 · 332 阅读 · 0 评论 -
Mahout: Integerate jcseg with mahout seq2parse
Google global sites urlhttps://github.com/justjavac/Google-IPs JCSEGhttp://www.oschina.net/p/jcsegMMSEGhttp://technology.chtsai.org/mmseg/ //convert maven project to eclipse project...原创 2014-06-16 18:30:06 · 147 阅读 · 0 评论 -
Mahout: CVB
When run cvb, there is a errororg.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.io.IntWritableSolution:the new LDA requires SequenceFile<IntWritable, VectorWritable> as input...原创 2014-06-19 18:32:25 · 217 阅读 · 0 评论 -
Solr:Deploy solr to tomcat
Install tomcat7#sudo apt-get update#sudo apt-get install tomcat7#sudo apt-get install tomcat7-adminhttp://localhost:8080/#sudo vi /etc/tomcat7/tomcat-users.xml<tomcat-users> ...原创 2014-07-09 17:41:03 · 157 阅读 · 0 评论 -
MachineLearning: Introduction 1
Supervised learningis tasked with learning a function from labeled training data in order to predict the value of any valid input. Common examples of supervised learning include classifying e-mail...原创 2014-04-24 00:05:54 · 320 阅读 · 0 评论 -
Recommendtion System Introduction
collaborative filteringproducing recommendations based on, and only based on, knowledge of users’ rela-tionships to items. These techniques require no knowledge of the properties of the items thems...原创 2014-04-24 23:04:08 · 197 阅读 · 0 评论 -
Mahout: build 0.9 from source code and eclipse env setup
1.# svn co http://svn.apache.org/repos/asf/mahout/trunkor download mahout-distribution-0.9-src.tar.gz 2.mvn -DskipTests clean install package 3.create a common java project which using mahou...原创 2014-04-27 00:49:03 · 182 阅读 · 0 评论 -
Exploring the user-based recommender 1
recommending items to some user, denoted by u, as seen below It would be terribly slow to examine every item. In reality, a neighborhood of most similar users is computed first, and only items k...原创 2014-04-29 15:29:31 · 135 阅读 · 0 评论 -
Mahout: Dirichlet clustering
Dirichlet clustering starts with a data set of points and a ModelDistribution. Think of ModelDistribution as a class that generates different models. You create an empty model and try to assign point...原创 2014-06-12 14:08:43 · 137 阅读 · 0 评论 -
Mahout: Fuzzy k-means clustering
As the name says, the fuzzy k-means clustering algorithm does a fuzzy form of k-means clustering. Instead of the exclusive clustering in k-means, fuzzy k-means tries to generate overlapping clusters ...原创 2014-06-12 11:18:08 · 265 阅读 · 0 评论 -
Mahout: An overview of clustering techniques
Different kinds of clustering problemsEXCLUSIVE CLUSTERING In exclusive clustering, an item belongs exclusively to one cluster, not several.OVERLAPPING CLUSTERING What if we wanted to do non-e...原创 2014-06-12 10:57:40 · 229 阅读 · 0 评论 -
New and experimental recommenders
Singular value decomposition–based recommenders SVDRecommender Linear interpolation item–based recommendation KnnItemBasedRecommender Cluster-based recommendation TreeCluster...原创 2014-04-30 14:31:50 · 192 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 1
co-occurrence matrixInstead of computing the similarity between every pair of items, it’ll compute the number of times each pair of items occurs together in some user’s list of preferences, ...原创 2014-05-04 09:38:23 · 110 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 2
generating user vectorsInput formatuserID: itemID1 itemID2 itemID3 ....Output format a Vector from all item IDs for the user, and outputs the user ID mapped to the user’s preference ve...原创 2014-05-04 11:28:42 · 156 阅读 · 0 评论 -
Mahout: distributed item-based algorithm 3
Running recommendations with HadoopThe glue that binds together the various Mapper and Reducer components is org.apache.mahout.cf.taste.hadoop.item.RecommenderJob. It configures and invokes the se...原创 2014-05-04 13:55:35 · 154 阅读 · 0 评论 -
Mahout: build 0.9 support hadoop2.3.0
mvn clean package -Dhadoop2.version=2.3.0 -DskipTestsmvn clean package -Dhadoop.version=2.3.0 -DskipTestsmvn clean package -Dhadoop.profile=200 -DskipTestsThe above commands will not work...原创 2014-05-05 00:24:20 · 130 阅读 · 0 评论 -
Mahout: Introduction to clustering
Clustering a collection involves three things:An algorithmA notion of both similarity and dissimilarityA stopping condition Measuring the similarity of items The most important issue...原创 2014-05-05 12:01:47 · 161 阅读 · 0 评论 -
Mahout: qulity blogs
http://blog.youkuaiyun.com/zwan0518/article/details/9100329https://www.ibm.com/developerworks/library/j-mahout-scaling/http://mail-archives.apache.org/mod_mbox/mahout-user/201202.mbox/%3C1328197877...原创 2014-05-06 17:51:44 · 101 阅读 · 0 评论 -
Mahout: Run ItemBasedRecommemdation Job in eclipse
1.configure parameters by Run -> Run Configurations->Java Applications --> Arguments--input hdfs://192.168.122.1:2014/user/zhaohj/mahout/item.txt --output hdfs://192.168.122.1:2014/user/...原创 2014-05-14 16:12:08 · 144 阅读 · 0 评论 -
Mahout: Clustering - Representing data
Transforming data into vectorsIn Mahout, vectors are implemented as three different classesDenseVector can be thought of as an array of doubles, whose size is the numberof features in the data....原创 2014-06-11 11:02:56 · 154 阅读 · 0 评论 -
Mahout: K-means clustering
K-means AlgorithmThe k-means algorithm will start with an initial set of k centroid points. The algorithm does multiple rounds of processing and refines the centroid locations until the iteration ...原创 2014-06-11 16:06:14 · 120 阅读 · 0 评论 -
Exploring the user-based recommender 2( similarity metrics)
Sample Data1,101,5.01,102,3.01,103,2.52,101,2.02,102,2.52,103,5.02,104,2.03,101,2.53,104,4.03,105,4.53,107,5.04,101,5.04,103,3.04,104,4.54,106,4.05,101,4.05,102,3.0...原创 2014-04-29 17:25:01 · 202 阅读 · 0 评论
分享