1,
wget
http://archive.ics.uci.edu/ml/databases/synthetic_control/synthetic_control.data
hadoop fs -mkdir testdata
hadoop fs -put synthetic_control.data testdata
hadoop fs -lsr testdata
hadoop fs -mkdir testdata
hadoop fs -put synthetic_control.data testdata
hadoop fs -lsr testdata
2,
hadoop集群来执行聚类算法
cd /usr/local/mahout
bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
如果执行成功,在hdfs的/user/dev/output里面应该可以看到输出结果
cd /usr/local/mahout
bin/mahout org.apache.mahout.clustering.syntheticcontrol.canopy.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.kmeans.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.fuzzykmeans.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.dirichlet.Job
bin/mahout org.apache.mahout.clustering.syntheticcontrol.meanshift.Job
如果执行成功,在hdfs的/user/dev/output里面应该可以看到输出结果
本文介绍如何通过wget下载UCI机器学习库中的synthetic_control数据集,并将其上传到Hadoop文件系统中。随后利用Apache Mahout在Hadoop集群上执行多种聚类算法,包括Canopy、K-means、Fuzzy K-means、Dirichlet和Mean Shift等。

被折叠的 条评论
为什么被折叠?



