clustring testing -- robin canopy

本博客详细记录了使用Mahout算法处理大数据过程,包括参数解析、数据转换、并行作业执行、集群构建及输出文件转换。重点介绍了Mahout在距离度量、输入输出配置、作业运行情况及性能指标的展示。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

15/01/12 17:12:29 INFO canopy.Canopy: parsing the arguments
15/01/12 17:12:30 INFO common.AbstractJob: Command line arguments: {--distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/DolphinData/UserTest/synthetic/synthetic_input], --method=[mapreduce], --startPhase=[0], --t1=[0.01], --t2=[0.03], --tempDir=[temp]}
15/01/12 17:12:30 INFO canopy.Canopy: Converting plain text format to sequence format ...
15/01/12 17:12:40 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/01/12 17:12:44 INFO input.FileInputFormat: Total input paths to process : 1
15/01/12 17:12:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/01/12 17:12:44 WARN snappy.LoadSnappy: Snappy native library not loaded
15/01/12 17:12:44 INFO mapred.JobClient: Running job: job_201411071554_0248
15/01/12 17:12:45 INFO mapred.JobClient:  map 0% reduce 0%
15/01/12 17:12:58 INFO mapred.JobClient:  map 100% reduce 0%
15/01/12 17:13:00 INFO mapred.JobClient: Job complete: job_201411071554_0248
15/01/12 17:13:00 INFO mapred.JobClient: Counters: 19
15/01/12 17:13:00 INFO mapred.JobClient:   Job Counters
15/01/12 17:13:00 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=10295
15/01/12 17:13:00 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/01/12 17:13:00 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/01/12 17:13:00 INFO mapred.JobClient:     Launched map tasks=1
15/01/12 17:13:00 INFO mapred.JobClient:     Data-local map tasks=1
15/01/12 17:13:00 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
15/01/12 17:13:00 INFO mapred.JobClient:   File Output Format Counters
15/01/12 17:13:00 INFO mapred.JobClient:     Bytes Written=335470
15/01/12 17:13:00 INFO mapred.JobClient:   FileSystemCounters
15/01/12 17:13:00 INFO mapred.JobClient:     HDFS_BYTES_READ=288533
15/01/12 17:13:00 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=63342
15/01/12 17:13:00 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=335470
15/01/12 17:13:00 INFO mapred.JobClient:   File Input Format Counters
15/01/12 17:13:00 INFO mapred.JobClient:     Bytes Read=288374
15/01/12 17:13:00 INFO mapred.JobClient:   Map-Reduce Framework
15/01/12 17:13:00 INFO mapred.JobClient:     Map input records=600
15/01/12 17:13:00 INFO mapred.JobClient:     Physical memory (bytes) snapshot=74567680
15/01/12 17:13:00 INFO mapred.JobClient:     Spilled Records=0
15/01/12 17:13:00 INFO mapred.JobClient:     CPU time spent (ms)=1520
15/01/12 17:13:00 INFO mapred.JobClient:     Total committed heap usage (bytes)=62128128
15/01/12 17:13:00 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=1096867840
15/01/12 17:13:00 INFO mapred.JobClient:     Map output records=600
15/01/12 17:13:00 INFO mapred.JobClient:     SPLIT_RAW_BYTES=159
15/01/12 17:13:00 INFO canopy.Canopy: Run Canopy algorithm ...
15/01/12 17:13:00 INFO canopy.CanopyDriver: Build Clusters Input: /DolphinData/UserTest/synthetic/synthetic_seq_89701f55-b1f1-408f-9bbb-166f855d1589 Out: /DolphinData/UserTest/synthetic/synthetic_output_fdeb35fa-ede9-4d73-a455-ece92f133cd4 Measure: org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure@25e5d007 t1: 0.01 t2: 0.03
15/01/12 17:13:00 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
15/01/12 17:13:04 INFO input.FileInputFormat: Total input paths to process : 1
15/01/12 17:13:04 INFO mapred.JobClient: Running job: job_201411071554_0249
15/01/12 17:13:05 INFO mapred.JobClient:  map 0% reduce 0%
15/01/12 17:13:17 INFO mapred.JobClient:  map 100% reduce 0%
15/01/12 17:13:25 INFO mapred.JobClient:  map 100% reduce 33%
15/01/12 17:13:30 INFO mapred.JobClient:  map 100% reduce 100%
15/01/12 17:13:31 INFO mapred.JobClient: Job complete: job_201411071554_0249
15/01/12 17:13:31 INFO mapred.JobClient: Counters: 29
15/01/12 17:13:31 INFO mapred.JobClient:   Job Counters
15/01/12 17:13:31 INFO mapred.JobClient:     Launched reduce tasks=1
15/01/12 17:13:31 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=9036
15/01/12 17:13:31 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
15/01/12 17:13:31 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
15/01/12 17:13:31 INFO mapred.JobClient:     Launched map tasks=1
15/01/12 17:13:31 INFO mapred.JobClient:     Data-local map tasks=1
15/01/12 17:13:31 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=11277
15/01/12 17:13:31 INFO mapred.JobClient:   File Output Format Counters
15/01/12 17:13:31 INFO mapred.JobClient:     Bytes Written=426976
15/01/12 17:13:31 INFO mapred.JobClient:   FileSystemCounters
15/01/12 17:13:31 INFO mapred.JobClient:     FILE_BYTES_READ=333606
15/01/12 17:13:31 INFO mapred.JobClient:     HDFS_BYTES_READ=335655
15/01/12 17:13:31 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=798832
15/01/12 17:13:31 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=426976
15/01/12 17:13:31 INFO mapred.JobClient:   File Input Format Counters
15/01/12 17:13:31 INFO mapred.JobClient:     Bytes Read=335470
15/01/12 17:13:31 INFO mapred.JobClient:   Map-Reduce Framework
15/01/12 17:13:31 INFO mapred.JobClient:     Map output materialized bytes=333606
15/01/12 17:13:31 INFO mapred.JobClient:     Map input records=600
15/01/12 17:13:31 INFO mapred.JobClient:     Reduce shuffle bytes=333606
15/01/12 17:13:31 INFO mapred.JobClient:     Spilled Records=1200
15/01/12 17:13:31 INFO mapred.JobClient:     Map output bytes=331200
15/01/12 17:13:31 INFO mapred.JobClient:     CPU time spent (ms)=5660
15/01/12 17:13:31 INFO mapred.JobClient:     Total committed heap usage (bytes)=237043712
15/01/12 17:13:31 INFO mapred.JobClient:     Combine input records=0
15/01/12 17:13:31 INFO mapred.JobClient:     SPLIT_RAW_BYTES=185
15/01/12 17:13:31 INFO mapred.JobClient:     Reduce input records=600
15/01/12 17:13:31 INFO mapred.JobClient:     Reduce input groups=1
15/01/12 17:13:31 INFO mapred.JobClient:     Combine output records=0
15/01/12 17:13:31 INFO mapred.JobClient:     Physical memory (bytes) snapshot=264077312
15/01/12 17:13:31 INFO mapred.JobClient:     Reduce output records=600
15/01/12 17:13:31 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2198999040
15/01/12 17:13:31 INFO mapred.JobClient:     Map output records=600
15/01/12 17:13:31 INFO canopy.Canopy: Converting the output file from sequence format into plain text format
15/01/12 17:13:31 INFO conversion.SequenceClusters2PlainText: Dumping out clusters from clusters: /DolphinData/UserTest/synthetic/synthetic_output_fdeb35fa-ede9-4d73-a455-ece92f133cd4/clusters-*-final and clusteredPoints: /DolphinData/UserTest/synthetic/synthetic_output_fdeb35fa-ede9-4d73-a455-ece92f133cd4/clusteredPoints
15/01/12 17:13:31 INFO common.AbstractJob: Command line arguments: {--dictionaryType=[text], --distanceMeasure=[org.apache.mahout.common.distance.SquaredEuclideanDistanceMeasure], --endPhase=[2147483647], --input=[/DolphinData/UserTest/synthetic/synthetic_output_fdeb35fa-ede9-4d73-a455-ece92f133cd4/clusters-*-final], --output=[/DolphinData/UserTest/synthetic/synthetic_finalresult_d9d493ae-d027-4fb0-83a7-c9b8d4cfb35b.txt], --outputFormat=[TEXT], --pointsDir=[/DolphinData/UserTest/synthetic/synthetic_output_fdeb35fa-ede9-4d73-a455-ece92f133cd4/clusteredPoints], --startPhase=[0], --tempDir=[temp]}
15/01/12 17:13:32 INFO clusterdumper.ClusterDumper: Wrote 600 clusters
15/01/12 17:13:32 INFO canopy.Canopy: Final result has been written to file /DolphinData/UserTest/synthetic/synthetic_finalresult_d9d493ae-d027-4fb0-83a7-c9b8d4cfb35b.txt

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值