Twister之WordCount实例分析

1. 启动ActiveMQ和Twister. 安装过程可以参考博文《Twister编译及安装 [多节点方式]》.

1.1 启动ActiveMQ

[lucktroy@node03 apache-activemq-5.4.2]$ bin/activemq console

1.2 启动Twister

[lucktroy@node03 bin]$ ./start_twister.sh

2. 切分数据

使用方法:

./split_input_file.sh
Usage: [input file path][output dir][number of splits][partitioned file name pattern]

实例:

cd $TWISTER_HOME/samples/wordcount/bin
mkdir input
./split_input_file.sh $TWISTER_HOME/samples/wordcount/bin/input_data.txt $TWISTER_HOME/samples/wordcount/input 8 wc

结果:

$ ls input
wc0.txt  wc1.txt  wc2.txt  wc3.txt  wc4.txt  wc5.txt  wc6.txt  wc7.txt

3. 创造WordCount输入目录

cd $TWISTER_HOME/bin 
./twister.sh mkdir WC

4. 分配切分好的数据

使用方法:

$ ./twister.sh put
Usage: put [input data directory (local)][destination directory (remote)][file filter][num threads][num replications (optional)]
destination directory - relative to data_dir specified in twister.properties

实例:

$ ./twister.sh put $TWISTER_HOME/samples/wordcount/bin/input WC wc 8
Number of files to copy = 8
Number of nodes = 2
Destintion Directory =/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc5.txt to                                                                       node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc0.txt to                                                                       node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc1.txt to                                                                       node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc6.txt to                                                                       node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc4.txt to                                                                       node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc2.txt to                                                                       node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc7.txt to                                                                       node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc3.txt to                                                                       node02:/tmp/data/WC

5. 生成区分文件

使用方法:

$ ./create_partition_file.sh
Usage: [common directory][file filter][partition file]

实例:

$ ./create_partition_file.sh WC wc wc.pf
Apr 6, 2013 12:29:05 AM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://node03:61616
Partition file created.

结果:

$ cat wc.pf
0,node03,1,/tmp/data/WC/wc5.txt
1,node03,1,/tmp/data/WC/wc4.txt
2,node03,1,/tmp/data/WC/wc7.txt
3,node03,1,/tmp/data/WC/wc6.txt
4,node02,0,/tmp/data/WC/wc2.txt
5,node02,0,/tmp/data/WC/wc0.txt
6,node02,0,/tmp/data/WC/wc3.txt
7,node02,0,/tmp/data/WC/wc1.txt

6. 执行WordCount程序

使用方法:

$ ./run_wc.sh
Usage: [partition File][output file][num maps][num reducers]

实例 & 运行结果:

$ cd $TWISTER_HOME/samples/wordcount/bin
$ ./run_wc.sh ~/twister-0.9/bin/wc.pf wc2.out 8 1
JobID: word-count-map-reduce01fbcf0c-9e78-11e2-8071-87da32af18a2
Apr 6, 2013 12:08:24 AM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://node03:61616
0    [main] INFO  cgl.imr.client.TwisterDriver  - Configure Mappers through the partition file, please wait....
37   [main] INFO  cgl.imr.client.TwisterDriver  - Configuring Mappers through the partition file is completed.
623  [main] INFO  cgl.imr.client.TwisterDriver  - MapReduce computation termintated gracefully.
pint, , 3
cried. , 87
long, , 21
...
...
...
weight , 3
kind , 42
high," , 3
------------------------------------------------------
Word Count took 1.175 seconds.
------------------------------------------------------
715  [Thread-0] DEBUG cgl.imr.client.ShutdownHook  - Shutting down completed.

参考:

[1] http://salsahpc.indiana.edu/tutorial/twister_wordcount_user_guide.htm

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值