1. 启动ActiveMQ和Twister. 安装过程可以参考博文《Twister编译及安装 [多节点方式]》.
1.1 启动ActiveMQ
[lucktroy@node03 apache-activemq-5.4.2]$ bin/activemq console
1.2 启动Twister
[lucktroy@node03 bin]$ ./start_twister.sh
2. 切分数据
使用方法:
./split_input_file.sh
Usage: [input file path][output dir][number of splits][partitioned file name pattern]
实例:
cd $TWISTER_HOME/samples/wordcount/bin
mkdir input
./split_input_file.sh $TWISTER_HOME/samples/wordcount/bin/input_data.txt $TWISTER_HOME/samples/wordcount/input 8 wc
结果:
$ ls input
wc0.txt wc1.txt wc2.txt wc3.txt wc4.txt wc5.txt wc6.txt wc7.txt
3. 创造WordCount输入目录
cd $TWISTER_HOME/bin
./twister.sh mkdir WC
4. 分配切分好的数据
使用方法:
$ ./twister.sh put
Usage: put [input data directory (local)][destination directory (remote)][file filter][num threads][num replications (optional)]
destination directory - relative to data_dir specified in twister.properties
实例:
$ ./twister.sh put $TWISTER_HOME/samples/wordcount/bin/input WC wc 8
Number of files to copy = 8
Number of nodes = 2
Destintion Directory =/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc5.txt to node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc0.txt to node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc1.txt to node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc6.txt to node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc4.txt to node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc2.txt to node02:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc7.txt to node03:/tmp/data/WC
INPUT :Copying /home/lucktroy/twister-0.9/samples/wordcount/bin/input/wc3.txt to node02:/tmp/data/WC
5. 生成区分文件
使用方法:
$ ./create_partition_file.sh
Usage: [common directory][file filter][partition file]
实例:
$ ./create_partition_file.sh WC wc wc.pf
Apr 6, 2013 12:29:05 AM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://node03:61616
Partition file created.
结果:
$ cat wc.pf
0,node03,1,/tmp/data/WC/wc5.txt
1,node03,1,/tmp/data/WC/wc4.txt
2,node03,1,/tmp/data/WC/wc7.txt
3,node03,1,/tmp/data/WC/wc6.txt
4,node02,0,/tmp/data/WC/wc2.txt
5,node02,0,/tmp/data/WC/wc0.txt
6,node02,0,/tmp/data/WC/wc3.txt
7,node02,0,/tmp/data/WC/wc1.txt
6. 执行WordCount程序
使用方法:
$ ./run_wc.sh
Usage: [partition File][output file][num maps][num reducers]
实例 & 运行结果:
$ cd $TWISTER_HOME/samples/wordcount/bin
$ ./run_wc.sh ~/twister-0.9/bin/wc.pf wc2.out 8 1
JobID: word-count-map-reduce01fbcf0c-9e78-11e2-8071-87da32af18a2
Apr 6, 2013 12:08:24 AM org.apache.activemq.transport.failover.FailoverTransport doReconnect
INFO: Successfully connected to tcp://node03:61616
0 [main] INFO cgl.imr.client.TwisterDriver - Configure Mappers through the partition file, please wait....
37 [main] INFO cgl.imr.client.TwisterDriver - Configuring Mappers through the partition file is completed.
623 [main] INFO cgl.imr.client.TwisterDriver - MapReduce computation termintated gracefully.
pint, , 3
cried. , 87
long, , 21
...
...
...
weight , 3
kind , 42
high," , 3
------------------------------------------------------
Word Count took 1.175 seconds.
------------------------------------------------------
715 [Thread-0] DEBUG cgl.imr.client.ShutdownHook - Shutting down completed.
参考:
[1] http://salsahpc.indiana.edu/tutorial/twister_wordcount_user_guide.htm