首先搭建一个Hadoop集群环境,这里略过。
下载pig 0.9.1并运行。
wget www.apache.org/dist//pig/pig-0.9.1/pig-0.9.1.tar.gz
chmod 777 pig-0.9.1.tar.gz
tar -zxvf pig-0.9.1.tar.gz
mv pig-0.9.1 pig
cd pig/bin
./pig
---------------------------------------------------------------------------------------------------------
新建文件words.txt并拷贝至/tmp目录下,内容如下:
Hello World
Hello Hive
Hello Pig
通过hadoop把该文件拷贝至HDFS文件系统下。
hadoop fs -copyFromLocal /tmp/words.txt /data/input
运行pig.
grunt> a = load '/data/input/words.txt' as (text:chararray);
grunt> b = foreach a generate flatten(TOKENIZE(text)) as word;
grunt> c = group b by word;
grunt> d = foreach c generate group as word, COUNT(b) as count;
grunt> store d into '/data/output/pig';
。。。。。。
grunt>dump d
。。。。。。
grunt> ls /data/output/pig
hdfs://master:54310/data/output/pig/_logs <dir>
hdfs://master:54310/data/output/pig/part-r-00000<r 2> 29
grunt> cat /data/output/pig/part-r-00000
Pig 1
Hive 1
Hello 3
World 1
参考资料:
http://www.ne.jp/asahi/hishidama/home/tech/apache/pig/wordcount.html
http://www.codelast.com/?p=3621