PIG_HOME=/usr/local/pig
PATH=$PATH:$PIG_HOME/bin
PIG_CLASSPATH=$HADOOP_HOME/conf
通过如下的Pig脚本完成点击数 排名前20的IP220.181.108.151 - - [31/Jan/2012:00:02:32 +0800] "GET /home.php?mod=space&uid=158&do=album&view=me&from=space HTTP/1.1" 200 8784 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
208.115.113.82 - - [31/Jan/2012:00:07:54 +0800] "GET /robots.txt HTTP/1.1" 200 582 "-" "Mozilla/5.0 (compatible; Ezooms/1.0; ezooms.bot@gmail.com)"
220.181.94.221 - - [31/Jan/2012:00:09:24 +0800] "GET /home.php?mod=spacecp&ac=pm&op=showmsg&handlekey=showmsg_3&touid=3&pmid=0&daterange=2&pid=398&tid=66 HTTP/1.1" 200 10070 "-" "Sogou web spider/4.0(+http://www.sogou.com/docs/help/webmasters.htm#07)"
112.97.24.243 - - [31/Jan/2012:00:14:48 +0800] "GET /data/cache/style_2_common.css?AZH HTTP/1.1" 200 57752 "http://f.dataguru.cn/forum-58-1.html" "Mozilla/5.0 (iPhone; CPU iPhone OS 5_0_1 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Mobile/9A406"
运行的结果如下:logRecords = LOAD '/feixu/log/access_log' USING PigStorage(' ') AS (ip:chararray, link:chararray);
groupRecords = GROUP logRecords BY ip;
countRecords = FOREACH groupRecords GENERATE group AS ip, COUNT(logRecords) AS count;
sortRecords = ORDER countRecords BY count DESC;
row20 = LIMIT sortRecords 20;
STORE row20 INTO '/feixu/log/access_out2' USING PigStorage('\t');

可以看到Pig调用自己生成的MapReduce Job如下:
