计算数据总行数
rows= LOAD '/your/data/path';
g= GROUP rows ALL;
total_count = FOREACH g GENERATE COUNT(rows);
DUMP total_count;
g= GROUP rows ALL;
total_count = FOREACH g GENERATE COUNT(rows);
DUMP total_count;
world count:
a = load '/user/hue/word_count_text.txt';
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
c = group b by word;
d = foreach c generate COUNT(b), group;
store d into '/user/hue/pig_wordcount';
b = foreach a generate flatten(TOKENIZE((chararray)$0)) as word;
c = group b by word;
d = foreach c generate COUNT(b), group;
store d into '/user/hue/pig_wordcount';
本文介绍了一段Pig脚本,该脚本用于计算数据集的总行数,并实现了一个简单的词频统计应用。通过加载指定路径的数据文件,脚本能够快速计算出文件的总行数,同时利用Pig Latin提供的强大处理能力进行词频统计。
490

被折叠的 条评论
为什么被折叠?



