select * from test_table order by income;
select * from test_table sort by income;
Insert overwrite local directory ‘/home/hadoop/out’ select * from test_table destribute by city_id;
SELECT col1, col2 FROM t1 CLUSTER BY col1;
•order by 全局有序,一个reducer,输入规模较大时建议使用limit
•sort by 不是全局排序,多个reduce,每个reduce只保证自己产出的数据是排序的
再对产出的所有文件做一次归并排序就可以了,也可用limit大大加快查询速度
•distribute by 按照所选字段划分reduce,注意数据的均衡
•cluster by col1= distribute by col1 sort by col1 排序只能是倒序排序
分析窗口函数
•聚合函数:COUNT