在平时的Hive数仓开发工作中经常会用到排序,而Hive中支持的排序方式有四种,这里结合具体的案例详细介绍一下他们的使用与区别:
- order by
- sort by
- distribute by
- cluster by
准备工作:
新建一个测试用表employInfo:
create table employInfo(deptID int,employID int,employName string,employSalary double)
row format delimited fields terminated by ',';
向测试用表中导入测试数据:
load data local inpath '/home/hadoop/datas/employInfo.txt' into table employInfo;
以下为测试用的数据:
[hadoop@weekend110 datas]$ cat employInfo.txt
deptID,employID,employName,employSalary
1,1001,Jack01,5000
1,1002,Jack02,5001
1,1003,Jack03,5002
1,1004,Jack04,5003
1,1005,Jack05,5004
1,1006,Jack06,5005
1,1007,Jack07,5006
1,1008,Jack08,5007
1,1009,Jack09,5008
1,1010,Jack10,5009
1,1011,Jack11,5010
1,1012,Jack12,5011
2,1013,Maria01,7500
2,1014,Maria02,7501
2,1015,Maria03,7502
2,1016,Maria04,7503
2,1017,Maria05,7504
2,1018,Maria06,7505
2,1019,Maria07,7506
2,1020,Maria08,7507
2,1021,Maria09,7508
3,1022,Lucy01,8540
3,1023,Lucy02,8541
3,1024,Lucy03,8542
3,1025,Lucy04,8543
3,1026,Lucy05,8544
3,1027,Lucy06,8545
3,1028,Lucy07,8546
3,1029,Lucy08,8547
3,1030,Lucy09,8548
3,1031,Lucy10,8549
3,1032,Lucy11,8550
3,1033,Lucy12,8551
4,1034,Jimmy01,10000
4,1035,Jimmy02,10001
4,1036,Jimmy03,10002
4,1037,Jimmy04,10003
4,1038,Jimmy05,10004
4,1039,Jimmy06,10005
4