【博学谷学习记录】超强总结,用心分享| Hive表的查询

1.普通查询

1、聚合函数对null的态度
create table test11(
    id int,
    score int
);

insert into test11 values (1,50);
insert into test11 values (2,50);
insert into test11 values (3,null);
insert into test11 values (4,50);

select * from test11;

select sum(score) from test11; // 150
select avg(score) from test11; // 150  / 3
select avg(if(score is null, 0, score)) from test11; // 150  / 4
select avg(coalesce(score,0)) from test11; // 150  / 4


2、limit关键字
select * from student limit 3;
select * from student limit 2,3;  --从索引为2(从0开始)显示,显示3行


3、where条件查询

select * from score where sscore not in(80,90); -- 成绩不是80或者90
select * from score where not sscore  in(80,90);-- 成绩不是80或者90


4、like关键字
select * from student where sname like '赵%'; --  姓赵的
select * from student where sname like '%雷'; --  名字最后一个字是 雷
select * from student where sname like '%雷%'; -- 名字中包含 雷
select * from student where sname like '_雷%'; -- 名字第二个字是 雷\

5、分组-group by
-- 分组之后每一组只剩下一条数据,所以select后边只能跟分组字段和聚合函数
select  sid, sum(sscore) from score group by sid;

-- 分组之后的条件筛选是having,不是where
select  sid, sum(sscore) as total_score from score group by sid having  total_score > 450

2 join查询

-- 1、内连接:求交集
select * from teacher;
insert into teacher values ('04','赵六');

select * from  teacher inner join course c on teacher.tid = c.tid;
select * from  teacher  join course c on teacher.tid = c.tid;
select * from  teacher , course where teacher.tid = course.tid;


-- 2、左外连接
-- 左外是以左表为主,把左表的数据全部输出,右表有对应的数据就输出,没有对应的数据就输出NULL
select * from  teacher left  join course c on teacher.tid = c.tid;

3排序查询

select * from score order by sscore ;     -- 升序排序
select * from score order by sscore desc; -- 降序排序


1)设置reduce个数
 set mapreduce.job.reduces=3;
2)查询成绩按照成绩降序排列
 select * from score sort by sscore;
3)将查询结果导入到文件中(按照成绩降序排列)
 insert overwrite local directory '/export/data/exporthive/sort' select * from score sort by sscore;





1)设置reduce的个数,将我们对应的sid划分到对应的reduce当中去
set mapreduce.job.reduces=7;
2)通过distribute by进行数据的分区
insert overwrite local directory '/export/data/exporthive/distribute' select * from score distribute by sid sort by sscore; 



1、当distributed by 和 sort by字段相同时:cluster by 等价于  distributed by  + sort by
 cluster by id => distributed by id sort by id
 
2、当reduce个数 < id的个数时,排序有意义
 id有100个  rduce 100个
 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值