数据查询
select [all | distinct] select_expr, ...
from table_reference
[where where_condition]
[group by col_list]
[cluster by col_list] | [distribute by col_list] [sort by col_list]
[limit number]
查找正则表达式的列:
select '(ds|dh)? +.+' from sales
多路group by
from pv_users
insert overwrite table pr_gender_sum
select pv_users.gender, count(distinct pv_users.userid)
group by pv_users.gender
insert overwrite directory '/路径'
select pv_users.age, count(distinct pv_users.userid)
group by pv_users.age
区别
order by全局排序
sort by 保证reducer输出有序
distribute by 将数据分到同个reducer
cluster by = distribute by+ sort by
JOIN
内关联:select a.* from a join b on (a.id=b.id)
外关联:select a.val, b.val from a left[right|full] outer join b on (a.key=b.key)
where写法:select * from table1 t1, talbe2 t2, table3 t3 where t1.id=t2.id and t2.id=t3.id and t1.zipcode='0000';
改写IN语句
select a.key, a.val from a left semi join b on (a.key=b.key)
注意:
/* + streamtable(a) */ 指定大表为a 防止这个表放入内存
/* + mapjoin(a) */ 指定小表,放入内存做mapjoin
union all
虚拟列:
input_file_name
block_offset_inside_file
子查询
subqueries in the from clause
select [all | distinct] select_expr, ...
from table_reference
[where where_condition]
[group by col_list]
[cluster by col_list] | [distribute by col_list] [sort by col_list]
[limit number]
查找正则表达式的列:
select '(ds|dh)? +.+' from sales
多路group by
from pv_users
insert overwrite table pr_gender_sum
select pv_users.gender, count(distinct pv_users.userid)
group by pv_users.gender
insert overwrite directory '/路径'
select pv_users.age, count(distinct pv_users.userid)
group by pv_users.age
区别
order by全局排序
sort by 保证reducer输出有序
distribute by 将数据分到同个reducer
cluster by = distribute by+ sort by
JOIN
内关联:select a.* from a join b on (a.id=b.id)
外关联:select a.val, b.val from a left[right|full] outer join b on (a.key=b.key)
where写法:select * from table1 t1, talbe2 t2, table3 t3 where t1.id=t2.id and t2.id=t3.id and t1.zipcode='0000';
改写IN语句
select a.key, a.val from a left semi join b on (a.key=b.key)
注意:
/* + streamtable(a) */ 指定大表为a 防止这个表放入内存
/* + mapjoin(a) */ 指定小表,放入内存做mapjoin
union all
虚拟列:
input_file_name
block_offset_inside_file
子查询
subqueries in the from clause
subqueries in the where clause
函数UDF,UDAF,UDTF
详见技术文档
附:
floor, round和ceil是会将double转换为bigint
时间处理函数要传入integer或string作为参数