3.查询某个库中的表
方法一:SELECT * FROM db_name.table_name;
方法二:USE db_name; SELECT * FROM table_name;
4.WHERE条件 SELECT * FROM sales WHERE amount > 10 AND region = 'CHINA'
5.基于分区的查询
SELECT page_views.*
FROM page_views
WHERE page_views.date >= '2018-01-01' AND page_views.date <= '2018-01-31';
如果一个表和另一个表join,那么分区条件可以放在ON中
SELECT page_views.*
FROM page_views JOIN dim_users
ON (page_views.user_id = dim_users.id AND page_views.date >= '2018-01-01' AND page_views.date <= '2018-01-31');
6.having子句
SELECT col1 FROM t1 GROUP BY col1 HAVING SUM(col2) > 10;
也可以这样写
SELECT col1 FROM (SELECT col1, SUM(col2) as col2sum FROM t1 GROUP BY col1) t2 WHERE t2.col2sum > 10;
7.LIMIT子句
返回前5行
SELECT * FROM customers LIMIT 5;
返回第三行到第七行
SELECT * FROM customers ORDER BY create_date LIMIT 2,5;
二.排序
1.全局排序(order by )
Order By: 全局排序,只有一个 reducer
1.1使用 order by 子句排序
asc(ascend): 升序(默认)
desc(descend): 降序
1.2.order by 子句在 select 语句的结尾
2.分组(Grouping)
GROUP BY子句与聚合函数配合使用,将数据集按照指定列进行分组,然后对每个组执行聚合操作。
-- 按部门分组并计算每个部门的平均工资
SELECT department, AVG(salary) as avg_salary
FROM employees
GROUP BY department;
3.基础聚合
max min count avg sum —— MMCAS
COUNT(NULL)=0 SUM(NULL)=NULL
COUNT(FIELD | * | 1) : 计算所有数据行,不管是否为NULL
– 集合类型聚合
collect_set (F|func|case…when) – 去重后的列表
collect_list (F|func|case…when) – 列表
格式为[“11599”,“265”,“5462”]
4.高级聚合
GROUPING SETS和GROUPING__ID
GROUPING SETS子句允许开发者自行组合GROUP BY子句中出现的字段作为分组字段,其实现效果等同于按照不同字段分组的SQL语句进行UNION操作。下面这段SQL将生成按照product_id,channel_id和channel_id,promotion_id分组汇总sale_amount的数据:
SELECT a.product_id
, a.channel_id
, a.promotion_id
, SUM(a.sale_amount) AS sale_amount
FROM dwd.dwd_sales a
GROUP BY a.product_id
, a.channel_id
, a.promotion_id
GROUPING SETS ((a.product_id, a.channel_id),(a.channel_id, a.promotion_id));