给出数据:姓名,日期,花费
jack,2020-01-01,10
tony,2020-01-02,15
jack,2020-02-03,23
tony,2020-01-04,29
jack,2020-01-05,46
jack,2020-04-06,42
tony,2020-01-07,50
jack,2020-01-08,55
mart,2020-04-08,62
mart,2020-04-09,68
neil,2020-05-10,12
mart,2020-04-11,75
neil,2020-06-12,80
mart,2020-04-13,94
数据存于/data/orders中
建表:
create table tb_orders(
name string ,
ctime string ,
money double
)
row format delimited fields terminated by ‘,’ ;
load data local inpath “/data/orders” into table tb_orders ;
select * from tb_orders ;
±----------------±-----------------±-----------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money |
±----------------±-----------------±-----------------+
| jack | 2020-01-01 | 10.0 |
| tony | 2020-01-02 | 15.0 |
| jack | 2020-02-03 | 23.0 |
| tony | 2020-01-04 | 29.0 |
| jack | 2020-01-05 | 46.0 |
| jack | 2020-04-06 | 42.0 |
| tony | 2020-01-07 | 50.0 |
| jack | 2020-01-08 | 55.0 |
| mart | 2020-04-08 | 62.0 |
| mart | 2020-04-09 | 68.0 |
| neil | 2020-05-10 | 12.0 |
| mart | 2020-04-11 | 75.0 |
| neil | 2020-06-12 | 80.0 |
| mart | 2020-04-13 | 94.0 |
±----------------±-----------------±-----------------+
– 求每个人的订单明细和总量
思路:
①select * from tb_orders order by name;
②select
name ,
count(1)
from
tb_orders
group by name ;
±------±-----+
| name | _c1 |
±------±-----+
| jack | 5 |
| mart | 4 |
| neil | 2 |
| tony | 3 |
±------±-----+
整合:
with x as (
select * from tb_orders order by name
),
y as (
select
name ,
count(1) cnt ,
sum(money) total_money
from
tb_orders
group by name
)
select
x.*,
y.cnt,
y.total_money
from
x
join
y
on
x.name = y.name
order by x.name , x.ctime
;
结果:
±--------±------------±---------±-------±---------------+
| x.name | x.ctime | x.money | y.cnt | y.total_money |
±--------±------------±---------±-------±---------------+
| jack | 2020-01-01 | 10.0 | 5 | 176.0 |
| jack | 2020-01-05 | 46.0 | 5 | 176.0 |
| jack | 2020-01-08 | 55.0 | 5 | 176.0 |
| jack | 2020-02-03 | 23.0 | 5 | 176.0 |
| jack | 2020-04-06 | 42.0 | 5 | 176.0 |
| mart | 2020-04-08 | 62.0 | 4 | 299.0 |
| mart | 2020-04-09 | 68.0 | 4 | 299.0 |
| mart | 2020-04-11 | 75.0 | 4 | 299.0 |
| mart | 2020-04-13 | 94.0 | 4 | 299.0 |
| neil | 2020-05-10 | 12.0 | 2 | 92.0 |
| neil | 2020-06-12 | 80.0 | 2 | 92.0 |
| tony | 2020-01-02 | 15.0 | 3 | 94.0 |
| tony | 2020-01-04 | 29.0 | 3 | 94.0 |
| tony | 2020-01-07 | 50.0 | 3 | 94.0 |
±--------±------------±---------±-------±---------------+
指定窗口为整个表
select
- ,
count(1) over() – 整个表
from
tb_orders ;
±----------------±-----------------±-----------------±----------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | count_window_0 |
±----------------±-----------------±-----------------±----------------+
| mart | 2020-04-13 | 94.0 | 14 |
| neil | 2020-06-12 | 80.0 | 14 |
| mart | 2020-04-11 | 75.0 | 14 |
| neil | 2020-05-10 | 12.0 | 14 |
| mart | 2020-04-09 | 68.0 | 14 |
| mart | 2020-04-08 | 62.0 | 14 |
| jack | 2020-01-08 | 55.0 | 14 |
| tony | 2020-01-07 | 50.0 | 14 |
| jack | 2020-04-06 | 42.0 | 14 |
| jack | 2020-01-05 | 46.0 | 14 |
| tony | 2020-01-04 | 29.0 | 14 |
| jack | 2020-02-03 | 23.0 | 14 |
| tony | 2020-01-02 | 15.0 | 14 |
| jack | 2020-01-01 | 10.0 | 14 |
±----------------±-----------------±-----------------±----------------+
– 指定窗口大小是一个人
select
- ,
count(1) over(partition by name) ,
sum(money) over(partition by name)
from
tb_orders ;
±----------------±-----------------±-----------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money |
±----------------±-----------------±-----------------+
| jack | 2020-01-01 | 10.0 |
| tony | 2020-01-02 | 15.0 |
| jack | 2020-02-03 | 23.0 |
| tony | 2020-01-04 | 29.0 |
| jack | 2020-01-05 | 46.0 |
| jack | 2020-04-06 | 42.0 |
| tony | 2020-01-07 | 50.0 |
| jack | 2020-01-08 | 55.0 |
| mart | 2020-04-08 | 62.0 |
| mart | 2020-04-09 | 68.0 |
| neil | 2020-05-10 | 12.0 |
| mart | 2020-04-11 | 75.0 |
| neil | 2020-06-12 | 80.0 |
| mart | 2020-04-13 | 94.0 |
±----------------±-----------------±-----------------+
限制窗口大小:
这里 order by ctime 和rows between unbounded preceding and current row 起始效果是一样的,写一个就可以
select
- ,
sum(money) over(partition by name order by ctime rows between unbounded preceding and current row )
from
tb_orders ;
分开写:
①
select
*,
sum(money) over (partition by name order by ctime)
from
tb_orders;
②
select
*,
sum(money) over (partition by name rows between unbounded preceding and current row)
from
tb_orders;
这三个结果都一样:
±----------------±-----------------±-----------------±--------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | sum_window_0 |
±----------------±-----------------±-----------------±--------------+
| jack | 2020-01-01 | 10.0 | 10.0 |
| jack | 2020-01-05 | 46.0 | 56.0 |
| jack | 2020-01-08 | 55.0 | 111.0 |
| jack | 2020-02-03 | 23.0 | 134.0 |
| jack | 2020-04-06 | 42.0 | 176.0 |
| mart | 2020-04-08 | 62.0 | 62.0 |
| mart | 2020-04-09 | 68.0 | 130.0 |
| mart | 2020-04-11 | 75.0 | 205.0 |
| mart | 2020-04-13 | 94.0 | 299.0 |
| neil | 2020-05-10 | 12.0 | 12.0 |
| neil | 2020-06-12 | 80.0 | 92.0 |
| tony | 2020-01-02 | 15.0 | 15.0 |
| tony | 2020-01-04 | 29.0 | 44.0 |
| tony | 2020-01-07 | 50.0 | 94.0 |
±----------------±-----------------±-----------------±--------------+
** 起始行 unbounded preceding
当前行 current row
前n行 n preceding
后n行 n following
结束行 unbounded following**
– 上一行和当前行 – order by ctime 起始行和当前行
select
- ,
sum(money) over(partition by name order by ctime rows between 1 preceding and current row )
from
tb_orders ;
– 上一行 当前行 和下一行
select
- ,
sum(money) over(partition by name order by ctime rows between 1 preceding and 1 following )
from
tb_orders ;
lag 向前n行的数据 :lag(字段,偏移量,默认值(可省略)) 没值的补NULL
select
- ,
lag(ctime ,1) over(partition by name order by ctime)
from
tb_orders ;
±----------------±-----------------±-----------------±--------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | lag_window_0 |
±----------------±-----------------±-----------------±--------------+
| jack | 2020-01-01 | 10.0 | NULL |
| jack | 2020-01-05 | 46.0 | 2020-01-01 |
| jack | 2020-01-08 | 55.0 | 2020-01-05 |
| jack | 2020-02-03 | 23.0 | 2020-01-08 |
| jack | 2020-04-06 | 42.0 | 2020-02-03 |
| mart | 2020-04-08 | 62.0 | NULL |
| mart | 2020-04-09 | 68.0 | 2020-04-08 |
| mart | 2020-04-11 | 75.0 | 2020-04-09 |
| mart | 2020-04-13 | 94.0 | 2020-04-11 |
| neil | 2020-05-10 | 12.0 | NULL |
| neil | 2020-06-12 | 80.0 | 2020-05-10 |
| tony | 2020-01-02 | 15.0 | NULL |
| tony | 2020-01-04 | 29.0 | 2020-01-02 |
| tony | 2020-01-07 | 50.0 | 2020-01-04 |
±----------------±-----------------±-----------------±--------------+
select
- ,
lag(ctime ,1 , ‘自定义默认值’) over(partition by name order by ctime)
from
tb_orders ;
±----------------±-----------------±-----------------±--------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | lag_window_0 |
±----------------±-----------------±-----------------±--------------+
| jack | 2020-01-01 | 10.0 | 自定义默认值 |
| jack | 2020-01-05 | 46.0 | 2020-01-01 |
| jack | 2020-01-08 | 55.0 | 2020-01-05 |
| jack | 2020-02-03 | 23.0 | 2020-01-08 |
| jack | 2020-04-06 | 42.0 | 2020-02-03 |
| mart | 2020-04-08 | 62.0 | 自定义默认值 |
| mart | 2020-04-09 | 68.0 | 2020-04-08 |
| mart | 2020-04-11 | 75.0 | 2020-04-09 |
| mart | 2020-04-13 | 94.0 | 2020-04-11 |
| neil | 2020-05-10 | 12.0 | 自定义默认值 |
| neil | 2020-06-12 | 80.0 | 2020-05-10 |
| tony | 2020-01-02 | 15.0 | 自定义默认值 |
| tony | 2020-01-04 | 29.0 | 2020-01-02 |
| tony | 2020-01-07 | 50.0 | 2020-01-04 |
±----------------±-----------------±-----------------±--------------+
lead 向后n行的数据 lead(字段,偏移量,默认值(可省略))没值的补NULL
select
*,
lead(ctime,1) over(partition by name order by ctime)
from
tb_orders;
±----------------±-----------------±-----------------±---------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | lead_window_0 |
±----------------±-----------------±-----------------±---------------+
| jack | 2020-01-01 | 10.0 | 2020-01-05 |
| jack | 2020-01-05 | 46.0 | 2020-01-08 |
| jack | 2020-01-08 | 55.0 | 2020-02-03 |
| jack | 2020-02-03 | 23.0 | 2020-04-06 |
| jack | 2020-04-06 | 42.0 | NULL |
| mart | 2020-04-08 | 62.0 | 2020-04-09 |
| mart | 2020-04-09 | 68.0 | 2020-04-11 |
| mart | 2020-04-11 | 75.0 | 2020-04-13 |
| mart | 2020-04-13 | 94.0 | NULL |
| neil | 2020-05-10 | 12.0 | 2020-06-12 |
| neil | 2020-06-12 | 80.0 | NULL |
| tony | 2020-01-02 | 15.0 | 2020-01-04 |
| tony | 2020-01-04 | 29.0 | 2020-01-07 |
| tony | 2020-01-07 | 50.0 | NULL |
±----------------±-----------------±-----------------±---------------+
select
*,
lead(ctime,1,‘向后自定义默认值’) over(partition by name order by ctime)
from
tb_orders;
±----------------±-----------------±-----------------±---------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | lead_window_0 |
±----------------±-----------------±-----------------±---------------+
| jack | 2020-01-01 | 10.0 | 2020-01-05 |
| jack | 2020-01-05 | 46.0 | 2020-01-08 |
| jack | 2020-01-08 | 55.0 | 2020-02-03 |
| jack | 2020-02-03 | 23.0 | 2020-04-06 |
| jack | 2020-04-06 | 42.0 | 向后自定义默认值 |
| mart | 2020-04-08 | 62.0 | 2020-04-09 |
| mart | 2020-04-09 | 68.0 | 2020-04-11 |
| mart | 2020-04-11 | 75.0 | 2020-04-13 |
| mart | 2020-04-13 | 94.0 | 向后自定义默认值 |
| neil | 2020-05-10 | 12.0 | 2020-06-12 |
| neil | 2020-06-12 | 80.0 | 向后自定义默认值 |
| tony | 2020-01-02 | 15.0 | 2020-01-04 |
| tony | 2020-01-04 | 29.0 | 2020-01-07 |
| tony | 2020-01-07 | 50.0 | 向后自定义默认值 |
±----------------±-----------------±-----------------±---------------+
(1) 查询前20%时间的订单信息
NTILE(n) 数据分成n分,可实现按百分比取出数据
例:取出20%,也就是总数的五分之一
select
name ,
ctime ,
money
from
(select
- ,
ntile(5) over(order by ctime) as num
from
tb_orders)t
where num =1 ;
±------±------------±-------+
| name | ctime | money |
±------±------------±-------+
| jack | 2020-01-01 | 10.0 |
| tony | 2020-01-02 | 15.0 |
| tony | 2020-01-04 | 29.0 |
±------±------------±-------+
(2)查询在2020年4月份购买过的顾客及总人数
select
*
from
tb_orders where substring(ctime , 0 , 7) = ‘2020-04’ ;
±----------------±-----------------±-----------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money |
±----------------±-----------------±-----------------+
| jack | 2020-04-06 | 42.0 |
| mart | 2020-04-08 | 62.0 |
| mart | 2020-04-09 | 68.0 |
| mart | 2020-04-11 | 75.0 |
| mart | 2020-04-13 | 94.0 |
±----------------±-----------------±-----------------+
select
name,
count(1) over()
from
(
select
distinct name
from
tb_orders where substring(ctime , 0 , 7) = ‘2020-04’
)t ;
(3)查询顾客的购买明细及月购买总额
select
- ,
sum(money) over(partition by name , substring(ctime ,1 ,7))
from
tb_orders ;
±----------------±-----------------±-----------------±--------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | sum_window_0 |
±----------------±-----------------±-----------------±--------------+
| jack | 2020-01-05 | 46.0 | 111.0 |
| jack | 2020-01-08 | 55.0 | 111.0 |
| jack | 2020-01-01 | 10.0 | 111.0 |
| jack | 2020-02-03 | 23.0 | 23.0 |
| jack | 2020-04-06 | 42.0 | 42.0 |
| mart | 2020-04-13 | 94.0 | 299.0 |
| mart | 2020-04-11 | 75.0 | 299.0 |
| mart | 2020-04-09 | 68.0 | 299.0 |
| mart | 2020-04-08 | 62.0 | 299.0 |
| neil | 2020-05-10 | 12.0 | 12.0 |
| neil | 2020-06-12 | 80.0 | 80.0 |
| tony | 2020-01-04 | 29.0 | 94.0 |
| tony | 2020-01-02 | 15.0 | 94.0 |
| tony | 2020-01-07 | 50.0 | 94.0 |
±----------------±-----------------±-----------------±--------------+
month(ctime)month函数(对日期格式有要求):
±---------------------------------------------------+
| tab_name |
±---------------------------------------------------+
| month(param) - Returns the month component of the date/timestamp/interval |
要求格式为:日期,时间戳,定时器类型
select
month(ctime)
from
tb_orders;
±-----+
| _c0 |
±-----+
| 1 |
| 1 |
| 2 |
| 1 |
| 1 |
| 4 |
| 1 |
| 1 |
| 4 |
| 4 |
| 5 |
| 4 |
| 6 |
| 4 |
±-----+
select
- ,
sum(money) over(partition by name , month(ctime))
from
tb_orders ;
±----------------±-----------------±-----------------±--------------+
| tb_orders.name | tb_orders.ctime | tb_orders.money | sum_window_0 |
±----------------±-----------------±-----------------±--------------+
| jack | 2020-01-05 | 46.0 | 111.0 |
| jack | 2020-01-08 | 55.0 | 111.0 |
| jack | 2020-01-01 | 10.0 | 111.0 |
| jack | 2020-02-03 | 23.0 | 23.0 |
| jack | 2020-04-06 | 42.0 | 42.0 |
| mart | 2020-04-13 | 94.0 | 299.0 |
| mart | 2020-04-11 | 75.0 | 299.0 |
| mart | 2020-04-09 | 68.0 | 299.0 |
| mart | 2020-04-08 | 62.0 | 299.0 |
| neil | 2020-05-10 | 12.0 | 12.0 |
| neil | 2020-06-12 | 80.0 | 80.0 |
| tony | 2020-01-04 | 29.0 | 94.0 |
| tony | 2020-01-02 | 15.0 | 94.0 |
| tony | 2020-01-07 | 50.0 | 94.0 |
±----------------±-----------------±-----------------±--------------+