select
t2.j
from json_test t1
lateral view explode(split(regexp_replace(regexp_replace(json_str,'\\[|\\]',''),'\\},\\{','\\};\\{'),';')) t2 as j;
2.3.4、使用json_tuple解析json字符串
select
json_tuple(t3.j, 'website','name')
from (
select
t2.j
from json_test t1
lateral view explode(split(regexp_replace(regexp_replace(json_str,'\\[|\\]',''),'\\},\\{','\\};\\{'),';')) t2 as j) t3
3、开窗函数
3.1、开窗排序
3.1.1、row_number() over(partition by 分区字段 order by 排序字段)
在指定分区内,按照指定字段进行排序,分区内排序不重复(1,2,3,4...)
eg:
SELECT
identify,
age,
row_number() over(PARTITION BY identify ORDER BY age) rn
FROM person;
3.1.2、rank() over(partition by 分区字段 order by 排序字段)
在指定分区内,按照指定字段进行排序,分区内排序会重复,且会有跳行(1,2,2,4,5,5,5,8...)
eg:
SELECT
identify,
age,
rank() over(PARTITION BY identify ORDER BY age) rn
FROM person;
3.1.3、dense_rank() over(partition by 分区字段 order by 排序字段)
在指定分区内,按照指定字段进行排序,分区内排序会重复,且不会有跳行(1,2,2,3,4,5,5,5,6...)
eg:
SELECT
identify,
age,
dense_rank() over(PARTITION BY identify ORDER BY age) rn
FROM person;
3.2、窗口内向前、向后取值
3.2.1、rows between...and...
unbounded:无界限
preceding:向前
following:向后
current row:当前行
unbounded preceding:分区内首行
unbounded following:分区内尾行
eg:
over(partition by 分区字段 order by 排序字段 rows between unbounded preceding and current row)
解释:从分区内首行到当前行
eg:
over(partition by 分区字段 order by 排序字段 rows between 1 preceding and current row)
解释:在分区内,从当前行的前一行到当前行
eg:
over(partition by 分区字段 order by 排序字段 rows between current row and 2 following)
解释:在分区内,从当前行到当前行的后2行
3.2.2、lag(取值字段,向前取第几行,默认填充值):向前取值
eg:
SELECT
identify,
age,
lag(age,1,0) over(PARTITION BY identify ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) pre_age,
DENSE_RANK() over(PARTITION BY identify ORDER BY age ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) rn
FROM person;
解释:按照身份进行分区,年龄进行排序,在分区内的第一行至当前行的范围内,取前一行的age值,如果前一行没有值,则使用默认值0进行填充
3.2.3、lead(取值字段,向后取第几行,默认填充值):向后取值
eg:
SELECT
identify,
age,
lead(age,1,0) over(PARTITION BY identify ORDER BY age ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) pre_age,
DENSE_RANK() over(PARTITION BY identify ORDER BY age ROWS BETWEEN CURRENT ROW AND 3 FOLLOWING) rn
FROM person;
解释:按照身份进行分区,年龄进行排序,在分区内的当前行至后三行的范围内,取后一行的age值,如果后一行没有值,则使用默认值0进行填充
3.2.4、max()取值
eg:
SELECT
identify,
age,
max(age) over(PARTITION BY identify ORDER BY age ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) max_age,
DENSE_RANK() over(PARTITION BY identify ORDER BY age ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING) rn
FROM person;