collect_list:收集数据,返回是一个集合,集合中元素不去重
collect_set:收集数据,返回是一个集合,集合中元素去重
结合concat_ws函数返回行转列的结果
案例1 collect_list:
spark-sql> with test1 as (
> select '张三丰' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'极限挑战' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'今夜百乐门' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'极限挑战' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'王牌对王牌' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'觉醒年代' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'欢乐喜剧人' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline)
>
> select
> user_name,
> dateline,
> concat_ws(',',collect_list(video_name)) as video_list
> from test1
> group by user_name,dateline;
张无忌 2021-07-28 王牌对王牌,向往的生活,觉醒年代,欢乐喜剧人
张三丰 2021-07-28 今夜百乐门,极限挑战,向往的生活,极限挑战,向往的生活
Time taken: 17.38 seconds, Fetched 2 row(s)
案例2 collect_set:
spark-sql> with test1 as (
> select '张三丰' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'极限挑战' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'今夜百乐门' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline
> union all
> select '张三丰' as user_name,'极限挑战' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'王牌对王牌' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'觉醒年代' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'欢乐喜剧人' as video_name,'2021-07-28' as dateline
> union all
> select '张无忌' as user_name,'向往的生活' as video_name,'2021-07-28' as dateline)
>
> select
> user_name,
> dateline,
> concat_ws(',',collect_set(video_name)) as video_list
> from test1
> group by user_name,dateline;
张三丰 2021-07-28 今夜百乐门,极限挑战,向往的生活
张无忌 2021-07-28 欢乐喜剧人,向往的生活,王牌对王牌,觉醒年代