35. Spark 2.4版本以下没有array_join、array_sort 函数，可变通的办法

元元的李树

于 2019-12-24 10:04:52 发布

阅读量1.3k

点赞数

CC 4.0 BY-SA版权

文章标签： Spark

本文链接：https://blog.youkuaiyun.com/qq0719/article/details/103678103

首先，先要知道 array_join 及 array_sort的函数用法，详情请参考如下网址：

https://www.iteblog.com/archives/2459.html

下面给出Spark 2.4的 demo代码

select 
  row_number()  OVER (PARTITION BY 1 ORDER BY 1) id,
  md5(array_join(array_sort(collect_set(f.holder_id)),'|')) association_id,
  current_timestamp() date_modified,
  first(f.date_id) date_id,
  array_join(array_sort(collect_set(f.holder_id)),'|') horder_ids_string,
  size(collect_set(f.holder_id)) holder_count,
  first(h.type) holder_type,
  first(h.type_name) holder_type_name,
  first(f.date_id) dt           
from XXXXX f, XXXX h
where f.holde

最低0.47元/天解锁文章