spark.sql中的Array[Struct]类型查询

最新推荐文章于 2025-01-20 16:18:17 发布

原创最新推荐文章于 2025-01-20 16:18:17 发布 · 5.5k 阅读

6 ·

CC 4.0 BY-SA版权

文章标签：

#spark #lateral view #StructType

人工智能专栏收录该内容

68 篇文章

订阅专栏

本文介绍了如何利用LATERALVIEW SQL操作在大数据场景下简化从复杂结构数据中筛选指定key值的过程。通过举例说明，阐述了LATERALVIEW的语法格式和使用方法，包括单个及多个LATERALVIEW语句的应用，以及如何直接对struct类型数据进行操作。这种方法能够一步到位地完成原本需要多步才能实现的数据展开和查询，提高了查询效率。

背景

我们要查询类似以下结构的数据，但是要筛选出指定key值的数据。
在这里插入图片描述

解决方案

一般方法将数组covers字段进行explode()操作展开，生成一个包含covers中struct类型元素的临时表，然后再将临时表中的struct类型字段的各个字段展开。最终生成一个id，key，type的表。得到这个表时便可以指定key进行查询了。但是这个方法需要进行三步，非常麻烦。

使用LATERAL VIEW explode(covers) adTable AS cover可以一步到位。

LATERAL VIEW介绍

语法格式：

LATERAL VIEW [ OUTER ] generator_function ( expression [ , ... ] ) [ table_alias ] AS column_alias [ , ... ]

假设我们已经有如下表：

pageid	col1	col2
front_page	[1, 2, 3]	[“a”, “b”, “c”]
contact_page	[3, 4, 5]	[“d”, “e”, “f”]

单个Lateral View语句

select pageid, col1_new, col2 from pageAds lateral view explode(col1) adTable as col1_new;

+--------------+------------+---------------+
| pageid       | col1_new   | col2          |
+--------------+------------+---------------+
| front_page   | 1          | ["a","b","c"] |
| front_page   | 2          | ["a","b","c"] |
| front_page   | 3          | ["a","b","c"] |
| contact_page | 3          | ["d","e","f"] |
| contact_page | 4          | ["d","e","f"] |
| contact_page | 5          | ["d","e","f"] |
+--------------+------------+---------------+

拆分col1并执行聚合统计。

select col1_new, count(1) as count from pageAds lateral view explode(col1) adTable as col1_new group by col1_new;

+------------+------------+
| col1_new   | count      |
+------------+------------+
| 1          | 1          |
| 2          | 1          |
| 3          | 2          |
| 4          | 1          |
| 5          | 1          |
+------------+------------+

多个Lateral View语句

select pageid,mycol1, mycol2 from pageAds 
    lateral view explode(col1) myTable1 as mycol1 
    lateral view explode(col2) myTable2 as mycol2;
    
+--------------+----------+----------+
| pageid       | mycol1   | mycol2   |
+--------------+----------+----------+
| front_page   | 1        | a        |
| front_page   | 1        | b        |
| front_page   | 1        | c        |
| front_page   | 2        | a        |
| front_page   | 2        | b        |
| front_page   | 2        | c        |
| front_page   | 3        | a        |
| front_page   | 3        | b        |
| front_page   | 3        | c        |
| contact_page | 3        | d        |
| contact_page | 3        | e        |
| contact_page | 3        | f        |
| contact_page | 4        | d        |
| contact_page | 4        | e        |
| contact_page | 4        | f        |
| contact_page | 5        | d        |
| contact_page | 5        | e        |
| contact_page | 5        | f        |
+--------------+----------+----------+

对于struct类型可以使用 "."直接取数

select id,cover.key as k, cover.type as t from tablename lateral view explode(covers) myTable1 as cover where cover.key = 'special'

spark.sql中的Array[Struct]类型查询

背景

解决方案

LATERAL VIEW介绍

1 条评论