Hive/Spark/Presto/标准SQL实现行转列和列转行

本文介绍如何使用标准SQL、Presto和Hive进行表格数据的行转列和列转行操作,包括具体SQL语句及Presto的UNNEST功能详解。

     假设有以下两个表格,分别为vtable和htable,期望使用SQL实现相互转化,本文将展示如何分别使用标准SQL、Presto、和Hive实现。

htable
uidkeyvalue
101c111
101c212
101c313
102c121
102c222
102c323
vtable
uidc1c2c3
101111213
102212223

一、标准SQL实现

1.行转列(htable => vtable)

SELECT uid,
   max(CASE WHEN key = 'c1' THEN value END) AS c1,
   max(CASE WHEN key = 'c2' THEN value END) AS c2,
   max(CASE WHEN key = 'c3' THEN value END) AS c3
FROM vtable
GROUP BY uid

 2.列转行(vtable => htable)

SELECT uid, 'c1' AS key, c1 AS value FROM htable
UNION ALL
SELECT uid, 'c2' AS key, c2 AS value FROM htable
UNION ALL
SELECT uid, 'c3' AS key, c3 AS value FROM htable

二、Hive/Spark实现

1. 行转列(htable => vtable)

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  ----一般情况下是用',' ':'分隔符生成{k1:v1,k2:v2}样式的map
  ----这里使用\002,\003作为分割符可以防止key value中的 , :影响map的正确分隔
  SELECT uid,
    str_to_map(concat_ws('\002', collect_list(concat(key, '\003', value))), '\002', '\003') kv
  FROM vtable
  GROUP BY uid
) t

2. 列转行(vtable => htable)

SELECT t1.uid, t2.key, t2.value
FROM htable t1
LATERAL VIEW explode (map(
  'c1', c1,
  'c2', c2,
  'c3', c3
)) t2 as key, value

三、Presto实现

 1. 行转列(htable => vtable)

SELECT
  uid,
  kv['c1'] AS c1,
  kv['c2'] AS c2,
  kv['c3'] AS c3
FROM (
  SELECT uid, map_agg(key, value) kv
  FROM vtable
  GROUP BY uid
) t

2. 列转行(vtable => htable)

SELECT t1.uid, t2.key, t2.value
FROM htable t1
CROSS JOIN unnest (
  array['c1', 'c2', 'c3'],
  array[c1, c2, c3]
) t2 (key, value)

关于PRESTO的UNNEST

UNNEST 用于将 ARRAY 或 MAP 扩展到关系表中。ARRAY中得元素会被放到一个单独的列中,而MAP中的元素会被放到两个列中(key, value)。 UNNEST可以选择性的添加WITH ORDINALITY语句,这样可以在关系表的后面增加一个序号列。

1. 单个数组列:

SELECT student, score
FROM (
  VALUES
    ('LiMing', ARRAY[20, 50]),
    ('WangGang', ARRAY[70,80,90])
) AS x (student, scores)
CROSS JOIN UNNEST(scores) AS t (score);
  student  | score
-----------+---
 LiMing    | 20 
 LiMing    | 50 
 WangGang  | 70 
 WangGang  | 80 
 WangGang  | 90
(5 rows)

2. 多个数组列:

SELECT numbers, animals, n, a
FROM (
  VALUES
    (ARRAY[2, 5], ARRAY['dog', 'cat', 'bird']),
    (ARRAY[7, 8, 9], ARRAY['cow', 'pig'])
) AS x (numbers, animals)
CROSS JOIN UNNEST(numbers, animals) AS t (n, a);
  numbers  |     animals      |  n   |  a
-----------+------------------+------+------
 [2, 5]    | [dog, cat, bird] |    2 | dog
 [2, 5]    | [dog, cat, bird] |    5 | cat
 [2, 5]    | [dog, cat, bird] | NULL | bird
 [7, 8, 9] | [cow, pig]       |    7 | cow
 [7, 8, 9] | [cow, pig]       |    8 | pig
 [7, 8, 9] | [cow, pig]       |    9 | NULL
(6 rows)

3. WITH ORDINALITY 语句:

SELECT numbers, n, a
FROM (
  VALUES
    (ARRAY[2, 5]),
    (ARRAY[7, 8, 9])
) AS x (numbers)
CROSS JOIN UNNEST(numbers) WITH ORDINALITY AS t (n, a);
  numbers  | n | a
-----------+---+---
 [2, 5]    | 2 | 1
 [2, 5]    | 5 | 2
 [7, 8, 9] | 7 | 1
 [7, 8, 9] | 8 | 2
 [7, 8, 9] | 9 | 3
(5 rows)

4. Map+ARRAY:

SELECT
    animals, a, n
FROM (
    VALUES
        (MAP(ARRAY['dog', 'cat', 'bird'], ARRAY[1, 2, 0])),
        (MAP(ARRAY['dog', 'cat'], ARRAY[4, 5]))
) AS x (animals)
CROSS JOIN UNNEST(animals) AS t (a, n);
           animals          |  a   | n
----------------------------+------+---
 {"cat":2,"bird":0,"dog":1} | dog  | 1
 {"cat":2,"bird":0,"dog":1} | cat  | 2
 {"cat":2,"bird":0,"dog":1} | bird | 0
 {"cat":5,"dog":4}          | dog  | 4
 {"cat":5,"dog":4}          | cat  | 5
(5 rows)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值