Hive行转列与列转行操作详解

Hive 中的行转列和列转行操作

在Hive中,行转列和列转行是常见的数据转换操作,用于满足不同的数据分析需求。下面我将详细介绍这两种操作及其实现方式。

一、行转列(行转多列)

行转列是指将多行数据按照某个键值聚合,转换为一行中的多列。

常用方法:

  1. 使用 collect_listcollect_set 配合 case when

    SELECT 
      user_id,
      MAX(CASE WHEN attribute = 'age' THEN value END) AS age,
      MAX(CASE WHEN attribute = 'gender' THEN value END) AS gender,
      MAX(CASE WHEN attribute = 'city' THEN value END) AS city
    FROM user_attributes
    GROUP BY user_id;
    
  2. 使用 map 函数

    SELECT 
      user_id,
      map('age', age, 'gender', gender, 'city', city) AS user_info
    FROM users;
    
  3. 使用 str_to_map 函数(适用于字符串转换)

    SELECT 
      user_id,
      str_to_map(concat_ws(',', 
        concat_ws(':', 'age', age),
        concat_ws(':', 'gender', gender),
        concat_ws(':', 'city', city)
      )) AS user_info
    FROM users;
    

二、列转行(多列转行)

列转行是指将一行中的多列数据转换为多行数据。

常用方法:

  1. 使用 LATERAL VIEW explode

    SELECT 
      user_id,
      attr_info.key AS attribute,
      attr_info.value AS value
    FROM users
    LATERAL VIEW explode(map(
      'age', age,
      'gender', gender,
      'city', city
    )) attr_info AS key, value;
    
  2. 使用 UNION ALL

    SELECT user_id, 'age' AS attribute, age AS value FROM users
    UNION ALL
    SELECT user_id, 'gender' AS attribute, gender AS value FROM users
    UNION ALL
    SELECT user_id, 'city' AS attribute, city AS value FROM users;
    
  3. 使用 stack 函数

    SELECT 
      user_id,
      stacked_data.attribute,
      stacked_data.value
    FROM users
    LATERAL VIEW stack(3,
      'age', age,
      'gender', gender,
      'city', city
    ) stacked_data AS attribute, value;
    

三、特殊场景处理

1. JSON格式数据转换

如果数据是JSON格式,可以使用get_json_object或JSON函数:

-- 行转列
SELECT 
  user_id,
  get_json_object(user_json, '$.age') AS age,
  get_json_object(user_json, '$.gender') AS gender
FROM users_with_json;

-- 列转行
SELECT 
  user_id,
  attr.key AS attribute,
  attr.value AS value
FROM users_with_json
LATERAL VIEW json_tuple(user_json, 'age', 'gender', 'city') jt AS age, gender, city
LATERAL VIEW explode(map(
  'age', jt.age,
  'gender', jt.gender,
  'city', jt.city
)) attr AS key, value;

2. 数组数据转换

-- 数组列转行
SELECT 
  user_id,
  exploded_tag AS tag
FROM users_with_tags
LATERAL VIEW explode(tags) t AS exploded_tag;

-- 多行转数组(行转列)
SELECT 
  user_id,
  collect_list(tag) AS tags
FROM user_tags
GROUP BY user_id;

四、性能考虑

  1. 行转列通常比列转行性能更好,因为减少了数据量
  2. 对于大数据集,collect_listcollect_set更快,但后者去重
  3. LATERAL VIEW explode会产生多行数据,可能导致数据膨胀

五、实际应用示例

假设有一个订单表,需要统计每个用户的各类商品购买数量:

-- 原始表结构:user_id, product_type, quantity
-- 行转列:每个用户一行,各商品类型作为列
SELECT 
  user_id,
  SUM(CASE WHEN product_type = 'book' THEN quantity ELSE 0 END) AS book_qty,
  SUM(CASE WHEN product_type = 'food' THEN quantity ELSE 0 END) AS food_qty,
  SUM(CASE WHEN product_type = 'clothes' THEN quantity ELSE 0 END) AS clothes_qty
FROM orders
GROUP BY user_id;

-- 如果需要将上面的结果再转回多行形式
SELECT 
  user_id,
  stacked.product_type,
  stacked.quantity
FROM (
  SELECT 
    user_id,
    SUM(CASE WHEN product_type = 'book' THEN quantity ELSE 0 END) AS book_qty,
    SUM(CASE WHEN product_type = 'food' THEN quantity ELSE 0 END) AS food_qty,
    SUM(CASE WHEN product_type = 'clothes' THEN quantity ELSE 0 END) AS clothes_qty
  FROM orders
  GROUP BY user_id
) t
LATERAL VIEW stack(3,
  'book', book_qty,
  'food', food_qty,
  'clothes', clothes_qty
) stacked AS product_type, quantity;

通过灵活运用这些技巧,可以满足Hive中各种行列转换的需求。

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值