hive查询结果转为json格式

本文介绍了在Hive中如何处理JSON格式的数据,包括解析含有JSON字段的表数据以及将Hive查询结果转换为JSON格式。对于问题1,讲解了解析JSON格式字段的策略;对于问题2,详细阐述了将Hive ETL操作后的数据存储为JSON格式的方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题描述:1、hive中表数据字段为json格式,如何解析数据及处理数据?

                  2、hive中如何将etl处理后的数据存为json格式?

1、解析hive中表数据字段为json格式的数据

app_name,container,platform,get_json_object(biz,'$.desc') ===> pv

-- 传统计算,给定参数值:
SELECT
 count(distinct distinct_id) UV,   -- 2
 count(distinct_id) PV      -- 61
from ods.t_user_behavior 
WHERE get_json_object(biz,'$.desc') = '点击-新运营位'
and platform = "miniProgram"
and container = "WX_MiniApp"
and app_name = "xxxxx小程序"
and get_json_object(biz,'$.action') is not null;

--------------------
-- 自助化计算:
-- 给定计算维度:
-- 1、常规维度:app_name,container,platform,
-- 2、业务维度:get_json_object(biz,'$.desc')
-- 3、指标:pv
-- 步骤一:根据设定的维度计算所有情况(T+1):
SELECT
app_name,
platform,
container,
get_json_object(biz,'$.desc') desc,
count(distinct_id) pv
from ods.t_user_behavior 
WHERE get_json_object(biz,'$.action') is not null
group by app_name,container,platform,get_json_object(biz,'$.desc');

-- 步骤二:将汇总数据存入自助分析表中:
-- http://wiki.mwbyd.cn/display/MwBigData/user_action_self_help_d_di
insert overwrite table dm.user_action_self_help_d_di partition(dt)

-- 步骤三:用户自主化展示:(根据具体参数值,绘制图表)
SELECT
t.*
from dm.user_action_self_help_d_di t
WHERE desc = '点击-新运营位'
and platform = "miniProgram"
and container = "WX_MiniApp"
and app_name = "xxxxx小程序"
and desc is not null;

--====================================
-- 查询原理:
with a as
(
SELECT
app_name,
platform,
container,
get_json_object(biz,'$.desc') desc,
count(distinct_id) pv
from ods.t_user_behavior 
WHERE get_json_object(biz,'$.action') is not null
group by app_name,container,platform,get_json_object(biz,'$.desc')
)

SELECT
a.*
from a 
WHERE desc = '点击-新运营位'
and platform = "miniProgram"
and container = "WX_MiniApp"
and app_name = "xxxxx小程序"
and desc is not null;

2、hive查询结果转为json格式的方法:

-->>>>>>>>>>>>>>>>>>>>>>[方法一]:
concat('{\"date\":\"','$date','\",\"actions\":[',concat('{\"itemId\":\"',itemId,
'\",\"actionTime\":',actionTime,
',\"action\":\"',action,
'\",\"sceneId\":\"',sceneId,
'\",\"userId\":\"',userId,
'\",\"blogId\":\"',nvl(blogId, '-'),
'\",\"uuid_tt_dd\":\"',uuid_tt_dd,
'\"}'),']}') as value

concat('{\"itemId\":\"',itemId,
'\",\"actionTime\":',actionTime,
',\"action\":\"',action,
'\",\"sceneId\":\"',sceneId,
'\",\"userId\":\"',userId,
'\",\"blogId\":\"',nvl(blogId, '-'),
'\",\"uuid_tt_dd\":\"',uuid_tt_dd,
'\"}')
------------
insert overwrite table dm.user_action_self_help_d_di partition(dt)
SELECT
"xxx",
"xxx",
"xxx",
concat('{\"app_name\":\"',app_name,
'\",\"platform\":',platform,
',\"container\":\"',container,
'\"}') commen,
concat('{\"desc\":\"',get_json_object(biz,'$.desc'),
'\"}') biz,
concat('{\"pv\":\"',count(distinct_id),
'\"}') matric,
from_unixtime(unix_timestamp(current_timestamp), 'yyyy-MM-dd HH:mm:ss') as crt_time
from ods.t_user_behavior 
WHERE get_json_object(biz,'$.desc') is not null
group by app_name,container,platform,get_json_object(biz,'$.desc');

------------------


-->>>>>>>>>>>>>>>>>>>>>>[方法二]:个人开发接口
-- 优点:书写简单;缺点:存入的值的类型都为string类型
-- default.generate_json(key1,value1,key2,value2,key3,value3,...)

insert overwrite table dm.user_action_self_help_d_di partition(dt='${T1D}')
select
"xxx",
"xxx",
"xxx",
default.generate_json('app_name',app_name,'platform',platform,'container',container),
default.generate_json('desc',get_json_object(biz,'$.desc')),
default.generate_json('pv',count(distinct_id)),
from_unixtime(unix_timestamp(current_timestamp), 'yyyy-MM-dd HH:mm:ss')
from ods.t_user_behavior
WHERE get_json_object(biz,'$.desc') is not null
group by app_name,container,platform,get_json_object(biz,'$.desc');

 

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值