HIVE实战处理(十) hive函数json_tuple、get_json_object中遇到的json格式不符合规范问题

本文介绍了如何处理不符合标准格式的JSON数据,并提供了详细的SQL预处理及解析案例,包括单个JSON对象和JSON数组的处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、json解析格式的时候失败的时候检查格式问题。
注意: 后面的value如果是json的话,外层{}的外面不需要再有""

--处理前的代码
select line,get_json_object(line,'$.ext')
from (select '{"ext":"{"isNeedToMigu":"0"}"}' as line) a
;

因为格式问题,所以解析json失败,代码结果:
在这里插入图片描述

--处理后的代码
select line,line_new,get_json_object(line_new,'$.ext')
from (select line,regexp_replace(regexp_replace(line,'\\}"','\\}'),'\\"\\{','\\{') line_new
from (select '{"ext":"{"isNeedToMigu":"0"}"}' as line) a
) a

处理后的json解析代码执行结果:
在这里插入图片描述

在这里插入图片描述
2、复杂的原始json数据(json格式有误,导致正常解析json函数报错,需要做预处理)

  1. 准备原始json
--原始json
{"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}}],"ext":{"isNeedToMigu":"0","userIp":"10.181.8.35","root-client-info":"{"clientIP":"10.181.8.35","clientId":"b23d2604-a03d-4251-9d62-c675810673cb","clientType":"MIDDLE_WARE","clientUA":"Apache-HttpClient/4.5.3 (Java/1.8.0_191)"}","disablemigu":"0","client_id":"b23d2604-a03d-4251-9d62-c675810673cb","usernum":"8617876994298","appType":"1","traceid":"b767abbfc95faf74d23328e3b1ab0b41","sourceId":"012300"},"registerMode":1,"userInfo":{"appID":"108","createTime":"20200610160423","identityID":"1305548101","identityType":0,"individualProfile":{},"profileType":0}}
  1. 错误不符合规范的已经标示出
    在这里插入图片描述
    3)预处理后的json解析代码
select
b.accountInfoList,b.userInfo,b.registermode
from (select line, regexp_replace(regexp_replace(line,'\\}"','\\}'),'\\"\\{','\\{') reqbodyparam --把"{和}"转化为{}
from 
(select '{"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}}],"ext":{"isNeedToMigu":"0","userIp":"10.181.8.35","root-client-info":"{"clientIP":"10.181.8.35","clientId":"b23d2604-a03d-4251-9d62-c675810673cb","clientType":"MIDDLE_WARE","clientUA":"Apache-HttpClient/4.5.3 (Java/1.8.0_191)"}","disablemigu":"0","client_id":"b23d2604-a03d-4251-9d62-c675810673cb","usernum":"8617876994298","appType":"1","traceid":"b767abbfc95faf74d23328e3b1ab0b41","sourceId":"012300"},"registerMode":1,"userInfo":{"appID":"108","createTime":"20200610160423","identityID":"1305548101","identityType":0,"individualProfile":{},"profileType":0}}' as line) a
) a lateral view json_tuple(reqbodyparam, 'accountInfoList', 'userInfo','registerMode') b as accountinfolist, userinfo,registermode;

执行上面代码结果:
在这里插入图片描述
3、解析json数组
1)准备原始json数组

--原始json数组
{"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}},{"userInfo":{"appID":"108","createTime":"20200610160423"}}]}

json格式正确,不用预处理
在这里插入图片描述
2)准备解析json代码

--[0]:json数组中拿出第一个
select
c.accountInfoList_pos,c.accountInfoList_json
from (select
'param0={"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}},{"userInfo":{"appID":"108","createTime":"20200610160423"}}]}' line) a
lateral view posexplode(array(split(regexp_replace(regexp_replace(substr(a.line,8),'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')[0])) c as accountInfoList_pos, accountInfoList_json

--json数组中的json全部取出来
select accountInfo_json_info
from (select
regexp_replace(substr(a.line,8),'\\},\\{\\"','\\}@@\\{\\"') as accountInfoList_json
from (select
'param0={"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}},{"userInfo":{"appID":"108","createTime":"20200610160423"}}]}' line) a
) b
lateral view explode(split(accountInfoList_json,'@@')) accountInfo as accountInfo_json_info

代码执行结果如下:
在这里插入图片描述
在这里插入图片描述

4、不符合json格式和json数组组合在一起的json数据

select
b.accountInfoList,b.userInfo,b.registermode,accountInfoList_pos, accountInfoList_json
from (select line, regexp_replace(regexp_replace(line,'\\}"','\\}'),'\\"\\{','\\{') reqbodyparam --把"{和}"转化为{}
from 
(select '{"accountInfoList":[{"accountName":"17876994298","accountType":2,"appID":"108","ext":{"userNumber":"8617876994298"}}],"ext":{"isNeedToMigu":"0","userIp":"10.181.8.35","root-client-info":"{"clientIP":"10.181.8.35","clientId":"b23d2604-a03d-4251-9d62-c675810673cb","clientType":"MIDDLE_WARE","clientUA":"Apache-HttpClient/4.5.3 (Java/1.8.0_191)"}","disablemigu":"0","client_id":"b23d2604-a03d-4251-9d62-c675810673cb","usernum":"8617876994298","appType":"1","traceid":"b767abbfc95faf74d23328e3b1ab0b41","sourceId":"012300"},"registerMode":1,"userInfo":{"appID":"108","createTime":"20200610160423","identityID":"1305548101","identityType":0,"individualProfile":{},"profileType":0}}' as line) a
) a lateral view json_tuple(reqbodyparam, 'accountInfoList', 'userInfo','registerMode') b as accountinfolist, userinfo,registermode
lateral view posexplode(array(split(regexp_replace(regexp_replace(b.accountInfoList,'\\}\\,\\{','\\}\\|\\|\\{'),'\\[|\\]',''), '\\|\\|')[0])) c as accountInfoList_pos, accountInfoList_json  --[0]是从json数组中拿出第一个

;

代码执行结果:

在这里插入图片描述

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值