本实例简单的对json字符串使用自定义函数进行解析,方便hive的使用。
首先数据长这样:
很多几十万条:
{"movie":"2081","rate":"5","timeStamp":"977536266","uid":"106"}
{"movie":"1357","rate":"3","timeStamp":"977536364","uid":"106"}
{"movie":"902","rate":"3","timeStamp":"977536244","uid":"106"}
{"movie":"1296","rate":"4","timeStamp":"977536022","uid":"106"}
{"movie":"908","rate":"4","timeStamp":"977535797","uid":"106"}
{"movie":"838","rate":"4","timeStamp":"977536195","uid":"106"}
{"movie":"3044","rate":"4","timeStamp":"977536195","uid":"106"}
{"movie":"2243","rate":"4","timeStamp":"977536106","uid":"106"}
我们用hive分析这些数据时内置函数已经满足不了需求,需要自定义函数来实现。
需求:想要传递一个json串和index ,就能返回相应的字段,比如:
select myjson(json,1),myjson(json,2),myjson(json,3),myjson(json,4) from xx;
能够返回上面相应的:2081,5,977536266,106 hive中的json解析函数也可以,文末介绍。
需要写个java类,来进行实现。
hive自定义函数实现的步骤:
1、写 java程序实现想要的功能,传入json串和角标返回相应