hive xmlserde,如何将xml文件加载到Hive中

Im working on Hive tables im having the following problem. I am having more than 1 billion of xml files in my HDFS. What i want to do is, Each xml file having the 4 different sections. Now i want to split and load the each part in the each table for every xml file

Example :

1233222

// having lot of xml tages

// having lot of xml tages

// having lot of xml tages

// having lot of xml tages

And i have the four tables

section1Table

id section1 // fields

section2Table

id section2

section3Table

id section3

section4Table

id section4

Now i want to split and load the data into each table.

How can i achieve this . Can anyone help me

Thanks

UPDATE

I have tried the following

CREATE EXTERNAL TABLE test(name STRING) LOCATION '/user/sornalingam/zipped/output/Tagged/t1';\

SELECT xpath (name, '//section1') FROM test LIMIT 1 ;

but i got the following error

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"name":"<?xml version='1.0' encoding='iso-8859-1'?>"}

解决方案

You have several options:

Load the XML into a Hive table with a string column, one per row (e.g. CREATE TABLE xmlfiles (id int, xmlfile string). Then use an XPath UDF to do work on the XML.

Since you know the XPath's of what you want (e.g. //section1), follow the instructions in the second half of this tutorial to ingest directly into Hive via XPath.

Map your XML to Avro as described here because a SerDe exists for seamless Avro-to-Hive mapping.

Use XPath to store your data in a regular text file in HDFS and then ingest that into Hive.

It depends on your level of experience and comfort with these approaches.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值