完美解决hive external表中csv字段内容含“,“逗号的问题

为解决hive表中csv字段内容含","逗号的问题,网上几乎都是说要用org.apache.hadoop.hive.serde2.OpenCSVSerde。
使用方法为:

1、mysql导出时,加一个ENCLOSED BY ‘"’,
示例:

 mysql -h 10.16.0.10 -P 13306 -u root -p123 -e "SELECT * FROM loan.rm_rent_plan limit 100 INTO OUTFILE '/tmp/rm_rent_plan.csv' FIELDS TERMINATED BY ',' ENCLOSED BY '\"' LINES TERMINATED BY '\n'"

这样出来的csv:

"000002219E7B4C819FC011D14B170B2F","2F4F17BC110C45FDA9076E8F8E3CE000","76CBA33718744533A3030E55FC55CA78","26","2015-06-16","2015-07-15","11474.00","11474.00","10364.55","1109.45","174544.31","11474.00","11474.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",\N,"7.200000","10",\N,"10",\N,"transfer","2015-07-15 00:00:00","transfer","2024-05-14 13:35:15"
"000008F2551D4F21BEB386521D2AA8ED","80385054F2784911908CACBFC9D65AE3","9122EE380B7643229EB9F1F5F39A834C","41","2017-09-16","2017-10-15","14668.00","14668.00","13890.48","777.52","0.00","14668.00","14668.00","0.00","0.00","0.00","0.00","0.00","0.00","0.00",\N,"8.200000","10",\N,"10",\N,"transfer","2017-10-15 00:00:00","transfer","2024-05-14 13:35:15"

2、hive外部表使用ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.OpenCSVSerde’,
例如:

CREATE external TABLE acct.tmp_yecai_rm_rent_plan (  id STRING COMMENT 'id',  
 order_id STRING COMMENT '订单号:订单基本信息表id',  ...,  
 update_time TIMESTAMP COMMENT '修改时间'  ) COMMENT '租金计划表'
	ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
	WITH SERDEPROPERTIES ("separatorChar" = ",", "quoteChar" = "\"", "escapeChar" = "\\")
	STORED AS textfile 
	location '/dmp/acct/tmp/yecai/rm_rent_plan/test'
	

这个方案其它都完美,但唯一的问题是\N无法识别为null
网上翻了好久,结果发现OpenCSVSerde确实不支持null value的表示。
需求有人提出了,但还没解决:
https://github.com/ogrodnek/csv-serde/issues/15

本人又是追求完美型的性格,所以接着查找,最后终于找到了,就是hive原生的csv支持转义符自定义, ESCAPED BY ‘\’, 原来如此简单
示例:

CREATE external TABLE acct.tmp_yecai_rm_rent_plan (  id STRING COMMENT 'id',   order_id STRING COMMENT '订单号:订单基本信息表id',   equip_id STRING COMMENT '设备id:设备表表id',   periods INT COMMENT '租金期次',   start_date DATE COMMENT '本期计息开始日',   plan_date DATE COMMENT '计划还款日期',  updated_by STRING COMMENT '修改人',   update_time TIMESTAMP COMMENT '修改时间'  ) COMMENT '租金计划表'
		ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ESCAPED BY '\\'
	STORED AS textfile 
	location '/dmp/acct/tmp/yecai/rm_rent_plan/test'

这样几近完美了,转义,空值问题全部解决了。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值