通过spark sql创建HIVE的分区表

本文介绍了一次使用Spark SQL向Hive数据库导入大数据的过程,并详细记录了遇到的问题及解决方法,特别是解决了分区表中出现的大小写敏感BUG。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

今天需要通过导入文本中的数据到HIVE数据库,而且因为预设该表的数据会比较大,所以采用分区表的设计方案。将表按地区和日期分区。在这个过程出现过一些BUG,记录以便后期查看。

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\
  PARTITIONED BY(AREASTRING,OBUDATE STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ")
 
 spark.sql("set  hive.exec.dynamic.partition.mode = nonstrict")
 spark.sql("set  hive.exec.dynamic.partition = true")

# print("创建数据库完成")
 if addoroverwrite:
     # 追加
     spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(AREA,OBUDATE) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\
                 RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS AREA,SUBSTR(OBUTIME,1,10) AS OBUDATEFROM " + tablename + "_tmp")
z执行脚本后出现以下错误:

Partition spec {area=, obudate=, AREA=gz, OBUDATE=2017-01-} contains non-partition columns;

经过度娘,有提到分区表中大小写的BUG,于是修改脚本,将分区字段小写,执行成功。修改后的脚本:

 spark.sql("use oracledb")
 spark.sql("CREATE TABLE IF NOT EXISTS " + tablename + " (OBUID STRING, BUS_ID STRING,REVTIME STRING,OBUTIME STRING,LONGITUDE STRING,LATITUDE STRING,\
 GPSKEY STRING,DIRECTION STRING,SPEED STRING,RUNNING_NO STRING,DATA_SERIAL STRING,GPS_MILEAGE STRING,SATELLITE_COUNT STRING,ROUTE_CODE STRING,SERVICE STRING)\
  PARTITIONED BY(area STRING,obudate STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' ")
 # 设置参数
 # hive > set  hive.exec.dynamic.partition.mode = nonstrict;
 # hive > set  hive.exec.dynamic.partition = true;
 spark.sql("set  hive.exec.dynamic.partition.mode = nonstrict")
 spark.sql("set  hive.exec.dynamic.partition = true")

# print("创建数据库完成")
 if addoroverwrite:
     # 追加
     spark.sql("INSERT INTO TABLE " + tablename + " PARTITION(area,obudate) SELECT OBUID,BUS_ID, REVTIME, OBUTIME,LONGITUDE ,LATITUDE,GPSKEY,DIRECTION,SPEED,\
                 RUNNING_NO,DATA_SERIAL,GPS_MILEAGE, SATELLITE_COUNT ,ROUTE_CODE,SERVICE,'gz' AS area ,SUBSTR(OBUTIME,1,10) AS obudate FROM " + tablename + "_tmp")

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值