flume sink hive的那些坑啊

本文详细记录了在使用Flume整合Hive过程中遇到的各种错误及其解决方案,包括NoClassDefFoundError、NullPointerException、HiveWriter$ConnectException等,并介绍了如何通过调整配置、复制JAR包、创建分桶表及使用ORC存储格式来解决这些问题。

=======flume整合hive=========================
遇到找不到18/03/19 20:08:09 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/streaming/RecordWriter
    at org.apache.flume.sink.hive.HiveSink.createSerializer(HiveSink.java:219)

拷贝jar包
cp    /apps/hive-1.2.2/hcatalog/share/hcatalog/hive-hcatalog-streaming-1.2.2.jar /apps/flume-1.8.0/lib

Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:251)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:460)
        at org.apache.flume.sink.hive.HiveSink.makeEndPoint(HiveSink.java:379)
        at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:290)
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
        ... 3 more
解决:
ag1.sinks.sink1.useLocalTimeStamp=true


java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/common/HCatUtil
拷贝jar包
cp    /apps/hive-1.2.2/hcatalog/share/hcatalog/*.jar /apps/flume-1.8.0/lib


////
8/12/27 06:01:16 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
        at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
        at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
        at org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:383)
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:86)
        ... 6 more
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1193)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.runDDL(HiveEndPoint.java:404)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.createPartitionIfNotExists(HiveEndPoint.java:369)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:276)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:110)
        at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:379)
        at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:376)
        at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:428)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        ... 1 more
18/12/27 06:01:16 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:267)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
        at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
        at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
        ... 3 more
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
        at org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:383)
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:86)
        ... 6 more
Caused by: java.lang.NullPointerException
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1193)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.runDDL(HiveEndPoint.java:404)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.createPartitionIfNotExists(HiveEndPoint.java:369)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:276)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:110)
        at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:379)
        at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:376)
        at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:428)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)

//删除partition后
18/12/27 09:56:49 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
        at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
        at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.hcatalog.streaming.StreamingException: Cannot stream to table that has not been bucketed : {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:71)
        at org.apache.hive.hcatalog.streaming.DelimitedInputWriter.<init>(DelimitedInputWriter.java:115)
        at org.apache.flume.sink.hive.HiveDelimitedTextSerializer.createRecordWriter(HiveDelimitedTextSerializer.java:66)
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:89)

//分桶之后
create external table ue_through (imsi string,dlwan int,ulwan int) partitioned by (time string) CLUSTERED BY (imsi) INTO 4 BUCKETS row format delimited fields terminated by '\t' location '/omc/pm/ue'

18/12/27 10:12:06 INFO hive.HiveSink: sink1: Creating Writer to Hive end point : {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
18/12/27 10:12:06 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.AcidOutputFormat
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:267)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.AcidOutputFormat
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:75)
        at org.apache.hive.hcatalog.streaming.DelimitedInputWriter.<init>(DelimitedInputWriter.java:115)
        at org.apache.flume.sink.hive.HiveDelimitedTextSerializer.createRecordWriter(HiveDelimitedTextSerializer.java:66)
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:89)     
        
网上找的--------看了下源码实现AcidOutputFormat的类只有OrcOutputFormat,那么答案显而易见,Hive表需要stored as orc。
create external table ue_through (imsi string,dlwan int,ulwan int) partitioned by (time string) CLUSTERED BY (imsi) INTO 4 BUCKETS  row format delimited fields terminated by '\t' STORED AS ORC location '/omc/pm/ue' ;


//ORC之后
18/12/27 10:32:01 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
        at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
        at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
        at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$TxnBatchException: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
        at org.apache.flume.sink.hive.HiveWriter.nextTxnBatch(HiveWriter.java:400)
        at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:90)
        ... 6 more
//原来是 这个注解写到原来行的后面导致的
##flume事务控制所需要的缓存容量600条event 
ag1.channels.channel1.transactionCapacity = 600 

评论 7
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值