=======flume整合hive=========================
遇到找不到18/03/19 20:08:09 ERROR node.PollingPropertiesFileConfigurationProvider: Failed to start agent because dependencies were not found in classpath. Error follows.
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/streaming/RecordWriter
at org.apache.flume.sink.hive.HiveSink.createSerializer(HiveSink.java:219)
拷贝jar包
cp /apps/hive-1.2.2/hcatalog/share/hcatalog/hive-hcatalog-streaming-1.2.2.jar /apps/flume-1.8.0/lib
Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:251)
at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:460)
at org.apache.flume.sink.hive.HiveSink.makeEndPoint(HiveSink.java:379)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:290)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
... 3 more
解决:
ag1.sinks.sink1.useLocalTimeStamp=true
java.lang.NoClassDefFoundError: org/apache/hive/hcatalog/common/HCatUtil
拷贝jar包
cp /apps/hive-1.2.2/hcatalog/share/hcatalog/*.jar /apps/flume-1.8.0/lib
////
8/12/27 06:01:16 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
at org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:383)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:86)
... 6 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1193)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.runDDL(HiveEndPoint.java:404)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.createPartitionIfNotExists(HiveEndPoint.java:369)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:276)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:110)
at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:379)
at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:376)
at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:428)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
... 1 more
18/12/27 06:01:16 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:267)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
... 3 more
Caused by: org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[18-12-27] }
at org.apache.flume.sink.hive.HiveWriter.newConnection(HiveWriter.java:383)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:86)
... 6 more
Caused by: java.lang.NullPointerException
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1149)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1193)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1082)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1072)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.runDDL(HiveEndPoint.java:404)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.createPartitionIfNotExists(HiveEndPoint.java:369)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:276)
at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.<init>(HiveEndPoint.java:243)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnectionImpl(HiveEndPoint.java:180)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:157)
at org.apache.hive.hcatalog.streaming.HiveEndPoint.newConnection(HiveEndPoint.java:110)
at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:379)
at org.apache.flume.sink.hive.HiveWriter$8.call(HiveWriter.java:376)
at org.apache.flume.sink.hive.HiveWriter$11.call(HiveWriter.java:428)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
//删除partition后
18/12/27 09:56:49 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hive.hcatalog.streaming.StreamingException: Cannot stream to table that has not been bucketed : {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:71)
at org.apache.hive.hcatalog.streaming.DelimitedInputWriter.<init>(DelimitedInputWriter.java:115)
at org.apache.flume.sink.hive.HiveDelimitedTextSerializer.createRecordWriter(HiveDelimitedTextSerializer.java:66)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:89)
//分桶之后
create external table ue_through (imsi string,dlwan int,ulwan int) partitioned by (time string) CLUSTERED BY (imsi) INTO 4 BUCKETS row format delimited fields terminated by '\t' location '/omc/pm/ue'
18/12/27 10:12:06 INFO hive.HiveSink: sink1: Creating Writer to Hive end point : {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
18/12/27 10:12:06 ERROR flume.SinkRunner: Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.AcidOutputFormat
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:267)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat cannot be cast to org.apache.hadoop.hive.ql.io.AcidOutputFormat
at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.<init>(AbstractRecordWriter.java:75)
at org.apache.hive.hcatalog.streaming.DelimitedInputWriter.<init>(DelimitedInputWriter.java:115)
at org.apache.flume.sink.hive.HiveDelimitedTextSerializer.createRecordWriter(HiveDelimitedTextSerializer.java:66)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:89)
网上找的--------看了下源码实现AcidOutputFormat的类只有OrcOutputFormat,那么答案显而易见,Hive表需要stored as orc。
create external table ue_through (imsi string,dlwan int,ulwan int) partitioned by (time string) CLUSTERED BY (imsi) INTO 4 BUCKETS row format delimited fields terminated by '\t' STORED AS ORC location '/omc/pm/ue' ;
//ORC之后
18/12/27 10:32:01 WARN hive.HiveSink: sink1 : Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
org.apache.flume.sink.hive.HiveWriter$ConnectException: Failed connecting to EndPoint {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:99)
at org.apache.flume.sink.hive.HiveSink.getOrCreateWriter(HiveSink.java:344)
at org.apache.flume.sink.hive.HiveSink.drainOneBatch(HiveSink.java:295)
at org.apache.flume.sink.hive.HiveSink.process(HiveSink.java:253)
at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.flume.sink.hive.HiveWriter$TxnBatchException: Failed acquiring Transaction Batch from EndPoint: {metaStoreUri='thrift://hdp01:9083', database='myhive', table='ue_through', partitionVals=[] }
at org.apache.flume.sink.hive.HiveWriter.nextTxnBatch(HiveWriter.java:400)
at org.apache.flume.sink.hive.HiveWriter.<init>(HiveWriter.java:90)
... 6 more
//原来是 这个注解写到原来行的后面导致的
##flume事务控制所需要的缓存容量600条event
ag1.channels.channel1.transactionCapacity = 600
本文详细记录了在使用Flume整合Hive过程中遇到的各种错误及其解决方案,包括NoClassDefFoundError、NullPointerException、HiveWriter$ConnectException等,并介绍了如何通过调整配置、复制JAR包、创建分桶表及使用ORC存储格式来解决这些问题。
3433





