flume 简介和安装

本文介绍了Apache Flume的基本概念、数据流模型,详细讲解了Flume的安装配置过程,包括下载最新版本、解压、环境配置,并探讨了启动Flume时可能遇到的问题及其解决方案,如事件头中缺失的时间戳问题。通过实践操作,读者可以深入理解Flume的工作原理和配置方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1.flume 简介

    Flume是一种分布式,可靠且可用的服务,用于有效地收集,聚合和移动大量日志数据。它具有基于流数据流的简单灵活的架构。它具有可靠的可靠性机制和许多故障转移和恢复机制,具有强大的容错性。它使用简单的可扩展数据模型,允许在线分析应用程序。

2.数据流模型

  Flume事件被定义为具有字节有效负载和可选字符串属性集的数据流单元。 Flume代理是一个(JVM)进程,它承载事件从外部源流向下一个目标(跃点)的组件。

                       

flume通过source收集应用服务器上日志数据并义event形式将数据发送给channel 组件,期间可将数据进行简单的清洗装换,可用到flume的拦截器,每一个source可将数据发送给多个channel,sink接收指定channel的数据,并将其发送或存储到相应的下游组件或文件系统,sink 可将数据发送到下一个flume的source 或者kafak,也可将数据直接存储到HDFS或者数据库等。

3.安装配置

 3.1下载最新版本

地址:http://flume.apache.org/download.html

目前最新:apache-flume-1.9.0-bin.tar.gz

3.2解压

[hadoop@master ~]$ tar -zxvf apache-flume-1.9.0-bin.tar.gz

3.3环境配置

     指定java 路径

    根据模板生成flume的flume-env.sh环境文件和flume.conf组件配置文件(名字可以修改)

[hadoop@master ~]$ cd /home/hadoop/apache-flume-1.9.0-bin/conf 
[hadoop@master conf]$ cp flume-env.sh.template flume-env.sh
[hadoop@master conf]$ cp  flume-conf.properties.templat flume.conf

   修改flume-env.sh 的JAVA_HOME:

[hadoop@master conf]$ vi flume-env.sh
.......
# Enviroment variables can be set here.

# export JAVA_HOME=/usr/lib/jvm/java-8-oracle
 export JAVA_HOME=/usr/java/jdk1.8.0_131/
............

修改组件配置文件:
 

[hadoop@master conf]$ vi flume.conf
[hadoop@master conf]$ cat flume.conf|grep -v ^#
#为a1 agent配置source,channel,sink,分别取名为s1,c1,k1
a1.sources = s1
a1.channels = c1
a1.sinks = k1

#为a1 agent配置source 信息

#source来源类型为netcat,可以为Avro,seq,syslogtcp,http等等
a1.sources.s1.type = netcat 
#指定s1输出到channel c1,可以为多个
a1.sources.s1.channels = c1

#绑定s1的ip和端口
a1.sources.s1.bind = 0.0.0.0
a1.sources.s1.port = 44444
#不需要header
a1.sources.s1.fileHeader = false

#sink s1配置
#配置agnet a1数据来源为channel c1
a1.sinks.k1.channel = c1

#sink k1输出到hdfs
a1.sinks.k1.type = hdfs
#指定hdfs输出路径,如果根据时间分区,必须为event header添加时间戳
a1.sinks.k1.hdfs.path =hdfs://master:9000/flume-collection/%Y-%m-%d
a1.sinks.k1.hdfs.maxOpenFiles = 5000
a1.sinks.k1.hdfs.batchSize= 100

#为每个event添加时间戳,采用本地时间
a1.sinks.k1.hdfs.useLocalTimeStamp=true
#指定输出给谁为text
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.writeFormat =Text
#指定hdfs文件达到100M进行切换,也可根据时间进行滑动
a1.sinks.k1.hdfs.rollSize = 102400
a1.sinks.k1.hdfs.rollCount = 1000000
#根据时间间隔滑动,为0 取消根据时间间隔滑动,hadoop对大量小文件存储不太友好
a1.sinks.k1.hdfs.rollInterval = 30

#channle配置
a1.channels.c1.type = memory
a1.channels.c1.capacity = 100

配置列表:http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html#component-summary

Component InterfaceType AliasImplementation Class
org.apache.flume.Channelmemoryorg.apache.flume.channel.MemoryChannel
org.apache.flume.Channeljdbcorg.apache.flume.channel.jdbc.JdbcChannel
org.apache.flume.Channelfileorg.apache.flume.channel.file.FileChannel
org.apache.flume.Channelorg.apache.flume.channel.PseudoTxnMemoryChannel
org.apache.flume.Channelorg.example.MyChannel
org.apache.flume.Sourceavroorg.apache.flume.source.AvroSource
org.apache.flume.Sourcenetcatorg.apache.flume.source.NetcatSource
org.apache.flume.Sourceseqorg.apache.flume.source.SequenceGeneratorSource
org.apache.flume.Sourceexecorg.apache.flume.source.ExecSource
org.apache.flume.Sourcesyslogtcporg.apache.flume.source.SyslogTcpSource
org.apache.flume.Sourcemultiport_syslogtcporg.apache.flume.source.MultiportSyslogTCPSource
org.apache.flume.Sourcesyslogudporg.apache.flume.source.SyslogUDPSource
org.apache.flume.Sourcespooldirorg.apache.flume.source.SpoolDirectorySource
org.apache.flume.Sourcehttporg.apache.flume.source.http.HTTPSource
org.apache.flume.Sourcethriftorg.apache.flume.source.ThriftSource
org.apache.flume.Sourcejmsorg.apache.flume.source.jms.JMSSource
org.apache.flume.Sourceorg.apache.flume.source.avroLegacy.AvroLegacySource
org.apache.flume.Sourceorg.apache.flume.source.thriftLegacy.ThriftLegacySource
org.apache.flume.Sourceorg.example.MySource
org.apache.flume.Sinknullorg.apache.flume.sink.NullSink
org.apache.flume.Sinkloggerorg.apache.flume.sink.LoggerSink
org.apache.flume.Sinkavroorg.apache.flume.sink.AvroSink
org.apache.flume.Sinkhdfsorg.apache.flume.sink.hdfs.HDFSEventSink
org.apache.flume.Sinkhbaseorg.apache.flume.sink.hbase.HBaseSink
org.apache.flume.Sinkhbase2org.apache.flume.sink.hbase2.HBase2Sink
org.apache.flume.Sinkasynchbaseorg.apache.flume.sink.hbase.AsyncHBaseSink
org.apache.flume.Sinkelasticsearchorg.apache.flume.sink.elasticsearch.ElasticSearchSink
org.apache.flume.Sinkfile_rollorg.apache.flume.sink.RollingFileSink
org.apache.flume.Sinkircorg.apache.flume.sink.irc.IRCSink
org.apache.flume.Sinkthriftorg.apache.flume.sink.ThriftSink
org.apache.flume.Sinkorg.example.MySink
org.apache.flume.ChannelSelectorreplicatingorg.apache.flume.channel.ReplicatingChannelSelector
org.apache.flume.ChannelSelectormultiplexingorg.apache.flume.channel.MultiplexingChannelSelector
org.apache.flume.ChannelSelectororg.example.MyChannelSelector
org.apache.flume.SinkProcessordefaultorg.apache.flume.sink.DefaultSinkProcessor
org.apache.flume.SinkProcessorfailoverorg.apache.flume.sink.FailoverSinkProcessor
org.apache.flume.SinkProcessorload_balanceorg.apache.flume.sink.LoadBalancingSinkProcessor
org.apache.flume.SinkProcessor 
org.apache.flume.interceptor.Interceptortimestamporg.apache.flume.interceptor.TimestampInterceptor$Builder
org.apache.flume.interceptor.Interceptorhostorg.apache.flume.interceptor.HostInterceptor$Builder
org.apache.flume.interceptor.Interceptorstaticorg.apache.flume.interceptor.StaticInterceptor$Builder
org.apache.flume.interceptor.Interceptorregex_filterorg.apache.flume.interceptor.RegexFilteringInterceptor$Builder
org.apache.flume.interceptor.Interceptorregex_extractororg.apache.flume.interceptor.RegexFilteringInterceptor$Builder
org.apache.flume.channel.file.encryption.KeyProvider$Builderjceksfileorg.apache.flume.channel.file.encryption.JCEFileKeyProvider
org.apache.flume.channel.file.encryption.KeyProvider$Builderorg.example.MyKeyProvider
org.apache.flume.channel.file.encryption.CipherProvideraesctrnopaddingorg.apache.flume.channel.file.encryption.AESCTRNoPaddingProvider
org.apache.flume.channel.file.encryption.CipherProviderorg.example.MyCipherProvider
org.apache.flume.serialization.EventSerializer$Buildertextorg.apache.flume.serialization.BodyTextEventSerializer$Builder
org.apache.flume.serialization.EventSerializer$Builderavro_eventorg.apache.flume.serialization.FlumeEventAvroEventSerializer$Builder
org.apache.flume.serialization.EventSerializer$Builderorg.example.MyEventSerializer$Builder

sink配置中可用变量:

AliasDescription
%{host}Substitute value of event header named “host”. Arbitrary header names are supported.
%tUnix time in milliseconds
%alocale’s short weekday name (Mon, Tue, ...)
%Alocale’s full weekday name (Monday, Tuesday, ...)
%blocale’s short month name (Jan, Feb, ...)
%Blocale’s long month name (January, February, ...)
%clocale’s date and time (Thu Mar 3 23:05:25 2005)
%dday of month (01)
%eday of month without padding (1)
%Ddate; same as %m/%d/%y
%Hhour (00..23)
%Ihour (01..12)
%jday of year (001..366)
%khour ( 0..23)
%mmonth (01..12)
%nmonth without padding (1..12)
%Mminute (00..59)
%plocale’s equivalent of am or pm
%sseconds since 1970-01-01 00:00:00 UTC
%Ssecond (00..59)
%ylast two digits of year (00..99)
%Yyear (2010)
%z+hhmm numeric timezone (for example, -0400)
%[localhost]Substitute the hostname of the host where the agent is running
%[IP]Substitute the IP address of the host where the agent is running
%[FQDN]Substitute the canonical hostname of the host where the agent is running

   在sink输出hdfs时,如果路径中根据时间分区的话

4.启动flume

4.1启动

   开启会话1:

[hadoop@master apache-flume-1.9.0-bin]$ bin/flume-ng agent --conf conf --conf-file conf/flume.conf --name a1 -Dflume.root.logger=INFO,console
Info: Sourcing environment configuration script /home/hadoop/apache-flume-1.9.0-bin/conf/flume-env.sh
Info: Including Hadoop libraries found via (/home/hadoop/hadoop-2.8.1/bin/hadoop) for HDFS access
Info: Including HBASE libraries found via (/home/hadoop/hbase-1.2.6/bin/hbase) for HBASE access
Info: Including Hive libraries found via (/home/hadoop/apache-hive-2.1.1) for Hive access
+ exec /usr/java/jdk1.8.0_131//bin/java -Xmx20m -Dflume.root.logger=INFO,console -cp '/home/hadoop/apache-flume-1.9.0-bin/conf:/home/hadoop/apache-flume-1.9.0-bin/lib/*:/home/hadoop/hadoop-2.8.1/etc/hadoop:/home/hadoop/hadoop-2.8.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/common/*:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.8.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.8.1//contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-1.2.6/conf:/usr/java/jdk1.8.0_131//lib/tools.jar:/home/hadoop/hbase-1.2.6:/home/hadoop/hbase-1.2.6/lib/activation-1.1.jar:/home/hadoop/hbase-1.2.6/lib/aopalliance-1.0.jar:/home/hadoop/hbase-1.2.6/lib/apacheds-i18n-2.0.0-M15.jar:/home/hadoop/hbase-1.2.6/lib/apacheds-kerberos-codec-2.0.0-M15.jar:/home/hadoop/hbase-1.2.6/lib/api-asn1-api-1.0.0-M20.jar:/home/hadoop/hbase-1.2.6/lib/api-util-1.0.0-M20.jar:/home/hadoop/hbase-1.2.6/lib/asm-3.1.jar:/home/hadoop/hbase-1.2.6/lib/avro-1.7.4.jar:/home/hadoop/hbase-1.2.6/lib/commons-beanutils-1.7.0.jar:/home/hadoop/hbase-1.2.6/lib/commons-beanutils-core-1.8.0.jar:/home/hadoop/hbase-1.2.6/lib/commons-cli-1.2.jar:/home/hadoop/hbase-1.2.6/lib/commons-codec-1.9.jar:/home/hadoop/hbase-1.2.6/lib/commons-collections-3.2.2.jar:/home/hadoop/hbase-1.2.6/lib/commons-compress-1.4.1.jar:/home/hadoop/hbase-1.2.6/lib/commons-configuration-1.6.jar:/home/hadoop/hbase-1.2.6/lib/commons-daemon-1.0.13.jar:/home/hadoop/hbase-1.2.6/lib/commons-digester-1.8.jar:/home/hadoop/hbase-1.2.6/lib/commons-el-1.0.jar:/home/hadoop/hbase-1.2.6/lib/commons-httpclient-3.1.jar:/home/hadoop/hbase-1.2.6/lib/commons-io-2.4.jar:/home/hadoop/hbase-1.2.6/lib/commons-lang-2.6.jar:/home/hadoop/hbase-1.2.6/lib/commons-logging-1.2.jar:/home/hadoop/hbase-1.2.6/lib/commons-math-2.2.jar:/home/hadoop/hbase-1.2.6/lib/commons-math3-3.1.1.jar:/home/hadoop/hbase-1.2.6/lib/commons-net-3.1.jar:/home/hadoop/hbase-1.2.6/lib/disruptor-3.3.0.jar:/home/hadoop/hbase-1.2.6/lib/findbugs-annotations-1.3.9-1.jar:/home/hadoop/hbase-1.2.6/lib/guava-12.0.1.jar:/home/hadoop/hbase-1.2.6/lib/guice-3.0.jar:/home/hadoop/hbase-1.2.6/lib/guice-servlet-3.0.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-annotations-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-auth-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-client-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-common-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-hdfs-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-mapreduce-client-app-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-mapreduce-client-common-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-mapreduce-client-core-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-mapreduce-client-jobclient-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-mapreduce-client-shuffle-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-yarn-api-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-yarn-client-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-yarn-common-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hadoop-yarn-server-common-2.5.1.jar:/home/hadoop/hbase-1.2.6/lib/hbase-annotations-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-annotations-1.2.6-tests.jar:/home/hadoop/hbase-1.2.6/lib/hbase-client-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-common-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-common-1.2.6-tests.jar:/home/hadoop/hbase-1.2.6/lib/hbase-examples-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-external-blockcache-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-hadoop2-compat-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-hadoop-compat-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-it-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-it-1.2.6-tests.jar:/home/hadoop/hbase-1.2.6/lib/hbase-prefix-tree-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-procedure-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-protocol-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-resource-bundle-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-rest-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-server-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-server-1.2.6-tests.jar:/home/hadoop/hbase-1.2.6/lib/hbase-shell-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/hbase-thrift-1.2.6.jar:/home/hadoop/hbase-1.2.6/lib/htrace-core-3.1.0-incubating.jar:/home/hadoop/hbase-1.2.6/lib/httpclient-4.2.5.jar:/home/hadoop/hbase-1.2.6/lib/httpcore-4.4.1.jar:/home/hadoop/hbase-1.2.6/lib/jackson-core-asl-1.9.13.jar:/home/hadoop/hbase-1.2.6/lib/jackson-jaxrs-1.9.13.jar:/home/hadoop/hbase-1.2.6/lib/jackson-mapper-asl-1.9.13.jar:/home/hadoop/hbase-1.2.6/lib/jackson-xc-1.9.13.jar:/home/hadoop/hbase-1.2.6/lib/jamon-runtime-2.4.1.jar:/home/hadoop/hbase-1.2.6/lib/jasper-compiler-5.5.23.jar:/home/hadoop/hbase-1.2.6/lib/jasper-runtime-5.5.23.jar:/home/hadoop/hbase-1.2.6/lib/javax.inject-1.jar:/home/hadoop/hbase-1.2.6/lib/java-xmlbuilder-0.4.jar:/home/hadoop/hbase-1.2.6/lib/jaxb-api-2.2.2.jar:/home/hadoop/hbase-1.2.6/lib/jaxb-impl-2.2.3-1.jar:/home/hadoop/hbase-1.2.6/lib/jcodings-1.0.8.jar:/home/hadoop/hbase-1.2.6/lib/jersey-client-1.9.jar:/home/hadoop/hbase-1.2.6/lib/jersey-core-1.9.jar:/home/hadoop/hbase-1.2.6/lib/jersey-guice-1.9.jar:/home/hadoop/hbase-1.2.6/lib/jersey-json-1.9.jar:/home/hadoop/hbase-1.2.6/lib/jersey-server-1.9.jar:/home/hadoop/hbase-1.2.6/lib/jets3t-0.9.0.jar:/home/hadoop/hbase-1.2.6/lib/jettison-1.3.3.jar:/home/hadoop/hbase-1.2.6/lib/jetty-6.1.26.jar:/home/hadoop/hbase-1.2.6/lib/jetty-sslengine-6.1.26.jar:/home/hadoop/hbase-1.2.6/lib/jetty-util-6.1.26.jar:/home/hadoop/hbase-1.2.6/lib/joni-2.1.2.jar:/home/hadoop/hbase-1.2.6/lib/jruby-complete-1.6.8.jar:/home/hadoop/hbase-1.2.6/lib/jsch-0.1.42.jar:/home/hadoop/hbase-1.2.6/lib/jsp-2.1-6.1.14.jar:/home/hadoop/hbase-1.2.6/lib/jsp-api-2.1-6.1.14.jar:/home/hadoop/hbase-1.2.6/lib/junit-4.12.jar:/home/hadoop/hbase-1.2.6/lib/leveldbjni-all-1.8.jar:/home/hadoop/hbase-1.2.6/lib/libthrift-0.9.3.jar:/home/hadoop/hbase-1.2.6/lib/log4j-1.2.17.jar:/home/hadoop/hbase-1.2.6/lib/metrics-core-2.2.0.jar:/home/hadoop/hbase-1.2.6/lib/netty-all-4.0.23.Final.jar:/home/hadoop/hbase-1.2.6/lib/paranamer-2.3.jar:/home/hadoop/hbase-1.2.6/lib/protobuf-java-2.5.0.jar:/home/hadoop/hbase-1.2.6/lib/servlet-api-2.5-6.1.14.jar:/home/hadoop/hbase-1.2.6/lib/servlet-api-2.5.jar:/home/hadoop/hbase-1.2.6/lib/slf4j-api-1.7.7.jar:/home/hadoop/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar:/home/hadoop/hbase-1.2.6/lib/snappy-java-1.0.4.1.jar:/home/hadoop/hbase-1.2.6/lib/spymemcached-2.11.6.jar:/home/hadoop/hbase-1.2.6/lib/xmlenc-0.52.jar:/home/hadoop/hbase-1.2.6/lib/xz-1.0.jar:/home/hadoop/hbase-1.2.6/lib/zookeeper-3.4.6.jar:/home/hadoop/hadoop-2.8.1/etc/hadoop:/home/hadoop/hadoop-2.8.1/share/hadoop/common/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/common/*:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/hdfs/*:/home/hadoop/hadoop-2.8.1/share/hadoop/yarn/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/yarn/*:/home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce/lib/*:/home/hadoop/hadoop-2.8.1/share/hadoop/mapreduce/*:/home/hadoop/hadoop-2.8.1//contrib/capacity-scheduler/*.jar:/home/hadoop/hbase-1.2.6/conf:/home/hadoop/apache-hive-2.1.1/lib/*' -Djava.library.path=:/home/hadoop/hadoop-2.8.1/lib/native:/home/hadoop/hadoop-2.8.1/lib/native org.apache.flume.node.Application --conf-file conf/flume.conf --name a1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/home/hadoop/apache-flume-1.9.0-bin/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hadoop-2.8.1/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/hbase-1.2.6/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/home/hadoop/apache-hive-2.1.1/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
2019-08-03 11:19:25,704 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider.start(PollingPropertiesFileConfigurationProvider.java:62)] Configuration provider starting
2019-08-03 11:19:25,710 (conf-file-poller-0) [INFO - org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:138)] Reloading configuration file:conf/flume.conf
2019-08-03 11:19:25,714 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:s1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:s1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1117)] Added sinks: k1 Agent: a1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:s1
2019-08-03 11:19:25,715 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:s1
2019-08-03 11:19:25,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1
2019-08-03 11:19:25,716 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,717 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:s1
2019-08-03 11:19:25,717 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,717 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,717 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:c1
2019-08-03 11:19:25,717 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addComponentConfig(FlumeConfiguration.java:1203)] Processing:k1
2019-08-03 11:19:25,718 (conf-file-poller-0) [WARN - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.validateConfigFilterSet(FlumeConfiguration.java:623)] Agent configuration for 'a1' has no configfilters.
2019-08-03 11:19:25,742 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:163)] Post-validation flume configuration contains configuration for agents: [a1]
2019-08-03 11:19:25,742 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:151)] Creating channels
2019-08-03 11:19:25,748 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel c1 type memory
2019-08-03 11:19:25,751 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:205)] Created channel c1
2019-08-03 11:19:25,751 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source s1, type netcat
2019-08-03 11:19:25,755 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: k1, type: hdfs
2019-08-03 11:19:25,766 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:120)] Channel c1 connected to [s1, k1]
2019-08-03 11:19:25,770 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:162)] Starting new configuration:{ sourceRunners:{s1=EventDrivenSourceRunner: { source:org.apache.flume.source.NetcatSource{name:s1,state:IDLE} }} sinkRunners:{k1=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@1c14ff02 counterGroup:{ name:null counters:{} } }} channels:{c1=org.apache.flume.channel.MemoryChannel{name: c1}} }
2019-08-03 11:19:25,773 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:169)] Starting Channel c1
2019-08-03 11:19:25,778 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:184)] Waiting for channel: c1 to start. Sleeping for 500 ms
2019-08-03 11:19:25,824 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: c1: Successfully registered new MBean.
2019-08-03 11:19:25,824 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: CHANNEL, name: c1 started
2019-08-03 11:19:26,279 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:196)] Starting Sink k1
2019-08-03 11:19:26,283 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:207)] Starting Source s1
2019-08-03 11:19:26,287 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:155)] Source starting
2019-08-03 11:19:26,301 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: SINK, name: k1: Successfully registered new MBean.
2019-08-03 11:19:26,301 (lifecycleSupervisor-1-1) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(MonitoredCounterGroup.java:95)] Component type: SINK, name: k1 started
2019-08-03 11:19:26,313 (lifecycleSupervisor-1-4) [INFO - org.apache.flume.source.NetcatSource.start(NetcatSource.java:166)] Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/0:0:0:0:0:0:0:0:44444]

开启另外一个会话写入数据:

开启会话2

[root@master ~]# telnet master 44444
Trying 10.0.1.118...
Connected to master.
Escape character is '^]'.
this is a text 1  
OK
this is a text 2
OK
this is a text 3
OK

间隔一会写入4,6会重新写入一个文件。 

会话1输出 数据已写入HDFS:

2019-08-03 11:20:12,371 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.HDFSDataStream.configure(HDFSDataStream.java:57)] Serializer = TEXT, UseRawLocalFileSystem = false
2019-08-03 11:20:12,702 (SinkRunner-PollingRunner-DefaultSinkProcessor) [INFO - org.apache.flume.sink.hdfs.BucketWriter.open(BucketWriter.java:246)] Creating hdfs://master:9000/flume-collection/2019-08-03/FlumeData.1564802412372.tmp
2019-08-03 11:20:44,355 (hdfs-k1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.HDFSEventSink$1.run(HDFSEventSink.java:393)] Writer callback called.
2019-08-03 11:20:44,356 (hdfs-k1-roll-timer-0) [INFO - org.apache.flume.sink.hdfs.BucketWriter.doClose(BucketWriter.java:438)] Closing hdfs://master:9000/flume-collection/2019-08-03/FlumeData.1564802412372.tmp
2019-08-03 11:20:44,456 (hdfs-k1-call-runner-6) [INFO - org.apache.flume.sink.hdfs.BucketWriter$7.call(BucketWriter.java:681)] Renaming hdfs://master:9000/flume-collection/2019-08-03/FlumeData.1564802412372.tmp to hdfs://master:9000/flume-collection/2019-08-03/FlumeData.1564802412372

可以看到,flume在hdfs 会优先生成一个临时文件.tmp文件结尾,当到达配置文件的滑动文件间隔参数或者文件大小时,会自动rename文件。

查看hdfs数据文件:

[hadoop@master ~]$ hadoop fs -cat  /flume-collection/2019-08-03/FlumeData.1564802412372
this is a text 1
this is a text 2
this is a text 3
[hadoop@master ~]$ hadoop fs -cat  /flume-collection/2019-08-03/FlumeData.1564802609786
this is a text 4
this is a text 6
[hadoop@master ~]$ 

 

问题:org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null

 原因:hdfs 采用日期分区(%Y-%m-%d),但是在source header 中并没有添加时间,无法区分。

解决:

       1.为sink使用本地时间戳a1.sinks.k1.hdfs.useLocalTimeStamp=true

       2.在拦截器中添加时间戳a1.sources.s1.interceptors.t1.type=timestamp(t1为拦截器名称)

      3.在source发送event时,在header中添加时间戳

2019-08-03 11:09:27,595 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR - org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:158)] Unable to deliver event. Exception follows.
org.apache.flume.EventDeliveryException: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:464)
        at org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:67)
        at org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:145)
        at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.NullPointerException: Expected timestamp in the Flume event headers, but it was null
        at com.google.common.base.Preconditions.checkNotNull(Preconditions.java:204)
        at org.apache.flume.formatter.output.BucketPath.replaceShorthand(BucketPath.java:251)
        at org.apache.flume.formatter.output.BucketPath.escapeString(BucketPath.java:460)
        at org.apache.flume.sink.hdfs.HDFSEventSink.process(HDFSEventSink.java:379)
        ... 3 more

至此,安装完成!

总结:

1.熟悉flume 是做什么,工作原理

2.根据原理配置搭建flume服务,实践操作

3.熟悉相关配置

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值