flume介绍
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application
简单来说是一个分布式的日志采集系统。简单易用,高容错
这次使用的是spooldir source,kafkachannnel channel, hdfs sink.,以生产方式为列使用两个fluem。
第一个flume 把数据推到kafkachannel
第二个flume把数据从kafkachannel落到hdfs中
flume1
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# spooldir source
a1.sources.r1.type = spooldir
a1.sources.r1.channels = c1
a1.sources.r1.spoolDir = /home/test10
#a1.sources.r1.fileHeader = true
# interceptor 拦截timestamp,简单过滤数据
a1.sources.r1.interceptors=i1 i2
a1.sources.r1.interceptors.i1.type=regex_filter
a1.sources.r1.interceptors.i1.regex=(.*)installed(.*)
a1.sources.r1.interceptors.i2.type = regex_extractor
a1.sources.r1.interceptors.i2.regex = ^(?:\\n)?(\\d\\d\\d\\d-\\d\\d-\\d\\d\\s\\d\\d:\\d\\d:\\d\\d)
a1.sources.r1.interceptors.i2.serializers = s1
a1.sources.r1.interceptors.i2.serializers.s1.type = org.apache.flume.interceptor.RegexExtractorInterceptorMillisSerializer
a1.sources.r1.interceptors.i2.serializers<