解决log传输的高延时,容错,负载均衡,压缩等问题。
Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.
配置文件
A)配置Source
B)配置CHannel
C)配置Sink
D)把以上三个组件串起来
a1:agent
r1:source
k1:sink
c1:channel
a1.source=r1
a1.sinks=k1
a1.channels=c1
//必须
a1.sources.r1.type=netcat
a1.sources.r1.bind=hadoop000
a1.sources.r1.port=44444
a1.sinks.k1.type=logger
a1.channels.c1.type=memory
a1.sources.r1.channecls=c1 //注意soucrce可以多个channel
a1.sinks.k1.channel=c1 //注意sink只有一个channel
启动agent
flume-ng agent
–name a1
–conf $FLUME_HOME/conf
–conf-file $FLUME_HOME/conf/example.conf
-Dflume.root.logger=INFO,console
Event: { headers:{} body: 68 65 6C 6C 6F 0D hello.}
Event是FLUME数据传输的基本单元
Event = 可选的header + byte array
从A服务器拿取日志到B服务器上则
exec source + memory channel +avro sink
avro source + memory channel + logger sink
FLUME PUSH方式
FLUME PULL方式
simple-agent.sources=netcat-source
simple-agent.sinks=spark-sink
simple-agent.channels=memory-channel
simple-agent.sources.netcat-source.type=netcat
simple-agent.sources.netcat-source.bind=hadoop001
simple-agetn.sources.netcat-source.port=44444
simple-agent.sinks.spark-sink.type=org.apache.spark.streaming.flume.sink.SparkSink
simple-agent.sinks.spark-sink.hostname=hadoop001
simple-agent.sinks.spark-sink.port=41414
simple-agent.channels.memory-channels.type=memory
simple-agent.sources.netcat-source.channels=memory-channel
simple-agent.sinks.spark-sink.channel=memory-channel