flume官网
http://flume.apache.org/releases/content/1.9.0/FlumeUserGuide.html
1、安装flume
https://blog.youkuaiyun.com/starkpan/article/details/82765628
2、需求一:将计算机一个文件夹下的文件转移到另一个文件夹下
需求一:将计算机一个文件夹下的文件转移到另一个文件夹下
配置文件名称:file-file.conf
配置文件内容
# Name the components on this agent
a1.sources = src-1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source监听一个文件夹下所有文件
a1.sources.src-1.type = spooldir
a1.sources.src-1.channels = c1
a1.sources.src-1.spoolDir =/Users/panstark/Documents/data/job/spider
a1.sources.src-1.fileHeader = true
# Describe the sink将文件夹中的数据存放到指定目录下
a1.sinks.k1.type = file_roll
a1.sinks.k1.channel = c1
a1.sinks.k1.sink.directory = /Users/panstark/Documents/data/job
# Use a channel which buffers events in memory采用内存通道
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
#将数据源、通道、下沉文件连接起来
a1.sources.src-1.channels = c1
a1.sinks.k1.channel = c1
启动命令
flume-ng agent -n a1 --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/file-file.conf -Dflume.root.logger=INFO,console
3、监听一个文件,将文件内容实时传输到kafka
新建一个配置文件名称为file-kafka.conf
配置文件
# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
# Describe/configure the source监听一个文件下的文件liepin_2019-02-07.json
a2.sources.r1.type = exec
a2.sources.r1.command = tail -F /Users/panstark/Documents/data/job/spider/liepin_2019-02-07.json
a2.sources.r1.channels = c1
# Describe the sink将文件中的数据存入kafka
a2.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a2.sinks.k1.kafka.topic = liepinTopic
a2.sinks.k1.kafka.bootstrap.servers = panstark:9092
a2.sinks.k1.kafka.flumeBatchSize = 20
a2.sinks.k1.kafka.producer.acks = 1
a2.sinks.k1.kafka.producer.linger.ms = 1
a2.sinks.k1.kafka.producer.compression.type = snappy
# Use a channel which buffers events in memory采用内存通道
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
#将数据源、通道、下沉文件连接起来
a2.sources.r1.channels = c1
a2.sinks.k1.channel = c1
启动命令
flume-ng agent -n a2 --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/file-kafka.conf -Dflume.root.logger=INFO,console
启动kafka消费者查看信息
kafka-console-consumer.sh --bootstrap-server panstark:9092 --topic liepinTopic --from-beginning
4、通过log4j传输到flume输出到控制台
flume配置文件
需求三、将log4j日志传输到通过flume传输日志输出
文件名称log4j-log.conf
配置文件
# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1
# Describe/configure the source监听本地端口4141
a3.sources.r1.type = avro
a3.sources.r1.bind = 0.0.0.0
a3.sources.r1.port = 4141
# Describe the sink将文件中的数据输出
a3.sinks.k1.type = logger
# Use a channel which buffers events in memory采用内存通道
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
#将数据源、通道、下沉文件连接起来
a3.sources.r1.channels = c1
a3.sinks.k1.channel = c1
启动命令
flume-ng agent -n a3 --conf $FLUME_HOME/conf -f $FLUME_HOME/conf/log4j-log.conf -Dflume.root.logger=INFO,console
java项目log4j配置文件
log4j.rootLogger=INFO,stdout,flume
log4j.appender.stdout = org.apache.log4j.ConsoleAppender
log4j.appender.stdout.target = System.out
log4j.appender.stdout.layout = org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss,SSS} [%t] [%c] [%p] - %m%n
log4j.appender.flume = org.apache.flume.clients.log4jappender.Log4jAppender
log4j.appender.flume.Hostname = 0.0.0.0
log4j.appender.flume.Port = 4141
log4j.appender.flume.UnsafeMode = true
java生成日志示例
/**
* 模拟产生日志
*/
public class LoggerGenerator {
private static Logger logger = Logger.getLogger(LoggerGenerator.class.getName());
public static void main(String[] args) throws InterruptedException {
int index = 0;
while(true){
Thread.sleep(1000);
logger.info("value is"+index++);
}
}
}
5、通过log4j输出到flume,从flume中输入到kafka
文件名称log4j-flume-kafka.conf
配置文件
# Name the components on this agent
a4.sources = r1
a4.sinks = k1
a4.channels = c1
# Describe/configure the source监听本地端口4141
a4.sources.r1.type = avro
a4.sources.r1.bind = 0.0.0.0
a4.sources.r1.port = 4141
# Describe the sink将文件中的数据存入kafka
a4.sinks.k1.type = org.apache.flume.sink.kafka.KafkaSink
a4.sinks.k1.kafka.topic = liepinTopic
a4.sinks.k1.kafka.bootstrap.servers = localhost:9092
a4.sinks.k1.kafka.flumeBatchSize = 20
a4.sinks.k1.kafka.producer.acks = 1
a4.sinks.k1.kafka.producer.linger.ms = 1
a4.sinks.k1.kafka.producer.compression.type = snappy
# Use a channel which buffers events in memory采用内存通道
a4.channels.c1.type = memory
a4.channels.c1.capacity = 1000
a4.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
#将数据源、通道、下沉文件连接起来
a4.sources.r1.channels = c1
a4.sinks.k1.channel = c1
启动zookeeper
zkServer.sh start
后台启动kafka
kafka-server-start.sh -daemon $KAFKA_HOME/config/server.properties
启动flume
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic liepinTopic --from-beginning
启动项目生成日志
见上一个案例