Flume日志收集配置实战-优快云博客

本文详细介绍了Flume的日志收集系统配置方法，包括安装配置、故障转移、负载均衡等场景，以及如何通过组合Source、Channel和Sink实现高效稳定的数据传输。提供了具体的配置示例，如exec_source、syslogtcp_source、memory_channel、hdfs_sink和logger_sink的应用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

一安装

--日志收集工具---

应用场景：海量日志采集聚合和传输系统

支持日志系统中制定各类数据发送方，用于数据收集，简单处理
，写道各种数据接收方

----安装配置-----
1.修改环境变量。使用vim打开用户环境变量。
sudo vim /etc/profile
追加
#flume config
export FLUME_HOME=/apps/flume
export FLUME_CONF_DIR=$FLUME_HOME/conf
export PATH=$FLUME_HOME/bin:$PATH
使之生效
source /etc/profile

2.配置Flume。切换到/apps/flume/conf目录，并将配置文件flume-env.sh.template重命名为flume-env.sh。编辑
export JAVA_HOME=/apps/java
3.测试
flume-ng version

二故障转移

1.切换到/apps/flume/conf目录下，创建failover_sink.conf文件

2.创建failover_s1.conf


# Name the components on this agent
a2.sources = r1
a2.sinks = k1
a2.channels = c1
 
# Describe/configure the source
a2.sources.r1.type = avro
a2.sources.r1.channels = c1
a2.sources.r1.bind = localhost
a2.sources.r1.port = 44433
 
# Describe the sink
a2.sinks.k1.type = logger
a2.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a2.channels.c1.type = memory
a2.channels.c1.capacity = 1000
a2.channels.c1.transactionCapacity = 100

3.创建failover_s2.conf文件


# Name the components on this agent
a3.sources = r1
a3.sinks = k1
a3.channels = c1
 
# Describe/configure the source
a3.sources.r1.type = avro
a3.sources.r1.channels = c1
a3.sources.r1.bind = localhost
a3.sources.r1.port = 44455
 
# Describe the sink
a3.sinks.k1.type = logger
a3.sinks.k1.channel = c1

# Use a channel which buffers events in memory
a3.channels.c1.type = memory
a3.channels.c1.capacity = 1000
a3.channels.c1.transactionCapacity = 100

4.在当前目录（/apps/flume/conf）下，执行以下命令启动44433端口

flume-ng agent -c . -f failover_s1.conf -n a2 -Dflume.root.logger=INFO,console

5.另起一个终端模拟器，执行以下命令启动44455端口

flume-ng agent -c . -f failover_s2.conf -n a3 -Dflume.root.logger=INFO,console

6.另启第三个终端模拟器，执行以下命令，启动failover_sink.conf

flume-ng agent -c . -f failover_sink.conf -n a1 -Dflume.root.logger=INFO,console

7.另起第四个终端模拟器，生成测试log

echo "<37>bingo1" | nc localhost 5140

三负载均衡 load_balance

1.切换到/apps/flume/conf目录下，创建load_sink.conf文件

a1.sources = r1  
a1.sinks = k1 k2  
a1.channels = c1  
   
a1.sinkgroups = g1  
a1.sinkgroups.g1.sinks = k1 k2  
a1.sinkgroups.g1.processor.type = load_balance  
a1.sinkgroups.g1.processor.backoff = true  
a1.sinkgroups.g1.processor.selector = round_robin  
   
# Describe/configure the source  
a1.sources.r1.type = syslogtcp  
a1.sources.r1.port = 5140  
a1.sources.r1.host = localhost  
a1.sources.r1.channels = c1  
   
# Describe the sink  
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1  
a1.sinks.k1.hostname = localhost  
a1.sinks.k1.port = 44433  
   
a1.sinks.k2.type = avro  
a1.sinks.k2.channel = c1  
a1.sinks.k2.hostname = localhost  
a1.sinks.k2.port = 44455  
   
# Use a channel which buffers events in memory  
a1.channels.c1.type = memory  
a1.channels.c1.capacity = 1000  
a1.channels.c1.transactionCapacity = 100

2.创建load_s1.conf

#Name the components on this agent  
a2.sources = r1  
a2.sinks = k1  
a2.channels = c1  
   
#Describe/configure the source  
a2.sources.r1.type = avro  
a2.sources.r1.channels = c1  
a2.sources.r1.bind = localhost  
a2.sources.r1.port = 44433  
   
#Describe the sink  
a2.sinks.k1.type = logger  
a2.sinks.k1.channel = c1  
 
#Use a channel which buffers events in memory  
a2.channels.c1.type = memory  
a2.channels.c1.capacity = 1000  
a2.channels.c1.transactionCapacity = 100

3.创建load_s2.conf文件

#Name the components on this agent  
a3.sources = r1  
a3.sinks = k1  
a3.channels = c1  
   
#Describe/configure the source  
a3.sources.r1.type = avro  
a3.sources.r1.channels = c1  
a3.sources.r1.bind = localhost  
a3.sources.r1.port = 44455  
   
#Describe the sink  
a3.sinks.k1.type = logger  
a3.sinks.k1.channel = c1  
 
#Use a channel which buffers events in memory  
a3.channels.c1.type = memory  
a3.channels.c1.capacity = 1000  
a3.channels.c1.transactionCapacity = 100

4.在当前目录（/apps/flume/conf）下，执行以下命令启动44433端口

flume-ng agent -c . -f load_s1.conf -n a2 -Dflume.root.logger=INFO,console

5另起一个终端模拟器，执行以下命令启动44455端口

flume-ng agent -c . -f load_s2.conf -n a3 -Dflume.root.logger=INFO,console

6.另启第三个终端模拟器，执行以下命令，启动load_sink.conf

flume-ng agent -c . -f load_sink.conf -n a1 -Dflume.root.logger=INFO,console

7.另起第四个终端模拟器，生成测试log，注意在测试产生log时，要一行一行地输入，输入太快，容易落到一台机器上

echo "<37>bingo1" | nc localhost 5140

echo "<37>bingo2" | nc localhost 5140

echo "<37>bingo3" | nc localhost 5140

echo "<37>bingo4" | nc localhost 5140

echo "<37>bingo5" | nc localhost 5140

Flume配置：Source、Channel、Sink

实验场景1：source:exec，channel:memory，sink:logger，数据是/data/flume2/目录下的goods文件。

场景1是最简单的一个Flume配置，它的结构是由以下几部分组成：首先定义各个组件，其次配置Source的类型为exec，并定义了命令command为tail -n 20 /data/flume2/goods(查看/data/flume2目录下的goods文件里的倒数20行记录)，然后配置Channel的类型为memory，Sink的类型为logger，最后将各个组件关联起来（设置Source的Channel为ch，Sink的Channel也为ch）

1.切换到/apps/flume/conf目录下，使用vim编辑conf文件，名为：exec_mem_logger.conf

#定义各个组件  
agent1.sources  = src  
agent1.channels = ch  
agent1.sinks    = des  
 
#配置source  
agent1.sources.src.type = exec  
agent1.sources.src.command = tail -n 20 /data/flume2/goods    # 这个目录必须提前有  而且文件存在
 
#配置channel  
agent1.channels.ch.type = memory  
 
#配置sink  
agent1.sinks.des.type = logger  
 
##下面是把上面设置的组件关联起来（把点用线连起来）  
agent1.sources.src.channels = ch  
agent1.sinks.des.channel    = ch

2.执行命令

flume-ng agent -c /conf -f /apps/flume/conf/exec_mem_logger.conf -n agent1 -Dflume.root.logger=DEBUG,console

按ctrl+c停止flume。

实验场景2：source:exec，channel:memory，sink:hdfs。场景2相对于场景1，它的Sink类型发生了变化，变成了hdfs型。其结构中定义的各组件，Source配置没有变，在配置Channel时最大容量capacity为100000，通信的最大容量为100，在配置Sink时类型变为hdfs，路径设置为hdfs：//localhost:9000/myflume2/exec_mem_hdfs/%Y/%m/%d，里面的%Y/%m/%d代表年月日，数据类型为文本型，写入格式为Text格式，写入hdfs的文件是否新建有几种判断方式：rollInterval表示基于时间判断，单位是秒，当为0时，表示不基于时间判断。rollSize表示基于文件大小判断，单位是B，当为0时表示不基于大小判断，rollCount表示基于写入记录的条数来判断，当为0时，表示不基于条数来判断。idleTimeout表示基于空闲时间来判断，单位是秒，当为0时，代表不基于空闲时间来判断。最后和实验1一样通过设置Source和Sink的Channel都为ch，把Source、Channel和Sink三个组件关联起来。

1.使用vim编辑conf文件，名为：exec_mem_hdfs.conf

#定义各个组件  
agent1.sources  = src  
agent1.channels = ch  
agent1.sinks    = des  
 
#配置source  
agent1.sources.src.type = exec  
agent1.sources.src.command = tail -n 20 /data/flume2/goods  此文件必须存在
 
#配置channel  
agent1.channels.ch.type = memory  
agent1.channels.ch.keep-alive = 30  
agnet1.channels.ch.capacity = 1000000  
agent1.channels.ch.transactionCapacity = 100  
 
#配置sink  
agent1.sinks.des.type = hdfs  
agent1.sinks.des.hdfs.path = hdfs://localhost:9000/myflume2/exec_mem_hdfs/%Y%m%d/  
agent1.sinks.des.hdfs.useLocalTimeStamp = true  
 
#设置flume临时文件的前缀为 . 或 _ 在hive加载时，会忽略此文件。  
agent1.sinks.des.hdfs.inUsePrefix=_  
#设置flume写入文件的前缀是什么  
agent1.sinks.des.hdfs.filePrefix = abc  
agent1.sinks.des.hdfs.fileType = DataStream  
agent1.sinks.des.hdfs.writeFormat = Text  
#hdfs创建多久会新建一个文件，0为不基于时间判断,单位为秒  
agent1.sinks.des.hdfs.rollInterval = 30  
#hdfs写入的文件达到多大时，创建新文件 0为不基于空间大小,单位B  
agent1.sinks.des.hdfs.rollSize = 100000  
#hdfs有多少条消息记录时，创建文件，0为不基于条数判断  
agent1.sinks.des.hdfs.rollCount = 10000  
#hdfs空闲多久就新建一个文件,单位秒  
agent1.sinks.des.hdfs.idleTimeout = 30  
##下面是把上面设置的组件关联起来（把点用线连起来）  
agent1.sources.src.channels = ch  
agent1.sinks.des.channel    = ch

2.执行命令 hadoop 必须启动

flume-ng agent -c /conf -f /apps/flume/conf/exec_mem_hdfs.conf -n agent1 -Dflume.root.logger=DEBUG,console

实验场景3：source:exec channel:file sink:hdfs。

场景3相对于场景2把通道Channel的类型从memory改变为file。其结构在各组件定义，配置Source和设置组件的关联三方面与场景2一样。在配置Channel时把类型变为file型，并设置了检查点目录checkpointDir为/data/flume2/ckdir（用于检查Flume与HDFS是否正常通信），还设置了数据存储目录dataDir为/data/flume2/dataDir。在Sink配置中相对场景2增添了useLocalTimeStamp、inUsePrefix和filePrefix这三个设置。useLocalTimeStamp设置是判断是否开启使用本地时间戳，当设置为true是表示开启。inUsePrefix表示设置临时文件的前缀这里设置为"_"，filePrefix表示文件的前缀设置，这里设置为abc。

1.使用vim编辑conf文件，名为：exec_file_hdfs.conf。

#定义各个组件  
agent1.sources  = src  
agent1.channels = ch  
agent1.sinks    = des  
 
#配置source  
agent1.sources.src.type = exec  
agent1.sources.src.command = tail -n 20 /data/flume2/goods   此目录文件必须存在

 
#配置channel  
agent1.channels.ch.type = file  
agent1.channels.ch.checkpointDir = /data/flume2/ckdir  
agent1.channels.ch.dataDirs = /data/flume2/datadir  
 
#配置sink  
agent1.sinks.des.type = hdfs  
agent1.sinks.des.hdfs.path = hdfs://localhost:9000/myflume2/exec_file_hdfs/%Y%m%d/  
agent1.sinks.des.hdfs.useLocalTimeStamp = true  
 
#设置flume临时文件的前缀为 . 或 _ 在hive加载时，会忽略此文件。  
agent1.sinks.des.hdfs.inUsePrefix=_  
#设置flume写入文件的前缀是什么  
agent1.sinks.des.hdfs.filePrefix = abc  
agent1.sinks.des.hdfs.fileType = DataStream  
agent1.sinks.des.hdfs.writeFormat = Text  
#hdfs创建多久会新建一个文件，0为不基于时间判断,单位为秒  
agent1.sinks.des.hdfs.rollInterval = 30  
#hdfs写入的文件达到多大时，创建新文件 0为不基于空间大小,单位B  
agent1.sinks.des.hdfs.rollSize = 100000  
#hdfs有多少条消息记录时，创建文件，0为不基于条数判断  
agent1.sinks.des.hdfs.rollCount = 10000  
#hdfs空闲多久就新建一个文件,单位秒  
agent1.sinks.des.hdfs.idleTimeout = 30  
##下面是把上面设置的组件关联起来（把点用线连起来）  
agent1.sources.src.channels = ch  
agent1.sinks.des.channel    = ch

2.执行命令

flume-ng agent -c /conf -f /apps/flume/conf/exec_file_hdfs.conf -n agent1 -Dflume.root.logger=DEBUG,console

3.查看Hadoop目录是否生成

实验场景4：source:syslogtcp，channel:memory，sink:logger。

场景4是一个比较简单的Flume组件配置。首先定义了各组件，然后配置Source，Source的类型配置为syslogtcp，监听端口为6868，主机名为localhost，接下来是配置了Channel的类型为memeory，Sink的类型为logger，最后用通过定义Source和Sink的Channel都为ch，来将Source、Channel和Sink三个相关联起来。

1.使用vim编辑conf文件，名为：syslog_mem_logger.conf。

#定义各个组件  
agent1.sources  = src  
agent1.channels = ch  
agent1.sinks    = des  
 
#配置source  
agent1.sources.src.type = syslogtcp  
agent1.sources.src.port = 6868  
agent1.sources.src.host = localhost  
 
#配置channel  
agent1.channels.ch.type = memory  
 
#配置sink  
agent1.sinks.des.type = logger  
 
##下面是把上面设置的组件关联起来（把点用线连起来）  
agent1.sources.src.channels = ch  
agent1.sinks.des.channel    = ch

2.执行命令

flume-ng agent -c /conf -f /apps/flume/conf/syslog_mem_logger.conf -n agent1 -Dflume.root.logger=DEBUG,console

3.在另一个终端向6868发送数据

echo "hello can you hear me?" | nc localhost 6868

4.查看flume命令窗口输出

Flume多source，多sink组合框架搭建

Flume运行的核心是Agent。它是一个完整的数据收集工具，含有三个核心组件，分别是Source、Channel、Sink。通过这些组件，Event可以从一个地方流向另一个地方。

Source可以接收外部源发送过来的数据。不同的Source可以接受不同的数据格式。

Channel是一个存储地，接收Source的输出，直到有Sink消费掉Channel中的数据。

Sink消费Channel中的数据，将数据推送给外部源或者其他Source。当Sink写入失败后，可以自动重启，不会造成数据丢失，因此很可靠。

在实际生产环境中，Flume允许多个Agent连在一起，形成前后相连的多级跳。Flume有多种组合方式。比如多个Source收集不同格式的数据输出到同一个Sink中，或者一个Source收集的数据输出到多个Sink中去。

现在有三台机器，分别是：Hadoop1，Hadoop2，Hadoop3，以Hadoop1为日志汇总

例子

1.首先检查Hadoop相关进程，是否已经启动。若未启动，切换到/apps/hadoop/sbin目录下，启动Hadoop

2.切换目录到/apps/flume/conf目录下，创建Flume的配置文件。

3.使用vim打开syslog_mem_hdfsandlogger.conf文件。

#定义各个组件  
agent1.sources  = src  
agent1.channels = ch1 ch2  
agent1.sinks    = des1 des2 

#配置source  
agent1.sources.src.type = syslogtcp  
agent1.sources.src.bind = localhost  
agent1.sources.src.port = 6868  

#配置channel  
agent1.channels.ch1.type = memory  
agent1.channels.ch2.type = memory 

#配置hdfs sink  
agent1.sinks.des1.type = hdfs  
agent1.sinks.des1.hdfs.path = hdfs://localhost:9000/myflume4/syslog_mem_hdfsandlogger/  
agent1.sinks.des1.hdfs.useLocalTimeStamp = true  
#设置flume临时文件的前缀为 . 或 _ 在hive加载时，会忽略此文件。  
agent1.sinks.des1.hdfs.inUsePrefix=_  
#设置flume写入文件的前缀是什么  
agent1.sinks.des1.hdfs.filePrefix = q7  
agent1.sinks.des1.hdfs.fileType = DataStream  
agent1.sinks.des1.hdfs.writeFormat = Text  
#hdfs创建多久会新建一个文件，0为不基于时间判断,单位为秒  
agent1.sinks.des1.hdfs.rollInterval = 20  
#hdfs写入的文件达到多大时，创建新文件 0为不基于空间大小,单位B  
agent1.sinks.des1.hdfs.rollSize = 10  
#hdfs有多少条消息记录时，创建文件，0为不基于条数判断  
agent1.sinks.des1.hdfs.rollCount = 5  
#hdfs空闲多久就新建一个文件,单位秒  
agent1.sinks.des1.hdfs.idleTimeout = 20 


#配置logger sink  
agent1.sinks.des2.type = logger 


##下面是把上面设置的组件关联起来（把点用线连起来）  
agent1.sources.src.channels = ch1 ch2  
agent1.sinks.des1.channel   = ch1  
agent1.sinks.des2.channel   = ch2

4.启动Flume，执行收集工作

flume-ng agent -c /conf -f /apps/flume/conf/syslog_mem_hdfsandlogger.conf -n agent1 -Dflume.root.logger=DEBUG,console

5.新打开一个窗口，执行nc命令，向6868端口发送消息。

echo "hello can you hear me?" | nc localhost 6868

6.查看执行flume命令窗口的显示结果

7.再来查看HDFS上的输出。

hadoop fs -lsr /  
hadoop fs -cat /myflume/syslog_mem_hdfsandlogger/*

Flume 学习

一 安装

二 故障转移

三 负载均衡 load_balance

Flume配置：Source、Channel、Sink

实验场景3：source:exec channel:file sink:hdfs。

实验场景4：source:syslogtcp，channel:memory，sink:logger。

Flume多source，多sink组合框架搭建

一安装

二故障转移

三负载均衡 load_balance