flume基本配置详情

本文深入解析了Flume配置文件的关键组件,包括生产者、通道和接收者,详细介绍了如何设置数据源、通道容量、传输方式以及目标存储(HDFS、Elasticsearch)的具体参数,为构建高效的数据收集和传输管道提供了实用指南。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

############################################

#  producer config

############################################

#agent section

producer.sources = s

producer.channels = c c1 c2

producer.sinks = r h es


#source section

producer.sources.s.type =exec

producer.sources.s.command = tail -f /usr/local/nginx/logs/test1.log

#producer.sources.s.type = spooldir

#producer.sources.s.spoolDir = /usr/local/nginx/logs/

#producer.sources.s.fileHeader = true


producer.sources.s.channels = c c1 c2


producer.sources.s.interceptors = i

#不支持忽略大小写

producer.sources.s.interceptors.i.regex = .*\.(css|js|jpg|jpeg|png|gif|ico).*

producer.sources.s.interceptors.i.type = org.apache.flume.interceptor.RegexFilteringInterceptor$Builder

#不包含

producer.sources.s.interceptors.i.excludeEvents = true


############################################

#   hdfs config

############################################

producer.channels.c.type = memory

#Timeout in seconds for adding or removing an event

producer.channels.c.keep-alive= 30

producer.channels.c.capacity = 10000

producer.channels.c.transactionCapacity = 10000

producer.channels.c.byteCapacityBufferPercentage = 20

producer.channels.c.byteCapacity = 800000


producer.sinks.r.channel = c


producer.sinks.r.type = avro

producer.sinks.r.hostname  = 127.0.0.1

producer.sinks.r.port = 10101

############################################

#   hdfs config

############################################

producer.channels.c1.type = memory

#Timeout in seconds for adding or removing an event

producer.channels.c1.keep-alive= 30

producer.channels.c1.capacity = 10000

producer.channels.c1.transactionCapacity = 10000

producer.channels.c1.byteCapacityBufferPercentage = 20

producer.channels.c1.byteCapacity = 800000


producer.sinks.h.channel = c1


producer.sinks.h.type = hdfs

#目录位置

producer.sinks.h.hdfs.path = hdfs://127.0.0.1/tmp/flume/%Y/%m/%d

#文件前缀

producer.sinks.h.hdfs.filePrefix=nginx-%Y-%m-%d-%H

producer.sinks.h.hdfs.fileType = DataStream

#时间类型必加,不然会报错

producer.sinks.h.hdfs.useLocalTimeStamp = true

producer.sinks.h.hdfs.writeFormat = Text

#hdfs创建多长时间新建文件,0不基于时间

#Number of seconds to wait before rolling current file (0 = never roll based on time interval)

producer.sinks.h.hdfs.rollInterval=0

hdfs多大时新建文件,0不基于文件大小

#File size to trigger roll, in bytes (0: never roll based on file size)

producer.sinks.h.hdfs.rollSize = 0

#hdfs有多少条消息时新建文件,0不基于消息个数

#Number of events written to file before it rolled (0 = never roll based on number of events)

producer.sinks.h.hdfs.rollCount = 0

#批量写入hdfs的个数

#number of events written to file before it is flushed to HDFS

producer.sinks.h.hdfs.batchSize=1000

#flume操作hdfs的线程数(包括新建,写入等)

#Number of threads per HDFS sink for HDFS IO ops (open, write, etc.)

producer.sinks.h.hdfs.threadsPoolSize=15

#操作hdfs超时时间

#Number of milliseconds allowed for HDFS operations, such as open, write, flush, close. This number should be increased if many HDFS timeout operations are occurring.

producer.sinks.h.hdfs.callTimeout=30000




hdfs.round

false

Should the timestamp be rounded down (if true, affects all time based escape sequences except %t)



hdfs.roundValue

Rounded down to the highest multiple of this (in the unit configured using hdfs.roundUnit), less than current time.



hdfs.roundUnit

second

The unit of the round down value - second, minute or hour.


############################################

#   elasticsearch config

############################################

producer.channels.c2.type = memory

#Timeout in seconds for adding or removing an event

producer.channels.c2.keep-alive= 30

producer.channels.c2.capacity = 10000

producer.channels.c2.transactionCapacity = 10000

producer.channels.c2.byteCapacityBufferPercentage = 20

producer.channels.c2.byteCapacity = 800000


producer.sinks.es.channel = c2


producer.sinks.es.type = org.apache.flume.sink.elasticsearch.ElasticSearchSink

producer.sinks.es.hostNames = 127.0.0.1:9300

#Name of the ElasticSearch cluster to connect to

producer.sinks.es.clusterName = sunxucool

#Number of events to be written per txn.

producer.sinks.es.batchSize = 1000

#The name of the index which the date will be appended to. Example ‘flume’ -> ‘flume-yyyy-MM-dd’

producer.sinks.es.indexName = flume_es

#The type to index the document to, defaults to ‘log’

producer.sinks.es.indexType = test

producer.sinks.es.serializer = org.apache.flume.sink.elasticsearch.ElasticSearchLogStashEventSerializer

转载于:https://my.oschina.net/u/559635/blog/475256

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值