# 指定Agent的组件名称
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# 指定Flume source(要监听的路径)
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /opt/bigdata/FtpDir1/
# 配置每次读大小限制
a1.sources.r1.deserializer.maxLineLength = 2048000
# 指定Flume sink
a1.sinks = k1
a1.sinks.k1.type = http
a1.sinks.k1.endpoint = http://localhost:8080/someuri
a1.sinks.k1.connectTimeout = 2000
a1.sinks.k1.requestTimeout = 2000
a1.sinks.k1.acceptHeader = application/json
a1.sinks.k1.contentTypeHeader = application/json
a1.sinks.k1.defaultBackoff = true
a1.sinks.k1.defaultRollback = true
a1.sinks.k1.defaultIncrementMetrics = false
a1.sinks.k1.backoff.4XX = false
a1.sinks.k1.rollback.4XX = false
a1.sinks.k1.incrementMetrics.4XX = true
a1.sinks.k1.backoff.200 = false
a1.sinks.k1.rollback.200 = false
a1.sinks.k1.incrementMetrics.200 = true
# 指定Flume channel
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# 绑定source和sink到channel上
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
注意事项:
deserializer.maxLineLength 默认是2048, 所以单行数据量大于此值是,http请求会失败,并导致数据一直重复发送
本文详细解析了Flume的数据传输配置,包括指定Agent组件、配置Flume source监听路径、设置数据读取大小限制、定义Flume sink目标、配置Flume channel及数据传输流程。深入探讨了deserializer.maxLineLength参数的重要性,避免因单行数据过大导致HTTP请求失败和数据重复发送的问题。
3085

被折叠的 条评论
为什么被折叠?



