flume+kafka配置问题

kafka版本0.9以后采用了新的consumer,改变了很多特性:

新的Comsumer API不再有high-level、low-level之分了,而是自己维护offset。这样做的好处是避免应用出现异常时,数据未消费成功,但Position已经提交,导致消息未消费的情况发生。通过查看API,新的Comsumer API有以下功能:

  1. Kafka可以自行维护Offset、消费者的Position。也可以开发者自己来维护Offset,实现相关的业务需求。
  2. 消费时,可以只消费指定的Partitions
  3. 可以使用外部存储记录Offset,如数据库之类的。
  4. 自行控制Consumer消费消息的位置。
  5. 可以使用多线程进行消费

新的kafka配置可以去官网查看:https://kafka.apache.org/0102/documentation.html#configuration

其中对flume有影响的有以下几点:

  1. Kafka可以自行维护Offset、消费者的Position, 保存到了 Kafka 一个名为__consumer_offsets 的Topic,因此flume增加了kafka.bootstrap.servers代替了原来的ZK连接。
  2. migrateZookeeperOffsets     When no Kafka stored offset is found, look up the offsets in Zookeeper and commit them to Kafka. This should be true to support seamless Kafka client migration from older versions of Flume. Once migrated this can be set to false, though that should generally not be required. If no Zookeeper offset is found, the Kafka configuration kafka.consumer.auto.offset.reset defines how offsets are handled. Check Kafka documentation for details       当没有kafka存储的offset时,是否去ZK检查。
  3. Other Kafka Consumer Properties    –    These properties are used to configure the Kafka Consumer. Any consumer property supported by Kafka can be used. The only requirement is to prepend the property name with the prefix kafka.consumer. For example: kafka.consumer.auto.offset.reset      其他kafka参数建议用kafka.consumer直接去配置,这样保证不会出错

flume详细配置参数如下所示

配置官网:http://flume.apache.org/FlumeUserGuide.html#kafka-source

 

Property NameDefaultDescription
channels 
typeThe component type name, needs to be org.apache.flume.source.kafka.KafkaSource
kafka.bootstrap.serversList of brokers in the Kafka cluster used by the source
kafka.consumer.group.idflumeUnique identified of consumer group. Setting the same id in multiple sources or agents indicates that they are part of the same consumer group
kafka.topicsComma-separated list of topics the kafka consumer will read messages from.
kafka.topics.regexRegex that defines set of topics the source is subscribed on. This property has higher priority than kafka.topics and overrides kafka.topics if exists.
batchSize1000Maximum number of messages written to Channel in one batch
batchDurationMillis1000Maximum time (in ms) before a batch will be written to Channel The batch will be written whenever the first of size and time will be reached.
backoffSleepIncrement1000Initial and incremental wait time that is triggered when a Kafka Topic appears to be empty. Wait period will reduce aggressive pinging of an empty Kafka Topic. One second is ideal for ingestion use cases but a lower value may be required for low latency operations with interceptors.
maxBackoffSleep5000Maximum wait time that is triggered when a Kafka Topic appears to be empty. Five seconds is ideal for ingestion use cases but a lower value may be required for low latency operations with interceptors.
useFlumeEventFormatfalseBy default events are taken as bytes from the Kafka topic directly into the event body. Set to true to read events as the Flume Avro binary format. Used in conjunction with the same property on the KafkaSink or with the parseAsFlumeEvent property on the Kafka Channel this will preserve any Flume headers sent on the producing side.
setTopicHeadertrueWhen set to true, stores the topic of the retrieved message into a header, defined by the topicHeader property.
topicHeadertopicDefines the name of the header in which to store the name of the topic the message was received from, if the setTopicHeader property is set to true. Care should be taken if combining with the Kafka Sink topicHeader property so as to avoid sending the message back to the same topic in a loop.
migrateZookeeperOffsetstrueWhen no Kafka stored offset is found, look up the offsets in Zookeeper and commit them to Kafka. This should be true to support seamless Kafka client migration from older versions of Flume. Once migrated this can be set to false, though that should generally not be required. If no Zookeeper offset is found, the Kafka configuration kafka.consumer.auto.offset.reset defines how offsets are handled. Check Kafka documentation for details
kafka.consumer.security.protocolPLAINTEXTSet to SASL_PLAINTEXT, SASL_SSL or SSL if writing to Kafka using some level of security. See below for additional info on secure setup.
more consumer security props If using SASL_PLAINTEXT, SASL_SSL or SSL refer to Kafka security for additional properties that need to be set on consumer.
Other Kafka Consumer PropertiesThese properties are used to configure the Kafka Consumer. Any consumer property supported by Kafka can be used. The only requirement is to prepend the property name with the prefix kafka.consumer. For example: kafka.consumer.auto.offset.reset

Note

 

The Kafka Source overrides two Kafka consumer parameters: auto.commit.enable is set to “false” by the source and every batch is committed. Kafka source guarantees at least once strategy of messages retrieval. The duplicates can be present when the source starts. The Kafka Source also provides defaults for the key.deserializer(org.apache.kafka.common.serialization.StringSerializer) and value.deserializer(org.apache.kafka.common.serialization.ByteArraySerializer). Modification of these parameters is not recommended.

Deprecated Properties

Property NameDefaultDescription
topicUse kafka.topics
groupIdflumeUse kafka.consumer.group.id
zookeeperConnectIs no longer supported by kafka consumer client since 0.9.x. Use kafka.bootstrap.servers to establish connection with kafka cluster
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值