记flume发往kafka的一次配置

最新推荐文章于 2024-05-28 09:37:18 发布

ZL_bigdata

最新推荐文章于 2024-05-28 09:37:18 发布

阅读量251

点赞数

CC 4.0 BY-SA版权

文章标签： kafka flume big data

本文链接：https://blog.youkuaiyun.com/ZL_javaco/article/details/120080557

这篇博客介绍了如何配置Flume以从指定目录监控文件，并将数据流传输到Kafka集群中。配置文件`flume-conf.properties`设置了源、通道和消费者组，确保数据从`/home/flume/input`目录下的.csv文件读取，并通过Kafka通道发送到名为AAA的Topic。同时，Docker Compose配置文件`docker-compose.yml`用于部署Flume容器，映射了配置文件、输入目录、输出目录及端口，确保服务正常运行。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.配置flume-conf.properties

buttery.sources = buttSource
buttery.channels = buttChannel

# source
buttery.sources.buttSource.type = spooldir
buttery.sources.buttSource.spoolDir = /home/flume/input
buttery.sources.buttSource.deserializer = LINE
buttery.sources.buttSource.deserializer.maxLineLength = 320000
buttery.sources.buttSource.includePattern = ^.\*.csv$
buttery.sources.buttSource.fileHeader = true
buttery.sources.buttSource.channels = buttChannel

# channels
buttery.channels.buttChannel.type = org.apache.flume.channel.kafka.KafkaChannel
# 设置Kafka集群中的Brokers
buttery.channels.buttChannel.kafka.bootstrap.servers = IP1:9092,IP2:9092,IP3:9092
# 设置Kafka的Topic
buttery.channels.buttChannel.kafka.topic = AAA
# 设置成不按照flume event格式解析数据,因为同一个Kafka topic可能有非flume Event类数据传入
buttery.channels.buttChannel.parseAsFlumeEvent = false
# 设置消费者组,保证每次消费时能够获取上次对应的Offset
buttery.channels.buttChannel.kafka.consumer.group.id = flume-consumer
# 设置消费过程poll()超时时间(ms)
buttery.channels.buttChannel.pollTimeout = 1000

2.配置docker-compose.yml

version: '3.3'

services:

  flume:
    image: flume:1.9.0
    container_name: flume
    hostname: flume
    environment:
      - FLUME_CONF_DIR=/usr/flume/conf
      - FLUME_AGENT_NAME=buttery
    ports:
      - 5555:5555
      - 6666:6666
    volumes:
      - ./conf/core-site.xml:/usr/hadoop/etc/hadoop/core-site.xml
      - ./conf/hdfs-site.xml:/usr/hadoop/etc/hadoop/hdfs-site.xml
      - ./conf/mapred-site.xml:/usr/hadoop/etc/hadoop/mapred-site.xml
      - ./conf/yarn-site.xml:/usr/hadoop/etc/hadoop/yarn-site.xml
      - ./conf/workers:/usr/hadoop/etc/hadoop/workers
      - ./conf/flume-conf.properties:/usr/flume/conf/flume-conf.properties
      - ./input:/home/flume/input
      - ./output:/home/flume/output
      - ./run.sh:/run.sh