使用flume将csv文件写入到Kafka中

最新推荐文章于 2024-07-17 03:20:51 发布

jalrs

最新推荐文章于 2024-07-17 03:20:51 发布

阅读量1.6k

点赞数 1

CC 4.0 BY-SA版权

文章标签： flume csv kafka

本文链接：https://blog.youkuaiyun.com/qq_40333693/article/details/112584043

本文档详细介绍了如何使用Flume将CSV文件加载到Kafka中。首先，提供了源数据文件链接和提取码，然后指导读者在Flume的conf目录下创建目录并放置CSV文件。接着，创建Kafka的多个主题。随后，展示了为不同需求编写的多个Flume agent配置文件。最后，演示了启动Flume agent并将CSV文件传输到相应的Kafka主题的步骤，同时提示用户可以根据需要调整文件名、路径和主题名。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

源数据文件：https://pan.baidu.com/s/1UiM8qmYY8MFKJaSLwIlPqQ
提取码：apk6

1.在flume的conf目录下创建jobkb09目录：mkdir /opt/flume160/conf/jobkb09
2.进入jobkb09目录，在其中创建tmp目录，并将源数据文件均放入其中
3.创建Kafka topic：
events ：

kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic events --partitions 1 --replication-factor 1

event_attendees:

kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic event_attendees --partitions 1 --replication-factor 1

train：

kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic train --partitions 1 --replication-factor 1

user_friends:

kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic user_friends --partitions 1 --replication-factor 1

users:

kafka-topics.sh --create --zookeeper 192.168.134.104:2181 --topic users --partitions 1 --replication-factor 1

4.编写对应需求的agent配置文件：
train-flume-kafka.conf：

train.sources=trainSource
train.channels=trainChannel
train.sinks=trainSink

train.sources.trainSource.type=spooldir
train.sources.trainSource.spoolDir=/opt/flume160/conf/jobkb09/dataSourceFile/train
train.sources.trainSource.deserializer=LINE
train.sources.trainSource.deserializer.maxLineLength=320000
train.sources.trainSource.includePattern=train_[0-9]{
  
  4}-[0-9]{
  
  2}-[0-9]{
  
  2}.csv
train.sources.trainSource.interceptors=head_filter
train.sources.trainSource.interceptors.head_filter.type=regex_filter
train.sources.trainSource.interceptors.head_filter.regex=^user*
train.sources.trainSource.interceptors.head_filter.excludeEvents=true

train.channels.trainChannel.type=