-
本文的主要目的就是为了实现以下需求:
第一步,我们创建flume配置文件,直接在cm上面改就行了。
a1.sources = r1
a1.channels = c1
a1.sinks =s1
#sources端配置
a1.sources.r1.type=exec
a1.sources.r1.command=tail -F /usr/local/soft/flume/flume_dir/kafka.log
a1.sources.r1.channels=c1
#channels端配置
a1.channels.c1.type=memory
a1.channels.c1.capacity=10000
a1.channels.c1.transactionCapacity=100
#设置Kafka接收器
a1.sinks.s1.type= org.apache.flume.sink.kafka.KafkaSink
#设置Kafka的broker地址和端口
a1.sinks.s1.brokerList=node1:9092,node2:9092,node3:9092
#设置Kafka的Topic
a1.sinks.s1.topic=realtime
#设置序列化方式
a1.sinks.s1.serializer.class=kafka.serializer.StringEncoder
a1.sinks.s1.channel=c1
注意,关于配置文件中注意3点:
1.配置文件:
a. a1.sources.r1.command=tail -F /usr/local/soft/flume/flume_dir//kafka.log
b. a1.sinks.s1.brokerList= n1:9092,n2:9092,n3:9092
c . a1.sinks.s1.topic=realtime
2.很明显,由配置文件可以了解到:
a.我们需要在/usr/local/soft/flume/flume_dir/下建一个kafka.log的文件,且向文件中输出内容(下面会说到);
b.flume连接到kafka的地址是 manager:9092,namenode:9092,datanode:9092,注意不要配置出错了;
c.flume会将采集后的内容输出到Kafka topic 为realtime上,所以我们启动zk,kafka后需要打开一个终端消费topic realtime的内容。这样就可以看到flume与kafka之间玩起来了~~
下一步,我们测试。
编写测试脚本kafka_output.sh
在/usr/local/soft/flume/flume_dir/下建立空文件kafka.log。在root用户目录下新建脚本kafka_output.sh(一定要给予可执行权限),用来向kafka.log输入内容,脚本内容如下:
for((i=0;i<=1000;i++));
do echo "kafka_test-"+$i>>/usr/local/soft/flume/flume_dir/kafka.log;
done
此时要确保,cm上面的zookeeper、flume、kafka已经正确启动了。
请注意,cdh5.16的kafka安装目录在
/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin
kafka-topics --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1 --topic realti
me
看一下控制台日志
[root@node1 bin]# kafka-topics --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1 --topic rea
ltime-bash: kafka-topics : command not found
[root@node1 bin]# kafka-topics --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1 --topic realti
me19/05/20 13:33:03 INFO utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean
19/05/20 13:33:06 INFO zookeeper.ZooKeeperClient: [ZooKeeperClient] Initializing a new session to node1:2181,node2:2181,node3:2181.
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.14.2--1, built on 03/27/2018 20:39 GMT
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:host.name=node1
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.class.path=.:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x8
6_64/lib/dt.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/lib/tools.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/activation-1.1.1.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/activation-1.1.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/aopalliance-1.0.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-i18n-2.0.0-M15.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-jdbm1-2.0.0-M2.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/api-asn1-api-1.0.0-M20.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/api-util-1.0.0-M20.jar:/