CDH 5.16.1 使用 flume、kafka、sparkstreaming做实时

  • 本文的主要目的就是为了实现以下需求:

  1. 通过flume收集日志;

  2. 将收集到的日志分发给kafka;

  3. 通过sparksteaming对kafka获取的日志进行处理;

  4. 然后将处理的结果存储到hdfs的指定目录下。

 

 

第一步,我们创建flume配置文件,直接在cm上面改就行了。

a1.sources = r1

a1.channels = c1

a1.sinks =s1
 
#sources端配置

a1.sources.r1.type=exec

a1.sources.r1.command=tail -F /usr/local/soft/flume/flume_dir/kafka.log

a1.sources.r1.channels=c1
 
#channels端配置

a1.channels.c1.type=memory

a1.channels.c1.capacity=10000

a1.channels.c1.transactionCapacity=100
 
#设置Kafka接收器

a1.sinks.s1.type= org.apache.flume.sink.kafka.KafkaSink
 
#设置Kafka的broker地址和端口

a1.sinks.s1.brokerList=node1:9092,node2:9092,node3:9092
 
#设置Kafka的Topic

a1.sinks.s1.topic=realtime
 
#设置序列化方式

a1.sinks.s1.serializer.class=kafka.serializer.StringEncoder

a1.sinks.s1.channel=c1

 

注意,关于配置文件中注意3点:

1.配置文件:

  a.  a1.sources.r1.command=tail -F /usr/local/soft/flume/flume_dir//kafka.log   

  b.  a1.sinks.s1.brokerList= n1:9092,n2:9092,n3:9092

  c . a1.sinks.s1.topic=realtime

2.很明显,由配置文件可以了解到:

  a.我们需要在/usr/local/soft/flume/flume_dir/下建一个kafka.log的文件,且向文件中输出内容(下面会说到);

  b.flume连接到kafka的地址是 manager:9092,namenode:9092,datanode:9092,注意不要配置出错了;

  c.flume会将采集后的内容输出到Kafka topic 为realtime上,所以我们启动zk,kafka后需要打开一个终端消费topic realtime的内容。这样就可以看到flume与kafka之间玩起来了~~
 

 

 

下一步,我们测试。

 

 

编写测试脚本kafka_output.sh
在/usr/local/soft/flume/flume_dir/下建立空文件kafka.log。在root用户目录下新建脚本kafka_output.sh(一定要给予可执行权限),用来向kafka.log输入内容,脚本内容如下:
for((i=0;i<=1000;i++));

do echo "kafka_test-"+$i>>/usr/local/soft/flume/flume_dir/kafka.log;

done
 

 

此时要确保,cm上面的zookeeper、flume、kafka已经正确启动了。

 

请注意,cdh5.16的kafka安装目录在

/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin

 

kafka-topics --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1 --topic realti
me

 

看一下控制台日志

[root@node1 bin]# kafka-topics  --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1  --topic  rea
ltime-bash: kafka-topics : command not found
[root@node1 bin]# kafka-topics --create --zookeeper node1:2181,node2:2181,node3:2181 --replication-factor 3 --partitions 1 --topic realti
me19/05/20 13:33:03 INFO utils.Log4jControllerRegistration$: Registered kafka:type=kafka.Log4jController MBean
19/05/20 13:33:06 INFO zookeeper.ZooKeeperClient: [ZooKeeperClient] Initializing a new session to node1:2181,node2:2181,node3:2181.
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:zookeeper.version=3.4.5-cdh5.14.2--1, built on 03/27/2018 20:39 GMT
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:host.name=node1
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.version=1.8.0_212
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.vendor=Oracle Corporation
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.home=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/jre
19/05/20 13:33:06 INFO zookeeper.ZooKeeper: Client environment:java.class.path=.:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x8
6_64/lib/dt.jar:/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.212.b04-0.el7_6.x86_64/lib/tools.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/activation-1.1.1.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/activation-1.1.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/aopalliance-1.0.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/aopalliance-repackaged-2.5.0-b42.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-i18n-2.0.0-M15.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-jdbm1-2.0.0-M2.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/apacheds-kerberos-codec-2.0.0-M15.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/api-asn1-api-1.0.0-M20.jar:/opt/cloudera/parcels/KAFKA-4.0.0-1.4.0.0.p0.1/bin/../lib/kafka/bin/../libs/api-util-1.0.0-M20.jar:/
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值