选择1台机器(h15)
解压文件
配置环境变量
启动hdfs:start-dfs.sh
以下是配置基于hdfs sinks的操作,其他的sinks(如hbase sinks 等)参考flume的index.html文档:
一\测试第一种(在命令行敲数据,回车)
在/home/test目录下,配置文件
#vi testflume1
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type = netcat
a1.sources.r1.bind =h15
a1.sources.r1.port = 44444
# Describe the sink
a1.sinks.k1.type = logger
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
进入flume的bin目录下执行:
#$ bin/flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume1 --name a1 -Dflume.root.logger=INFO,console &
复制一个h15然后执行:#telnet h15 44444
在命令行中任意输入数据:yuisdufska;显示如下:
退出界面后,执行kill -9 ???即可杀死进程
二\测试第二种命令(通过把文件放在指定目录中,使flume收集文件到hdfs)
1\先kill -9 当前运行的flume
2\再在/home/test目录下,配置文件
#vi testflume2
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type =spooldir
a1.sources.r1.spoolDir=/opt/flume #(该目录需要手动在linux文件系统中创建,而不是在hdfs上创建)
# Describe the sink,自定义输出到hdfs
a1.sinks.k1.type =hdfs
a1.sinks.k1.hdfs.rollInterval=0
a1.sinks.k1.hdfs.rollSize=10240000
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.path=hdfs://yangjifei/flume/data/%Y-%m-%d #hdfs的唯一路径
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.idleTimeout=3
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3\修改jvm的堆内存和栈内存大小
#vi /home/apache-flume-1.6.0-bin/conf/flume-env.sh
修改jvm大小即可
4\上传测试的文本数据文件到:/opt/flume目录下
5\再进入flume的bin目录下执行:
#flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume2 --name a1 -Dflume.root.logger=INFO,console &
查看成功文件:
#hadoop dfs -cat /flume/data/2016-04-25/FlumeData.1461571137420
三\测试第三种(串联2个flume的方式,动态对log文件增加数据时,flume动态收集数据信息到hdfs)
1\先kill -9 当前运行的flume
2\再在/home/test目录下,配置文件
#vi testflume3
#vi testflume4
(1)testflume3
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type =exec
a1.sources.r1.command=tail -F /opt/test/flume.log
# Describe the sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h15
a1.sinks.k1.port=55555 #新的55555端口
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
(2)testflume4
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type =avro
a1.sources.r1.bind=h15
a1.sources.r1.port=55555
# Describe the sink
a1.sinks.k1.type =hdfs
a1.sinks.k1.hdfs.rollInterval=3
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.path=hdfs://yangjifei/flume/data/%Y-%m-%d
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.idleTimeout=3
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
3\先编辑文件:/opt/test/flume.log
再启动testflume4
#flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume4 --name a1 -Dflume.root.logger=INFO,console &
4\最后启动testflume3
#flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume3 --name a1 -Dflume.root.logger=INFO,console &
5\执行命令追加数据到flume.log中
echo 'hello' >>/opt/test/flume.log
如果不断执行命令:
echo 'hello' >>/opt/test/flume.log
监听正常,即hdfs上有新的文件产生,说明成功!
第三种测试如果是分别放在不同机器上(h15,h16),应该配置如下:
testflume_h16文件内容如下:
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe/configure the source
a1.sources.r1.type =avro
a1.sources.r1.bind=h16
a1.sources.r1.port=55555
# Describe the sink
a1.sinks.k1.type =hdfs
a1.sinks.k1.hdfs.rollInterval=3
a1.sinks.k1.hdfs.rollSize=0
a1.sinks.k1.hdfs.rollCount=0
a1.sinks.k1.hdfs.path=hdfs://yangjifei/flume/data/%Y-%m-%d
a1.sinks.k1.hdfs.useLocalTimeStamp=true
a1.sinks.k1.hdfs.fileType=DataStream
a1.sinks.k1.hdfs.idleTimeout=3
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
testflume_h15文件内容如下:
# example.conf: A single-node Flume configuration
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
a1.sources.r1.type =exec
a1.sources.r1.command=tail -F /opt/test/flume.log
# Describe the sink
a1.sinks.k1.type=avro
a1.sinks.k1.hostname=h16
a1.sinks.k1.port=55555
# Use a channel which buffers events in memory
a1.channels.c1.type = memory
a1.channels.c1.capacity = 1000
a1.channels.c1.transactionCapacity = 100
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
执行测试:
先启动输出:testflume_h16
#flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume_h16 --name a1 -Dflume.root.logger=INFO,console
再启动输入:testflume_h15
#flume-ng agent --conf /home/apache-flume-1.6.0-bin/conf --conf-file /home/test/testflume_h15 --name a1 -Dflume.root.logger=INFO,console
在h15上追加数据,页面查看hdfs文件系统中是否有数据生成
#echo ‘heeeooolllll’ >> /opt/test/flume.log
注意:连接hdfs的配置文件,一定要放在active的节点上!!!