Flume拦截器简介
Flume拦截器就是一个插件,可以在从信源到信宿传输事件的过程中操作和修改事件。大多数拦截器要么在事件的基础上加上一些元数据要么基于某些规则删除一些事件。Flume拦截器例子
1.创建agent配置文件把下列内容存入agent5.conf,并保存到Flume的工作目录/opt/flume/bin下面
agent5.sources = netsource
agent5.sinks = hdfssink
agent5.channels = memorychannel
agent5.sources.netsource.type = netcat
agent5.sources.netsource.bind = localhost
agent5.sources.netsource.port = 3000
agent5.sources.netsource.interceptors = ts
agent5.sources.netsource.interceptors.ts.type = org.apache.flume.interceptor.TimestampInterceptor$Builder
agent5.sinks.hdfssink.type = hdfs
agent5.sinks.hdfssink.hdfs.path = /flume/ts-%Y-%m-%d
agent5.sinks.hdfssink.hdfs.filePrefix = log-ts-
agent5.sinks.hdfssink.hdfs.rollInterval = 0
agent5.sinks.hdfssink.hdfs.rollCount = 5
agent5.sinks.hdfssink.hdfs.fileType = DataStream
agent5.channels.memorychannel.type = memory
agent5.channels.memorychannel.capacity = 1000
agent5.channels.memorychannel.transactionCapacity = 100
agent5.sources.netsource.channels = memorychannel
agent5.sinks.hdfssink.channel = memorychannel
2.启动Flume代理
caiyong@caiyong:/opt/flume/bin$ flume-ng agent --conf conf --conf-file agent5.conf --name agent5
3.在另一个窗口中开启一个远程连接并发送几个事件
说明:在Flume里,事件就相当于一行接一行的文本
caiyong@caiyong:~$ telnet localhost 3000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
write to HDFS
OK
add
OK
timestamp
OK
interceptor
OK
sorry
OK
bye
OK
bye
OK
4.检查结果
caiyong@caiyong:/opt/hadoop$ bin/hadoop fs -ls /flume/
Found 4 items
-rw-r--r-- 1 caiyong supergroup 20 2015-03-14 14:45 /flume/log.1426315528974
-rw-r--r-- 1 caiyong supergroup 17 2015-03-14 14:45 /flume/log.1426315528975
-rw-r--r-- 1 caiyong supergroup 6 2015-03-14 14:46 /flume/log.1426315528976
drwxr-xr-x - caiyong supergroup 0 2015-03-14 15:12 /flume/ts-2015-03-14
caiyong@caiyong:/opt/hadoop$ bin/hadoop fs -ls /flume/ts*
Found 2 items
-rw-r--r-- 1 caiyong supergroup 51 2015-03-14 15:12 /flume/ts-2015-03-14/log-ts-.1426317131125
-rw-r--r-- 1 caiyong supergroup 0 2015-03-14 15:12 /flume/ts-2015-03-14/log-ts-.1426317167866.tmp
caiyong@caiyong:/opt/hadoop$ bin/hadoop fs -cat /flume/ts*/log*25
write to HDFS
add
timestamp
interceptor
sorry
caiyong@caiyong:/opt/hadoop$ bin/hadoop fs -cat /flume/ts*/log*.tmp
bye
bye
Flume用.tmp后缀标记正在写入的文件,这样就很好区分完整文件和正在写入的文件。MR作业只会处理完整的文件。