Flume timestamp拦截器

本文介绍了Flume拦截器的基础知识,特别是timestamp拦截器的使用。通过实例展示了如何在Flume中配置和应用timestamp拦截器,以增强数据流处理的能力。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Flume拦截器简介

Flume拦截器就是一个插件,可以在从信源到信宿传输事件的过程中操作和修改事件。大多数拦截器要么在事件的基础上加上一些元数据要么基于某些规则删除一些事件。

Flume拦截器例子

1.创建agent配置文件
把下列内容存入agent5.conf,并保存到Flume的工作目录/opt/flume/bin下面

agent5.sources = netsource
agent5.sinks = hdfssink
agent5.channels = memorychannel

agent5.sources.netsource.type = netcat
agent5.sources.netsource.bind = localhost
agent5.sources.netsource.port = 3000
agent5.sources.netsource.interceptors = ts

agent5.sources.netsource.interceptors.ts.type = org.apache.flume.interceptor.TimestampInterceptor$Builder

agent5.sinks.hdfssink.type = hdfs
agent5.sinks.hdfssink.hdfs.path = /flume/ts-%Y-%m-%d
agent5.sinks.hdfssink.hdfs.filePrefix = log-ts-
agent5.sinks.hdfssink.hdfs.rollInterval = 0
agent5.sinks.hdfssink.hdfs.rollCount = 5
agent5.sinks.hdfssink.hdfs.fileType = DataStream

agent5.channels.memorychannel.type = memory
agent5.channels.memorychannel.capacity = 1000
agent5.channels.memorychannel.transactionCapacity = 100

agent5.sources.netsource.channels = memorychannel
agent5.sinks.hdfssink.channel = memorychannel

2.启动Flume代理
caiyong@caiyong:/opt/flume/bin$ flume-ng agent --conf conf --conf-file agent5.conf --name agent5

3.在另一个窗口中开启一个远程连接并发送几个事件
说明:在Flume里,事件就相当于一行接一行的文本
caiyong@caiyong:~$ telnet localhost 3000
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
write to HDFS
OK
add
OK
timestamp
OK
interceptor
OK
sorry
OK
bye
OK
bye
OK


4.检查结果
caiyong@caiyong:/opt/hadoop$ bin/hadoop  fs -ls /flume/
Found 4 items
-rw-r--r--   1 caiyong supergroup         20 2015-03-14 14:45 /flume/log.1426315528974
-rw-r--r--   1 caiyong supergroup         17 2015-03-14 14:45 /flume/log.1426315528975
-rw-r--r--   1 caiyong supergroup          6 2015-03-14 14:46 /flume/log.1426315528976
drwxr-xr-x   - caiyong supergroup          0 2015-03-14 15:12 /flume/ts-2015-03-14
caiyong@caiyong:/opt/hadoop$ bin/hadoop  fs -ls /flume/ts*
Found 2 items
-rw-r--r--   1 caiyong supergroup         51 2015-03-14 15:12 /flume/ts-2015-03-14/log-ts-.1426317131125
-rw-r--r--   1 caiyong supergroup          0 2015-03-14 15:12 /flume/ts-2015-03-14/log-ts-.1426317167866.tmp

caiyong@caiyong:/opt/hadoop$ bin/hadoop  fs -cat /flume/ts*/log*25
write to HDFS
add
timestamp
interceptor
sorry

caiyong@caiyong:/opt/hadoop$ bin/hadoop  fs -cat /flume/ts*/log*.tmp
bye
bye



Flume用.tmp后缀标记正在写入的文件,这样就很好区分完整文件和正在写入的文件。MR作业只会处理完整的文件。




评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值