Flume 传递数据到HDFS上

本文介绍如何使用Flume将Netcat作为数据源,并将数据输出到HDFS中。通过配置Flume,可以实现每十分钟生成一个新目录,并在达到一定条件时创建新文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

使用瑞士军刀(netcat 作为输入源) ,hdfs 作为flume 的输出源(sink)

flume 配置文件内容如下:

a1.sources = r1
a1.channels = c1
a1.sinks = k1

a1.sources.r1.type = netcat
a1.sources.r1.bind = localhost
a1.sources.r1.port = 8888

a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://mycluster/flume/events/%y-%m-%d/%H/%M/%S
a1.sinks.k1.hdfs.filePrefix = events-

#是否是产生新目录,每十分钟产生一个新目录,一般控制的目录方面。
#2017-12-12 -->
#2017-12-12 -->%H%M%S

a1.sinks.k1.hdfs.round = true			
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

a1.sinks.k1.hdfs.useLocalTimeStamp=true

#是否产生新文件。
a1.sinks.k1.hdfs.rollInterval=10
a1.sinks.k1.hdfs.rollSize=10
a1.sinks.k1.hdfs.rollCount=3

a1.channels.c1.type=memory

a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1

 其中hdfs://mycluster 是HDFS的集群地址.

使用flume-ng agent  -f hdfs_k.conf  -n a1  即可启动flume 

[hadoop@s202 /soft/flume/conf]$flume-ng agent  -f hdfs_k.conf  -n a1

启动netcat 客户端

[hadoop@s202 ~]$nc localhost  8888

[hadoop@s202 ~]$nc localhost  8888
he ll sss adsd
OK
hellow ssdfvas fffg
OK

然后在flume 控制可以看到收集的日志目录

18/09/06 08:52:12 INFO instrumentation.MonitoredCounterGroup: Component type: SINK, name: k1 started
18/09/06 08:52:12 INFO source.NetcatSource: Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/127.0.0.1:8888]
18/09/06 08:52:42 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
18/09/06 08:52:42 INFO hdfs.BucketWriter: Creating hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp
18/09/06 08:52:51 INFO hdfs.HDFSSequenceFile: writeFormat = Writable, UseRawLocalFileSystem = false
18/09/06 08:52:51 INFO hdfs.BucketWriter: Creating hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp
18/09/06 08:52:56 INFO hdfs.BucketWriter: Closing hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp
18/09/06 08:52:56 INFO hdfs.BucketWriter: Renaming hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064.tmp to hdfs://mycluster/flume/events/18-09-06/08/52/40/events-.1536195162064
18/09/06 08:52:56 INFO hdfs.HDFSEventSink: Writer callback called.
18/09/06 08:53:01 INFO hdfs.BucketWriter: Closing hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp
18/09/06 08:53:02 INFO hdfs.BucketWriter: Renaming hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721.tmp to hdfs://mycluster/flume/events/18-09-06/08/52/50/events-.1536195171721
18/09/06 08:53:02 INFO hdfs.HDFSEventSink: Writer callback called.

使用如下命令进行查看:

[hadoop@s202 ~]$hdfs dfs  -cat /flume/events/18-09-06/08/52/50/events-.1536195171721
SEQ!org.apache.hadoop.io.LongWritable"org.apache.hadoop.io.BytesWritable?.~?;...s.e..hellow ssdfvas fffg?.~?;...s.[hadoop@s202 ~]$

可以看到这是一个HDFS 的序列文件 ,可以模糊的看到"hellow ssdfvas fffg "字样.

也可以使用如下命令查看:

[hadoop@s202 ~]$hdfs dfs  -text  /flume/events/18-09-06/08/52/50/events-.1536195171721
1536195171972	68 65 6c 6c 6f 77 20 73 73 64 66 76 61 73 20 66 66 66 67

 

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值