java hdfs 实时写_如何使用Flume将数据实时写入HDFS?

我正在使用Flume将传感器数据存储在HDFS中 . 一旦通过MQTT接收数据 . 订阅者将数据以JSON格式发布到Flume HTTP侦听器 . 它目前工作正常,但问题是水槽不写入HDFS文件,直到我停止它(或文件的大小达到128MB) . 我正在使用Hive在读取时应用模式 . 不幸的是,生成的hive表只包含1个条目 . 这是正常的,因为Flume没有将新的数据写入文件(由Hive加载) .

Is there any manner to force Flume to write new coming data to HDFS in a near-real time way? So, I don't need to restart it or to use small files?

这是我的水槽配置:

# Name the components on this agent

emsFlumeAgent.sources = http_emsFlumeAgent

emsFlumeAgent.sinks = hdfs_sink

emsFlumeAgent.channels = channel_hdfs

# Describe/configure the source

emsFlumeAgent.sources.http_emsFlumeAgent.type = http

emsFlumeAgent.sources.http_emsFlumeAgent.bind = localhost

emsFlumeAgent.sources.http_emsFlumeAgent.port = 41414

# Describe the sink

emsFlumeAgent.sinks.hdfs_sink.type = hdfs

emsFlumeAgent.sinks.hdfs_sink.hdfs.path = hdfs://localhost:9000/EMS/%{sensor}

emsFlumeAgent.sinks.hdfs_sink.hdfs.rollInterval = 0

emsFlumeAgent.sinks.hdfs_sink.hdfs.rollSize = 134217728

emsFlumeAgent.sinks.hdfs_sink.hdfs.rollCount=0

#emsFlumeAgent.sinks.hdfs_sink.hdfs.idleTimeout=20

# Use a channel which buffers events in memory

emsFlumeAgent.channels.channel_hdfs.type = memory

emsFlumeAgent.channels.channel_hdfs.capacity = 10000

emsFlumeAgent.channels.channel_hdfs.transactionCapacity = 100

# Bind the source and sinks to the channel

emsFlumeAgent.sources.http_emsFlumeAgent.channels = channel_hdfs

emsFlumeAgent.sinks.hdfs_sink.channel = channel_hdfs

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值