我正在使用Flume将传感器数据存储在HDFS中 . 一旦通过MQTT接收数据 . 订阅者将数据以JSON格式发布到Flume HTTP侦听器 . 它目前工作正常,但问题是水槽不写入HDFS文件,直到我停止它(或文件的大小达到128MB) . 我正在使用Hive在读取时应用模式 . 不幸的是,生成的hive表只包含1个条目 . 这是正常的,因为Flume没有将新的数据写入文件(由Hive加载) .
Is there any manner to force Flume to write new coming data to HDFS in a near-real time way? So, I don't need to restart it or to use small files?
这是我的水槽配置:
# Name the components on this agent
emsFlumeAgent.sources = http_emsFlumeAgent
emsFlumeAgent.sinks = hdfs_sink
emsFlumeAgent.channels = channel_hdfs
# Describe/configure the source
emsFlumeAgent.sources.http_emsFlumeAgent.type = http
emsFlumeAgent.sources.http_emsFlumeAgent.bind = localhost
emsFlumeAgent.sources.http_emsFlumeAgent.port = 41414
# Describe the sink
emsFlumeAgent.sinks.hdfs_sink.type = hdfs
emsFlumeAgent.sinks.hdfs_sink.hdfs.path = hdfs://localhost:9000/EMS/%{sensor}
emsFlumeAgent.sinks.hdfs_sink.hdfs.rollInterval = 0
emsFlumeAgent.sinks.hdfs_sink.hdfs.rollSize = 134217728
emsFlumeAgent.sinks.hdfs_sink.hdfs.rollCount=0
#emsFlumeAgent.sinks.hdfs_sink.hdfs.idleTimeout=20
# Use a channel which buffers events in memory
emsFlumeAgent.channels.channel_hdfs.type = memory
emsFlumeAgent.channels.channel_hdfs.capacity = 10000
emsFlumeAgent.channels.channel_hdfs.transactionCapacity = 100
# Bind the source and sinks to the channel
emsFlumeAgent.sources.http_emsFlumeAgent.channels = channel_hdfs
emsFlumeAgent.sinks.hdfs_sink.channel = channel_hdfs