Transactions are critical concept for FlumeNG.
Sources produce events into Channel based on transaction batch size.
Sinks consume events from Channel based on transaction batch size.
It means the batch size is the key for performance tuning.
For agent.
ExecSource is used to collect events and the default batch size is 20. In order to do it in real time, it’s better we set it to 5 or 10. It will be based on the log itself.
AvroSink is used to transmit events into collectors. So the batch size is N times as the ExecSource batch size.
For collector.
HbaseSink batch size is suggested to 100.
And we can adjust the batch size based on our test results.
The larger of the batch size, the better the file channels operate; but we should consider the latency.
The smaller of the batch size, the faster of the transmitting ; but we should consider the CPU and time consumed of disk sync.
Please ref below blogs to get more.
http://blog.cloudera.com/blog/2013/01/how-to-do-apache-flume-performance-tuning-part-1/
http://blog.cloudera.com/blog/2012/09/about-apache-flume-filechannel/
本文探讨了FlumeNG中事务批次大小对于性能的影响。详细介绍了如何为ExecSource和AvroSink设置合适的批次大小以实现实时数据收集和传输。同时,针对Collector的HbaseSink,建议的批次大小为100,并指出可以根据实际测试结果进行调整。
3315

被折叠的 条评论
为什么被折叠?



