-
开发场景: 数据发送到kafka集群,通过kafka spout将数据从kafka集群接入到storm topology,storm接收消息并进行滑动窗口的计算。
-
问题描述: 在基于storm1.0.1版本进行滑动窗口开发时,数据从kafka接收进行完滑动窗口计算后,停止发送消息到kafka集群,此时,storm的窗口计算应该暂停,直至有新的消息被发送到kafka。但是,滑动窗口会每隔一段时间就会进行消息重播,然后重复计算,导致出现计算错误的情况。
-
解决方法: 仔细阅读 http://storm.apache.org/releases/1.0.1/Windowing.html 就会发现,“The windowing functionality in storm core currently provides at-least once guarentee. The values emitted from the bolts execute(TupleWindow inputWindow) method are automatically anchored to all the tuples in the inputWindow. The downstream bolts are expected to ack the received tuple (i.e the tuple emitted from the windowed bolt) to complete the tuple tree. If not the tuples will be replayed and the windowing computation will be re-evaluated. The tuples in the window are automatically acked when the expire, i.e. when they fall out of the window after windowLength + slidingInterval. Note that the configuration topology.message.timeout.secs should be sufficiently more than windowLength + slidingInterval for time based windows; otherwise the tuples will timeout and get replayed and can result in duplicate evaluations. For count based windows, the configuration should be adjusted such that windowLength + slidingInterval tuples can be received within the timeout period.”。 因此修改topology的conf.setMessageTimeoutSecs(sec);保证sec的值远大于windowLength + slidingInterval,问题得到解决。
转载于:https://my.oschina.net/drl/blog/727754