Storm 1.0.1 sliding window结合kafka spout [ 错误记录 ]

本文针对基于Storm 1.0.1版本进行滑动窗口计算时出现的消息重播及重复计算问题进行了深入分析,并提供了具体的解决方案。通过调整topology配置参数,确保消息不会因超时而被重复处理。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

  • 开发场景: 数据发送到kafka集群,通过kafka spout将数据从kafka集群接入到storm topology,storm接收消息并进行滑动窗口的计算。

  • 问题描述: 在基于storm1.0.1版本进行滑动窗口开发时,数据从kafka接收进行完滑动窗口计算后,停止发送消息到kafka集群,此时,storm的窗口计算应该暂停,直至有新的消息被发送到kafka。但是,滑动窗口会每隔一段时间就会进行消息重播,然后重复计算,导致出现计算错误的情况。

  • 解决方法: 仔细阅读 http://storm.apache.org/releases/1.0.1/Windowing.html 就会发现,“The windowing functionality in storm core currently provides at-least once guarentee. The values emitted from the bolts execute(TupleWindow inputWindow) method are automatically anchored to all the tuples in the inputWindow. The downstream bolts are expected to ack the received tuple (i.e the tuple emitted from the windowed bolt) to complete the tuple tree. If not the tuples will be replayed and the windowing computation will be re-evaluated. The tuples in the window are automatically acked when the expire, i.e. when they fall out of the window after windowLength + slidingInterval. Note that the configuration topology.message.timeout.secs should be sufficiently more than windowLength + slidingInterval for time based windows; otherwise the tuples will timeout and get replayed and can result in duplicate evaluations. For count based windows, the configuration should be adjusted such that windowLength + slidingInterval tuples can be received within the timeout period.”。 因此修改topology的conf.setMessageTimeoutSecs(sec);保证sec的值远大于windowLength + slidingInterval,问题得到解决。

转载于:https://my.oschina.net/drl/blog/727754

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值