Last Task - concurrent tasks for retry

本文讨论了在接收到Kafka消息后如何处理流式传输到第三方系统的过程,包括保存消息状态、处理失败情况及实现重试功能。当消息传输失败时,系统将设置状态为'failed',用户可通过重试API再次尝试。设计中提到使用数据库写锁或Redis锁来防止并发重试,并通过为状态字段建立索引来优化性能。

task requirement

Background

There's a function in the product that after receive kafka msg, the system need to stream out the message content to third party system. For example, some merchant using this function to stream out their orders to warehouse after the customer paied the order.

However, not all the orders can be streamed out successfully, it might failed due to network issues or authentication issue or some runtime exceptions. The merchant is hoping to have retry function to stream out the failed orders.

Design

After received the kafka message, save it with "in-progress" status and stream it out. if it's successful, set the status to "succeed" and "failed" if not. When the user call the retry API, get the failed msg and do re-stream out function.

Concurrent situation

The existing implementation to consume Kafka message already has re-try mechanism :

@Transactional
@KafkaListener(topics = "${xxx}", containerFactory = "retryableKafkaListenerContainerFactory", concurrency = "#{@kafkaProperties.getConcurrencyValueForRetryableConsumers({'order-status-changed'}, ${service.app.instances:2})}")
public void processOrderStatusStreamingMessage(@Payload final Message<OrderStatusChangedPayload> message) {
    
        ...
}

And we also need to prevent the user call the "retry" API multi times, the data can only be streamed out once.

Solution

After receive the retry request, get the message with "failed" status with write lock, set the status to "in-progress", then run the stream out function.

If there's concurrent requests, it has to wait. After the first thread release the lock, the status already set to "in-progress", the stream out function won't be called.

Using db write lock is not the best way from performance view, but simple and easy to go. Redis lock should be considerred for better performance.

Using DB write lock, and the query is by the message status, when design the message persistance, should set index for the status so that only lock the row instead of the whole table.

ERROR - ❌ 程序发生严重错误: cannot access local variable 'tid' where it is not associated with a value Traceback (most recent call last): File "C:\Users\Administrator\PyCharmMiscProject\geetest2.py", line 216, in manage_tasks tid, task = future.result() ^^^^^^^^^^^^^^^ File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 449, in result return self.__get_result() ^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\_base.py", line 401, in __get_result raise self._exception File "C:\Users\Administrator\AppData\Local\Programs\Python\Python312\Lib\concurrent\futures\thread.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\PyCharmMiscProject\geetest2.py", line 163, in export_gridded_image task = ee.batch.Export.image.toDrive( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\PyCharmMiscProject\.venv\Lib\site-packages\ee\batch.py", line 505, in toDrive config = _prepare_image_export_config(image, config, ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\PyCharmMiscProject\.venv\Lib\site-packages\ee\batch.py", line 1209, in _prepare_image_export_config request['fileExportOptions'] = _build_image_file_export_options( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\PyCharmMiscProject\.venv\Lib\site-packages\ee\batch.py", line 1499, in _build_image_file_export_options raise ee_exception.EEException( ee.ee_exception.EEException: Unknown file format options: {'tileSize': 256, 'maxFileSize': 10737418240}. During handling of the above exception, another exception occurred: Traceback (most recent call last): File "C:\Users\Administrator\PyCharmMiscProject\geetest2.py", line 616, in <module> main() File "C:\Users\Administrator\PyCharmMiscProject\geetest2.py", line 552, in main task_manager = manage_tasks(task_manager, executor) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\Administrator\PyCharmMiscProject\geetest2.py", line 225, in manage_tasks task_manager['failed'].append((tid, str(e))) ^^^ UnboundLocalError: cannot access local variable 'tid' where it is not associated with a value
11-26
方案:多任务处理系统 ## 概述 这个程序方案旨在创建一个高效的多任务处理系统,能够同时处理多个任务并优化资源分配。 ## 核心功能 1. **任务队列管理** - 优先级队列实现 - 任务分类与标签系统 - 动态任务调度 2. **资源分配模块** - 自动负载均衡 - 资源监控与预警 - 智能资源分配算法 3. **并行处理引擎** - 多线程/多进程支持 - 异步I/O处理 - 任务依赖关系解析 ## 技术实现 ```python import concurrent.futures import queue import threading class MultiTaskSystem: def __init__(self, max_workers=4): self.task_queue = queue.PriorityQueue() self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=max_workers) self.resource_monitor = ResourceMonitor() def add_task(self, task, priority=1): """添加任务到队列""" self.task_queue.put((priority, task)) def start_processing(self): """启动任务处理""" while not self.task_queue.empty(): priority, task = self.task_queue.get() if self.resource_monitor.can_allocate(task): self.executor.submit(task.execute) class ResourceMonitor: """资源监控类""" def __init__(self): self.available_resources = {...} def can_allocate(self, task): """检查是否有足够资源执行任务""" return all(self.available_resources[k] >= v for k, v in task.resource_requirements.items()) ``` ## 扩展功能 1. **任务可视化面板** - 实时显示任务状态 - 资源使用图表 - 历史数据分析 2. **智能预测系统** - 任务完成时间预测 - 资源需求预测 - 异常任务检测 3. **API接口** - RESTful API 集成 - WebSocket 实时更新 - 第三方服务对接 ## 部署方案 1. **容器化部署** - Docker 镜像打包 - Kubernetes 集群支持 - 自动扩缩容配置 2. **监控与日志** - Prometheus 指标收集 - ELK 日志系统 - 告警通知集成
08-19
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值