Loss is its own Reward: Self-Supervision for Reinforcement Learning

使用特定标签进行有监督训练
作者采用action、reward、state等作为标签,开展有监督训练工作,涉及信息技术领域的机器学习训练方式。

作者用action, reward, state等当做lalbel,进行有监督训练。

 

2025-08-28 08:34:14.948 | WARNING | PID:76327 | definition.reward_shaping:78 - aisrv Action count mismatch: 2 actions for 4 junctions. Truncating or padding actions. 2025-08-28 08:34:14.950 | WARNING | PID:76327 | definition.reward_shaping:171 - aisrv Converted list action to integer: 1 2025-08-28 08:34:14.951 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 0: No lanes for phase 1, using enter lanes 2025-08-28 08:34:14.952 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 0: name 'travel_reward' is not defined 2025-08-28 08:34:14.952 | WARNING | PID:76327 | definition.reward_shaping:171 - aisrv Converted list action to integer: 2 2025-08-28 08:34:14.953 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 1: No lanes for phase 2, using enter lanes 2025-08-28 08:34:14.954 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 1: name 'travel_reward' is not defined 2025-08-28 08:34:14.954 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 2: No lanes for phase 0, using enter lanes 2025-08-28 08:34:14.955 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 2: name 'travel_reward' is not defined 2025-08-28 08:34:14.956 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 3: No lanes for phase 0, using enter lanes 2025-08-28 08:34:14.957 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 3: name 'travel_reward' is not defined learner is alive aisrv is alive Error found in file: /data/projects/intelligent_traffic_lights_v2/log/aisrv/aisrv_kaiwu_rl_helper_pid76327_log_2025-08-28-08.log Error content (line 24): {"time": "2025-08-28 08:34:14.952136", "level": "ERROR", "message": "aisrv Error processing junction 0: name 'travel_reward' is not defined", "file": "definition.py", "line": "220", "module": "aisrv", "process": "definition", "function": "reward_shaping", "stack": "", "pid": 76327} 2025-08-28 08:34:18.131 | WARNING | PID:76327 | definition.reward_shaping:78 - aisrv Action count mismatch: 2 actions for 4 junctions. Truncating or padding actions. 2025-08-28 08:34:18.133 | WARNING | PID:76327 | definition.reward_shaping:171 - aisrv Converted list action to integer: 1 2025-08-28 08:34:18.133 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 0: No lanes for phase 1, using enter lanes 2025-08-28 08:34:18.134 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 0: name 'travel_reward' is not defined 2025-08-28 08:34:18.134 | WARNING | PID:76327 | definition.reward_shaping:171 - aisrv Converted list action to integer: 2 2025-08-28 08:34:18.135 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 1: No lanes for phase 2, using enter lanes 2025-08-28 08:34:18.135 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 1: name 'travel_reward' is not defined 2025-08-28 08:34:18.136 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 2: No lanes for phase 0, using enter lanes 2025-08-28 08:34:18.137 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 2: name 'travel_reward' is not defined 2025-08-28 08:34:18.137 | WARNING | PID:76327 | definition.reward_shaping:190 - aisrv Junction 3: No lanes for phase 0, using enter lanes 2025-08-28 08:34:18.138 | ERROR | PID:76327 | definition.reward_shaping:220 - aisrv Error processing junction 3: name 'travel_reward' is not defined 2025-08-28 08:34:18.140 | INFO | PID:76327 | train_workflow.workflow:98 - aisrv Avg Step Reward: 0.00, Epoch: 0, Data Length: 2
08-29
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值