It's unrealistic to consider the environement as static with only one agent: it's almost always necessary to consider the reaction of multiple agents on each others' moves. With multiple agent, the space of control (action) becomes exponentially large on the number of agents. That is, for a binary action, with 10 agent, the possible action vector values would become 2^10. Some techniques are needed to deal with this large space.
As mentioned in https://www.youtube.com/watch?v=nTPuL6iVuwU, a talk by Bertsekas, rollout policy is discussed, which consider that all agents take an action sequentially, one by one, reducing the action space from n^m to nm, without much loss on the quality.
Distributed RL works on how to decompose the problem and make approximations by asymchronous computing. Some basic ideas are:
- Q-parallelization. Parallelize on the argmax part of Q-learning, i.e. when estimate the one-step look ahead value after each possible action
- Monte-carlo parallelization
- Multi-agent parallelization: use the multi-agent rollout method plus the two parallelization above
- Multi-processor parallelization: State-space partition. Make policy/value approximation on each subset.
Note that rollout also have many application on discrete/combinatorial problems.

本文探讨了在多智能体环境中应用强化学习的挑战与解决方案,重点介绍了rollout策略的应用,该策略通过逐个考虑智能体的动作来简化行动空间,同时保持较高的决策质量。文章还讨论了分布式强化学习的基本思想,包括Q并行化、蒙特卡洛并行化等技术,旨在通过分解问题和近似计算提高效率。

被折叠的 条评论
为什么被折叠?



