Multi-agent and distributed RL [Topic under construction]

最新推荐文章于 2024-04-12 21:41:16 发布

tangwing

最新推荐文章于 2024-04-12 21:41:16 发布

阅读量182

点赞数

CC 4.0 BY-SA版权

分类专栏：算法 # Data-Driven Decision Making # Paper notes

本文链接：https://blog.youkuaiyun.com/tangwing/article/details/108183840

算法同时被 3 个专栏收录

29 篇文章

订阅专栏

Data-Driven Decision Making

22 篇文章

订阅专栏

Paper notes

4 篇文章

订阅专栏

本文探讨了在多智能体环境中应用强化学习的挑战与解决方案，重点介绍了rollout策略的应用，该策略通过逐个考虑智能体的动作来简化行动空间，同时保持较高的决策质量。文章还讨论了分布式强化学习的基本思想，包括Q并行化、蒙特卡洛并行化等技术，旨在通过分解问题和近似计算提高效率。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

It's unrealistic to consider the environement as static with only one agent: it's almost always necessary to consider the reaction of multiple agents on each others' moves. With multiple agent, the space of control (action) becomes exponentially large on the number of agents. That is, for a binary action, with 10 agent, the possible action vector values would become 2^10. Some techniques are needed to deal with this large space.

As mentioned in https://www.youtube.com/watch?v=nTPuL6iVuwU, a talk by Bertsekas, rollout policy is discussed, which consider that all agents take an action sequentially, one by one, reducing the action space from n^m to nm, without much loss on the quality.

Distributed RL works on how to decompose the problem and make approximations by asymchronous computing. Some basic ideas are:

- Q-parallelization. Parallelize on the argmax part of Q-learning, i.e. when estimate the one-step look ahead value after each possible action

- Monte-carlo parallelization

- Multi-agent parallelization: use the multi-agent rollout method plus the two parallelization above

- Multi-processor parallelization: State-space partition. Make policy/value approximation on each subset.

Note that rollout also have many application on discrete/combinatorial problems.