Celebrating Diversity in Shared Multi-Agent Reinforcement Learning

Motivation
- Significance:
MARL is useful for many real-world applications, sensor networks, traffic management, coordinate robots, etc. - Problems:
Hard to learn effective policies in complex multi-agent scenarios, and action-obs space grows largely with number of agents. And PDSP (policy decentralization with shared params) is used to solve scalability problem.
But for PSDP, the drawback is: tasks usually require diversified policies among agents, while shared params lead to similar behaviors(under similar obs).
A tradeoff: sharing necessary params to accelerate learning while improve diversity.
- Keywords:
MARL, PDSP, tradeoff
Backgrounds:
Dec-POMDP(Decentralized partially observable MDP), CTDE, IGM,
Model
- Structure:
diversity-driven MARL framework - Theory:
Maximation of information-theoretic objective
Action-Value Learning for Balancing Diversity and Sharing
Overall Learning Objective
Experiment
- Metrics:
- Benchmark tasks & Baselines:
Google Research Football(GRF), StarCraft II micro-management(SMAC)
CDS(proposed), QPLEX, QMIX, MAVEN, EOI - Design:
Demonstration of how the approach works. - Conclusion:
State-of-art result.
Thinking
- Pros:
A novel mechanism of being diverse when necessary into shared multi-agent reinforcement learning
The balance between individual diversity and group coordination - Cros:
No ablation studies shown and more explanation about L1 not shown
Links:
Video: https://sites.google.com/view/celebrate-diversity-shared
Code: https://github.com/lich14/CDS
本文探讨了在复杂多智能体环境中,如何通过CDS框架解决参数共享带来的行为相似性问题,强调了在保持学习效率的同时促进个体多样性的关键。研究背景涉及Dec-POMDP和CTDE,方法论结合了信息论目标与行动价值学习。实验部分展示了在GRF和SMAC等任务上的先进性能,对比了CDS与其他MARL算法如QPLEX和QMIX。
4294

被折叠的 条评论
为什么被折叠?



