It's unrealistic to consider the environement as static with only one agent: it's almost always necessary to consider the reaction of multiple agents on each others' moves. With multiple agent, the space of control (action) becomes exponentially large on the number of agents. That is, for a binary action, with 10 agent, the possible action vector values would become 2^10. Some techniques are needed to deal with this large space.
As mentioned in https://www.youtube.com/watch?v=nTPuL6iVuwU, a talk by Bertsekas, rollout policy is discussed, which consider that all agents take an action sequentially, one by one, reducing the action space from n^m to nm, without much loss on the quality.
Distributed RL works on how to decompose the problem and make approximations by asymchronous computing. Some basic ideas are:
- Q-parallelization. Parallelize on the argmax part of Q-learning, i.e. when estimate the one-step look ahead value after each possible action
- Monte-carlo parallelization
- Multi-agent parallelization: use the multi-agent rollout method plus the two parallelization above
- Multi-processor parallelization: State-space partition. Make policy/value approximation on each subset.
Note that rollout also have many application on discrete/combinatorial problems.