@TOC【强化学习-单智能体】是什么使得Rl中的探索变得困难?
- 引言:本文对单智能体再强化学习中探索变得困难的三个主要原因进行了分析和说明。分别是:大的状态-动作空间、稀疏延迟奖励、白噪声问题。
A.大的状态-动作空间 “Large State–Action Space”
- 深度强化学习(DRL,Deep reinforcement learning)的难度自然会随着状态-动作空间的增长而增加。例如,现实世界的机器人通常具有图像或高频雷达信号等高维感官输入,并且具有多度的精细操作的自由。另一个实际的例子是推荐系统,它有图结构的数据作为状态和大量的离散动作。“The difficulty of DRL naturally increases with the growth of the state–action space. For example, real-world robots often have high-dimensional sensory inputs such as images or high-frequency radar signals and have numerous degrees of” “freedom for delicate manipulation. Another practical example is the recommendation system, which has graph-structured data as states and a large number of discrete actions.”