Reinforcement Learning --by Richard Sutton

最新推荐文章于 2025-06-06 18:29:11 发布

原创最新推荐文章于 2025-06-06 18:29:11 发布 · 776 阅读

0 ·

CC 4.0 BY-SA版权

MachineLearning 专栏收录该内容

22 篇文章

订阅专栏

本文深入探讨了强化学习的核心概念，对比了监督学习和无监督学习，强调了试错搜索和延迟奖励的特点。文章还讨论了探索与利用的平衡，以及强化学习在目标导向代理与不确定环境交互中的关键作用。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.introduction

Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal. The learner is not told which actions to take, but instead must discover which actions yield the most reward by trying them. In the most interesting and challenging cases, actions may a↵ect not only the immediate reward but also the next situation and, through that, all subsequent rewards. These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

2.reinforcement learning & supervised learning &unsupervised learning

supervised learning：has label.
unsupervised learning :typically about finding structure hidden in collections of unlabeled data.
reinforcement learning ?* trying to maximize a reward signal instead of trying to find hidden structure. **

3.exploration and exploitation

To obtain a lot of reward, a reinforcement learning agent must prefer actions that it has tried in the past and found to be e↵ective in producing reward. But to discover such actions, it has to try actions that it has not selected before. The agent has to exploit what it has already experienced in order to obtain reward, but it also has to explore in order to make better action selections in the future.

❤️❤️❤️ Another key feature of reinforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment. This is in contrast to many approaches that consider subproblems without addressing how they might fit into a larger picture. For example, we have mentioned that much of machine learning research is concerned with supervised learning without explicitly specifying how such an ability would finally be useful. Other researchers have developed theories of planning with general goals, but without considering planning’s role in real-time decision making, or the question of where the predictive models necessary for planning would come from. Although these approaches have yielded many useful results, their focus on isolated subproblems is a significant limitation.
❤️❤️❤️All reinforcement learning agents have explicit goals。