Paper reading： Human-level control through deep reinforcement learning

最新推荐文章于 2023-04-03 17:59:27 发布

ippputeeel

最新推荐文章于 2023-04-03 17:59:27 发布

阅读量229

点赞数

分类专栏： PaperReading

本文链接：https://blog.youkuaiyun.com/weixin_40485472/article/details/88634460

版权

PaperReading 专栏收录该内容

2 篇文章

订阅专栏

提出问题：

To use reinforcement learning successfully insituations approaching real-world complexity, however, agents are confronted with a difficult task: theymust derive efficient representations of the environment from high-dimensional sensory inputs, and use these to generalize past experience tonewsituations.

模型：

a deep Q-network（DQN）

combine RL with deep neural networks --- learn concepts such as object categories directly from raw sensory data

use deep CNN to approximate the optimal action-value function

RL 不稳定性的原因：

观察序列之间存在相关性（数据不独立）
Q函数有一个small update时，可能导致policy有很大的变化
Q值与目标值

solution：

Experience replay：randomizes over the data,removing correlations in the observation sequence and smoothing over changes in the data distribution
An iterative update that adjusts the action-values (Q) towards target values that are only periodically updated, thereby reducing correlations with the target