深度强化学习(一): Deep Q Network(DQN)

DQN算法结合了深度学习与强化学习,解决了高维输入下的控制问题。通过目标值函数构造损失函数,经验回放打破数据相关性,双网络稳定算法更新。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

原文:https://blog.youkuaiyun.com/LagrangeSK/article/details/80321265 一、背景

DeepMind2013年的论文《Playing Atari with Deep Reinforcement Learning》指出:从高维感知输入(如视觉、语音)直接学习如何控制 agent 对强化学习(RL)来说是一大挑战。

之前很多RL算法依赖于手工选取的特征和线性函数逼近(对value function(值函数) 或 policy进行逼近)。但这些系统都依赖于特征的选取质量

深度学习(DL),尤其是CNN(卷积神经网络),可以很好的提取图像的高维特征, 那么我们很自然的想到是否可以将其应用于强化学习(RL)上?

二、DL和RL结合的挑战

那么很自然,我们需要关注DL和RL的结合有哪些挑战:

  • 深度学习方法的成功应用案例大部分都具备很好的数据集标签(labels),而RL没有明确的标签,只能通过一个有延迟(也可能有噪声)的reward来学习。
  • 另外,深度学习一般假设其样本都是独立同分布的,但在RL中,通常会遇到一段相关度很高的状态量(state),且状态的分布也不相同。
  • 过往的研究表明,使用非线性网络表示值函数时出现网络不稳定,收敛困难等问题。

三、DQN的解决方案

DQN将卷积神经网络(CNN)与Q学习结合起来,通过以下方法,对DL与RL结合存在的问题进行解决:

  • 采用Q learning的目标值函数来构造DL的标签,从而构造DL的loss function;
  • .采用了记忆回放(experience replay mechanism) 来解决数据关联性问题;
  • 使用一个CNN(MainNet)产生当前Q值,使用另外一个CNN(Target)产生Target Q值。(在2015年DeepMind的论文Human-level Control Through Deep Reinforcement Learning新版DQN中采用)
3.1 loss function 构造

RL原理此不赘述,Q learning的更新方程如下:

KaTeX parse error: Expected '}', got '&' at position 127: …+\gamma \max_{a&̲#x27;}Q(s'…θ

3.2 记忆回放(experience replay mechanism)

为了让 RL 的数据 (关联性不独立分布的数据) 更接近DL所需要的数据 (独立同分布数据),在学习过程中建立一个“记忆库”, 将一段时间内的state、action、state_ (下一时刻状态) 以及 reward 存储在记忆库里,每次训练神经网络时,从记忆库里随机抽取一个batch的记忆数据, 这样就打乱了原始数据的顺序,将数据的关联性减弱。

3.3 目标网络

为了使得算法性能更稳定,建立两个结构一样的神经网络:一直在更新神经网络参数的网络 (MainNet) 和用于更新Q值(TargetNet)。

  • .初始时刻将MainNet的参数赋值给TargetNet,
  • 然后MainNet继续更新神经网络参数,而TargetNet的参数固定不动。
  • 再过一段时间将MainNet的参数赋给TargetNet,如此循环往复。

这样使得一段时间内的目标Q值是稳定不变的,从而使得算法更新更加稳定。

四、算法伪代码

2013年版本:
在这里插入图片描述
2015年版本:
这里写图片描述

五、算法流程

2015年版本:

这里写图片描述


六、算法评价

解决的问题:

  • 解决了Q学习的QTable高维度灾难问题,使得Q值连续化
  • 将DL和RL数据集不兼容问题解决(记忆库、固定目标值函数网络)

存在问题:

  • action依然是从最大的Q值中选取,无法用于action连续的问题
  • 只能处理只需短时记忆问题,无法处理需长时记忆问题;
  • CNN不一定收敛,需精良调参。

参考文献:
https://blog.youkuaiyun.com/u013236946/article/details/72871858
Human-level Control Through Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning

        </div>
# Deep Reinforcement Learning for Keras [![Build Status](https://api.travis-ci.org/matthiasplappert/keras-rl.svg?branch=master)](https://travis-ci.org/matthiasplappert/keras-rl) [![Documentation](https://readthedocs.org/projects/keras-rl/badge/)](http://keras-rl.readthedocs.io/) [![License](https://img.shields.io/github/license/mashape/apistatus.svg?maxAge=2592000)](https://github.com/matthiasplappert/keras-rl/blob/master/LICENSE) [![Join the chat at https://gitter.im/keras-rl/Lobby](https://badges.gitter.im/keras-rl/Lobby.svg)](https://gitter.im/keras-rl/Lobby) ## What is it? `keras-rl` implements some state-of-the art deep reinforcement learning algorithms in Python and seamlessly integrates with the deep learning library [Keras](http://keras.io). Just like Keras, it works with either [Theano](http://deeplearning.net/software/theano/) or [TensorFlow](https://www.tensorflow.org/), which means that you can train your algorithm efficiently either on CPU or GPU. Furthermore, `keras-rl` works with [OpenAI Gym](https://gym.openai.com/) out of the box. This means that evaluating and playing around with different algorithms is easy. Of course you can extend `keras-rl` according to your own needs. You can use built-in Keras callbacks and metrics or define your own. Even more so, it is easy to implement your own environments and even algorithms by simply extending some simple abstract classes. In a nutshell: `keras-rl` makes it really easy to run state-of-the-art deep reinforcement learning algorithms, uses Keras and thus Theano or TensorFlow and was built with OpenAI Gym in mind. ## What is included? As of today, the following algorithms have been implemented: - Deep Q Learning (DQN) [[1]](http://arxiv.org/abs/1312.5602), [[2]](http://home.uchicago.edu/~arij/journalclub/papers/2015_Mnih_et_al.pdf) - Double DQN [[3]](http://arxiv.org/abs/1509.06461) - Deep Deterministic Policy Gradient (DDPG) [[4]](http://arxiv.org/abs/1509.02971) - Continuous DQN (CDQN or NAF) [[6]](http://arxiv.org/abs/1603.00748) - Cross-Entropy Method (CEM) [[7]](http://learning.mpi-sws.org/mlss2016/slides/2016-MLSS-RL.pdf), [[8]](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.81.6579&rep=rep1&type=pdf) - Dueling network DQN (Dueling DQN) [[9]](https://arxiv.org/abs/1511.06581) - Deep SARSA [[10]](http://people.inf.elte.hu/lorincz/Files/RL_2006/SuttonBook.pdf) You can find more information on each agent in the [wiki](https://github.com/matthiasplappert/keras-rl/wiki/Agent-Overview). I'm currently working on the following algorithms, which can be found on the `experimental` branch: - Asynchronous Advantage Actor-Critic (A3C) [[5]](http://arxiv.org/abs/1602.01783) Notice that these are **only experimental** and might currently not even run. ## How do I install it and how do I get started? Installing `keras-rl` is easy. Just run the following commands and you should be good to go: ```bash pip install keras-rl ``` This will install `keras-rl` and all necessary dependencies. If you want to run the examples, you'll also have to install `gym` by OpenAI. Please refer to [their installation instructions](https://github.com/openai/gym#installation). It's quite easy and works nicely on Ubuntu and Mac OS X. You'll also need the `h5py` package to load and save model weights, which can be installed using the following command: ```bash pip install h5py ``` Once you have installed everything, you can try out a simple example: ```bash python examples/dqn_cartpole.py ``` This is a very simple example and it should converge relatively quickly, so it's a great way to get started! It also visualizes the game during training, so you can watch it learn. How cool is that? Unfortunately, the documentation of `keras-rl` is currently almost non-existent. However, you can find a couple of more examples that illustrate the usage of both DQN (for tasks with discrete actions) as well as for DDPG (for tasks with continuous actions). While these examples are not replacement for a proper documentation, they should be enough to get started quickly and to see the magic of reinforcement learning yourself. I also encourage you to play around with other environments (OpenAI Gym has plenty) and maybe even try to find better hyperparameters for the existing ones. If you have questions or problems, please file an issue or, even better, fix the problem yourself and submit a pull request! ## Do I have to train the models myself? Training times can be very long depending on the complexity of the environment. [This repo](https://github.com/matthiasplappert/keras-rl-weights) provides some weights that were obtained by running (at least some) of the examples that are included in `keras-rl`. You can load the weights using the `load_weights` method on the respective agents. ## Requirements - Python 2.7 - [Keras](http://keras.io) >= 1.0.7 That's it. However, if you want to run the examples, you'll also need the following dependencies: - [OpenAI Gym](https://github.com/openai/gym) - [h5py](https://pypi.python.org/pypi/h5py) `keras-rl` also works with [TensorFlow](https://www.tensorflow.org/). To find out how to use TensorFlow instead of [Theano](http://deeplearning.net/software/theano/), please refer to the [Keras documentation](http://keras.io/#switching-from-theano-to-tensorflow). ## Documentation We are currently in the process of getting a proper documentation going. [The latest version of the documentation is available online](http://keras-rl.readthedocs.org). All contributions to the documentation are greatly appreciated! ## Support You can ask questions and join the development discussion: - On the [Keras-RL Google group](https://groups.google.com/forum/#!forum/keras-rl-users). - On the [Keras-RL Gitter channel](https://gitter.im/keras-rl/Lobby). You can also post **bug reports and feature requests** (only!) in [Github issues](https://github.com/matthiasplappert/keras-rl/issues). ## Running the Tests To run the tests locally, you'll first have to install the following dependencies: ```bash pip install pytest pytest-xdist pep8 pytest-pep8 pytest-cov python-coveralls ``` You can then run all tests using this command: ```bash py.test tests/. ``` If you want to check if the files conform to the PEP8 style guidelines, run the following command: ```bash py.test --pep8 ``` ## Citing If you use `keras-rl` in your research, you can cite it as follows: ```bibtex @misc{plappert2016kerasrl, author = {Matthias Plappert}, title = {keras-rl}, year = {2016}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/matthiasplappert/keras-rl}}, } ``` ## Acknowledgments The foundation for this library was developed during my work at the [High Performance Humanoid Technologies (H²T)](https://h2t.anthropomatik.kit.edu/) lab at the [Karlsruhe Institute of Technology (KIT)](https://kit.edu). It has since been adapted to become a general-purpose library. ## References 1. *Playing Atari with Deep Reinforcement Learning*, Mnih et al., 2013 2. *Human-level control through deep reinforcement learning*, Mnih et al., 2015 3. *Deep Reinforcement Learning with Double Q-learning*, van Hasselt et al., 2015 4. *Continuous control with deep reinforcement learning*, Lillicrap et al., 2015 5. *Asynchronous Methods for Deep Reinforcement Learning*, Mnih et al., 2016 6. *Continuous Deep Q-Learning with Model-based Acceleration*, Gu et al., 2016 7. *Learning Tetris Using the Noisy Cross-Entropy Method*, Szita et al., 2006 8. *Deep Reinforcement Learning (MLSS lecture notes)*, Schulman, 2016 9. *Dueling Network Architectures for Deep Reinforcement Learning*, Wang et al., 2016 10. *Reinforcement learning: An introduction*, Sutton and Barto, 2011 ## Todos - Documentation: Work on the documentation has begun but not everything is documented in code yet. Additionally, it would be super nice to have guides for each agents that describe the basic ideas behind it. - TRPO, priority-based memory, A3C, async DQN, ...
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值