RL论文

最新推荐文章于 2025-10-13 18:56:20 发布

原创最新推荐文章于 2025-10-13 18:56:20 发布 · 694 阅读

·

0

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

文章标签：

#RL #DQN #论文

algorithm 同时被 3 个专栏收录

29 篇文章

订阅专栏

17 篇文章

订阅专栏

4 篇文章

订阅专栏

https://zhuanlan.zhihu.com/p/21378532?refer=intelligentunit

通用解决框架DQN：

DQN：Playing Atari with Deep Reinforcement Learning

Nature DQN：Human-levelcontrol through deep reinforcement learning

Nature DQN：Human-level Control Through Deep Reinforcement Learning

简介文：

RL：reinforcement learning：an introduction

POMDP方向：Partially Observable Markov Decision Processes

数据集上的改进：

优先经验回放方法：PrioritizedExperience Replay

训练上的改进：

异步训练（A3C）：AsynchronousMethods for Deep Reinforcement Learning

网络结构上的改进：

增加RNN：DeepRecurrent Q-Learning for Partially Observable MDP

增加TL：Actor-Mimic:Deep Multitask and Transfer Reinforcement Learning

评估单独动作价值：DuelingNetwork Architectures for Deep Reinforcement Learning

增加LSTM的DRQN：Deep Recurrent Q-Learning for Partially Observable MDPs

基于最优解计算结构的改进：

Target Q的改进：DeepReinforcement Learning with Double Q-learning

置信域策略优化（TRPO）：Trust Region Policy Optimization

基于Actor的PG方向：

基础：Policy Gradient Methods for Reinforcement Learning with FunctionApproximation

对数似然项解读：Why we consider log likelihood instead of Likelihood in GaussianDistribution

DPG算法：Deterministic Policy Gradient Algorithms

DDPG算法：CONTINUOUS CONTROL WITH DEEP REINFORCEMENT LEARNING

扩展应用领域的改进：

解决高难度游戏：UnifyingCount-Based Exploration and Intrinsic Motivation

连续控制上面：ContinuousDeep Q-Learning with Model-based Acceleration

平台：

SC2：StarCraft II: A New Challenge for Reinforcement Learning

elf：ELF: An Extensive, Lightweight and Flexible Research Platformfor Real-time Strategy Games

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。