RL Policy-Based : Actor-Critic，A3C,DPG,DDPG,TRPO,PPO

最新推荐文章于 2025-08-09 10:59:44 发布

原创最新推荐文章于 2025-08-09 10:59:44 发布 · 408 阅读

·

2

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

03.RL 专栏收录该内容

3 篇文章

订阅专栏

RL Policy-Based ，基于策略梯度PG的算法：

PG基础: REINFORCE

PG扩展： Actor-Critic，A3C,DPG,DDPG,TRPO,PPO

=============

REINFORCE Algorithms ,Machine Learning,1992
Ronald J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992
https://people.cs.umass.edu/~barto/courses/cs687/williams92simple.pdf

Actor-Critic Algorithms, NIPS 1999
https://papers.nips.cc/paper/1999/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf
https://www.mit.edu/~jnt/Papers/J094-03-kon-actors.pdf

Asynchronous Advantage Actor-Critic, A3C , ICML 2016
https://arxiv.org/abs/1602.01783
https://github.com/dennybritz/reinforcement-learning/tree/master/PolicyGradient/a3c
A2C,Advantage Actor Aritic:A2C is a synchronous, deterministic variant of
Asynchronous Advantage Actor Critic (A3C)
https://github.com/openai/baselines/blob/master/baselines/a2c/a2c.py
https://openai.com/blog/baselines-acktr-a2c/
Deterministic Policy Gradient Algorithms, DPG ICML 2014
https://hal.inria.fr/file/index/docid/938992/filename/dpg-icml2014.pdf
Continuous Control with Deep Reinforcement Learning,DDPG, ICLR 2016
https://arxiv.org/abs/1509.02971
https://github.com/openai/baselines/tree/master/baselines/ddpg
https://spinningup.openai.com/en/latest/algorithms/ddpg.html

Distributed Distributional Deterministic Policy Gradients,D4PG, ICLR 2018
https://arxiv.org/abs/1804.08617

Trust Region Policy Optimization,TRPO, ICML 2015
https://arxiv.org/abs/1502.05477
Proximal Policy Optimization Algorithms,PPO,2017
https://arxiv.org/abs/1707.06347
https://github.com/openai/baselines/tree/master/baselines/ppo1

Policy Gradient Algorithms,Lilian Weng,OpenAI,2018~
https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html

Actor-Critic相关算法小结
https://zhuanlan.zhihu.com/p/29486661
https://blog.youkuaiyun.com/WASEFADG/article/details/81042818
深度增强学习（DRL）漫谈 - 从AC（Actor-Critic）到A3C（Asynchronous Advantage Actor-Critic）
https://jinzhuojun.blog.youkuaiyun.com/article/details/72851548

Ref:

Reinforcement Learning: A Survey 1996
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/rl-survey.html
REINFORCE Algorithms
https://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a-html/node37.html

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。