强化学习
-
goal:learn how to take actions maximize reward
-
agent and environment
-
environment–>state–>agent–>action–>environment–>reward&next state–>agent
-
example :
- cart-pole problem(倒立摆问题)
list the Objective State Action & Reward - Robot locomotion
- Atari Games
- go
- cart-pole problem(倒立摆问题)
-
markov decision process (无记忆性)
S,A,R,P,Y
definition: Value function and Q-value function
how good is a state??? && how good a state-action pair???
Bellman equation:如果我们之前的状态选择是最大的,那么总体也是最优的
the optimal policy is a policy that every step is optimal
function Q is a very complex function and we want to use a neural network to approximate the function.
Training the Q-network:Experience Replay
弄一下放到一个集合里,再选取一个batch,以其为集合作为训练集。
论文关于Q-learning -
Spiking NN 脉冲神经网络
脉冲神经网络
少数派报告
本文深入探讨了强化学习的基本概念,包括目标、状态、动作、奖励及价值函数。通过实例如倒立摆问题、机器人行走及Atari游戏,阐述了强化学习在不同场景的应用。文章还介绍了马尔科夫决策过程及贝尔曼方程,讨论了Q-learning算法,并提到了使用神经网络逼近Q函数的方法。
4592





