91、强化学习中的泛化与应用

强化学习泛化与应用解析

rust6ferris

于 2025-09-12 16:22:54 发布

阅读量25

点赞数

CC 4.0 BY-SA版权

分类专栏：人工智能：现代方法精解文章标签：强化学习 Q学习 SARSA

本文链接：https://blog.youkuaiyun.com/rust6ferris/article/details/151887479

人工智能：现代方法精解专栏收录该内容

99 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

强化学习中的泛化与应用

一、Q学习与SARSA对比

Q学习是一种主动学习方法，用于学习每个状态下每个动作的价值Q(s, a)。以下是Q学习代理的代码：

function Q-LEARNING-AGENT(percept) returns an action
    inputs: percept, a percept indicating the current state s′ and reward signal r
    persistent: Q, a table of action values indexed by state and action, initially zero
              Nsa, a table of frequencies for state–action pairs, initially zero
              s, a, the previous state and action, initially null
    if s is not null then
        increment Nsa[s,a]
        Q[s,a]←Q[s,a] + α(Nsa[s,a])(r + γ maxa′ Q[s′,a′] −Q[s,a])
    s,a←s′,argmaxa′ f(Q[s′,a′],Nsa[s′,a′])
    return a

Q学习比SARSA更灵活，Q学习代理可以在各种探索策略的控制下学习如何表现良好。而SARSA适用于整体策略部分由其他代理或程序控制的情况，此时学习实际会发生的Q函数更好。不过，Q学