Actor:
language model
P(at|a0,…,at−1)=P(at|ct)(1)(1)P(at|a0,…,at−1)=P(at|ct)
category
plan2vec
plan sequences classification
观测序列
o¯=∑i=1nϕ(oi)(2)(2)o¯=∑i=1nϕ(oi)
根据o¯o¯确定最相似的plan的类别,确定观测序列所属的主题TT,然后在主题中选出最相似的kk个plan
E[R1:∞]E[R1:∞]
∂E[R1:∞]∂θ==E[∂∂θlogπ(a|s)(Qπ(s,a)−Vπ(s))]E[∂∂θlogP(a|s)(Qπ(s,a)−∑aQπ(s,a))](72)∂E[R1:∞]∂θ=E[∂∂θlogπ(a|s)(Qπ(s,a)−Vπ(s))](72)=E[∂∂θlogP(a|s)(Qπ(s,a)−∑aQπ(s,a))]
1443

被折叠的 条评论
为什么被折叠?



