- 博客(6)
- 收藏
- 关注
原创 [NeurIPS 2020] Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics
This algorithm is based on POMDP family, but it replaces the state of the environment by the belief of the agent, transferring POMDP into belief MDP. Belief is sufficient statistics for the posterior. We can calculate the probability that the environment
2023-06-26 14:47:36
77
1
原创 [ICLR 2023] Explaining RL Decisions with Trajectories
1.1.1 Take the average of output tokens: 1.1.2 Use X-means methods to cluster similar trajectories: 1.1.3 Identify the least change in the original data that leads to the change in behavior of the RL agent: 1.1.4 For each cluster, train optimal p
2023-06-26 10:04:47
91
1
原创 [PLOS] Identification of animal behavioral strategies by inverse reinforcement learning
The optimal policy that maximizes the cumulative net reward is obtained as same in LMDP : The method to estimate fixed value function is given by: L(v(s)) is the likelihood of the sequential state transition. λ is a positive parameter that
2023-06-26 09:51:03
63
1
原创 [AABI 2022 Review] Kernel Density Bayesian Inverse Reinforcement Learning
ArXiv。
2023-06-26 09:48:37
100
1
原创 [ICLR 2021 Review] Behavioral Cloning from Noisy Demonstrations
The algorithm uses behavioral cloning to learn a policy and use that policy as reward to update the policy in the policy-gradient step. This article deal with problems in that the input trajectories include non-optimal state-action pairs. The main advan
2023-06-26 09:43:33
119
1
原创 [NeurIPS 2022 Review] Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior
The goal of the article is to learn the time-varying reward function with low-rank representation: The function defines the reward of a state at time t. The parameter K is the number of goal maps. α_k,t represents the weights of goal map k at time t.
2023-06-25 17:22:24
166
1
空空如也
空空如也
TA创建的收藏夹 TA关注的收藏夹
TA关注的人
RSS订阅