自定义博客皮肤VIP专享

*博客头图:

格式为PNG、JPG,宽度*高度大于1920*100像素,不超过2MB,主视觉建议放在右侧,请参照线上博客头图

请上传大于1920*100像素的图片!

博客底图:

图片格式为PNG、JPG,不超过1MB,可上下左右平铺至整个背景

栏目图:

图片格式为PNG、JPG,图片宽度*高度为300*38像素,不超过0.5MB

主标题颜色:

RGB颜色,例如:#AFAFAF

Hover:

RGB颜色,例如:#AFAFAF

副标题颜色:

RGB颜色,例如:#AFAFAF

自定义博客皮肤

-+
  • 博客(6)
  • 收藏
  • 关注

原创 [NeurIPS 2020] Inverse Rational Control with Partially Observable Continuous Nonlinear Dynamics

This algorithm is based on POMDP family, but it replaces the state of the environment by the belief of the agent, transferring POMDP into belief MDP. Belief is sufficient statistics for the posterior. We can calculate the probability that the environment

2023-06-26 14:47:36 77 1

原创 [ICLR 2023] Explaining RL Decisions with Trajectories

1.1.1 Take the average of output tokens: 1.1.2 Use X-means methods to cluster similar trajectories: 1.1.3 Identify the least change in the original data that leads to the change in behavior of the RL agent: 1.1.4 For each cluster, train optimal p

2023-06-26 10:04:47 91 1

原创 [PLOS] Identification of animal behavioral strategies by inverse reinforcement learning

The optimal policy that maximizes the cumulative net reward is obtained as same in LMDP : The method to estimate fixed value function is given by: L(v(s)) is the likelihood of the sequential state transition. λ is a positive parameter that

2023-06-26 09:51:03 63 1

原创 [AABI 2022 Review] Kernel Density Bayesian Inverse Reinforcement Learning

ArXiv。

2023-06-26 09:48:37 100 1

原创 [ICLR 2021 Review] Behavioral Cloning from Noisy Demonstrations

The algorithm uses behavioral cloning to learn a policy and use that policy as reward to update the policy in the policy-gradient step. This article deal with problems in that the input trajectories include non-optimal state-action pairs. The main advan

2023-06-26 09:43:33 119 1

原创 [NeurIPS 2022 Review] Dynamic Inverse Reinforcement Learning for Characterizing Animal Behavior

The goal of the article is to learn the time-varying reward function with low-rank representation: The function defines the reward of a state at time t. The parameter K is the number of goal maps. α_k,t represents the weights of goal map k at time t.

2023-06-25 17:22:24 166 1

空空如也

空空如也

TA创建的收藏夹 TA关注的收藏夹

TA关注的人

提示
确定要删除当前文章?
取消 删除