【重磅最新】163篇ICML-2021强化学习领域论文整理汇总(2021.06.07)

这篇博客汇总了ICML 2021会议中关于深度强化学习的163篇论文,涵盖知识转移、不确定性建模、模型鲁棒性、无模型学习等多个核心领域,以及实际应用如自动驾驶、金融交易等。研究者们探讨了新方法、理论突破和前沿技术,为强化学习的发展提供了丰富的视角和解决方案。

深度强化学习实验室

官网:http://www.neurondance.com/

论坛http://deeprl.neurondance.com/

作者:深度强化学习实验室

来源:整理自https://icml.cc/

ICML 是机器学习领域最重要的会议之一,在该会议上发表论文的研究者也会备受关注。近年来,ICML会议的投稿数量一直增长:ICML 2020 投稿量为4990篇,ICML 2021的投稿量5513, 在一个月之前,ICML 2021的论文接收结果已经公布,其中1184篇论文被接收,接收率为 21.5% 。

(注:图片参考自AI科技评论)

而就在近日,ICML 2021的论文接收列表也终于放了出来,本文整理强化学习领域大约163篇文章,具体列表如下:(也可访问实验室论坛参与讨论)

[1]. Revisiting Rainbow: Promoting more insightful and inclusive deep reinforcement learning research

作者: Johan Obando Ceron (UAO) · Pablo Samuel Castro (Google Brain)

[2]. First-Order Methods for Wasserstein Distributionally Robust MDP

作者: Julien Grand-Clement (IEOR Department, Columbia University) · Christian Kroer (Columbia University)

[3]. REPAINT: Knowledge Transfer in Deep Reinforcement Learning

作者: Yunzhe Tao (ByteDance) · Sahika Genc (Amazon AI) · Jonathan Chung (AWS) · TAO SUN (Amazon.com) · Sunil Mallya (Amazon AWS)

[4]. Uncertainty Weighted Actor-Critic for Offline Reinforcement Learning

作者: Yue Wu (Carnegie Mellon University) · Shuangfei Zhai (Apple) · Nitish Srivastava (Apple) · Joshua Susskind (Apple, Inc.) · Jian Zhang (Apple Inc.) · Ruslan Salakhutdinov (Carnegie Mellen University) · Hanlin Goh (Apple)

[5]. Detecting Rewards Deterioration in Episodic Reinforcement Learning

作者: Ido Greenberg (Technion) · Shie Mannor (Technion)

[6]. Model-Free Reinforcement Learning: from Clipped Pseudo-Regret to Sample Complexity

作者: Zhang Zihan (Tsinghua University) · Yuan Zhou (UIUC) · Xiangyang Ji (Tsinghua University)

[7]. Near Optimal Reward-Free Reinforcement Learning

作者: Zhang Zihan (Tsinghua University) · Simon Du (University of Washington) · Xiangyang Ji (Tsinghua University)

[8]. On Reinforcement Learning with Adversarial Corruption and Its Application to Block MDP

作者: Tianhao Wu (Peking University) · Yunchang Yang (Center for Data Science, Peking University) · Simon Du (University of Washington) · Liwei Wang (Peking University)

[9]. Average-Reward Off-Policy Policy Evaluation with Function Approximation

作者: Shangtong Zhang (University of Oxford) · Yi Wan (University of Alberta) · Richard Sutton (DeepMind / Univ Alberta) · Shimon Whiteson (University of Oxford)

[10]. Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL

作者: Andrea Zanette (Stanford University)

[11]. Is Model-Free Learning Nearly Optimal for Non-Stationary RL?

作者: Weichao Mao (University of Illinois at Urbana-Champaign) · Kaiqing Zhang (University of Illinois at Urbana-Champaign/MIT) · Ruihao Zhu (MIT) · David Simchi-Levi (MIT) · Tamer Basar (University of Illinois at Urbana-Champaign)

[12]. DouZero: Mastering DouDizhu with Self-Play Deep Reinforcement Learning

作者: Daochen Zha (Texas A&M University) · Jingru Xie (Kwai Inc.) · Wenye Ma (Kuaishou) · Sheng Zhang (Georgia Institute of Technology) · Xiangru Lian (Kwai Inc.) · Xia Hu (Texas A&M University) · Ji Liu (Kwai Seattle AI lab, University of Rochester)

[13]. Accelerating Safe Reinforcement Learning with Constraint-mismatched Baseline Policies

作者: Jimmy (Tsung-Yen) Yang (Princeton University) · Justinian Rosca (Siemens Corp.) · Karthik Narasimhan (Princeton) · Peter Ramadge (Princeton)

[14]. Revisiting Peng's Q($\lambda$) for Modern Reinforcement Learning

作者: Tadashi Kozuno (University of Alberta) · Yunhao Tang (Columbia University) · Mark Rowland (DeepMind) · Remi Munos (DeepMind) · Steven Kapturowski (Deepmind) · Will Dabney (DeepMind) · Michal Valko (DeepMind / Inria / ENS Paris-Saclay) · David Abel (DeepMind)

[15]. Ensemble Bootstrapping for Q-Learning

作者: Oren Peer (Technion) · Chen Tessler (Technion) · Nadav Merlis (Technion) · Ron Meir (Technion Israeli Institute of Technology)

[16]. Phasic Policy Gradient

作者: Karl Cobbe (OpenAI) · Jacob Hilton (OpenAI) · Oleg Klimov (OpenAI) · John Schulman (OpenAI)

[17]. Optimal Off-Policy Evaluation from Multiple Logging Policies

作者: Nathan Kallus (Cornell University) · Yuta Saito (Tokyo Institute of Technology.) · Masatoshi Uehara (Cornell University)

[18]. Risk Bounds and Rademacher Complexity in Batch Reinforcement Learning

作者: Yaqi Duan (Princeton University) · Chi Jin (Princeton University) · Zhiyuan Li (Princeton University)

[19]. Finite-Sample Analysis of Off-Policy Natural Actor-Critic Algorithm

作者: sajad khodadadian (georgia institute of technology) · Zaiwei Chen (Georgia Institute of Technology) · Siva Maguluri (Georgia Tech)

[20]. SUNRISE: A Simple Unified Framework for Ensemble Learning in Deep Reinforcement Learning

作者: Kimin Lee (UC Berkeley) · Michael Laskin (UC Berkeley) · Aravind Srinivas (UC Berkeley) · Pieter Abbeel (UC Berkeley & Covariant)

[21]. Reinforcement Learning with Prototypical Representations

作者: Denis Yarats (New York University) · Rob Fergus (Facebook / NYU) · Alessandro Lazaric (Facebook AI Research) · Lerrel Pinto (NYU/Berkeley)

[22]. Evaluating the Implicit Midpoint Integrator for Riemannian Hamiltonian Monte Carlo

作者: James Brofos (Yale University) · Roy Lederman (Yale University)

[23]. Deep Reinforcement Learning amidst Continual Structured Non-Stationarity

作者: Annie Xie (Stanford University) · James Harrison (Stanford University) · Chelsea Finn (Stanford)

[24]. Off-Policy Confidence Sequences

作者: Nikos Karampatziakis (Microsoft) · Paul Mineiro (Microsoft) · Aaditya Ramdas (Carnegie Mellon University)

[25]. Deeply-Debiased Off-Policy Interval Estimation

作者: Chengchun Shi (London School of Economics and Political Science) · Runzhe Wan (North Carolina State University) · Victor Chernozhukov (MIT) · Rui Song (North Carolina State University)

[26]. Improving Lossless Compression Rates via Monte Carlo Bits-Back Coding

作者: Yangjun Ruan (University of Toronto) · Karen Ullrich (FAIR) · Daniel Severo (University of Toronto) · James Townsend () · Ashish Khisti (Univ. of Toronto) · Arnaud Doucet (Oxford University) · Alireza Makhzani (University of Toronto) · Chris Maddison (University of Toronto)

[27]. Logarithmic Regret for Reinforcement Learning with Linear Function Approximation

作者: Jiafan He (University of California, Los Angeles) · Dongruo Zhou (UCLA) · Quanquan Gu (University of California, Los Angeles)

[28]. Randomized Entity-wise Factorization for Multi-Agent Reinforcement Learning

作者: Shariq Iqbal (University of Southern California) · Christian Schroeder (University of Oxford) · Bei Peng (University of Oxford) · Wendelin Boehmer (Delft University of Technology) · Shimon Whiteson (University of Oxford) · Fei Sha (Google Research)

[29]. Monotonic Robust Policy Optimization with Model Discrepancy

作者: yuankun jiang (Shanghai Jiao Tong University) · Chenglin Li (Shanghai Jiao Tong University) · Wenrui Dai (Shanghai Jiao Tong University) · Junni Zou (Shanghai Jiao Tong University) · Hongkai Xiong (Shanghai Jiao Tong University)

[30]. Guided Exploration with Proximal Policy Optimization using a Single Demonstration

作者: Gabriele Libardi (Pompeu Fabra University) · Gianni De Fabritiis (Universitat Pompeu Fabra) · Sebastian Dittert (Universitat Pompeu Fabra)

[31]. Diversity Actor-Critic: Sample-Aware Entropy Regularization for Sample-Efficient Exploration

作者: Seungyul Han (KAIST) · Youngchul Sung (KAIST)

[32]. On-Policy Reinforcement Learning for the Average-Reward Criterion

作者: Yiming Zhang (New York University) · Keith Ross (New York University Shanghai)

[33]. UneVEn: Universal Value Exploration for Multi-Agent Reinforcement Learning

作者: Tarun Gupta (University of Oxford) · Anuj Mahajan (Dept. of Computer Science, University of Oxford) · Bei Peng (University of Oxford) · Wendelin Boehmer (Delft University of Technology) · Shimon Whiteson (University of Oxford)

[34]. Demonstration-Conditioned Reinforcement Learning for Few-Shot Imitation

作者: Christopher Dance (NAVER LABS Europe) · Perez Julien (Naver Labs Europe) · Théo Cachet (Naver Labs Europe)

[35]. Feature Clustering for Support Identification in Extreme Regions

作者: Hamid Jalalzai (Inria) · Rémi Leluc (Télécom Paris)

[36]. Multi-Task Reinforcement Learning with Context-based Representations

作者: Shagun Sodhani (Facebook AI Research) · Amy Zhang (FAIR / McGill) · Joelle Pineau (McGill, Facebook)

[37]. Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √T Regret

作者: Asaf Cassel (Tel Aviv University) · Tomer Koren (Tel Aviv University and Google)

[38]. Learning and Planning in Average-Reward Markov Decision Processes

作者: Yi Wan (University of Alberta) · Abhishek Naik (University of Alberta) · Richard Sutton (DeepMind / Univ Alberta)

[39]. MetaCURE: Meta Reinforcement Learning with Empowerment-Driven Exploration

作者: Jin Zhang (Tsinghua University) · Jianhao Wang (Tsinghua University) · Hao Hu (Tsinghua University) · Tong Chen (Tsinghua University) · Yingfeng Chen (NetEase Fuxi AI Lab) · Changjie Fan (NetEase Fuxi AI Lab) · Chongjie Zhang (Tsinghua University)

[40]. A Lower Bound for the Sample Complexity of Inverse Reinforcement Learning

作者: Abi Komanduru (Purdue University) · Jean Honorio (Purdue University)

[41]. Safe Reinforcement Learning with Linear Function Approximation

作者: Sanae Amani (University of California, Los Angeles) · Christos Thrampoulidis (University of British Columbia) · Lin Yang (UCLA)

[42]. Combining Pessimism with Optimism for Robust and Efficient Model-Based Deep Reinforcement Learning

作者: Sebastian Curi (ETH) · Ilija Bogunovic (ETH Zurich) · Andreas Krause (ETH Zurich)

[43]. A Precise Performance Analysis of Support Vector Regression

作者: Houssem Sifaou (King Abdullah University of Science and Technology (KAUST)) · Abla Kammoun (KAUST) · Mohamed-Slim Alouini (King Abdullah University of Science and Technology )

[44]. Generalizable Episodic Memory for Deep Reinforcement Learning

作者: Hao Hu (Tsinghua University) · Jianing Ye (Peking University) · Guangxiang Zhu (Tsinghua University) · Zhizhou Ren (University of Illinois at Urbana-Champaign) · Chongjie Zhang (Tsinghua University)

[45]. Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping

作者: Dongruo Zhou (UCLA) · Jiafan He (University of California, Los Angeles) · Quanquan Gu (University of California, Los Angeles)

[46]. Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games

作者: Hongyi Guo (Northwestern University) · Zuyue Fu (Northwestern) · Zhuoran Yang (Princeton) · Zhaoran Wang (Northwestern U)

[47]. Adaptive Sampling for Best Policy Identification in Markov Decision Processes

作者: Aymen Al Marjani (ENS Lyon) · Alexandre Proutiere (KTH Royal Institute of Technology)

[48]. Inverse Constrained Reinforcement Learning

作者: Shehryar Malik (Information Technology University) · Usman Anwar (Information Technlogy University, Lahore.) · Alireza Aghasi (Georgia State University) · Ali Ahmed (Information Technology University)

[49]. Self-Paced Context Evaluation for Contextual Reinforcement Learning

作者: Theresa Eimer (Leibniz Universität Hannover) · André Biedenkapp (University of Freiburg) · Frank Hutter (University of Freiburg and Bosch Center for Artificial Intelligence) · Marius Lindauer (Leibniz University Hannover)

[50]. On the Convergence of Hamiltonian Monte Carlo with Stochastic Gradients

作者: Difan Zou (UCLA) · Quanquan Gu (University of California, Los Angeles)

[51]. DG-LMC: A Turn-key and Scalable Synchronous Distributed MCMC Algorithm via Langevin Monte Carlo within Gibbs

作者: Vincent Plassier (Huawei) · Maxime Vono (Lagrange Mathematics and Computing Research Center) · Alain Durmus (ENS Paris Saclay) · Eric Moulines (Ecole Polytechnique)

[52]. Meta Learning for Support Recovery in High-dimensional Precision Matrix Estimation

作者: Qian Zhang (Purdue University) · Yilin Zheng (Purdue university) · Jean Honorio (Purdue University)

[53]. Optimal Thompson Sampling strategies for support-aware CVaR bandits

作者: Dorian Baudry (CNRS/INRIA) · Romain Gautron (CIRAD - CGIAR) · Emilie Kaufmann (CNRS, Univ. Lille) · Odalric-Ambrym Maillard (Inria Lille - Nord Europe)

[54]. High Confidence Generalization for Reinforcement Learning

作者: James Kostas (University of Massachusetts Amherst) · Yash Chandak (University of Massachusetts Amherst) · Scott M Jordan (University of Massachusetts) · Georgios Theocharous (Adobe Research) · Philip Thomas (University of Massachusetts Amherst)

[55]. Robust Asymmetric Learning in POMDPs

作者: Andrew Warrington (University of Oxford) · Jonathan Lavington (University of British Columbia) · Adam Scibior (University of British Columbia) · Mark Schmidt (University of British Columbia) · Frank Wood (University of British Columbia)

[56]. Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach

作者: Yingjie Fei (Cornell University) · Zhuoran Yang (Princeton University) · Zhaoran Wang (Northwestern U)

[57]. Decoupling Value and Policy for Generalization in Reinforcement Learning

作者: Roberta Raileanu (NYU) · Rob Fergus (Facebook / NYU)

[58]. Learning Routines for Effective Off-Policy Reinforcement Learning

作者: Edoardo Cetin (King's College London) · Oya Celiktutan (King's College London)

[59]. Emergent Social Learning via Multi-agent Reinforcement Learning

作者: Kamal Ndousse (OpenAI) · Douglas Eck (Google Brain) · Sergey Levine (UC Berkeley) · Natasha Jaques (Google Brain, UC Berkeley)

[60]. DFAC Framework: Factorizing the Value Function via Quantile Mixture for Multi-Agent Distributional Q-Learning

作者: Wei-Fang Sun (National Tsing Hua University) · Cheng-Kuang Lee (NVIDIA Corporation) · Chun-Yi Lee (National Tsing Hua University)

[61]. Shortest-Path Constrained Reinforcement Learning for Sparse Reward Tasks

作者: Sungryull Sohn (University of Michigan) · Sungtae Lee (Yonsei University) · Jongwook Choi (University of Michigan) · Harm van Seijen (Microsoft Research) · Mehdi Fatemi (Microsoft Research) · Honglak Lee (Google / U. Michigan)

[62]. What Structural Conditions Permit

评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值