【重磅整理】180篇NIPS-2020顶会《强化学习领域》Accept论文大全

深度强化学习实验室

作者:《DeepRL-Lab》 & 《AMiner.cn》联合发布

来源:https://neurips.cc/Conferences/2020/

编辑:DeepRL

(图片来自新智元)

NeurIPS终于放榜,提交数再次创新高,与去年相比增加了38%,共计达到9454篇,总接收1900篇,其中谷歌以169篇傲视群雄,清华大学63篇,南京大学周志华教授团队3篇。论文接收率20.09%较去年有所下降,其中论文主题占比和结构图如下:

  • 算法(29%)

  • 深度学习(19%)

  • 强化学习(9%)


强化学习完整列表

[1]. Relabeling Experience with Inverse RL: Hindsight Inference for Policy Improvement

作者: Ben Eysenbach (Carnegie Mellon University) · XINYANG GENG (UC Berkeley) · Sergey Levine (UC Berkeley) · Russ Salakhutdinov (Carnegie Mellon University)

[2]. Generalised Bayesian Filtering via Sequential Monte Carlo

作者: Ayman Boustati (University of Warwick) · Omer Deniz Akyildiz (University of Warwick) · Theodoros Damoulas (University of Warwick & The Alan Turing Institute) · Adam Johansen (University of Warwick)

[3]. Softmax Deep Double Deterministic Policy Gradients

作者: Ling Pan (Tsinghua University) · Qingpeng Cai (Alibaba Group) · Longbo Huang (IIIS, Tsinghua Univeristy)

[4]. Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model

作者: Gen Li (Tsinghua University) · Yuting Wei (Carnegie Mellon University) · Yuejie Chi (CMU) · Yuantao Gu (Tsinghua University) · Yuxin Chen (Princeton University)

[5]. Learning Multi-Agent Coordination for Enhancing Target Coverage in Directional Sensor Networks

作者: Jing Xu (Peking University) · Fangwei Zhong (Peking University) · Yizhou Wang (Peking University)

[6]. Off-Policy Imitation Learning from Observations

作者: Zhuangdi Zhu (Michigan State University) · Kaixiang Lin (Michigan State University) · Bo Dai (Google Brain) · Jiayu Zhou (Michigan State University)

[7]. Can Q-Learning with Graph Networks Learn a Generalizable Branching Heuristic for a SAT Solver?

作者: Vitaly Kurin (University of Oxford) · Saad Godil (NVIDIA) · Shimon Whiteson (University of Oxford) · Bryan Catanzaro (NVIDIA)

[8]. DISK: Learning local features with policy gradient

作者: MichaÅ‚ Tyszkiewicz (EPFL) · Pascal Fua (EPFL, Switzerland) · Eduard Trulls (Google)

[9]. Learning Individually Inferred Communication for Multi-Agent Cooperation

作者: Ziluo Ding (Peking University) · Tiejun Huang (Peking University) · Zongqing Lu (Peking University)

[10]. Lifelong Policy Gradient Learning of Factored Policies for Faster Training Without Forgetting

作者: Jorge Mendez (University of Pennsylvania) · Boyu Wang (University of Western Ontario) · Eric Eaton (University of Pennsylvania)

[11]. Fixed-Support Wasserstein Barycenters: Computational Hardness and Fast Algorithm

作者: Tianyi Lin (UC Berkeley) · Nhat Ho (University of Texas at Austin) · Xi Chen (New York University) · Marco Cuturi (Google Brain  &  CREST - ENSAE) · Michael Jordan (UC Berkeley)

[12]. Memory Based Trajectory-conditioned Policies for Learning from Sparse Rewards

作者: Yijie Guo (University of Michigan) · Jongwook Choi (University of Michigan) · Marcin Moczulski (Google Brain) · Shengyu Feng (University of Illinois Urbana Champaign) · Samy Bengio (Google Research, Brain Team) · Mohammad Norouzi (Google Brain) · Honglak Lee (Google / U. Michigan)

[13]. Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition

作者: Zihan Zhang (Tsinghua University) · Yuan Zhou (UIUC) · Xiangyang Ji (Tsinghua University)

[14]. Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping

作者: Yujing Hu (NetEase Fuxi AI Lab) · Weixun Wang (Tianjin University) · Hangtian Jia (Netease Fuxi AI Lab) · Yixiang Wang (University of Science and Technology of China) · Yingfeng Chen (NetEase Fuxi AI Lab) · Jianye Hao (Tianjin University) · Feng Wu (University of Science and Technology of China) · Changjie Fan (NetEase Fuxi AI Lab)

[15]. Effective Diversity in Population Based Reinforcement Learning

作者: Jack Parker-Holder (University of Oxford) · Aldo Pacchiano (UC Berkeley) · Krzysztof M Choromanski (Google Brain Robotics) · Stephen J Roberts (University of Oxford)

[16]. A Boolean Task Algebra for Reinforcement Learning

作者: Geraud Nangue Tasse (University of the Witwatersrand) · Steven James (University of the Witwatersrand) · Benjamin Rosman (University of the Witwatersrand / CSIR)

[17]. A new convergent variant of Q-learning with linear function approximation

作者: Diogo Carvalho (GAIPS, INESC-ID) · Francisco S. Melo (IST/INESC-ID) · Pedro A. Santos (Instituto Superior Técnico)

[18]. Knowledge Transfer in Multi-Task Deep Reinforcement Learning for Continuous Control

作者: Zhiyuan Xu (Syracuse University) · Kun Wu (Syracuse University) · Zhengping Che (DiDi AI Labs, Didi Chuxing) · Jian Tang (DiDi AI Labs, DiDi Chuxing) · Jieping Ye (Didi Chuxing)

[19]. Multi-task Batch Reinforcement Learning with Metric Learning

作者: Jiachen Li (University of California, San Diego) · Quan Vuong (University of California San Diego) · Shuang Liu (University of California, San Diego) · Minghua Liu (UCSD) · Kamil Ciosek (Microsoft) · Henrik Christensen (UC San Diego) · Hao Su (UCSD)

[20]. Demystifying Orthogonal Monte Carlo and Beyond

作者: Han Lin (Columbia University) · Haoxian Chen (Columbia University) · Krzysztof M Choromanski (Google Brain Robotics) · Tianyi Zhang (Columbia University) · Clement Laroche (Columbia University)

[21]. On the Stability and Convergence of Robust Adversarial Reinforcement Learning: A Case Study on Linear Quadratic Systems

作者: Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)) · Bin Hu (University of Illinois at Urbana-Champaign) · Tamer Basar (University of Illinois at Urbana-Champaign)

[22]. Towards Playing Full MOBA Games with Deep Reinforcement Learning

作者: Deheng Ye (Tencent) · Guibin Chen (Tencent) · Wen Zhang (Tencent) · chen sheng (qq) · Bo Yuan (Tencent) · Bo Liu (Tencent) · Jia Chen (Tencent) · Hongsheng Yu (Tencent) · Zhao Liu (Tencent) · Fuhao Qiu (Tencent AI Lab) · Liang Wang (Tencent) · Tengfei Shi (Tencent) · Yinyuting Yin (Tencent) · Bei Shi (Tencent AI Lab) · Lanxiao Huang (Tencent) · qiang fu (Tencent AI Lab) · Wei Yang (Tencent AI Lab) · Wei Liu (Tencent AI Lab)

[23]. How to Learn a Useful Critic? Model-based Action-Gradient-Estimator Policy Optimization

作者: Pierluca D'Oro (MILA) · Wojciech  JaÅ›kowski (NNAISENSE SA)

[24]. Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting

作者: Ziping Xu (University of Michigan) · Ambuj Tewari (University of Michigan)

[25]. HiPPO: Recurrent Memory with Optimal Polynomial Projections

作者: Albert Gu (Stanford) · Tri Dao (Stanford University) · Stefano Ermon (Stanford) · Atri Rudra (University at Buffalo, SUNY) · Christopher Ré (Stanford)

[26]. Promoting Coordination through Policy Regularization in Multi-Agent Deep Reinforcement Learning

作者: Julien Roy (Mila) · Paul Barde (Quebec AI institute - Ubisoft La Forge) · Félix G Harvey (Polytechnique Montréal) · Derek Nowrouzezahrai (McGill University) · Chris Pal (MILA, Polytechnique Montréal, Element AI)

[27]. Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs

作者: Chung-Wei Lee (University of Southern California) · Haipeng Luo (University of Southern California) · Chen-Yu Wei (University of Southern California) · Mengxiao Zhang (University of Southern California)

[28]. Minimax Confidence Interval for Off-Policy Evaluation and Policy Optimization

作者: Nan Jiang (University of Illinois at Urbana-Champaign) · Jiawei Huang (University of Illinois at Urbana-Champaign)

[29]. Confounding-Robust Policy Evaluation in Infinite-Horizon Reinforcement Learning

作者: Nathan Kallus (Cornell University) · Angela Zhou (Cornell University)

[30]. Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition

作者: Tiancheng Jin (University of Southern California) · Haipeng Luo (University of Southern California)

[31]. Learning Retrospective Knowledge with Reverse Reinforcement Learning

作者: Shangtong Zhang (University of Oxford) · Vivek Veeriah (University of Michigan) · Shimon Whiteson (University of Oxford)

[32]. Combining Deep Reinforcement Learning and Search for Imperfect-Information Games

作者: Noam Brown (Facebook AI Research) · Anton Bakhtin (Facebook AI Research) · Adam Lerer (Facebook AI Research) · Qucheng Gong (Facebook AI Research)

[33]. Variance reduction for Langevin Monte Carlo in high dimensional sampling problems

作者: ZHIYAN DING (University of Wisconsin-Madison) · Qin Li (University of Wisconsin-Madison)

[34]. POMO: Policy Optimization with Multiple Optima for Reinforcement Learning

作者: Yeong-Dae Kwon (Samsung SDS) · Jinho Choo (Samsung SDS) · Byoungjip Kim (Samsung SDS) · Iljoo Yoon (Samsung SDS) · Youngjune Gwon (Samsung SDS) · Seungjai Min (Samsung SDS)

[35]. Mixed Hamiltonian Monte Carlo for Mixed Discrete and Continuous Variables

作者: Guangyao Zhou (Vicarious AI)

[36]. Self-Paced Deep Reinforcement Learning

作者: Pascal Klink (TU Darmstadt) · Carlo D'Eramo (TU Darmstadt) · Jan Peters (TU Darmstadt & MPI Intelligent Systems) · Joni Pajarinen (TU Darmstadt)

[37]. Efficient Model-Based Reinforcement Learning through Optimistic Policy Search and Planning

作者: Sebastian Curi (ETH Zürich) · Felix Berkenkamp (Bosch Center for Artificial Intelligence) · Andreas Krause (ETH Zurich)

[38]. Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies

作者: Nathan Kallus (Cornell University) · Masatoshi Uehara (Cornell University)

[39]. Off-Policy Evaluation and Learning for External Validity under a Covariate Shift

作者: Masatoshi Uehara (Cornell University) · Masahiro Kato (The University of Tokyo) · Shota Yasui (Cyberagent)

[40]. Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms

作者: Tengyu Xu (The Ohio State University) · Zhe Wang (Ohio State University) · Yingbin Liang (The Ohio State University)

[41]. Fast Epigraphical Projection-based Incremental Algorithms for Wasserstein Distributionally Robust Support Vector Machine

作者: Jiajin Li (The Chinese University of Hong Kong) · Caihua Chen (Nanjing University) · Anthony Man-Cho So (CUHK)

[42]. A maximum-entropy approach to off-policy evaluation in average-reward MDPs

作者: Nevena Lazic (DeepMind) · Dong Yin (DeepMind) · Mehrdad Farajtabar (DeepMind) · Nir Levine (DeepMind) · Dilan Gorur () · Chris Harris (Google) · Dale Schuurmans (Google Brain & University of Alberta)

[43]. Off-policy Policy Evaluation For Sequential Decisions Under Unobserved Confounding

作者: Hongseok Namkoong (Stanford University) · Ramtin Keramati (Stanford University) · Steve Yadlowsky (Stanford University) · Emma Brunskill (Stanford University)

[44]. Self-Imitation Learning via Generalized Lower Bound Q-learning

作者: Yunhao Tang (Columbia University)

[45]. Weakly-Supervised Reinforcement Learning for Controllable Behavior

作者: Lisa Lee (CMU / Google Brain / Stanford) · Ben Eysenbach (Carnegie Mellon University) · Russ Salakhutdinov (Carnegie Mellon University) · Shixiang (Shane) Gu (Google Brain) · Chelsea Finn (Stanford)

[46]. An Improved Analysis of  (Variance-Reduced) Policy Gradient and Natural Policy Gradient Methods

作者: Yanli Liu (UCLA) · Kaiqing Zhang (University of Illinois at Urbana-Champaign (UIUC)) · Tamer Basar (University of Illinois at Urbana-Champaign) · Wotao Yin (Alibaba US, DAMO Academy)

[47]. MOReL: Model-Based Offline Reinforcement Learning

作者: Rahul Kidambi (Cornell University) · Aravind Rajeswaran (University of Washington) · Praneeth Netrapalli (Microsoft Research) · Thorsten Joachims (Cornell)

[48]. Zap Q-Learning With Nonlinear Function Approximation

作者: Shuhang Chen (University of Florida) · Adithya M Devraj (University of Florida) · Fan Lu (University of Florida) · Ana Busic (INRIA) · Sean Meyn (University of Florida)

[49]. Reinforcement Learning with General Value Function Approximation: Provably Efficient Approach via Bounded Eluder Dimension

作者: Ruosong Wang (Carnegie Mellon University) · Russ Salakhutdinov (Carnegie Mellon University) · Lin Yang (UCLA)

[50]. Security Analysis of Safe and Seldonian Reinforcement Learning Algorithms

作者: Pinar Ozisik (UMass Amherst) · Philip Thomas (University of Massachusetts Amherst)

[51]. RepPoints v2: Verification Meets Regression for Object Detection

作者: Yihong Chen (Peking University) · Zheng Zhang (MSRA) · Yue Cao (Microsoft Research) · Liwei Wang (Peking University) · Stephen Lin (Microsoft Research) · Han Hu (Microsoft Research Asia)

[52]. Learning to Communicate in Multi-Agent Systems via Transformer-Guided Program Synthesis

作者: Jeevana Priya Inala (MIT) · Yichen Yang (MIT) · James Paulos (University of Pennsylvania) · Yewen Pu (MIT) · Osbert Bastani (University of Pennysylvania) · Vijay Kumar (University of Pennsylvania) · Martin Rinard (MIT) · Armando Solar-Lezama (MIT)

[53]. Belief-Dependent Macro-Action Discovery in POMDPs using the Value of Information

作者: Genevieve E Flaspohler (Massachusetts Institute of Technology) · Nicholas Roy (MIT) · John W Fisher III (MIT)

[54]. Bayesian Multi-type Mean Field Multi-agent Imitation Learning

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值