NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING

本文提出了一种基于梯度的神经架构搜索(NaS)方法,利用循环网络(controller)生成网络结构的字符串,并在真实数据集上训练。通过在验证集上的准确度作为反馈,使用策略梯度更新controller,使其在后续迭代中生成更优的架构。这种方法与超参数优化、进化算法、程序综合和元学习等方法相比,具有更高的灵活性和通用性。实验中,通过引入skip connections和branching layers扩大搜索空间,并使用强化学习的REINFORCE算法进行训练,加速了控制器的学习过程。

提出背景:尽管过去几年深度神经网络在极具挑战的任务上取得成功,以及越来越多样化的feature设计和架构设计的出现,但是设计深度模型架构依然需要很强的专业知识以及大量时间。

本文:提出了基于梯度(gradient-based)NaS方法(如下图),此工作是基于对‘一个神经网络的结构和连通性可以被一串可变字符来明确定义’的认知,因此可以通过一个循环网络(即controller)来生成string,在真实数据集上训练被这一string明确规定的网络(即child network)并在验证集上产生准确度。用这一准确度作为反馈信号,我们可以计算policy gradient以更新controller。最终,在下一次迭代中,controller将使架构以更高的概率获得更好的准确度,也就是说,controller将通过迭代学会提高搜索网络的准确度。

相关工作:

1、超参数优化(Hyperparameter optimization )应用广泛,缺陷:只能在固定长度的空间中搜索models。其中贝叶斯优化方法允许可变长度搜索,但是相较于本文缺少通用性和灵活性。

2、进化算法(Modern neuro-evolution algorithms)更灵活但是不实用,它们的局限性在于它们是基于搜索的方法,因此速度慢,或者需要许多启发式方法才能很好地工作。

3、NaS与program synthesis 以及 inductive programming有一定的相似性,都是从examples中搜索程序。在ML中,probabilistic program induction在很多应用中取得成功。

4、NaS中的controller是自动回归的,一次预测一个超参数,以先前的预测为条件。这一思想来源于decoder in end-to-end sequence to sequence learning,与之不同的是,我们

Multi-agent reinforcement learning (MARL) is a subfield of reinforcement learning (RL) that involves multiple agents learning simultaneously in a shared environment. MARL has been studied for several decades, but recent advances in deep learning and computational power have led to significant progress in the field. The development of MARL can be divided into several key stages: 1. Early approaches: In the early days, MARL algorithms were based on game theory and heuristic methods. These approaches were limited in their ability to handle complex environments or large numbers of agents. 2. Independent Learners: The Independent Learners (IL) algorithm was proposed in the 1990s, which allowed agents to learn independently while interacting with a shared environment. This approach was successful in simple environments but often led to convergence issues in more complex scenarios. 3. Decentralized Partially Observable Markov Decision Process (Dec-POMDP): The Dec-POMDP framework was introduced to address the challenges of coordinating multiple agents in a decentralized manner. This approach models the environment as a Partially Observable Markov Decision Process (POMDP), which allows agents to reason about the beliefs and actions of other agents. 4. Deep MARL: The development of deep learning techniques, such as deep neural networks, has enabled the use of MARL in more complex environments. Deep MARL algorithms, such as Deep Q-Networks (DQN) and Deep Deterministic Policy Gradient (DDPG), have achieved state-of-the-art performance in many applications. 5. Multi-Agent Actor-Critic (MAAC): MAAC is a recent algorithm that combines the advantages of policy-based and value-based methods. MAAC uses an actor-critic architecture to learn decentralized policies and value functions for each agent, while also incorporating a centralized critic to estimate the global value function. Overall, the development of MARL has been driven by the need to address the challenges of coordinating multiple agents in complex environments. While there is still much to be learned in this field, recent advancements in deep learning and reinforcement learning have opened up new possibilities for developing more effective MARL algorithms.
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值