Model Swarms: Collaborative Search to Adapt LLM EXPERTS VIA SWARM INTELLIGENCE 翻译-优快云博客

本文链接：https://blog.youkuaiyun.com/Doc2X/article/details/143972221

Doc2X：PDF转换与翻译领域的佼佼者
借助 Doc2X，您可以快速完成 PDF转Docx、Markdown、Latex 等格式转换，同时体验表格识别、公式编辑、双语对照翻译等贴心功能！
Doc2X: A Leader in PDF Conversion and Translation
With Doc2X, quickly convert PDF to Docx, Markdown, or LaTeX, and enjoy features like table recognition, formula editing, and bilingual translation for added convenience!
👉 了解更多 Doc2X 功能 | Learn More About Doc2X

原文链接：https://arxiv.org/pdf/2410.11163

Model Swarms: Collaborative Search to Adapt LLM EXPERTS VIA SWARM INTELLIGENCE

模型集群：通过集群智能协作搜索来适应大型语言模型专家

Shangbin Feng ${}^{1 * }$ Zifeng Wang ${}^{2}\;$ Yike Wang ${}^{1}\;$ Sayna Ebrahimi ${}^{3}$

冯尚斌 ${}^{1 * }$ 王自峰 ${}^{2}\;$ 王依珂 ${}^{1}\;$ Sayna Ebrahimi ${}^{3}$

Hamid Palangi ${}^{2}\;$ Lesly Miculicich ${}^{2}\;$ Achin Kulshrestha ${}^{4}\;$ Nathalie Rauschmayr ${}^{4}$ Yejin Choi ${}^{1}\;$ Yulia Tsvetkov ${}^{1}\;$ Chen-Yu Lee ${}^{2}\;$ Tomas Pfister ${}^{2}$

${}^{1}$ University of Washington ${}^{2}$ Google Cloud AI Research ${}^{3}$ Google DeepMind ${}^{4}$ Google

${}^{1}$ 华盛顿大学 ${}^{2}$ Google Cloud AI 研究 ${}^{3}$ Google DeepMind ${}^{4}$ Google

ABSTRACT

We propose MODEL SWARMS, a collaborative search algorithm to adapt LLMs via swarm intelligence, the collective behavior guiding individual systems. Specifically, MODEL SWARMS starts with a pool of LLM experts and a utility function. Guided by the best-found checkpoints across models, diverse LLM experts collaboratively move in the weight space and optimize a utility function representing model adaptation objectives. Compared to existing model composition approaches, MODEL SWARMS offers tuning-free model adaptation, works in low-data regimes with as few as 200 examples, and does not require assumptions about specific experts in the swarm or how they should be composed. Extensive experiments demonstrate that MODEL SWARMS could flexibly adapt LLM experts to a single task, multi-task domains, reward models, as well as diverse human interests,improving over 12 model composition baselines by up to ${21.0}\%$ across tasks and contexts. Further analysis reveals that LLM experts discover previously unseen capabilities in initial checkpoints and that MODEL SWARMS enable the weak-to-strong transition of experts through the collaborative search process.

我们提出了模型集群（MODEL SWARMS），这是一种通过集群智能协作搜索来适应大型语言模型（LLMs）的算法。具体来说，模型集群从一个大型语言模型专家池和一个效用函数开始。在跨模型最佳检查点的指导下，多样化的LLM专家在权重空间中协作移动，并优化一个代表模型适应目标的效用函数。与现有的模型组合方法相比，模型集群提供了无需调优的模型适应，能够在低数据环境下工作，仅需200个示例，并且不需要对集群中的特定专家或它们应如何组合做出假设。广泛的实验表明，模型集群能够灵活地将LLM专家适应于单任务、多任务领域、奖励模型以及多样化的人类兴趣，在任务和上下文中，其性能在12个模型组合基线上提高了多达 ${21.0}\%$ 。进一步的分析揭示，LLM专家在初始检查点中发现了以前未见的能力，并且模型集群通过协作搜索过程实现了专家从弱到强的转变。

1 INTRODUCTION

1 引言

Advancing beyond efforts to train a single, universal large language model (LLM) (Brown et al., 2020; Gemini Team et al., 2023) that shares parameters across all languages and tasks, recent work has increasingly recognized the importance of modularity through multi-LLM collaboration, where diverse models interact and complement each other in various ways (Shen et al., 2024c; Feng et al., 2024a; Chan et al., 2024; Du et al., 2024). For example, mixture-of-experts (MoE) relies on the routing of queries to various neural sub-components, leveraging the specialized expertise of one model (Masoudnia & Ebrahimpour, 2014; Roller et al., 2021; Pfeiffer et al., 2022; Jiang et al., 2024). Routing to domain-specific experts demonstrates great potential, while no new model/expert is produced in the MoE process. However, challenging real-world tasks often require flexible composition and adaptation to new domains and/or capabilities that go beyond the scope of an existing expert.

超越了训练单一、通用的巨型语言模型（LLM）（Brown et al., 2020; Gemini Team et al., 2023）的努力，这些模型在所有语言和任务中共享参数，最近的研究越来越认识到通过多LLM协作实现模块化的重要性，其中不同的模型以各种方式相互作用并互补（Shen et al., 2024c; Feng et al., 2024a; Chan et al., 2024; Du et al., 2024）。例如，专家混合（MoE）依赖于将查询路由到各种神经子组件，利用一个模型的专业知识（Masoudnia & Ebrahimpour, 2014; Roller et al., 2021; Pfeiffer et al., 2022; Jiang et al., 2024）。路由到特定领域的专家展示了巨大的潜力，而在MoE过程中不会产生新的模型/专家。然而，具有挑战性的现实任务通常需要灵活的组合和适应新领域和/或超出现有专家范围的能力。

Two lines of work aim to extend multi-LLM collaboration beyond routing to compose and produce new adapted models. 1) Learn-to-fuse designs trainable components to “glue” experts together into a merged model, then fine-tunes the model with supervised objectives to produce compositional experts (Jiang et al., 2023b; Wang et al., 2024b; Bansal et al., 2024). These approaches often rely on large training sets to tune the learnable parts from scratch and hardly offer the modularity of seamlessly adding/removing experts. 2) Model arithmetic composes LLM experts by conducting arithmetic operations on model weights and/or token probabilities (Ilharco et al., 2023; Yu et al., 2024; Yadav et al., 2024; Mavromatis et al., 2024; Liu et al., 2024). These approaches often come with strong assumptions about the available experts and how the desired adaptation should be decomposed (e.g., lion indoors = lion outdoors + (dog indoors - dog outdoors) (Ilharco et al., 2023)). As such, a flexible approach that does not rely on excessive tuning data or strong assumptions about existing models is crucial for adapting diverse LLM experts for wide-ranging purposes.

两项工作旨在将多LLM协作扩展到路由之外，以组合和生成新的适应模型。1) 学习融合设计可训练组件，将专家“粘合”在一起形成合并模型，然后通过监督目标对模型进行微调，以生成组合专家（Jiang et al., 2023b; Wang et al., 2024b; Bansal et al., 2024）。这些方法通常依赖于大型训练集从头开始调整可学习部分，并且几乎不提供无缝添加/移除专家的模块性。2) 模型算术通过在模型权重和/或令牌概率上进行算术运算来组合LLM专家（Ilharco et al., 2023; Yu et al., 2024; Yadav et al., 2024; Mavromatis et al., 2024; Liu et al., 2024）。这些方法通常对可用专家和所需适应的分解方式有强假设（例如，室内狮子 = 室外狮子 + (室内狗 - 室外狗)（Ilharco et al., 2023））。因此，一种不依赖于过多调整数据或对现有模型有强假设的灵活方法对于适应多样化的LLM专家以实现广泛目的是至关重要的。

To this end, we propose MODEL SWARMS, where multiple LLM experts collaboratively search for new adapted models in the weight space. Inspired by Particle Swarm Optimization (PSO) (Kennedy & Eberhart, 1995), MODEL SWARMS views each LLM expert as a “particle” and defines LLM adaptation as the collaborative movement of particles governed by a utility function representing an

为此，我们提出了MODEL SWARMS，其中多个LLM专家在权重空间中协作搜索新的适应模型。受粒子群优化（PSO）（Kennedy & Eberhart, 1995）的启发，MODEL SWARMS将每个LLM专家视为一个“粒子”，并将LLM适应定义为由表示为效用函数的协同粒子运动所支配的过程。

*Work done as a student researcher at Google Cloud AI Research. Corresponde to: Shangbin Feng (shang-bin@cs.washington.edu), Zifeng Wang (zifengw@google.com), and Chen-Yu Lee (chenyulee@google.com).

*在Google Cloud AI Research担任学生研究员期间完成的工作。对应作者：Shangbin Feng (shang-bin@cs.washington.edu)、Zifeng Wang (zifengw@google.com) 和 Chen-Yu Lee (chenyulee@google.com)。

Figure 1: We propose MODEL SWARMS, a collaborative search algorithm to adapt LLM experts via swarm intelligence. Guided by personal best ${\mathbf{p}}_{i}$ ,global best $\mathbf{g}$ ,and global worst ${\mathbf{g}}_{w}$ ,LLM experts update its velocity $\mathbf{v}$ and location $\mathbf{x}$ to explore the weight space and optimize a utility function $f$ . The best-found expert (global best $\mathbf{g}$ ) in the end is retained as the output.

图1：我们提出MODEL SWARMS，一种通过群体智能来适应LLM专家的协作搜索算法。在个人最佳 ${\mathbf{p}}_{i}$ 、全局最佳 $\mathbf{g}$ 和全局最差 ${\mathbf{g}}_{w}$ 的指导下，LLM专家更新其速度 $\mathbf{v}$ 和位置 $\mathbf{x}$ 以探索权重空间并优化效用函数 $f$ 。最终找到的最佳专家（全局最佳 $\mathbf{g}$ ）被保留作为输出。

adaptation objective. Specifically, to model the proactive search of LLMs instead of passive merging, each expert particle starts with a location (model weights) and a velocity (direction in the weight space). The velocity is iteratively impacted by inertia (the tendency to keep current velocity), personal best (the best-found location of a given particle), and global best/worst (the best/worst-found location among all particles), while LLM particles then take a step towards the updated velocity direction. These velocity factors enable LLM particles to chart an independent search path and explore the personal/global best neighborhoods. Thanks to the flexible search methodology, MODEL SWARMS does not need any supervised fine-tuning data or pre-existing knowledge about the LLM experts or the utility function, adapting LLM experts solely through collaborative search and movement guided by any model-to-scalar utility function.

适应目标。具体来说，为了模拟LLM的主动搜索而非被动合并，每个专家粒子从一个位置（模型权重）和一个速度（权重空间中的方向）开始。速度会受到惯性（保持当前速度的趋势）、个人最佳（给定粒子找到的最佳位置）和全局最佳/最差（所有粒子中找到的最佳/最差位置）的影响，而LLM粒子随后朝更新后的速度方向迈出一步。这些速度因素使LLM粒子能够绘制独立的搜索路径并探索个人/全局最佳邻域。得益于灵活的搜索方法，MODEL SWARMS不需要任何监督微调数据或关于LLM专家或效用函数的先验知识，仅通过协作搜索和由任何模型到标量效用函数引导的运动来适应LLM专家。

MODEL SWARMS achieves superior performance across four distinct LLM adaptation objectives:

MODEL SWARMS在四个不同的LLM适应目标上实现了卓越的性能：

Single task: Optimizing over as few as 200 instances, MODEL SWARMS outperforms 12 model composition baselines by ${13.3}\%$ across 9 datasets spanning knowledge,reasoning,and safety.
单一任务：在仅优化200个实例的情况下，MODEL SWARMS在跨越知识、推理和安全领域的9个数据集上，超越了12个模型组合基线 ${13.3}\%$ 。
Multi-task domain: Jointly optimizing multiple tasks in medical, legal, scientific, and cultural domains, MODEL SWARMS often produces Pareto-optimal experts than optimizing a single task.
多任务领域：在医疗、法律、科学和文化等多个领域联合优化多个任务，MODEL SWARMS 通常能产生比优化单一任务更优的帕累托最优专家。
Reward model: Optimizing reward model scores of general and conflicting preferences, MODEL SWARMS offers steerable experts that outperform baselines by up to 14.6% in controllability.
奖励模型：优化通用和冲突偏好的奖励模型分数，MODEL SWARMS 提供了可操控的专家，其可操控性比基线高出最多 14.6%。
Human interest: On 16 topics evaluated by humans (e.g., electric vehicles and PhD applications), Model Swarms produces experts on par or better than existing models in 85% of cases.
人类兴趣：在由人类评估的 16 个主题（例如电动汽车和博士申请）上，Model Swarms 在 85% 的情况下产生的专家与现有模型相当或优于现有模型。

Empirical analyses reveal that the diversity of starting experts is crucial, models display emerging capabilities not seen in initial checkpoints, and surprisingly, the best ending particle often did not start as the best. MODEL SWARMS could be accelerated with dropout-like strategies and seamlessly extended to token probability arithmetic for experts with different model architectures. We envision MODEL SWARMS as a versatile framework to reimagine the potential of diverse open models.

实证分析表明，初始专家的多样性至关重要，模型展现出在初始检查点中未见的新兴能力，并且令人惊讶的是，最佳的最终粒子往往并非从最佳开始。MODEL SWARMS 可以通过类似 dropout 的策略加速，并且可以无缝扩展到不同模型架构专家的标记概率算术。我们设想 MODEL SWARMS 作为一个多功能框架，重新设想多样化开放模型的潜力。

2 Methodology

2 方法论

We propose MODEL SWARMS, a collaborative search algorithm to adapt LLM experts via swarm intelligence. We present an overview of MODEL SWARMS in Figure 1 and Algorithm 1.

我们提出 MODEL SWARMS，一种通过群体智能适应 LLM 专家的协作搜索算法。我们在图 1 和算法 1 中概述了 MODEL SWARMS。

MODEL SWARMS assumes the access to various LLM experts ${\left\{ {\mathbf{x}}_{i}\right\} }_{i = 1}^{n}$ ,which could be full models or LoRA adapters (Hu et al., 2022) fine-tuned on diverse tasks and domains publicly available on model-sharing platforms (Wolf et al.,2019). It also requires a utility function $\mathbf{x} \rightarrow \mathcal{R}$ ,mapping each expert onto a scalar value that should be optimized for model adaptation. Utility functions could be dataset performance, reward model scores, or human preferences (Section 3).

MODEL SWARMS 假设可以访问各种 LLM 专家 ${\left\{ {\mathbf{x}}_{i}\right\} }_{i = 1}^{n}$ ，这些专家可以是完整模型或 LoRA 适配器（Hu et al., 2022），它们在模型共享平台上公开可用，并针对不同任务和领域进行了微调（Wolf et al., 2019）。它还需要一个效用函数 $\mathbf{x} \rightarrow \mathcal{R}$ ，将每个专家映射到一个标量值，该值应针对模型适应进行优化。效用函数可以是数据集性能、奖励模型分数或人类偏好（第 3 节）。

Inspired by Particle Swarm Optimization (Kennedy & Eberhart, 1995) and evolutionary algorithms in general (Bäck & Schwefel, 1993), MODEL SWARMS employs several terminologies:

受粒子群优化（Kennedy & Eberhart, 1995）和一般进化算法（Bäck & Schwefel, 1993）的启发，MODEL SWARMS 采用了几个术语：

Each LLM expert, or “particle” in the model swarm, has a location represented by model weights;
每个 LLM 专家，或模型群中的“粒子”，都有一个由模型权重表示的位置；
Each particle has a velocity, a direction in the model weight space that should move towards next;
每个粒子都有一个速度，即在模型权重空间中应朝向的下一个方向；
Personal best ${\mathbf{p}}_{i}$ : the best-found location of ${\mathbf{x}}_{i}$ based on utility function $f$ in its search history;
个人最佳 ${\mathbf{p}}_{i}$ ：基于效用函数 $f$ 在其搜索历史中找到的 ${\mathbf{x}}_{i}$ 的最佳位置；

Algorithm 1: Model Swarms

算法 1：模型群

Input: LLM experts ${\left\{ {\mathbf{x}}_{i}\right\} }_{i = 1}^{n}$ ,utility function $\mathbf{x} \rightarrow \mathcal{R}$ ; Hyperparameters: swarm size $N$ ,step length $\lambda$ ,step length schedule ${\phi }_{\lambda }$ ,inertia ${\phi }_{v}$ ,cognitive coefficient ${\phi }_{p}$ ,social coefficient ${\phi }_{g}$ ,repel coefficient ${\phi }_{w}$ ,patience $c$ ,restart patience ${c}_{r}$ ,max iteration $\mathcal{K}$ // initialize search pairwise interpolation to populate initial experts ${\left\{ {\mathbf{x}}_{i}\right\} }_{i = 1}^{N} =$ populate $\left( {\left\{ {\mathbf{x}}_{i}\right\} }_{i = 1}^{n}\right) ,N > n$ initialize global best checkpoint $\mathbf{g} \leftarrow \varnothing$ ,global worst checkpoint ${\mathbf{g}}_{w} \leftarrow \varnothing$ for $i = 1$ to $N$ do initialize personal best ${\mathbf{p}}_{i} \leftarrow {\mathbf{x}}_{i}$ ,velocity ${\mathbf{v}}_{i} \leftarrow \operatorname{random}\left( {\left\{ {\mathbf{x}}_{j}\right\} }_{j = 1}^{N}\right) - {\mathbf{x}}_{i}$ if $f\left( {\mathbf{x}}_{i}\right) > f\left( \mathbf{g}\right) ,\mathbf{g} \leftarrow {\mathbf{x}}_{i};$ if $f\left( {\mathbf{x}}_{i}\right) < f\left( {\mathbf{g}}_{w}\right) ,{\mathbf{g}}_{w} \leftarrow {\mathbf{x}}_{i}$