GitHub_Trending/ai/aici常见问题解答：从安装到高级功能的全面解析-优快云博客

GitHub_Trending/ai/aici常见问题解答：从安装到高级功能的全面解析

【免费下载链接】aici 项目地址: https://gitcode.com/GitHub_Trending/ai/aici

项目简介

Artificial Intelligence Controller Interface (AICI) 是一个允许你构建控制器（Controllers）的工具，这些控制器可以实时约束和引导大型语言模型（LLM）的输出。控制器是灵活的程序，能够实现约束解码、动态编辑提示和生成文本，以及跨多个并行生成协调执行。

AICI 的核心优势在于：

灵活性：控制器可以用任何能编译成 WebAssembly (Wasm) 的语言编写，或在 Wasm 内部解释执行
安全性：控制器被沙箱化，无法访问文件系统、网络或任何其他资源
高性能：Wasm 模块被编译为本地代码，与 LLM 推理引擎并行运行，对生成过程的开销极小

详细架构可参考 README.md 中的架构图。

安装与环境配置

开发环境准备

要编译 AICI 组件，需要为 Rust 设置开发环境。对于本快速入门，还需要 Python 3.11 或更高版本来创建控制器。

Windows WSL / Linux / macOS

[!NOTE] Windows 用户：请使用 WSL2 或随附的 devcontainer。 MacOS 用户：请确保已安装 XCode 命令行工具，方法是运行 xcode-select -p，如果未安装，请运行 xcode-select --install。 CUDA：CUDA 构建依赖于特定的 libtorch 安装，强烈建议使用随附的 devcontainer。

使用系统包管理器安装构建代码所需的必要工具，包括 git、cmake 和 ccache。

例如，在 WSL / Ubuntu 中使用 apt：

sudo apt-get install --assume-yes --no-install-recommends \
    build-essential cmake ccache pkg-config libssl-dev libclang-dev clang llvm-dev git-lfs

或在 macOS 上使用 Homebrew：

brew install git cmake ccache

然后按照这里和这里提供的说明安装 Rust、Rustup 和 Cargo：

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

安装后，通过从终端运行 rustup --version 命令来验证该命令是否可访问。如果命令未被识别，请尝试打开新的终端会话。

接下来安装 wasm32-wasi Rust 组件：

rustup target add wasm32-wasi

如果你已经安装了 Rust，或者 Cargo 抱怨版本过时，请运行：

rustup update

最后，要使用 Python 控制器和脚本（如本教程），运行以下命令安装所需的包：

pip install pytest pytest-forked ujson posix_ipc numpy requests

构建并启动 rLLM 服务器和 AICI 运行时

rLLM 服务器有两个后端，一个基于 libtorch 和 CUDA (rllm-cuda)，另一个基于 llama.cpp (rllm-llamacpp)。

rllm-cuda 后端仅适用于计算能力 8.0 或更高版本的 NVidia GPU（A100 及更高版本；RTX 30x0 及更高版本），并且需要繁琐的 libtorch 设置——强烈建议使用随附的 devcontainer。虽然本指南重点介绍 rllm-llamacpp 后端，但除文件夹名称外，rllm-cuda 的构建步骤相同。

完成上述开发环境设置后，克隆 AICI 仓库并继续以下步骤。

使用以下命令构建和运行 aicirt 和 rllm-llamacpp：

cd rllm/rllm-llamacpp
./server.sh phi2

你可以将其他模型名称作为参数传递（运行不带参数的 ./server.sh 以查看可用模型）。你也可以使用指向 .gguf 文件的 HuggingFace URL 或本地路径到 .gguf 文件。（对于 rllm-cuda，使用 HuggingFace 模型 ID 或文件夹路径）。

./server.sh orca

你可以在这里找到有关 rllm-llamacpp 的更多详细信息。

rLLM 服务器提供 HTTP 接口，用于配置任务和处理请求。你也可以使用此接口快速验证其状态。例如，如果你打开 http://127.0.0.1:4242/v1/models，你应该看到：

{
  "object": "list",
  "data": [
    {
      "object": "model",
      "id": "TheBloke/phi-2-GGUF",
      "created": 946810800,
      "owned_by": "owner"
    }
  ]
}

确认所选模型已加载。

基本使用方法

如何使用 AICI 控制 LLM 输出

假设我们希望模型生成一个列表，遵循特定格式并仅包含五个项目。

通常，实现这一点需要提示工程，精确地设计提示并提供清晰的指令，例如：

What are the five most popular types of vehicles?
Return the result as a numbered list.
Do not add explanations, only the list.

考虑到每个模型往往会添加解释并且对指令的理解不同，因此提示也会因所使用的模型而异。

使用 AICI，我们将控制权交还给代码，我们可以将提示简化为：

What are the most popular types of vehicles?

使用代码来：

将列表限制为 5 项
防止模型添加一些初始解释
格式化为编号列表
阻止模型在列表后添加一些文本

让我们创建一个 list-of-five.py python 文件，内容如下：

import pyaici.server as aici

# Force the model to generate a well formatted list of 5 items, e.g.
#   1. name 1
#   2. name 2
#   3. name 3
#   4. name 4
#   5. name 5
async def main():
    
    # This is the prompt we want to run.
    # Note how the prompt doesn't mention a number of vehicles or how to format the result.
    prompt = "What are the most popular types of vehicles?\n"

    # Tell the model to generate the prompt string, ie. let's start with the prompt "to complete"
    await aici.FixedTokens(prompt)

    # Store the current position in the token generation process
    marker = aici.Label()

    for i in range(1,6):
      # Tell the model to generate the list number
      await aici.FixedTokens(f"{i}.")

      # Wait for the model to generate a vehicle name and end with a new line
      await aici.gen_text(stop_at = "\n")

    await aici.FixedTokens("\n")

    # Store the tokens generated in a result variable
    aici.set_var("result", marker.text_since())

aici.start(main())

运行脚本与发送提示并没有太大不同。在这种情况下，我们同时发送控制逻辑和指令。

要查看最终结果，请执行以下命令：

./aici.sh run list-of-five.py

结果：

Running with tagged AICI Controller: gh:microsoft/aici/pyctrl
[0]: FIXED 'What are the most popular types of vehicles?\n'
[0]: FIXED '1.'
[0]: GEN ' Cars\n'
[0]: FIXED '2.'
[0]: GEN ' Motorcycles\n'
[0]: FIXED '3.'
[0]: GEN ' Bicycles\n'
[0]: FIXED '4.'
[0]: GEN ' Trucks\n'
[0]: FIXED '5.'
[0]: GEN ' Boats\n'
[0]: FIXED '\n'
[DONE]
[Response] What are the most popular types of vehicles?
1. Cars
2. Motorcycles
3. Bicycles
4. Trucks
5. Boats

response saved to tmp/response.json
Usage: {'sampled_tokens': 16, 'ff_tokens': 37, 'cost': 69}
Timing: {'http_response': 0.05193686485290527, 'data0': 0.05199289321899414, 'first_token': 0.0658726692199707, 'last_token': 0.1784682273864746}
Tokens/sec: {'prompt': 861.0913072488067, 'sampling': 89.65181217019571}
Storage: {'result': '1. Cars\n2. Motorcycles\n3. Bicycles\n4. Trucks\n5. Boats\n\n'}

控制器使用详解

PyCtrl 控制器

PyCtrl 通过在 Wasm 模块中嵌入 RustPython（Python 3 语言实现）以及特定类型输出约束的原生原语来实现 AI 控制器接口：固定令牌输出、正则表达式、LR(1) 语法、子字符串约束等。Python 代码通常仅用于将原语粘合在一起，因此对性能要求不高。

有几个示例脚本可用。这些脚本使用 pyaici.server 模块与 AICI 运行时通信并使用原生约束。

这与 jsctrl 非常相似，但使用 Python 而不是 JavaScript。

使用方法

你可以使用 aici.sh 脚本构建、上传和标记 PyCtrl Wasm 模块（假设运行服务器）：

../../aici.sh build . --tag pyctrl-latest

然后你可以运行 PyCtrl 示例：

../../aici.sh run --ctrl pyctrl-latest samples/test.py

你也可以一步构建并运行（不标记）：

../../aici.sh run --build . samples/test.py

无论哪种方式，你都会看到程序的控制台输出。

默认情况下，如果你没有将 --ctrl 传递给 aici.sh 但传递了 .py 文件，它将下载并使用 gh:microsoft/aici/pyctrl，这是 PyCtrl 的最新版本。

Python 解释器不包含完整的 Python 标准库，但部分被捆绑。除了常规的 Python 模块外，pyaici.server 模块也被捆绑。它定义了 AiciCallbacks 接口，该接口紧密反映了原生 AICI 接口的结构。AiciAsync 接受异步方法并将其转换为 AiciCallbacks 实现；这通常通过 aici.start() 调用。

import pyaici.server as aici

async def sample():
    # initialization code
    print("I'm going in the logs!")
    # ... more initialization code, it has long time limit
    prompt = await aici.GetPrompt()
    # here we're out of initialization code - the time limits are tight

    # This appends the exact string to the output; similar to adding it to prompt
    await aici.FixedTokens("The word 'hello' in French is")

    # generate text (tokens) matching the regex
    french = await aici.gen_text(regex=r' "[^"]+"', max_tokens=5)
    # set a shared variable (they are returned as JSON and are useful with aici.fork())
    aici.set_var("french", french)

    await aici.FixedTokens(" and in German")
    # shorthand for the above
    await aici.gen_text(regex=r' "[^"]+"', store_var="german")

    await aici.FixedTokens("\nFive")
    # generates one of the strings
    # aici.gen_tokens() and gen_text() are the same, except for return type
    await aici.gen_tokens(options=[" pounds", " euros", " dollars"])

aici.start(sample())

回溯

在 LLM 中，令牌是逐个生成的，删除最近生成的一批令牌可能很便宜。也可以一次追加多个令牌（参见上面的 aici.FixedTokens()）。aici.Label() 用于标记生成序列中的一个点，FixedTokens() 的 following= 参数用于回溯。例如：

import pyaici.server as aici

async def backtracking():
    await aici.FixedTokens("The word 'hello' in")
    # mark the current position
    l = aici.Label()
    # append text at label (here the following= is superfluous)
    await aici.FixedTokens(" French is", following=l)
    await aici.gen_tokens(regex=r' "[^"]+"', store_var="french", max_tokens=5)
    # now again append text at label - here following= is required
    await aici.FixedTokens(" German is", following=l)
    await aici.gen_tokens(regex=r' "[^"]+"', store_var="german", max_tokens=5)

aici.start(backtracking())

这将生成法语单词，将其存储在变量中，让 LLM 忘记它，然后生成德语单词（并存储它）。

请注意，这是按顺序发生的，我们可以使用 aici.fork() 并行生成两个单词。

分叉（Forking）

生成过程可以分叉为多个分支（可能多于两个）。分支可以通过共享变量（aici.set_var() 和 aici.get_var()）进行通信。

import pyaici.server as aici

async def forking():
    await aici.FixedTokens("The word 'hello' in")
    # fork into three branches
    id = await aici.fork(3)
    # see which branch we're in
    if id == 0:
        # in first branch, we wait for the other two branches to finish
        french, german = await aici.wait_vars("french", "german")
        # append some text, based on what the other branches did
        await aici.FixedTokens(f"{french} is the same as {german}.")
        # and then generate some tokens
        await aici.gen_tokens(max_tokens=5)
    # the other two branches are similar to previous examples
    elif id == 1:
        await aici.FixedTokens(" German is")
        await aici.gen_tokens(regex=r' "[^"]+"', store_var="german", max_tokens=5)
    elif id == 2:
        await aici.FixedTokens(" French is")
        await aici.gen_tokens(regex=r' "[^"]+"', store_var="french", max_tokens=5)

aici.start(forking)

令牌、字节和字符串

LLM 生成令牌。每个令牌由唯一的整数标识（我们通常不使用令牌的字符串名称）。不同的模型有不同的令牌集（词汇表），例如 Llama 有 32000 个令牌。每个令牌对应一个字节序列；这些通常是有效的 UTF-8 字符串，但并非总是如此（例如，某些表情符号或罕见的 Unicode 字符将被拆分为多个令牌）。因此：

aici.gen_tokens() 返回 list[int]
aici.gen_text() 返回 str，可能带有 Unicode 替换字符 (�)；在非 UTF-8 字节存在的情况下，这可能无法正常工作
共享变量作为 bytes 存储和返回（尽管写入它们时，你可以使用 str）

将来，我们可能需要扩展 re 以支持匹配 bytes 而不仅仅是 str。

限制和兼容性

无法访问文件或网络
仅包含标准库的部分（尽管可以轻松添加更多模块）
re 模块可用；所有 str 方法也可用
不能 pip install
没有多线程（但请参见 aici.fork()）

RustPython 通常与 Python 3 兼容。

JsCtrl 控制器

JsCtrl 是另一个常用的控制器，允许使用 JavaScript 来编写控制逻辑。使用方法与 PyCtrl 类似，但使用 JavaScript 语法。

例如，一个简单的 JavaScript Hello World：

async function main() {
    await $`Hello, world!`;
    let name = await gen({ regex: /[A-Z][a-z]+/ });
    await $` My name is ${name}.`;
}

start(main);

REST API 使用

AICI 服务器公开 REST API 用于上传和标记控制器（.wasm 文件），并扩展"完成"REST API 以允许运行控制器。

上传控制器

要上传控制器，请将其 POST 到 /v1/controllers。请注意，主体是原始二进制 .wasm 文件，不是 JSON 编码的。module_id 只是 .wasm 文件的 SHA256 哈希。响应中的其他字段可能会也可能不会返回：wasm_size 是输入大小（以字节为单位），compiled_size 是编译后的 Wasm 文件的大小，time 是编译 Wasm 文件所花费的时间（以毫秒为单位）。

// POST /v1/controllers
// ... binary of Wasm file ...
// 200 OK
{
  "module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
  "wasm_size": 3324775,
  "compiled_size": 11310512,
  "time": 393
}

运行控制器

要运行控制器，请 POST 到 /v1/run。controller 参数指定要运行的模块，可以是 HEX module_id 或标签名称（见下文）。controller_arg 是传递给模块的参数；它可以是 JSON 对象（它将被编码为字符串）或 JSON 字符串（将按原样传递）。jsctrl 期望参数是字符串，即要执行的程序。

// POST /v1/run
{
  "controller": "jsctrl-latest",
  "controller_arg": "async function main() {\n    await $`Ultimate answer is to the life, universe and everything is `\n    await gen({ regex: /\\d\\d/ })\n}\n\nstart(main)\n"
}

响应将是一系列事件流：

200 OK
data: {"id":"run-cfa3ed5b-7be1-4e57-a480-1873ad096817","object":"initial-run","cr...
data: {"object":"run","forks":[{"index":0,"text":"Ultimate answer is to the life,...
data: {"object":"run","forks":[{"index":0,"text":"2","error":"","logs":"GEN \"42....
data: {"object":"run","forks":[{"index":0,"finish_reason":"aici-stop","text":" ",...
data: {"object":"run","forks":[{"index":0,"finish_reason":"aici-stop","text":"","...
data: [DONE]

首先是 initial-run 对象，然后是零个或多个 run 对象。最后一个条目是字符串 [DONE]。

每个 run 条目包含：

forks - 请求中的序列列表
usage - 有关处理和生成的令牌数量的信息

每个分叉包含：

text - LLM 的结果；请注意，如果你使用回溯，它会变得混乱（AICI 插入额外的 ↩ 字符以指示回溯）
logs - 控制器的控制台输出
storage - 存储操作列表（这是提取控制器结果的一种方式）；WriteVar 中的 value 是十六进制编码的字节字符串
error - 发生错误时设置

usage 对象包含：

sampled_tokens - 生成的令牌数
ff_tokens - 处理的令牌数（提示、快进和生成的令牌）
cost - 运行成本（公式：2*sampled_tokens + ff_tokens；有待完善！）

// POST /v1/controllers/tags
{
  "module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
  "tags": ["jsctrl-test"]
}
// 200 OK
{
  "tags": [
    {
      "tag": "jsctrl-test",
      "module_id": "44f595216d8410335a4beb1cc530321beabe050817b41bf24855c4072c2dde2d",
      "updated_at": 1706140462,
      "updated_by": "mimoskal",
      "wasm_size": 3324775,
      "compiled_size": 11310512
    }
  ]
}

你也可以列出所有现有标签：

// GET /v1/controllers/tags
// 200 OK
{
  "tags": [
    {
      "tag": "pyctrl-v0.0.3",
      "module_id": "41bc81f0ce56f2add9c18e914e30919e6b608c1eaec593585bcebd61cc1ba744",
      "updated_at": 1705629923,
      "updated_by": "mimoskal",
      "wasm_size": 13981950,
      "compiled_size": 42199432
    },
    {
      "tag": "pyctrl-latest",
      "module_id": "41bc81f0ce56f2add9c18e914e30919e6b608c1eaec593585bcebd61cc1ba744",
      "updated_at": 1705629923,
      "updated_by": "mimoskal",
      "wasm_size": 13981950,
      "compiled_size": 42199432
    },
    ...
  ]
}

高级功能

提示工程与 AICI 结合使用

虽然 AICI 允许通过代码控制 LLM 输出，但结合良好的提示工程可以获得更好的结果。例如，在使用 AICI 约束生成列表时，仍然可以提供一个简洁的提示来引导模型：

What are the most popular types of vehicles?

然后使用 AICI 代码来确保输出格式和长度符合要求。

性能优化

性能关键代码是原生实现的。这包括：

TokenSet 类
RegexConstraint 类
SubstrConstraint 类
分词器/解分词器

你应该将生成令牌后运行的 Python 代码量限制在几行。

例如，计算 Llama 模型 32000 个词汇表中的允许令牌集需要：

C 编程语言的 Yacc 语法约 2.0ms
正则表达式约 0.3ms
子字符串约束，来自 4kB 字符串约 0.2ms

上述数字是针对单个序列的，但是每个序列在单独的进程中处理，因此如果内核数多于序列数（这是典型情况），它们不会改变。它们还包括调用 Wasm 中实现的 Python 解释器，然后返回到 Rust 生成的 Wasm 代码以实现约束本身的开销。它们都完全在 20-50ms 的预算内，因此根本不会影响生成时间。

采样的关键路径中也有一些开销。在并行执行 10 个序列时，每代步骤大约为 0.3ms（与使用的约束无关）。对于 40 个序列，开销上升到约 0.7ms（尽管尚未完全优化）。

WebAssembly 旨在与原生代码相比具有最小的开销。根据我们的经验，高度优化的 Rust 代码在 Wasmtime 中运行时比原生代码慢不到 2 倍。这比 JavaScript 或 Python 好 10-100 倍。

所有测量均在 AMD EPYC 7V13 和带有 80GB VRAM 的 nVidia A100 GPU 上完成。

常见问题解答

AICI 如何处理系统提示或聊天模式？

AICI 在令牌序列级别与模型交互。模型本身没有"系统提示"或"聊天消息"的 distinct 输入，而是用特定于模型的令牌包装。你需要找到模型的"指令格式"，通常在 HuggingFace 上的模型页面上。

例如，Orca-2-13b 模型具有以下指令格式：

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{user_message}<|im_end|>
<|im_start|>assistant

Mistral-Instruct 和 Mixtral-Instruct，以及 CodeLlama-Instruct 模型使用：

[INST]{instruction}[/INST]

有趣的是，<|im_start|> 和 <|im_end|> 是特殊令牌，而 [INST] 和 [/INST] 是常规字符串。

开始令牌（通常表示为 <s>）在 AICI 中始终是隐式的！

例如，对于 Orca 模型，你可以使用以下代码：

import pyaici.server as aici

system_message = "You are a helpful assistant."

async def ask(user_message: str):
    prompt = f"<|im_start|>system\n{system_message}<|im_end|>\n"
    prompt += f"<|im_start|>user\n{user_message}<|im_end|>\n"
    prompt += "<|im_start|>assistant\n"
    await aici.FixedTokens(prompt)

async def main():
    await ask("What is the capital of Poland?")
    await aici.gen_tokens(max_tokens=10, store_var="capital")

aici.start(main())

如何设置客户端访问 AICI？

安装 pyaici 包，导出凭据，看看连接是否正常：

pip uninstall pyaici
pip install -e "git+https://github.com/microsoft/aici#egg=pyaici&subdirectory=py"
export AICI_API_BASE="https://inference.example.com/v1/#key=wht_..."
aici infer --max-tokens=10 "Answer to the Ultimate Question of Life, the Universe, and Everything is"

要测试 pyctrl，创建 answer.py 文件：

import pyaici.server as aici

async def main():
    await aici.FixedTokens("The ultimate answer to the universe is ")
    await aici.gen_text(regex=r'\d\d', max_tokens=2)

aici.start(main())

你可以使用 aici run answer.py 运行它。尝试 aici run --help 查看可用选项。

你可以使用 aici --log-level=5 run answer.py 查看 REST 请求的参数，如果你想自己做的话。

总结与展望

AICI 提供了一个灵活、安全且高性能的方式来控制 LLM 的输出。通过使用 WebAssembly 技术，AICI 允许开发者用多种语言编写控制器，同时确保安全性和性能。

目前，AICI 已经支持多种控制器，如 PyCtrl 和 JsCtrl，以及多种 LLM 后端，如 llama.cpp 和 CUDA 加速的 rLLM。未来，AICI 计划支持更多的 LLM 后端和控制器类型，以满足不同的应用场景需求。

如果你有任何问题或建议，欢迎查阅官方文档或提交 issue。

希望本 FAQ 能帮助你更好地理解和使用 AICI。如有其他问题，请随时在社区中提问或贡献文档。

【免费下载链接】aici 项目地址: https://gitcode.com/GitHub_Trending/ai/aici

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考