分布式大型模型推理框架 Cake 使用指南-优快云博客

分布式大型模型推理框架 Cake 使用指南

cake Distributed LLM and StableDiffusion inference for mobile, desktop and server. 项目地址: https://gitcode.com/gh_mirrors/cake3/cake

1. 项目介绍

Cake 是一个基于 Rust 的框架，用于分布式推理大规模模型，如 LLama3 和 Stable Diffusion。该项目旨在通过将消费硬件（如 iOS、Android、macOS、Linux 和 Windows 设备）转变为一个异构集群，来实现对大型（70B+）模型的运行。通过有效利用计划淘汰的硬件，该项目旨在使 AI 更易于获取和普及。

2. 项目快速启动

环境准备

确保你的系统中已经安装了 Rust。

编译核心库和 CLI 工具

不使用加速（将使用 CPU）：
```
cargo build --release
```
使用 Metal 加速（针对 Apple Silicon）：
```
cargo build --release --features metal
```
使用 CUDA 加速：
```
cargo build --release --features cuda
```

运行工作节点

运行以下命令来启动一个工作节点：

cake-cli --model /path/to/Meta-Llama-3-8B --mode worker --name worker0 --topology topology.yml --address 0.0.0.0:10128

运行主节点

运行以下命令来启动一个带有 OpenAI 兼容 REST API 的主节点：

cake-cli --model /path/to/Meta-Llama-3-8B --api 0.0.0.0:8080 --topology topology.yml

如果需要加载整个模型到单个实例中，可以省略拓扑文件：

cake-cli --model /path/to/Meta-Llama-3-8B --api 0.0.0.0:8080

3. 应用案例和最佳实践

定义模型部分

在 topology.yml 文件中定义模型部分，例如：

wsl2_on_windows:
  host: '192.168.1.2:10128'
  description: 'NVIDIA RTX 4090 24GB'
  layers:
    - 'unet'

macbook:
  host: '192.168.1.3:10128'
  description: 'Macbook M2'
  layers:
    - 'clip'
    - 'vae'

运行图像生成工作节点

cake-cli --model /path/to/hf/cache --mode worker --name wsl2_on_windows --model-type image-model --topology topology.yml --address 0.0.0.0:10128

使用 REST API 生成图像

curl http://master-ip:8080/api/v1/image -H "Content-Type: application/json" -d '{"image_args": {"sd-image-prompt": "An old man sitting on the chair at seaside", "sd-num-samples": 1, "sd-image-seed": 2439383}}'

4. 典型生态项目

目前，Cake 项目在 GitHub 上拥有超过 2.8k 的 Star，以及 162 个 Fork。它为分布式推理提供了可能，特别是在利用旧硬件进行 AI 推理方面有显著的优势。典型的生态项目包括但不限于在各类设备上部署推理任务，例如在移动设备、个人电脑以及服务器上分布模型负载，以实现高效的推理性能。

cake Distributed LLM and StableDiffusion inference for mobile, desktop and server. 项目地址: https://gitcode.com/gh_mirrors/cake3/cake

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考