PyTorch TorchChat 项目教程：本地大语言模型部署全攻略-优快云博客

PyTorch TorchChat 项目教程：本地大语言模型部署全攻略

【免费下载链接】torchchat Run PyTorch LLMs locally on servers, desktop and mobile 项目地址: https://gitcode.com/GitHub_Trending/to/torchchat

还在为云端LLM API调用费用高昂、网络延迟和隐私问题而烦恼吗？PyTorch TorchChat让你能够在本地服务器、桌面设备和移动端无缝运行大型语言模型（LLMs），彻底摆脱对外部服务的依赖！

通过本文，你将掌握：

🚀 零基础搭建 TorchChat 开发环境
📦 模型下载与管理 完整流程
💬 多种交互方式：CLI聊天、浏览器界面、REST API
⚡ 性能优化技巧：AOT编译、量化压缩
📱 移动端部署：iOS和Android实战指南
🔍 模型评估：准确性和性能测试方法

1. 环境准备与安装

系统要求

Python 3.10+
PyTorch 2.2+
支持的操作系统：Linux (x86)、macOS (Apple Silicon)、Android、iOS

安装步骤

# 克隆项目仓库
git clone https://gitcode.com/gh_mirrors/to/torchchat.git
cd torchchat

# 创建虚拟环境
python3 -m venv .venv
source .venv/bin/activate

# 安装依赖
./install/install_requirements.sh

# 创建模型导出目录
mkdir exportedModels

验证安装

python3 torchchat.py --help

2. 模型下载与管理

Hugging Face 账号配置

大多数模型通过 Hugging Face 分发，需要创建访问令牌：

huggingface-cli login

查看可用模型

python3 torchchat.py list

下载模型

# 下载 Llama 3.1 8B 模型
python3 torchchat.py download llama3.1

模型管理命令

mermaid

3. 基础使用：三种交互模式

3.1 CLI 聊天模式

python3 torchchat.py chat llama3.1

3.2 文本生成模式

python3 torchchat.py generate llama3.1 \
  --prompt "写一个关于人工智能的短故事"

3.3 浏览器界面

# 终端1：启动服务器
python3 torchchat.py server llama3.1

# 终端2：启动浏览器界面
streamlit run torchchat/usages/browser.py

4. 高级功能：模型导出与优化

4.1 AOT Inductor 编译（桌面/服务器）

# 导出编译模型
python3 torchchat.py export llama3.1 \
  --output-aoti-package-path exportedModels/llama3_1_artifacts.pt2

# Python环境运行
python3 torchchat.py generate llama3.1 \
  --aoti-package-path exportedModels/llama3_1_artifacts.pt2 \
  --prompt "Hello world"

# C++运行器（需要编译）
torchchat/utils/scripts/build_native.sh aoti
./cmake-out/aoti_run exportedModels/llama3_1_artifacts.pt2 \
  -z `python3 torchchat.py where llama3.1`/tokenizer.model \
  -i "Once upon a time"

4.2 ExecuTorch 移动端部署

# 安装 ExecuTorch
export TORCHCHAT_ROOT=${PWD}
./torchchat/utils/scripts/install_et.sh

# 导出移动端模型
python3 torchchat.py export llama3.1 \
  --quantize torchchat/quant_config/mobile.json \
  --output-pte-path llama3.1.pte

5. 量化优化：大幅减少模型大小

TorchChat 支持多种量化方案，显著降低内存占用：

量化配置示例

{
  "embedding": {"bitwidth": 4, "groupsize": 32},
  "linear:int4": {"groupsize": 32},
  "executor": {"device": "cuda"},
  "precision": {"dtype": "bf16"}
}

量化方案对比表

量化类型	位宽	组大小	激活量化	支持平台
Linear (非对称)	4/8	32-256	否	Eager/AOTI/ET
Linear + 动态激活	4	32-256	是	ExecuTorch
Embedding	4/8	32-256	否	全平台

使用量化模型

#  eager模式量化
python3 torchchat.py generate llama3.1 \
  --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}}' \
  --prompt "量化测试"

# AOTI量化导出
python3 torchchat.py export llama3.1 \
  --quantize '{"embedding": {"bitwidth": 4, "groupsize":32}, "linear:int4": {"groupsize":32}}' \
  --output-dso-path llama3.so

6. 多模态模型支持

TorchChat 支持 Llama 3.2 11B Vision 多模态模型：

# 图像+文本生成
python3 torchchat.py generate llama3.2-11B \
  --prompt "这张图片里有什么？" \
  --image-prompt assets/dog.jpg

# 多模态服务器
python3 torchchat.py server llama3.2-11B

7. 移动端部署实战

7.1 Android 部署

# 准备模型文件
adb shell mkdir -p /data/local/tmp/llama
adb push llama3.1.pte /data/local/tmp/llama
adb push `python3 torchchat.py where llama3.1`/tokenizer.model /data/local/tmp/llama

# 使用Android Studio打开项目
open torchchat/edge/android/torchchat

7.2 iOS 部署

# 打开Xcode项目
open et-build/src/executorch/examples/demo-apps/apple_ios/LLaMA/LLaMA.xcodeproj

# 将模型文件拖放到模拟器或设备的iLLaMA文件夹

8. 模型评估与测试

使用 lm_evaluation_harness 进行模型评估：

# 基础评估
python3 torchchat.py eval llama3.1 --dtype fp32 --limit 5

# 量化模型评估
python3 torchchat.py eval llama3.1 --pte-path llama3.1.pte --limit 5

9. 故障排除与优化

常见问题解决

问题	解决方案
模型访问权限	通过Hugging Face申请模型访问
ExecuTorch安装失败	卸载其他PyTorch版本：`brew uninstall pytorch`
证书验证失败	`pip install --upgrade certifi`

性能优化建议

CUDA加速：导出时添加 --quantize torchchat/quant_config/cuda.json
内存优化：使用适当的量化配置减少内存占用
线程优化：设置 OMP_NUM_THREADS 环境变量

10. 项目架构与设计原则

mermaid

设计原则

原生PyTorch：核心功能基于PyTorch实现
简单可扩展：易于理解和扩展的模块化设计
正确性优先：经过充分测试的高质量组件

总结与展望

PyTorch TorchChat 为开发者提供了完整的本地LLM解决方案，从模型下载到移动端部署，覆盖了全链路需求。通过本教程，你已经掌握了：

✅ 环境搭建和模型管理
✅ 多种交互方式的使用
✅ 性能优化和量化技术
✅ 移动端部署实战
✅ 故障排除和性能调优

未来TorchChat将继续支持更多模型和执行模式，包括：

🚧 torch.compile JIT编译优化
🚧 更多多模态模型支持
🚧 增强的量化方案

开始你的本地LLM之旅吧！如果在使用过程中遇到任何问题，欢迎查阅项目文档或加入社区讨论。

支持我们：如果本文对你有帮助，请点赞、收藏、关注，我们下期将深入探讨TorchChat的高级特性和定制化开发！

【免费下载链接】torchchat Run PyTorch LLMs locally on servers, desktop and mobile 项目地址: https://gitcode.com/GitHub_Trending/to/torchchat

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考