调用本地魔塔Chatglm3的 api，FastAPI

RuiyChen

已于 2024-01-16 08:24:47 修改

阅读量1.7k

点赞数 10

CC 4.0 BY-SA版权

分类专栏： LLMs API 文章标签： python 深度学习 fastapi 语言模型

于 2023-12-11 18:03:32 首次发布

本文链接：https://blog.youkuaiyun.com/Aorg1/article/details/134932856

LLMs 同时被 2 个专栏收录

4 篇文章

订阅专栏

API

1 篇文章

订阅专栏

本文讲述了在使用FastAPI构建API时遇到的CUDA初始化错误和DataParallel问题，以及解决方法，适合IT开发者和技术社区阅读。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1.创建.py文件，jupyter会报错

看自己缺什么包，补什么包

先确保自己模型加载，在当前环境下不报错。

from fastapi import FastAPI
from pydantic import BaseModel
import uvicorn
import torch
import os
from modelscope import AutoTokenizer, AutoModel, snapshot_download
os.environ['CUDA_VISIBLE_DEVICES'] = "1,0"

app = FastAPI()

class Query(BaseModel):
    text: str

model_dir = snapshot_download("ZhipuAI/chatglm3-6b-32k", revision = "v1.0.0")
tokenizer = AutoTokenizer.from_pretrained("ZhipuAI/chatglm3-6b-32k", trust_remote_code=True)
model = AutoModel.from_pretrained(model_dir, trust_remote_code=True).half().cuda()
model = torch.nn.DataParallel(model,device_ids=[0,1])
if isinstance(model,torch.nn.DataParallel):
		model = model.module
                
@app.post("/chat/")
async def chat(query: Query):
    input_ids = tokenizer([query.text]).input_ids
    output_ids = model.generate(
        torch.as_tensor(input_ids).cuda(),
        do_sample=False,
        temperature=0.1,
        repetition_penalty=1,
        max_new_tokens=1024)
    output_ids = output_ids[0][len(input_ids[0]):]
    outputs = tokenizer.decode(output_ids, skip_special_tokens=True, spaces_between_special_tokens=False)
    return {"result": outputs}


if __name__ == "__main__":
   uvicorn.run(app, host="0.0.0.0", port=8060)

运行上门的文件

2.测试api 新建一个py文件

import requests
url = "http://0.0.0.0:8060/chat"
query={'text':"hi"}
response = requests.post(url, json=query)
if response.status_code == 200:
    res = response.json()
    print("Chatglm3:",res["result"])
else:
    print("error")

运行，正常返回，表示api调用成功！

查看防火墙，确保端口 8060是允许的，或者换成自己的允许的端口

查询端口状态:ufw status

坑：

当时设置环境os.environ['CUDA_VISIBLE_DEVICES'] = "1,0" 报错

torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization

bash重置指定显卡解决问题：

unset CUDA_VISIBLE_DEVICES

并行报错：

AttributeError: ‘DataParallel’ object has no attribute ‘xxxx’

在并行后加入以下代码：

if isinstance(model,torch.nn.DataParallel):
		model = model.module

参考文章：

【FastAPI】利用FastAPI构建大模型接口服务-优快云博客

torch.cuda.DeferredCudaCallError: CUDA call failed lazily at initialization 报错-优快云博客

Pytorch —— AttributeError: ‘DataParallel’ object has no attribute ‘xxxx’_attributeerror: 'dataparallel' object has no attri-优快云博客