1. 开通阿里云PAI
https://pai.console.aliyun.com/#/quick-start/models
问题:
BladeLLM与vLLM区别是什么?
选择标准部署。
服务脚本:
{
"cloud": {
"computing": {
"instances": [
{
"type": "ecs.gn7i-c16g1.4xlarge"
}
]
}
},
"metadata": {
"instance": 1,
"rpc": {
"keepalive": 9000000,
"worker_threads": 1
},
"enable_webservice": true,
"name": "quickstart_20250205_sfmn"
},
"containers": [
{
"image": "eas-registry-vpc.cn-hangzhou.cr.aliyuncs.com/pai-eas/chat-llm-webui:3.0.6",
"port": 8000,
"script": "python webui/webui_server.py --model-path=/model_dir/ --model-type=qwen2"
}
],
"storage": [
{
"mount_path": "/model_dir/",
"properties": {
"resource_type": "model",
"resource_use": "base"
},
"oss": {
"path": "oss://pai-quickstart-cn-hangzhou/modelscope/models/DeepSeek-R1-Distill-Qwen-7B/",
"endpoint": "oss-cn-hangzhou-internal.aliyuncs.com"
}
}
],
"SupportedInstanceTypes": [
"ecs.gn7i-c16g1.4xlarge",
"ecs.gn7i-c32g1.16xlarge",
"ecs.gn7i-c32g1.32xlarge",
"ecs.gn7i-c32g1.8xlarge",
"ecs.gn7i-c8g1.2xlarge",
"ecs.gn7i-c8g1.2xlarge.limit",
"ecs.gn8is-2x.8xlarge",
"ecs.gn8is-4x.16xlarge",
"ecs.gn8is-8x.32xlarge",
"ecs.gn8is.2xlarge",
"ecs.gn8is.4xlarge",
"ecs.gn8v.6xlarge",
"ecs.gn8v-2x.12xlarge",
"ecs.gn8v-4x.24xlarge",
"ecs.gn8v-8x.48xlarge",
"ml.gu7i.c128m752.4-gu30",
"ml.gu7i.c16m60.1-gu30",
"ml.gu7i.c32m188.1-gu30",
"ml.gu7i.c64m376.2-gu30",
"ml.gu7i.c8m30.1-gu30",
"ml.gu8is.c128m1024.8-gu60",
"ml.gu8is.c16m128.1-gu60",
"ml.gu8is.c32m256.2-gu60",
"ml.gu8is.c64m512.4-gu60",
"ml.gu8v.c192m1024.8-gu120",
"ml.gu8v.c24m128.1-gu120",
"ml.gu8v.c48m256.2-gu120",
"ml.gu8v.c96m512.4-gu120"
]
}
监控:
Stream output 流式输出会陆续显示新的内容,而不是让enduser等十几秒再一起显示。