NVIDIA Triton Inference Server搭建

最新推荐文章于 2025-04-11 09:23:23 发布

腾梦

最新推荐文章于 2025-04-11 09:23:23 发布

阅读量463

点赞数

文章标签： python git pytorch

本文链接：https://blog.youkuaiyun.com/weixin_43815091/article/details/130497400

版权

https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/index.html
https://github.com/triton-inference-server/server
API:https://docs.nvidia.com/deeplearning/triton-inference-server/user-guide/docs/protocol/README.html
执行模型包装： https://github.com/triton-inference-server/backend
- pytorch https://github.com/triton-inference-server/pytorch_backend
容器：
https://catalog.ngc.nvidia.com/orgs/nvidia/containers/pytorch

安装

按照安装文档要求安装CPU或GPU平台

GPU命令：
docker run --gpus=1 --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

CPU命令：
docker run --rm -p8000:8000 -p8001:8001 -p8002:8002 -v/full/path/to/docs/examples/model_repository:/models nvcr.io/nvidia/tritonserver:<xx.yy>-py3 tritonserver --model-repository=/models

QuickStart

cpu-only

# Step 1: Create the example model repository 
git clone -b r22.12 https://ghproxy.com/https://github.com/triton-inference-server/server.git  #添加代理方便下载
cd server/docs/examples
./fetch_models.sh

# Step 2: Launch triton from the NGC Triton container
docker run -name triton -d -p8000:8000 -p8001:8001 -p8002:8002 -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:22.12-py3 tritonserver --model-repository=/models

# Step 3: Sending an Inference Request 
# In a separate console, launch the image_client example from the NGC Triton SDK container
docker run -it --name triton-client --rm --net=host nvcr.io/nvidia/tritonserver:22.12-py3-sdk

# Step 4: Inference
/workspace/install/bin/image_client -m densenet_onnx -c 3 -s INCEPTION /workspace/images/mug.jpg

# Inference should return the following
Image '/workspace/images/mug.jpg':
    15.346230 (504) = COFFEE MUG
    13.224326 (968) = CUP
    10.422965 (505) = COFFEEPOT

Step2:
![[Pasted image 20230203171629.png]]
Step 3:
![[Pasted image 20230203171643.png]]
![[Pasted image 20230203171702.png]]
Inference:
![[Pasted image 20230203171807.png]]

说明

![[Pasted image 20230209202207.png]]
- k8s异构集群上的单节点的推理服务
- 支持多个框架的模型
![[Pasted image 20230209202255.png]]
- 支持并行推理
- 单一模型多个线程的推理
  - ![[Pasted image 20230209203611.png]]
- 多模型多线程推理
  - ![[Pasted image 20230209203624.png]]
![[Pasted image 20230209215358.png]]
![[Pasted image 20230210092031.png]]
torchserve
rayserve

https://zhuanlan.zhihu.com/p/598468847
https://github.com/triton-inference-server/client/tree/main/src/python/examples
kserver :https://kserve.github.io/website/modelserving/inference_api/
端口分布
- grpc 8001
- http 8000
- metrics 8002
  ![[Pasted image 20230221151448.png]]

![[Pasted image 20230221152230.png]]
https://pypi.org/project/tritonclient/