TurboTransformers 使用教程-优快云博客

TurboTransformers 使用教程

【免费下载链接】TurboTransformers a fast and user-friendly runtime for transformer inference (Bert, Albert, GPT2, Decoders, etc) on CPU and GPU. 项目地址: https://gitcode.com/gh_mirrors/tu/TurboTransformers

1、项目介绍

TurboTransformers 是由腾讯开源的一个快速且用户友好的 Transformer 推理运行时，支持在 CPU 和 GPU 上进行高效的 Transformer 模型推理。该项目的主要特点包括：

支持多种 Transformer 模型：如 BERT、Albert、GPT2、Decoder 等。
变长输入支持：无需预处理，支持实时更改批量大小和序列长度。
高性能：在 CPU 和 GPU 上均表现出色。
易用性：提供 Python 和 C++ API，支持智能批处理，减少零填充开销。
作为 PyTorch 插件使用：通过添加几行 Python 代码即可实现端到端加速。

2、项目快速启动

2.1 安装

2.1.1 CPU 版本安装

git clone https://github.com/Tencent/TurboTransformers --recursive
cd TurboTransformers
sh tools/build_docker_cpu.sh
docker run -it --rm --name=turbort -v $PWD:/workspace your_image_name /bin/bash

2.1.2 GPU 版本安装

git clone https://github.com/Tencent/TurboTransformers --recursive
cd TurboTransformers
sh tools/build_docker_gpu.sh
nvidia-docker run --gpus all --net=host --rm -it -v $PWD:/workspace -v /etc/passwd:/etc/passwd --name=your_container_name REPOSITORY:TAG

2.2 使用示例

以下是一个简单的 Python 代码示例，展示如何使用 TurboTransformers 进行 BERT 推理：

import torch
import transformers
import turbo_transformers

if __name__ == "__main__":
    turbo_transformers.set_num_threads(4)
    torch.set_num_threads(4)
    
    model_id = "bert-base-uncased"
    model = transformers.BertModel.from_pretrained(model_id)
    model.eval()
    
    cfg = model.config
    input_ids = torch.tensor(([12166, 10699, 16752, 4454], [5342, 16471, 817, 16022]), dtype=torch.long)
    position_ids = torch.tensor(([1, 0, 0, 0], [1, 1, 1, 0]), dtype=torch.long)
    segment_ids = torch.tensor(([1, 1, 1, 0], [1, 0, 0, 0]), dtype=torch.long)
    
    torch.set_grad_enabled(False)
    torch_res = model(input_ids, position_ids=position_ids, token_type_ids=segment_ids)
    torch_seqence_output = torch_res[0][:, 0, :]
    
    tt_model = turbo_transformers.BertModel.from_torch(model)
    res = tt_model(input_ids, position_ids=position_ids, token_type_ids=segment_ids)
    tt_seqence_output = res[0]

3、应用案例和最佳实践

3.1 微信 FAQ 服务

TurboTransformers 在微信 FAQ 服务中应用，带来了 1.88 倍的加速效果。

3.2 公共云情感分析服务

在公共云情感分析服务中，TurboTransformers 实现了 2.11 倍的加速。

3.3 QQ 推荐系统

在 QQ 推荐系统中，TurboTransformers 带来了 13.6 倍的加速效果。

4、典型生态项目

4.1 Hugging Face Transformers

TurboTransformers 可以与 Hugging Face 的 Transformers 库无缝集成，提供高效的 Transformer 模型推理。

4.2 PyTorch

作为 PyTorch 的插件，TurboTransformers 可以在不改变现有代码结构的情况下，显著提升推理性能。

4.3 ONNX Runtime

TurboTransformers 与 ONNX Runtime 结合，可以在 CPU 和 GPU 上实现更高效的模型推理。

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考