WhisperX-FastAPI 开源项目最佳实践-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00566/article/details/148200685

WhisperX-FastAPI 开源项目最佳实践

whisperX-FastAPI FastAPI service on top of WhisperX 项目地址: https://gitcode.com/gh_mirrors/wh/whisperX-FastAPI

1. 项目介绍

WhisperX-FastAPI 是一个基于 WhisperX 的 REST API 服务，利用 FastAPI 框架构建。它提供了一系列处理音频和视频文件的服务，包括转录、对齐、语音识别和时间轴匹配等。该项目支持多种语言和 Whisper 模型，用户可以通过 API 接口上传音频或视频文件，获得文本转录和其他音频分析结果。

2. 项目快速启动

环境准备

确保你的开发环境已经安装了 Python 和 pip。项目需要使用 NVIDIA GPU 和 CUDA 12.8+ 支持。

克隆项目

git clone https://github.com/pavelzbornik/whisperX-FastAPI.git
cd whisperX-FastAPI

创建虚拟环境并安装依赖

python -m venv venv
source venv/bin/activate  # 在 Windows 下使用 `venv\Scripts\activate`
pip install -r requirements/dev.txt

配置环境变量

在项目根目录下创建 .env 文件，并设置以下环境变量：

HF_TOKEN=<<YOUR HUGGINGFACE TOKEN>>
WHISPER_MODEL=<<WHISPER MODEL SIZE>>
LOG_LEVEL=<<LOG LEVEL>>

运行 FastAPI 应用

uvicorn app.main:app --reload --log-config uvicorn_log_conf.yaml --log-level $LOG_LEVEL

项目启动后，API 将可通过 http://127.0.0.1:8000 访问。

3. 应用案例和最佳实践

转录音频文件

使用 /speech-to-text 接口可以上传音频文件并获取转录文本。

import requests

url = "http://127.0.0.1:8000/speech-to-text"
files = {'file': open('path_to_your_audio_file', 'rb')}
response = requests.post(url, files=files)
print(response.json())