10分钟极速部署！FunASR语音识别系统从0到1实战指南-优快云博客

10分钟极速部署！FunASR语音识别系统从0到1实战指南

【免费下载链接】FunASR A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. 项目地址: https://gitcode.com/gh_mirrors/fu/FunASR

你是否还在为复杂的语音识别系统部署而烦恼？是否想快速拥有一个高精度的语音转文字工具？本文将带你通过3个步骤，在10分钟内基于FunASR搭建属于自己的语音识别系统，无需深厚的技术背景，只需简单复制粘贴命令即可完成。

读完本文你将获得：

掌握Docker快速部署语音识别服务的方法
学会使用Python SDK进行语音文件识别
了解如何选择适合自己场景的预训练模型
能够处理实时语音流和离线音频文件

FunASR简介

FunASR（Fundamental End-to-End Speech Recognition Toolkit）是一个开源的端到端语音识别工具包，提供了一系列预训练模型和便捷的部署工具。它支持多种语音处理任务，包括语音识别（ASR）、语音端点检测（VAD）、标点恢复和说话人识别等功能。

官方文档：docs/tutorial/Tables_zh.md 项目仓库：README_zh.md

步骤一：环境准备

安装Docker（如已安装可跳过）

Docker是一个开源的容器化平台，可以让你轻松打包、分发和运行应用程序。我们将使用Docker来快速部署FunASR服务。

curl -O https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/shell/install_docker.sh;
sudo bash install_docker.sh

克隆代码仓库

git clone https://gitcode.com/gh_mirrors/fu/FunASR
cd FunASR

步骤二：选择合适的模型

FunASR提供了多种预训练模型，适用于不同场景需求。以下是一些常用的语音识别模型：

模型名称	语言	参数量	特点
speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx	中文和英文	220M	高精度非实时识别
speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx	中文和英文	220M	高精度实时流式识别
speech_UniASR_asr_2pass-zh-cn-16k-common-vocab8358-tensorflow1-online	中文和英文	100M	流式离线一体化模型
speech_paraformer-tiny-commandword_asr_nat-zh-cn-16k-vocab544-pytorch	中文	5.2M	轻量级命令词识别

对于初次使用，推荐选择speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx模型，它在通用场景下表现优异，识别准确率高。

步骤三：部署语音识别服务

离线文件转写服务部署

离线文件转写适用于处理已录制好的音频文件，如会议录音、语音留言等。

拉取并启动Docker镜像

sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.7
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10095:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.7

启动服务端

在Docker容器内执行：

cd FunASR/runtime
nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

实时语音听写服务部署

实时语音听写适用于需要低延迟的场景，如实时会议转录、语音输入法等。

拉取并启动Docker镜像

sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.13
mkdir -p ./funasr-runtime-resources/models
sudo docker run -p 10096:10095 -it --privileged=true \
  -v $PWD/funasr-runtime-resources/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.13

启动服务端

在Docker容器内执行：

cd FunASR/runtime
nohup bash run_server_2pass.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

步骤四：测试语音识别服务

客户端测试

我们提供了Python客户端脚本，方便你测试已部署的服务：

下载测试样本

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz
tar zxvf funasr_samples.tar.gz

测试离线文件转写

python3 funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "../audio/asr_example.wav"

测试实时语音听写

python3 funasr_wss_client.py --host "127.0.0.1" --port 10096 --mode 2pass

Python API调用示例

除了通过WebSocket服务调用，你还可以直接使用Python API进行语音识别：

from funasr import AutoModel

model = AutoModel(
    model="iic/SenseVoiceSmall",
    vad_model="fsmn-vad",
    vad_kwargs={"max_single_segment_time": 30000},
    device="cpu",
)

res = model.generate(
    input="asr_example.wav",
    language="auto",  # "zh", "en", "yue", "ja", "ko", "nospeech"
    use_itn=True,
    batch_size_s=60,
)
text = res[0]["text"]
print(text)

API文档：funasr/auto/init.py

常见问题解决

服务启动失败

检查是否已正确安装Docker和Docker Compose，以及是否有权限访问Docker服务。

模型下载缓慢

可以手动下载模型文件，然后通过--model-dir参数指定本地模型路径。

识别准确率不高

尝试使用更大的模型，或根据具体场景调整热词文件hotwords.txt。

支持的音频格式

默认支持WAV格式，采样率16kHz，单声道。其他格式需要先进行转换。

总结与后续学习

通过本文，你已经学会了如何快速部署FunASR语音识别系统，包括离线文件转写和实时语音听写两种服务模式。接下来，你可以：

探索更多模型：model_zoo/modelscope_models_zh.md
学习高级功能：docs/SDK_advanced_guide_offline_zh.md
尝试模型微调：examples/aishell/paraformer
了解多语言支持：model_zoo/modelscope_models_zh.md

希望这个教程能帮助你快速上手语音识别技术，为你的项目添加强大的语音交互能力！

【免费下载链接】FunASR A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models. 项目地址: https://gitcode.com/gh_mirrors/fu/FunASR

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考