【语音识别】搭建本地的语音转文字系统：FunASR（离线不联网即可使用）

张学徒

已于 2024-11-30 14:18:20 修改

阅读量1.3w

点赞数 10

分类专栏： AI 文章标签：语音识别人工智能

于 2024-04-24 21:34:46 首次发布

本文链接：https://blog.youkuaiyun.com/qq_37280924/article/details/138170217

版权

AI 专栏收录该内容

1 篇文章

订阅专栏

本文详细介绍了如何在阿里达摩院的FunASR平台上部署服务端，包括安装Docker、拉取镜像、配置模型和运行服务，以及客户端的下载、安装和使用方法。重点涉及Python脚本和不同环境下的FFmpeg集成。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

参考自：

参考配置：FunASR/runtime/docs/SDK_advanced_guide_offline_zh.md at main · alibaba-damo-academy/FunASR (github.com)
参考配置：FunASR/runtime/quick_start_zh.md at 861147c7308b91068ffa02724fdf74ee623a909e · alibaba-damo-academy/FunASR (github.com)
参考运行命令：FunASR/runtime/python/websocket/README.md at 861147c7308b91068ffa02724fdf74ee623a909e · alibaba-damo-academy/FunASR (github.com)
便捷部署教程：FunASR/blob/main/runtime/docs/SDK_tutorial_en_zh.md

阿里达摩院

服务端

安装 Docker

（过程省略）

下面步骤如果是在 Linux 需要以管理员方式执行命令，开头添加 sudo

docker 拉取镜像

docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.4

创建模型目录

mkdir -p ./funasr-runtime-resources/models

运行 docker 镜像

docker run -p 10095:10095 -it --privileged=true -v $PWD/funasr-runtime-resources/models:/workspace/models registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-cpu-0.4.4

启动服务

cd FunASR/runtime

nohup bash run_server.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --punc-dir damo/punc_ct-transformer_cn-en-common-vocab471067-large-onnx \
  --lm-dir damo/speech_ngram_lm_zh-cn-ai-wesp-fst \
  --itn-dir thuduj12/fst_itn_zh \
  --hotword /workspace/models/hotwords.txt > log.txt 2>&1 &

# 如果您想关闭ssl，增加参数：--certfile 0
# 如果您想使用时间戳或者nn热词模型进行部署，请设置--model-dir为对应模型：
#   damo/speech_paraformer-large-vad-punc_asr_nat-zh-cn-16k-common-vocab8404-onnx（时间戳）
#   damo/speech_paraformer-large-contextual_asr_nat-zh-cn-16k-common-vocab8404-onnx（nn热词）
# 如果您想在服务端加载热词，请在宿主机文件./funasr-runtime-resources/models/hotwords.txt配置热词（docker映射地址为/workspace/models/hotwords.txt）:
#   每行一个热词，格式(热词 权重)：阿里巴巴 20（注：热词理论上无限制，但为了兼顾性能和效果，建议热词长度不超过10，个数不超过1k，权重1~100）

客户端

下载客户端测试工具

wget https://isv-data.oss-cn-hangzhou.aliyuncs.com/ics/MaaS/ASR/sample/funasr_samples.tar.gz

解压上面链接下载的文件。比如我解压到目录 C:\Users\z\Documents\FunASR

解压所在目录下的 funasr_samples\samples 目录为不同类型的语言相关的使用文件

但是从使用上来讲，我直接 git clone https://github.com/alibaba/FunASR.git 下载源码到里面的 runtime 目录下，使用体验会更好

安装 FFMpeg

apt-get install -y ffmpeg  # ubuntu
# yum install -y ffmpeg    # centos
# brew install ffmpeg      # mac
# winget install ffmpeg    # wins

HTML

解压进入目录：C:\Users\z\Documents\FunASR\funasr_samples\samples\html\static

打开 index.html 使用网页的形式进行操作

文档：https://github.com/modelscope/FunASR/blob/861147c7308b91068ffa02724fdf74ee623a909e/runtime/html5/readme_zh.md

Python

下载 python

https://www.python.org/ftp/python/3.11.8/python-3.11.8-amd64.exe

pip 安装依赖库

pip install -U modelscope funasr -i https://mirror.sjtu.edu.cn/pypi/web/simple
pip install -U websockets pyaudio ffmpeg-python -i https://mirror.sjtu.edu.cn/pypi/web/simple

运行客户端

# 这个目录取决于上面你解压的文件所在的目录
cd C:\Users\z\Documents\FunASR\runtime\python\websocket

# 识别本地文件
python funasr_wss_client.py --host "127.0.0.1" --port 10095 --mode offline --audio_in "C:\Users\z\Videos\02d0b6703d9b5d6bc05a46548a938826_new.mp3"