【Paddle笔记】搭建PaddleSpeech API语音服务器

本文详细介绍了如何搭建PaddleSpeech API语音服务器,包括环境安装(Conda、PyTorch、Tensorflow和Paddle框架)、PaddleSpeech的安装与配置、服务器启动与自定义CLI、客户端调用方法,以及解决运行中遇到的问题。通过这个指南,读者将能够成功运行并使用PaddleSpeech语音服务。

1、环境安装

1.1 运行环境

1.1.1 Conda虚拟环境

创建ppai
conda create -n ppai python=3.7
激活环境
conda activate ppai

1.1.2 PyTorch

Python3.7推荐版本:
pytorch1.13.1 + torchvision0.14.1 + torchaudio0.13.1 + cuda11.7
conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia

安装遇到疑难参考 这里

1.1.3 Tensorflow

Python3.7推荐版本:
tensorflow-gpu 1.15
pip install tensorflow-gpu==1.15 -i https://pypi.tuna.tsinghua.edu.cn/simple

安装遇到疑难参考 这里

1.2 Paddle核心框架

1.2.1 安装Paddle框架

  • CPU 版本
    pip install paddlepaddle
  • GPU 版本
    飞桨官网点这里
    在这里插入图片描述
  • 安装与你GPU匹配的版本
    python -m pip install paddlepaddle-gpu==2.4.2.post117 -f https://www.paddlepaddle.org.cn/whl/linux/mkl/avx/stable.html

1.2.2 验证框架是否安装成功

python -c "import paddle; print(paddle.__version__)"

(ppai) xf@VP01:~/ai/tts$ python -c "import paddle; print(paddle.__version__)"
2.4.2

python -c "import paddle;paddle.utils.run_check()"

(ppai) xf@VP01:~/ai/tts$ python -c "import paddle;paddle.utils.run_check()"
Running verify PaddlePaddle program ...
W0502 07:43:23.693028  1278 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0502 07:43:23.701891  1278 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
PaddlePaddle works well on 1 GPU.
PaddlePaddle works well on 1 GPUs.
PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

1.3 PaddleSpeech 语音服务

1.3.1 安装paddlespeech

pip install paddlespeech

1.3.2 Git下载源码包

git clone https://github.com/PaddlePaddle/PaddleSpeech

(ppai) xf@VP01:~/ai/tts$ git clone https://github.com/PaddlePaddle/PaddleSpeech
Cloning into 'PaddleSpeech'...
remote: Enumerating objects: 44486, done.
remote: Counting objects: 100% (968/968), done.
remote: Compressing objects: 100% (616/616), done.
remote: Total 44486 (delta 379), reused 789 (delta 295), pack-reused 43518
Receiving objects: 100% (44486/44486), 69.56 MiB | 16.05 MiB/s, done.
Resolving deltas: 100% (28721/28721), done.
Updating files: 100% (3558/3558), done.
(ppai) xf@VP01:~/ai/tts$ cd paddlespeech
(ppai) xf@VP01:~/ai/tts/paddlespeech$ ls
LICENSE      README.md     audio    demos   docs      paddlespeech  setup.cfg  tests        tools
MANIFEST.in  README_cn.md  dataset  docker  examples  runtime       setup.py   third_party  utils

2、语音服务器

2.1 启用CLI 语音服务

2.1.1 application.yaml服务器配置

服务器配置application.yaml,在PaddleSpeech/paddlespeech/server/conf目录下:

engine_list: ['asr_python', 'tts_python', 'cls_python', 'text_python', 'vector_python']

engine_list 表示开启的服务类型

  • asr_python(语音识别)
  • tts_python(语音合成)
  • cls_python(声音分类)
  • text_python(标点恢复)
  • vector_python (声纹向量提取)

2.1.2 运行server命令

paddlespeech_server start --config_file /home/xf/ai/tts/paddlespeech/paddlespeech/server/conf/application.yaml

(ppai) xf@VP01:~/ai/tts/paddlespeech$ paddlespeech_server start --config_file paddlespeech/server/conf/application.yaml
[2023-05-02 07:16:34,644] [    INFO] - start to init the engine
[2023-05-02 07:16:34,644] [    INFO] - asr : python engine.
W0502 07:16:37.497296  1187 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0502 07:16:37.502528  1187 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-02 07:16:38.195 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-02 07:16:39,064] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-02 07:16:39,064] [    INFO] - tts : python engine.
...
[2023-05-02 07:16:55] [INFO] [on.py:61] Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2023-05-02 07:16:55] [INFO] [server.py:212] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

服务成功运行在http://127.0.0.1:8090 端口。

2.2 自定义CLI快捷启动Shell

2.2.1 自定义Shell目标

在home目录创建shell,不用进入层层目录,快捷启动PaddleSpeech语音服务器:

  • 自动启用conda环境
  • 自动cd到paddlespeech目录
  • 自动输入cli命令及config_file

cd ~
vim mypss
自行修改当中的文件路径和conda环境名

#!/bin/bash
source /home/xf/anaconda3/bin/activate ppai
cd /home/xf/ai/tts/PaddleSpeech
paddlespeech_server start --config_file paddlespeech/server/conf/application.yaml

2.2.2 快捷启动

在mypss目录输入
. mypss

3、客户端语音使用的几种方法

3.1 命令行cli调用

(ppai) xf@VP01:~/ai/tts/paddlespeech$ paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "您好,这是命令行cli生成语音。" --output ./output/output_tts_cli.wav
[2023-05-02 08:53:42,265] [    INFO] - Save synthesized audio successfully on ./output/output_tts_cli.wav.
[2023-05-02 08:53:42,266] [    INFO] - Audio duration: 2.575000 s.
[2023-05-02 08:53:42,266] [    INFO] - Response time: 0.491946 s.

3.2 客户端直接调用

创建xf_tts_class.py

# 客户端直接调用
from paddlespeech.cli.tts.infer import TTSExecutor

tts = TTSExecutor()
res = tts(
    text="这是class生成语音。", 
    output="./output/output_tts_class.wav"
    )
print(res)
(ppai) xf@VP01:~/ai/tts/paddlespeech$ python xf_tts_class.py
/home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages/librosa/core/constantq.py:1059: DeprecationWarning: `np.complex` is a deprecated alias for the builtin `complex`. To silence this warning, use `complex` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.complex128` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  dtype=np.complex,
/home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2023-05-02 08:58:16,097] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-02 08:58:16,114] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-02 08:58:16,114] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
W0502 08:58:16.274096  1806 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0502 08:58:16.279990  1806 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
Building prefix dict from the default dictionary ...
[2023-05-02 08:58:23] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-05-02 08:58:23] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.282 seconds.
[2023-05-02 08:58:23] [DEBUG] [__init__.py:165] Loading model cost 0.282 seconds.
Prefix dict has been built successfully.
[2023-05-02 08:58:23] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
/mnt/d/develop/ubuntu/tts/paddlespeech/output/output_tts_class.wav

3.3 客户端API请求

创建xf_tts_client.py

# 客户端API请求
from paddlespeech.server.bin.paddlespeech_client import TTSClientExecutor
import json

ttsclient_executor = TTSClientExecutor()
res = ttsclient_executor(
    input="这是API客户端生成的语音。 ",
    server_ip="127.0.0.1",
    port=8090,
    spk_id=0,
    speed=1.0,
    volume=1.0,
    sample_rate=0,
    output="./output/output_tts_client.wav")

response_dict = res.json()
print(response_dict["message"])
print("Save synthesized audio successfully on %s." % (response_dict['result']['save_path']))
print("Audio duration: %f s." %(response_dict['result']['duration']))
(ppai) xf@VP01:~/ai/tts/paddlespeech$ python xf_tts_client.py
{'description': 'success.'}
Save synthesized audio successfully on ./output/output_tts_client.wav.
Audio duration: 2.125000 s.

3.4 客户端Requests构建API请求

3.4.1 示例

创建xf_tts_api.py

# Requests构建 API 请求

import requests
import base64
import json
import soundfile
import io

# 定义paddlespeech_request函数
def paddlespeech_request(url, data):
    res = requests.post(
        url=url,
        data=json.dumps(data)
    )

    if res.status_code == 200:
        res = res.json()
    else:
        print("请求失败,错误代码:", res.status_code)
        res = None
    return res

# API 请求
tts_url = "http://127.0.0.1:8090/paddlespeech/tts"
## 请求数据包
data = {
    "text": "这个音频是由Requests构建 API 请求生成的!",
    "spk_id": 0,
    "speed": 1.0,
    "volume": 1.0,
    "sample_rate": 0,
    # 这里先不保存,演示如何将base64编码还原成wav音频
    "save_path": None
}
## 发送请求
res = paddlespeech_request(tts_url, data)
print(res['success'])

# 将res中的base64还原成wav
wav_base64 = res['result']['audio']
audio_data_byte = base64.b64decode(wav_base64)

## 重新读取sample
samples, sample_rate = soundfile.read(
            io.BytesIO(audio_data_byte), dtype='float32')
## 保存成audio
outfile = "./output/output_tts_api.wav"
soundfile.write(outfile, samples, sample_rate)
print(outfile)
(ppai) xf@VP01:~/ai/tts/paddlespeech$ python xf_tts_api.py
True
./output/output_tts_api.wav

3.4.2 API响应返回格式

正常情况下都会返回HTTP 200状态码,application/json格式。

字段必选类型说明
successbool表示请求成功与否
codeint错误码。success=true 的时候,code 必然为 0。success=false 的时候,code 为具体的错误码。
messagejson用来描述错误信息
requestIdstring一次请求的唯一标识,作trace用
resultjson具体的返回内容,在每个接口会详细阐述

3.4.3 TTS参数说明

  • url: GET /paddlespeech/tts/help
  • 请求参数说明
字段必选类型说明
textstring待合成文本
spk_idint发音人id,未使用到,默认:0
speedfloat合成音频的语速,值范围:(0,3],默认:1.0,windows 平台不支持变语速
volumefloat合成音频的音量,值范围:(0,3],默认:1.0,值过大可能会存在截幅现象
sample_rateint合成音频的采样率,只支持下采样,值选择 [0, 8000, 16000],默认:0,表示与模型采样率一致
save_pathstring通过此参数,可以在合成完成后在本地保存一个音频文件,默认值:None,表示不保存音频,保存音频格式支持wav和pcm
  • 响应参数说明
字段必选类型说明
langstring待合成文本语言(zh or en)
spk_idin发音人id
speedfloat合成音频的语速,值范围:[0,3]
volumefloat合成音频的音量,值范围:[0,3]
sample_rateint合成音频的采样率
durationfloat合成音频的时长,单位为秒
save_pathstring保存的合成音频路径
audiostring合成音频的base64
  • TTS响应示例
{
    "success": true,
    "code": 0,
    "message": {"global": "success" }
    "result": {
        "lang": "zh",
        "spk_id": 0,
        "speed": 1.0,
        "volume": 1.0,
        "sample_rate": 24000,
        "duration": 3.6125,
        "save_path": "./tts.wav",
        "audio": "LTI1OTIuNjI1OTUwMzQsOTk2OS41NDk4..."
    }
}

3.4.4 ASR参数说明

  • url: POST /paddlespeech/asr
  • 请求参数说明
字段必选类型说明
audiostring将音频文件进行 base64编码后得到的 string
audio_formatstring合成音频文件格式,可选:pcm、wav,默认值:wav
sample_rateint音频的采样率,值选择 [8000, 16000],默认与模型采样率一致
langstring语种 zh_cn:中文; zh_tw: 台湾普通话; en_us:英文
puncbool是否开启标点符号添加 true:开启 false:关闭(默认值)
  • 响应参数说明
字段必选类型说明
transcriptionstringasr识别结果
  • ASR响应示例
{
    "success": true,
    "code": 0,
    "message": {"description": "success" }
    "result": {
		"transcription": "这个文字是由Requests构建 API 请求生成的!"
    }
}

3.4.5 更多RESTful-API

PaddleSpeech Restful-API 点这里

4、问题集

4.1 运行报错 [ ERROR] - run() got an unexpected keyword argument ‘debug’

4.1.1启动PaddleSpeech服务器出错信息

paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml

paddlespeech_server start --config_file ./paddlespeech/server/conf/application.yaml
[2023-05-01 22:00:56,214] [    INFO] - start to init the engine
[2023-05-01 22:00:56,214] [    INFO] - asr : python engine.
W0501 22:00:59.081336 30848 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0501 22:00:59.087821 30848 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-01 22:00:59.776 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-01 22:01:00,560] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-01 22:01:00,560] [    INFO] - tts : python engine.
[2023-05-01 22:01:04,561] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-01 22:01:04,579] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-01 22:01:04,580] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-01 22:01:06,862] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-01 22:01:06,862] [    INFO] - cls : python engine.
[2023-05-01 22:01:11,269] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-01 22:01:11,269] [    INFO] - text : python engine.
[2023-05-01 22:01:13,329] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-01 22:01:13,330] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-01 22:01:13,803] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-01 22:01:13,803] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-01 22:01:13,804] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-01 22:01:13,805] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-01 22:01:13,809] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-01 22:01:13,809] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-01 22:01:13,810] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-01 22:01:13,810] [    INFO] - vector : python engine.
[2023-05-01 22:01:14,187] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
Building prefix dict from the default dictionary ...
[2023-05-01 22:01:14] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-05-01 22:01:14] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.281 seconds.
[2023-05-01 22:01:14] [DEBUG] [__init__.py:165] Loading model cost 0.281 seconds.
Prefix dict has been built successfully.
[2023-05-01 22:01:14] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
[2023-05-01 22:01:17,051] [   ERROR] - Failed to start server.
[2023-05-01 22:01:17,051] [   ERROR] - run() got an unexpected keyword argument 'debug'

4.1.2 试验 start_multi_progress_server.py 排错

查找核对PaddleSpeech/PaddleSpeech/server目录下所有代码,没找到’debug’参数代码。
发现PaddleSpeech/demos/speech_server目录下有个start_multi_progress_server.py,决定先运行一下试试

(ppai) xf@VP01:~/ai/tts/paddlespeech/demos/speech_server$ python start_multi_progress_server.py
[2023-05-01 22:13:13,711] [    INFO] - asr : python engine.
W0501 22:13:16.539659 30898 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0501 22:13:16.544288 30898 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-01 22:13:17.220 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-01 22:13:18,000] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-01 22:13:18,000] [    INFO] - tts : python engine.
[2023-05-01 22:13:22,107] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-01 22:13:22,112] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-01 22:13:22,112] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-01 22:13:24,104] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-01 22:13:24,104] [    INFO] - cls : python engine.
[2023-05-01 22:13:28,482] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-01 22:13:28,482] [    INFO] - text : python engine.
[2023-05-01 22:13:30,527] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-01 22:13:30,527] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-01 22:13:31,136] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-01 22:13:31,136] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-01 22:13:31,137] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-01 22:13:31,138] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-01 22:13:31,161] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-01 22:13:31,161] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-01 22:13:31,162] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-01 22:13:31,162] [    INFO] - vector : python engine.
[2023-05-01 22:13:31,537] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
Traceback (most recent call last):
  File "start_multi_progress_server.py", line 69, in <module>
    workers=args.workers
TypeError: run() got an unexpected keyword argument 'debug'

问题相同,但是start_multi_progress_server.py文件中能找到debug参数

if __name__ == "__main__":
    parser = argparse.ArgumentParser(add_help=True)
    parser.add_argument(
        "--workers", type=int, help="workers of server", default=1)
    args = parser.parse_args()
    uvicorn.run(
        "start_multi_progress_server:app",
        host=config.host,
        port=config.port,
        debug=False,
        workers=args.workers)

注释掉debug这一行后再次运行

(ppai) xf@VP01:~/ai/tts/paddlespeech/demos/speech_server$ python start_multi_progress_server.py
[2023-05-01 22:15:55,525] [    INFO] - asr : python engine.
W0501 22:15:58.387439 30943 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0501 22:15:58.393894 30943 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-01 22:15:59.083 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-01 22:15:59,860] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-01 22:15:59,860] [    INFO] - tts : python engine.
[2023-05-01 22:16:03,997] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-01 22:16:04,002] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-01 22:16:04,002] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-01 22:16:06,045] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-01 22:16:06,045] [    INFO] - cls : python engine.
[2023-05-01 22:16:10,462] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-01 22:16:10,462] [    INFO] - text : python engine.
[2023-05-01 22:16:12,541] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-01 22:16:12,541] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-01 22:16:13,196] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-01 22:16:13,196] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-01 22:16:13,196] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-01 22:16:13,197] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-01 22:16:13,202] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-01 22:16:13,202] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-01 22:16:13,203] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-01 22:16:13,203] [    INFO] - vector : python engine.
[2023-05-01 22:16:13,592] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
[2023-05-01 22:16:13,675] [    INFO] - asr : python engine.
2023-05-01 22:16:16.439 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-01 22:16:17,271] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-01 22:16:17,271] [    INFO] - tts : python engine.
[2023-05-01 22:16:19,472] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-01 22:16:19,477] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-01 22:16:19,477] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-01 22:16:21,625] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-01 22:16:21,625] [    INFO] - cls : python engine.
[2023-05-01 22:16:26,017] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-01 22:16:26,017] [    INFO] - text : python engine.
[2023-05-01 22:16:28,102] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-01 22:16:28,102] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-01 22:16:28,674] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-01 22:16:28,675] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-01 22:16:28,675] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-01 22:16:28,676] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-01 22:16:28,680] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-01 22:16:28,680] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-01 22:16:28,681] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-01 22:16:28,681] [    INFO] - vector : python engine.
[2023-05-01 22:16:29,063] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
INFO:     Started server process [30943]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)

start_multi_progress_server 服务运行成功
另开个客户端shell测试一下数据请求

(ppai) xf@VP01:~/ai/tts/PaddleSpeech$ paddlespeech_client tts --server_ip 127.0.0.1 --port 8090 --input "这是语音合成服务测试,听到声音了吗?" --output o
utput.wav
[2023-05-01 22:18:48,051] [    INFO] - Save synthesized audio successfully on output.wav.
[2023-05-01 22:18:48,051] [    INFO] - Audio duration: 3.587500 s.
[2023-05-01 22:18:48,051] [    INFO] - Response time: 3.246741 s.

服务端信息同步变化

INFO:     Started server process [30943]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:8090 (Press CTRL+C to quit)
[2023-05-01 22:18:44,816] [    INFO] - request: text='这是语音合成服务测试,听到声音了吗?' spk_id=0 speed=1.0 volume=1.0 sample_rate=0 save_path='output.wav'
[2023-05-01 22:18:44,816] [    INFO] - Get tts engine successfully.
Building prefix dict from the default dictionary ...
[2023-05-01 22:18:44] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-05-01 22:18:44] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.297 seconds.
[2023-05-01 22:18:45] [DEBUG] [__init__.py:165] Loading model cost 0.297 seconds.
Prefix dict has been built successfully.
[2023-05-01 22:18:45] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
[2023-05-01 22:18:47,983] [    INFO] - Save audio to output.wav successfully.
[2023-05-01 22:18:47,983] [    INFO] - tts engine type: python
[2023-05-01 22:18:47,983] [    INFO] - audio duration: 3.5875
[2023-05-01 22:18:47,983] [    INFO] - total inference time: 3.1081044673919678
[2023-05-01 22:18:47,983] [    INFO] - postprocess (change speed, volume, target sample rate) time: 0.058774709701538086
[2023-05-01 22:18:47,983] [    INFO] - total generate audio time: 3.166879177093506
[2023-05-01 22:18:47,983] [    INFO] - RTF: 0.8663705832451478
INFO:     127.0.0.1:42976 - "POST /paddlespeech/tts HTTP/1.1" 200 OK

一切正常,OK

4.1.3 尝试研究服务器主程序代码踩坑

我认为PaddleSpeech\PaddleSpeech\server\bin目录下paddlespeech_server是服务器主程序。

paddlespeech_server.py

    @stats_wrapper
    def __call__(self,
                 config_file: str="./conf/application.yaml",
                 log_file: str="./log/paddlespeech.log"):
        """
        Python API to call an executor.
        """
        config = get_config(config_file)
        if self.init(config):
            uvicorn.run(app, host=config.host, port=config.port)

run() 当中没有debug=False这个参数代码,那就无法通过注释debug代码的方式解决。
问题会不会出在uvicorn命令这里,

Uvicorn 是一个闪电般快速的ASGI服务器

(ppai) xf@VP01:~/ai/tts/paddlespeech$ uvicorn --version
Running uvicorn 0.22.0 with CPython 3.7.16 on Linux
(ppai) xf@VP01:~/ai/tts/paddlespeech$ uvicorn --help
Usage: uvicorn [OPTIONS] APP

Options:
  --host TEXT                     Bind socket to this host.  [default:
                                  127.0.0.1]
  --port INTEGER                  Bind socket to this port.  [default: 8000]
  --uds TEXT                      Bind to a UNIX domain socket.
  --fd INTEGER                    Bind to socket from this file descriptor.
  --reload                        Enable auto-reload.
  --reload-dir PATH               Set reload directories explicitly, instead
                                  of using the current working directory.
  --reload-include TEXT           Set glob patterns to include while watching
                                  for files. Includes '*.py' by default; these
                                  defaults can be overridden with `--reload-
                                  exclude`. This option has no effect unless
                                  watchfiles is installed.
  --reload-exclude TEXT           Set glob patterns to exclude while watching
                                  for files. Includes '.*, .py[cod], .sw.*,
                                  ~*' by default; these defaults can be
                                  overridden with `--reload-include`. This
                                  option has no effect unless watchfiles is
                                  installed.
  --reload-delay FLOAT            Delay between previous and next check if
                                  application needs to be. Defaults to 0.25s.
                                  [default: 0.25]
  --workers INTEGER               Number of worker processes. Defaults to the
                                  $WEB_CONCURRENCY environment variable if
                                  available, or 1. Not valid with --reload.
  --loop [auto|asyncio|uvloop]    Event loop implementation.  [default: auto]
  --http [auto|h11|httptools]     HTTP protocol implementation.  [default:
                                  auto]
  --ws [auto|none|websockets|wsproto]
                                  WebSocket protocol implementation.
                                  [default: auto]
  --ws-max-size INTEGER           WebSocket max size message in bytes
                                  [default: 16777216]
  --ws-ping-interval FLOAT        WebSocket ping interval  [default: 20.0]
  --ws-ping-timeout FLOAT         WebSocket ping timeout  [default: 20.0]
  --ws-per-message-deflate BOOLEAN
                                  WebSocket per-message-deflate compression
                                  [default: True]
  --lifespan [auto|on|off]        Lifespan implementation.  [default: auto]
  --interface [auto|asgi3|asgi2|wsgi]
                                  Select ASGI3, ASGI2, or WSGI as the
                                  application interface.  [default: auto]
  --env-file PATH                 Environment configuration file.
  --log-config PATH               Logging configuration file. Supported
                                  formats: .ini, .json, .yaml.
  --log-level [critical|error|warning|info|debug|trace]
                                  Log level. [default: info]
  --access-log / --no-access-log  Enable/Disable access log.
  --use-colors / --no-use-colors  Enable/Disable colorized logging.
  --proxy-headers / --no-proxy-headers
                                  Enable/Disable X-Forwarded-Proto,
                                  X-Forwarded-For, X-Forwarded-Port to
                                  populate remote address info.
  --server-header / --no-server-header
                                  Enable/Disable default Server header.
  --date-header / --no-date-header
                                  Enable/Disable default Date header.
  --forwarded-allow-ips TEXT      Comma separated list of IPs to trust with
                                  proxy headers. Defaults to the
                                  $FORWARDED_ALLOW_IPS environment variable if
                                  available, or '127.0.0.1'.
  --root-path TEXT                Set the ASGI 'root_path' for applications
                                  submounted below a given URL path.
  --limit-concurrency INTEGER     Maximum number of concurrent connections or
                                  tasks to allow, before issuing HTTP 503
                                  responses.
  --backlog INTEGER               Maximum number of connections to hold in
                                  backlog
  --limit-max-requests INTEGER    Maximum number of requests to service before
                                  terminating the process.
  --timeout-keep-alive INTEGER    Close Keep-Alive connections if no new data
                                  is received within this timeout.  [default:
                                  5]
  --timeout-graceful-shutdown INTEGER
                                  Maximum number of seconds to wait for
                                  graceful shutdown.
  --ssl-keyfile TEXT              SSL key file
  --ssl-certfile TEXT             SSL certificate file
  --ssl-keyfile-password TEXT     SSL keyfile password
  --ssl-version INTEGER           SSL version to use (see stdlib ssl module's)
                                  [default: 17]
  --ssl-cert-reqs INTEGER         Whether client certificate is required (see
                                  stdlib ssl module's)  [default: 0]
  --ssl-ca-certs TEXT             CA certificates file
  --ssl-ciphers TEXT              Ciphers to use (see stdlib ssl module's)
                                  [default: TLSv1]
  --header TEXT                   Specify custom default HTTP response headers
                                  as a Name:Value pair
  --version                       Display the uvicorn version and exit.
  --app-dir TEXT                  Look for APP in the specified directory, by
                                  adding this to the PYTHONPATH. Defaults to
                                  the current working directory.
  --h11-max-incomplete-event-size INTEGER
                                  For h11, the maximum number of bytes to
                                  buffer of an incomplete event.
  --factory                       Treat APP as an application factory, i.e. a
                                  () -> <ASGI app> callable.
  --help                          Show this message and exit.
(ppai) xf@VP01:~/ai/tts/paddlespeech$

果然没有–debug这个参数
但是paddlespeech_server.py中uvicorn.run()也没加入debug参数,这问题有点迷糊了。

再来个大胆的尝试:

  • 复制PaddleSpeech/PaddleSpeech/server/conf/application.yaml到paddlespeech项目上级目录
  • 把PaddleSpeech目录改名为aaaa,让程序找不到PaddleSpeech的任何文件,看下服务器程序需要用到PaddleSpeech项目下的哪些数据
(ppai) xf@VP01:~/ai/tts/paddlespeech$ cp PaddleSpeech/PaddleSpeech/server/conf/application.yaml /home/xf/ai/tts
cp: cannot stat 'PaddleSpeech/PaddleSpeech/server/conf/application.yaml': No such file or directory
(ppai) xf@VP01:~/ai/tts/paddlespeech$ cp PaddleSpeech/PaddleSpeech/server/conf/application.yaml ~/ai/tts
cp: cannot stat 'PaddleSpeech/PaddleSpeech/server/conf/application.yaml': No such file or directory
(ppai) xf@VP01:~/ai/tts/paddlespeech$ cp ~/ai/tts/PaddleSpeech/PaddleSpeech/server/conf/application.yaml ~/ai/tts
(ppai) xf@VP01:~/ai/tts/paddlespeech$ cd ..
(ppai) xf@VP01:~/ai/tts$ mv paddlespeech aaaa
(ppai) xf@VP01:~/ai/tts$

运行服务器程序
paddlespeech_server start --config_file /home/xf/ai/tts/application.yaml

(ppai) xf@VP01:~/ai/tts$ paddlespeech_server start --config_file /home/xf/ai/tts/application.yaml
[2023-05-02 05:44:14,790] [    INFO] - start to init the engine
[2023-05-02 05:44:14,790] [    INFO] - asr : python engine.
W0502 05:44:17.691005 32625 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0502 05:44:17.697503 32625 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-02 05:44:18.394 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-02 05:44:19,218] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-02 05:44:19,218] [    INFO] - tts : python engine.
[2023-05-02 05:44:23,311] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-02 05:44:23,316] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-02 05:44:23,317] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-02 05:44:25,701] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-02 05:44:25,701] [    INFO] - cls : python engine.
[2023-05-02 05:44:30,108] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-02 05:44:30,108] [    INFO] - text : python engine.
[2023-05-02 05:44:32,195] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-02 05:44:32,196] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-02 05:44:32,867] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-02 05:44:32,867] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-02 05:44:32,868] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-02 05:44:32,869] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-02 05:44:32,873] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-02 05:44:32,873] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-02 05:44:32,874] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-02 05:44:32,874] [    INFO] - vector : python engine.
[2023-05-02 05:44:33,260] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
Building prefix dict from the default dictionary ...
[2023-05-02 05:44:33] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-05-02 05:44:33] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.269 seconds.
[2023-05-02 05:44:33] [DEBUG] [__init__.py:165] Loading model cost 0.269 seconds.
Prefix dict has been built successfully.
[2023-05-02 05:44:33] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
[2023-05-02 05:44:36,191] [   ERROR] - Failed to start server.
[2023-05-02 05:44:36,191] [   ERROR] - run() got an unexpected keyword argument 'debug'

恍然大悟

paddlespeech_server start 命令只用了application.yaml,PaddleSpeech项目包里其他程序和服务器启动没有半点关联。

4.1.4 尝试替换uvicorn版本

修改服务器程序这条路走不通,那就修改uvicorn软件版本,仔细看了官方文档,用的是uvicorn==0.18.3
pip install uvicorn==0.18.3

(ppai) xf@VP01:~/ai/tts/paddlespeech$ pip install uvicorn==0.18.3
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple, https://pypi.ngc.nvidia.com
Collecting uvicorn==0.18.3
  Downloading https://pypi.tuna.tsinghua.edu.cn/packages/64/82/3fdff66fca901b30e42c88e0c37ada35e181074e0c4fd8d7d7525107329d/uvicorn-0.18.3-py3-none-any.whl (57 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 57.4/57.4 kB 1.1 MB/s eta 0:00:00
Requirement already satisfied: click>=7.0 in /home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages (from uvicorn==0.18.3) (8.1.3)
Requirement already satisfied: h11>=0.8 in /home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages (from uvicorn==0.18.3) (0.14.0)
Requirement already satisfied: typing-extensions in /home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages (from uvicorn==0.18.3) (4.5.0)
Requirement already satisfied: importlib-metadata in /home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages (from click>=7.0->uvicorn==0.18.3) (6.6.0)
Requirement already satisfied: zipp>=0.5 in /home/xf/anaconda3/envs/ppai/lib/python3.7/site-packages (from importlib-metadata->click>=7.0->uvicorn==0.18.3) (3.15.0)
Installing collected packages: uvicorn
  Attempting uninstall: uvicorn
    Found existing installation: uvicorn 0.22.0
    Uninstalling uvicorn-0.22.0:
      Successfully uninstalled uvicorn-0.22.0
Successfully installed uvicorn-0.18.3

试试运行服务器

(ppai) xf@VP01:~/ai/tts/paddlespeech$ paddlespeech_server start --config_file /home/xf/ai/tts/paddlespeech/paddlespeech/server/conf/application.yaml
[2023-05-02 07:16:34,644] [    INFO] - start to init the engine
[2023-05-02 07:16:34,644] [    INFO] - asr : python engine.
W0502 07:16:37.497296  1187 gpu_resources.cc:61] Please NOTE: device: 0, GPU Compute Capability: 8.9, Driver API Version: 12.1, Runtime API Version: 11.7
W0502 07:16:37.502528  1187 gpu_resources.cc:91] device: 0, cuDNN Version: 8.8.
2023-05-02 07:16:38.195 | INFO     | paddlespeech.s2t.modules.embedding:__init__:153 - max len: 5000
[2023-05-02 07:16:39,064] [    INFO] - Initialize ASR server engine successfully on device: gpu:0.
[2023-05-02 07:16:39,064] [    INFO] - tts : python engine.
[2023-05-02 07:16:43,148] [    INFO] - Already cached /home/xf/.paddlenlp/models/bert-base-chinese/bert-base-chinese-vocab.txt
[2023-05-02 07:16:43,153] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/bert-base-chinese/tokenizer_config.json
[2023-05-02 07:16:43,153] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/bert-base-chinese/special_tokens_map.json
[2023-05-02 07:16:45,409] [    INFO] - Initialize TTS server engine successfully on device: gpu:0.
[2023-05-02 07:16:45,410] [    INFO] - cls : python engine.
[2023-05-02 07:16:49,831] [    INFO] - Initialize CLS server engine successfully on device: gpu:0.
[2023-05-02 07:16:49,831] [    INFO] - text : python engine.
[2023-05-02 07:16:51,895] [    INFO] - loading configuration file /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt/config.json
[2023-05-02 07:16:51,895] [    INFO] - Model config ErnieConfig {
  "architectures": [
    "ErnieForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "enable_recompute": false,
  "fuse": false,
  "hidden_act": "relu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "LABEL_0",
    "1": "LABEL_1",
    "2": "LABEL_2",
    "3": "LABEL_3"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "LABEL_0": 0,
    "LABEL_1": 1,
    "LABEL_2": 2,
    "LABEL_3": 3
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 513,
  "model_type": "ernie",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "paddlenlp_version": null,
  "pool_act": "tanh",
  "task_id": 0,
  "task_type_vocab_size": 3,
  "type_vocab_size": 2,
  "use_task_id": true,
  "vocab_size": 18000
}

[2023-05-02 07:16:52,265] [    INFO] - All model checkpoint weights were used when initializing ErnieForTokenClassification.

[2023-05-02 07:16:52,265] [ WARNING] - Some weights of ErnieForTokenClassification were not initialized from the model checkpoint at /home/xf/.paddlespeech/models/ernie_linear_p3_wudao-punc-zh/1.0/ernie_linear_p3_wudao-punc-zh.tar/ckpt and are newly initialized: ['ernie.embeddings.task_type_embeddings.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
[2023-05-02 07:16:52,265] [ WARNING] - Accessing `num_classes` through `model.num_classes` will be deprecated after v2.6.0. Instead, do `model.config.num_classes`
[2023-05-02 07:16:52,266] [    INFO] - Already cached /home/xf/.paddlenlp/models/ernie-1.0/vocab.txt
[2023-05-02 07:16:52,271] [    INFO] - tokenizer config file saved in /home/xf/.paddlenlp/models/ernie-1.0/tokenizer_config.json
[2023-05-02 07:16:52,271] [    INFO] - Special tokens file saved in /home/xf/.paddlenlp/models/ernie-1.0/special_tokens_map.json
[2023-05-02 07:16:52,271] [    INFO] - Initialize Text server engine successfully on device: gpu:0.
[2023-05-02 07:16:52,271] [    INFO] - vector : python engine.
[2023-05-02 07:16:52,680] [    INFO] - Initialize Vector server engine successfully on device: gpu:0.
Building prefix dict from the default dictionary ...
[2023-05-02 07:16:52] [DEBUG] [__init__.py:113] Building prefix dict from the default dictionary ...
Loading model from cache /tmp/jieba.cache
[2023-05-02 07:16:52] [DEBUG] [__init__.py:133] Loading model from cache /tmp/jieba.cache
Loading model cost 0.277 seconds.
[2023-05-02 07:16:52] [DEBUG] [__init__.py:165] Loading model cost 0.277 seconds.
Prefix dict has been built successfully.
[2023-05-02 07:16:52] [DEBUG] [__init__.py:166] Prefix dict has been built successfully.
INFO:     Started server process [1187]
[2023-05-02 07:16:55] [INFO] [server.py:75] Started server process [1187]
INFO:     Waiting for application startup.
[2023-05-02 07:16:55] [INFO] [on.py:47] Waiting for application startup.
INFO:     Application startup complete.
[2023-05-02 07:16:55] [INFO] [on.py:61] Application startup complete.
INFO:     Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)
[2023-05-02 07:16:55] [INFO] [server.py:212] Uvicorn running on http://127.0.0.1:8090 (Press CTRL+C to quit)

惊喜不惊喜!!!

本文完

好的,以下是使用 PaddlePaddle 搭建 CRNN 进行语音识别的基本步骤: 1. 数据准备:准备好训练集和测试集。每个音频文件应该有对应的文本标注。 2. 特征提取:使用 Mel-spectrogram 技术将音频文件转换为图像,以便训练模型。 3. 构建模型:使用 PaddlePaddle 搭建 CRNN 模型,其中包括卷积层、循环神经网络层和全连接层。 4. 训练模型:使用训练集进行模型训练,并使用测试集进行验证。 5. 模型优化:根据实际情况对模型进行调整和优化,以提高模型的准确性。 6. 模型部署:将模型部署到生产环境中,以进行实际应用。 以下是一个基本的 CRNN 模型实现的代码示例: ```python import paddle import paddle.fluid as fluid class CRNN(fluid.dygraph.Layer): def __init__(self, name_scope, num_classes): super(CRNN, self).__init__(name_scope) self.num_classes = num_classes self.conv1 = fluid.dygraph.Conv2D(num_channels=1, num_filters=32, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.pool1 = fluid.dygraph.Pool2D(pool_size=(2, 2), pool_stride=(2, 2), pool_type='max') self.conv2 = fluid.dygraph.Conv2D(num_channels=32, num_filters=64, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.pool2 = fluid.dygraph.Pool2D(pool_size=(2, 2), pool_stride=(2, 2), pool_type='max') self.conv3 = fluid.dygraph.Conv2D(num_channels=64, num_filters=128, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.conv4 = fluid.dygraph.Conv2D(num_channels=128, num_filters=128, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.pool3 = fluid.dygraph.Pool2D(pool_size=(2, 2), pool_stride=(2, 2), pool_type='max') self.conv5 = fluid.dygraph.Conv2D(num_channels=128, num_filters=256, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.batch_norm1 = fluid.dygraph.BatchNorm(num_channels=256, act='relu') self.conv6 = fluid.dygraph.Conv2D(num_channels=256, num_filters=256, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.batch_norm2 = fluid.dygraph.BatchNorm(num_channels=256, act='relu') self.pool4 = fluid.dygraph.Pool2D(pool_size=(2, 2), pool_stride=(2, 1), pool_type='max') self.conv7 = fluid.dygraph.Conv2D(num_channels=256, num_filters=512, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.batch_norm3 = fluid.dygraph.BatchNorm(num_channels=512, act='relu') self.conv8 = fluid.dygraph.Conv2D(num_channels=512, num_filters=512, filter_size=(3, 3), stride=(1, 1), padding=(1, 1)) self.batch_norm4 = fluid.dygraph.BatchNorm(num_channels=512, act='relu') self.pool5 = fluid.dygraph.Pool2D(pool_size=(2, 2), pool_stride=(2, 1), pool_type='max') self.conv9 = fluid.dygraph.Conv2D(num_channels=512, num_filters=512, filter_size=(2, 2), stride=(1, 1), padding=(0, 0)) self.batch_norm5 = fluid.dygraph.BatchNorm(num_channels=512, act='relu') self.fc1 = fluid.dygraph.Linear(512, 512, act='relu') self.fc2 = fluid.dygraph.Linear(512, self.num_classes) def forward(self, x): x = self.conv1(x) x = self.pool1(x) x = self.conv2(x) x = self.pool2(x) x = self.conv3(x) x = self.conv4(x) x = self.pool3(x) x = self.conv5(x) x = self.batch_norm1(x) x = self.conv6(x) x = self.batch_norm2(x) x = self.pool4(x) x = self.conv7(x) x = self.batch_norm3(x) x = self.conv8(x) x = self.batch_norm4(x) x = self.pool5(x) x = self.conv9(x) x = self.batch_norm5(x) x = fluid.layers.squeeze(x, [2]) x = fluid.layers.transpose(x, [0, 2, 1]) x = fluid.layers.fc(x, size=512, act='relu') x = fluid.layers.dropout(x, dropout_prob=0.5) x = fluid.layers.fc(x, size=self.num_classes, act='softmax') return x ``` 其中,`num_classes` 表示分类数目,`forward()` 方法中定义了 CRNN 的前向传播过程。在训练过程中,使用 `fluid.dygraph.to_variable()` 方法将数据转换为 PaddlePaddle 支持的数据格式,然后使用 `model()` 方法进行模型的前向传播和反向传播,最终使用 `model.save()` 方法保存模型。 希望以上内容能对您有所帮助!
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值