python常用代码块 | python多种方式实现语音转文字

原创已于 2024-06-04 19:50:29 修改 · 2.7k 阅读

21 ·

CC 4.0 BY-SA版权

文章标签：

#python #开发语言

于 2024-04-11 12:54:11 首次发布

python常用代码块专栏收录该内容

15 篇文章

订阅专栏

部署运行你感兴趣的模型镜像

python多种方式实现语音转文字

1. mp3转wav

一些工具只能处理wav文件，可以通过剪映直接输出转换，也可以使用如下代码实现
需要安装pydub库： pip install pydub

from pydub import AudioSegment
fp= r'D:\python\pyapp\001\aa.MP3'  #输入自己的文件地址
output = AudioSegment.from_mp3(fp).set_channels(1) #1为单声道  2为多声道
output.export(r'D:\python\pyapp\001\bb.wav', format='wav')

2. OpenAI whisper

各模型大小(网络图片，实际下载的好像有些差异)
各模型下载地址

"tiny.en": "https://openaipublic.azureedge.net/main/whisper/models/d3dd57d32accea0b295c96e26691aa14d8822fac7d9d27d5dc00b4ca2826dd03/tiny.en.pt",
"tiny": "https://openaipublic.azureedge.net/main/whisper/models/65147644a518d12f04e32d6f3b26facc3f8dd46e5390956a9424a650c0ce22b9/tiny.pt",
"base.en": "https://openaipublic.azureedge.net/main/whisper/models/25a8566e1d0c1e2231d1c762132cd20e0f96a85d16145c3a00adf5d1ac670ead/base.en.pt",
"base": "https://openaipublic.azureedge.net/main/whisper/models/ed3a0b6b1c0edf879ad9b11b1af5a0e6ab5db9205f891f668f8b0e6c6326e34e/base.pt",
"small.en": "https://openaipublic.azureedge.net/main/whisper/models/f953ad0fd29cacd07d5a9eda5624af0f6bcf2258be67c92b79389873d91e0872/small.en.pt",
"small": "https://openaipublic.azureedge.net/main/whisper/models/9ecf779972d90ba49c06d968637d720dd632c55bbf19d441fb42bf17a411e794/small.pt",
"medium.en": "https://openaipublic.azureedge.net/main/whisper/models/d7440d1dc186f76616474e0ff0b3b6b879abc9d1a4926b7adfa41db2d497ab4f/medium.en.pt",
"medium": "https://openaipublic.azureedge.net/main/whisper/models/345ae4da62f9b3d59415adc60127b97c714f32e89e936602e85993674d08dcb1/medium.pt",
"large-v1": "https://openaipublic.azureedge.net/main/whisper/models/e4b87e7e0bf463eb8e6956e646f1e277e901512310def2c24bf0e11bd3c28e9a/large-v1.pt",
"large-v2": "https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt",
"large": "https://openaipublic.azureedge.net/main/whisper/models/81f7c96c852ee8fc832187b0132e569d6c3065a3252ed18e56effd0b6a73e524/large-v2.pt",

需要安装whisper： pip install whisper
使用代码比较简单，效果也算可以，至少是可以识别出文字出来

import whisper

#模型可选用上面下载地址中左侧的名称tiny.en、large-v1、large等
model = whisper.load_model('base')  
fp = r'D:\python\pyapp\001\audio\01\music.wav'
result = model.transcribe(fp)
print(result["text"])

音频是首歌曲，识别结果如下，感觉基本发音相似，错别字多了些：
在这里插入图片描述

3. Vosk

需安装vosk库： pip install vosk
vosk的模型库可以在如下地址下载，下载后直接解压即可，代码中加载模型时可以直接填写模型文件夹所在地址
https://alphacephei.com/vosk/models

import sys, json,wave
from vosk import Model, KaldiRecognizer, SetLogLevel
SetLogLevel(-1)

#加载模型地址
model = Model(r'D:\python\pyapp\001\audio\vosk-model-small-cn-0.22')

fp = r'D:\python\pyapp\001\audio\01\word.wav'
wf = wave.open(fp, 'rb')
print('getframerate', wf.getframerate())
print('getnframes', wf.getnframes())

rec = KaldiRecognizer(model, wf.getframerate())
rec.SetWords(True)

str_ret = ''

while True:
  data = wf.readframes(4000)
  if (len(data) == 0):
    break
  if rec.AcceptWaveform(data):
    result = rec.Result()
    result = json.loads(result)
    print(result)
    if 'text' in result:
      str_ret += result['text'] + ' '
      print(result['text'])
print('----')
print(str_ret)