利用AssemblyAI的API实现简单的语音识别

原创已于 2022-08-16 12:52:43 修改 · 1.7k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#语音识别 #人工智能

于 2022-08-15 14:21:18 首次发布

人工智能专栏收录该内容

3 篇文章

订阅专栏

本文介绍了AssemblyAI语音识别API的使用。包括前期准备，如参数配置与库导入；录制音频、上传本地文件，向终端请求获取参数；转录时指定语言发送请求获响应；通过poll操作询问转录状态，最终得到识别结果。该接口免费但有时间限制，非实时识别较准确。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

官方文档：AssemblyAI | Overview

前期准备

参数配置以及相关库的导入

#导入第三方库
import requests
import pyaudio
import wave
from tqdm import tqdm 

#设置参数
FRAME_PER_BUFFER = 4096  #每个缓冲区的帧数
FORMAT = pyaudio.paInt32    #字节类型
CHANNELS = 2
RATE: int = 44100

终端

upload_endpoint = "https://api.assemblyai.com/v2/upload"  #上传终端
transcript_endpoint = "https://api.assemblyai.com/v2/transcript"  #转录终端

请求头：

这里的API_KRY_ASSEMBLYAI为每个用户独有的密钥，位于官网主页

AssemblyAI Speech-to-Text API | Automatic Speech Recognition

 headers = {'authorization': API_KRY_ASSEMBLYAI}

录制音频

def record_audio(wave_out_path,record_second):

    #实例化
    p = pyaudio.PyAudio()

    stream = p.open(
        channels=CHANNELS,
        rate=RATE,
        format=FORMAT,
        input=True,
        frames_per_buffer=FRAME_PER_BUFFER
    )

    print("开始录制：")

    frames = []
    for i in tqdm(range(0, int(RATE/FRAME_PER_BUFFER * record_second ))):
        data = stream.read(FRAME_PER_BUFFER)
        frames.append(data)

    stream.stop_stream()
    stream.close()
    p.terminate()

    audio_obj = wave.open(wave_out_path, "wb")
    audio_obj.setnchannels(CHANNELS)
    audio_obj.setsampwidth(p.get_sample_size(FORMAT))
    audio_obj.setframerate(RATE)
    audio_obj.writeframes(b"".join(frames))         #将frames里面的所有元素合成二进制字符串的形式
    audio_obj.close()

上传本地文件

#上传本地文件
def upload(filename):
    def read_file(filename, chunk_size=5242880):
        with open(filename, 'rb') as _file:
            while True:
                data = _file.read(chunk_size)
                if not data:
                    break
                yield data
    try:

        upload_response = requests.post(upload_endpoint,
                                headers=headers,
                                data=read_file(filename))

        audio_url = upload_response.json()['upload_url']
        return audio_url
    except:
        print("请再次尝试")
        return

向endpoint请求后得到响应中的参数

{
  "upload_url": "https://bit.ly/3yxKEIY"
}

转录

带着从上传终端得到的upload_url以及language_code（可以指定转录的语言，每种语言都对应一种language_code，详情请看官方文档）两个参数，向转录终端发送请求得到响应json，只需要获取id

#转录
def transcribe(audio_url):
    transcript_request_json = { 
        "audio_url": audio_url,
        "language_code": "es"
                              }
    transcript_response = requests.post(
        transcript_endpoint, 
        json=transcript_request_josn, 
        headers=headers)
    
    id = transcript_response.json()['id']
    return id

响应json：

{
    "id": "5551722-f677-48a6-9287-39c0aafd9ac1"
    ...
}

poll操作以及获取结果

poll操作：询问转录的状态，检查有关转录状态的更新。"status"： "queued" "processing" "completed"

等待状态更新为"completed"，从json中得到转录结果text

官方文档里的poll终端：

endpoint = "https://api.assemblyai.com/v2/transcript/YOUR-TRANSCRIPT-ID-HERE"

即

polling_endpoint = transcript_endpoint + '/' + id

代码：

#poll操作
def poll(id):
    polling_endpoint = transcript_endpoint + '/' + id
    polling_response = requests.get(polling_endpoint,headers=headers)
    return polling_response.json()
#获取结果
def get_transcription_result_url(audio_url):
    id = transcribe(audio_url)
    print("转录中······")
    time_start = time.time()
    while True:  #一直询问转录的状态，直到转录成功
        data = poll(id)
        
        if data['status'] == 'completed':
            time_end = time.time()
            return data['text'], None, time_end-time_start  #查看转录时间
        elif data['status'] == 'error':
            return data['text'], data["error"], None

json: 这里的text就是转录结果

{
    "acoustic_model": "assemblyai_default",
    "audio_duration": 12.0960090702948,
    "audio_url": "https://bit.ly/3yxKEIY",
    "confidence": 0.956,
    "dual_channel": null,
    "format_text": true,
    "id": "5551722-f677-48a6-9287-39c0aafd9ac1",
    "language_model": "assemblyai_default",
    "language_code": "es",
    "punctuate": true,
    "status": "completed",
    "text": "Ya sabes Demonios en la tele así y para que la gente se exponga a ser rechazada en la tele o humillada por el factor miedo o.",
    ...
}