轻量级解决方案：唤醒词检测+固定语音指令识别

最新推荐文章于 2025-11-19 22:02:39 发布

原创最新推荐文章于 2025-11-19 22:02:39 发布 · 3.9k 阅读

22 ·

CC 4.0 BY-SA版权

文章标签：

#语音识别 #人工智能

计算机视觉专栏收录该内容

64 篇文章

订阅专栏

如果需要在Linux上运行一个轻量级的解决方案，完成唤醒词检测和固定语音指令识别，以下是推荐的工具和框架组合：

1. 轻量级唤醒词检测

Snowboy

GitHub: https://github.com/Kitt-AI/snowboy
特点：
- 专为嵌入式设备和低功耗环境设计。
- 支持自定义唤醒词训练。
- 占用资源少，适合轻量级应用。
安装：
```
pip install snowboy
```
使用：
- 训练自定义唤醒词（通过Snowboy官网）。
- 使用Python脚本检测唤醒词。

Mycroft Precise

GitHub: https://github.com/MycroftAI/mycroft-precise
特点：
- 基于RNN的轻量级唤醒词检测工具。
- 支持自定义唤醒词训练。
- 适合Linux环境。
安装：
```
pip install mycroft-precise
```
使用：
- 收集唤醒词数据并训练模型。
- 使用训练好的模型进行唤醒词检测。

2. 固定语音指令识别

SpeechRecognition + PocketSphinx

SpeechRecognition GitHub: https://github.com/Uberi/speech_recognition
PocketSphinx: https://github.com/cmusphinx/pocketsphinx
特点：
- PocketSphinx是轻量级的语音识别引擎，适合固定指令识别。
- SpeechRecognition库提供了简单的API接口。

安装：

pip install SpeechRecognition
sudo apt-get install pocketsphinx

使用：
- 定义固定指令的语法文件（.gram 或 .jsgf）。
- 使用SpeechRecognition调用PocketSphinx进行识别。

Vosk

GitHub: https://github.com/alphacep/vosk-api
特点：
- 轻量级、离线的语音识别工具。
- 支持多种语言和固定指令识别。
- 占用资源少，适合嵌入式设备。
安装：
```
pip install vosk
```
使用：
- 下载小型语言模型（如 vosk-model-small-en-us-0.15）。
- 使用Python脚本进行语音识别。

3. 语义理解

Rasa NLU

GitHub: https://github.com/RasaHQ/rasa
特点：
- 轻量级的自然语言理解工具。
- 支持意图识别和实体抽取。
- 适合固定指令解析。
安装：
```
pip install rasa
```
使用：
- 定义固定指令的意图和实体。
- 训练模型并解析语音识别结果。

Snips NLU

GitHub: https://github.com/snipsco/snips-nlu
特点：
- 专为嵌入式设备设计的轻量级NLU工具。
- 支持多语言和固定指令解析。
安装：
```
pip install snips-nlu
```
使用：
- 定义指令的意图和实体。
- 训练模型并解析语音识别结果。

4. 完整解决方案示例

以下是一个基于 Snowboy + Vosk + Rasa NLU 的轻量级解决方案示例：

步骤 1：唤醒词检测（Snowboy）

import snowboydecoder

def detected_callback():
    print("Wake word detected!")
    # 启动语音识别

detector = snowboydecoder.HotwordDetector("wake_word_model.pmdl", sensitivity=0.5)
detector.start(detected_callback)

步骤 2：语音识别（Vosk）

import os
import json
from vosk import Model, KaldiRecognizer
import pyaudio

model = Model("vosk-model-small-en-us-0.15")
rec = KaldiRecognizer(model, 16000)

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=8192)

print("Listening for command...")
while True:
    data = stream.read(4096)
    if len(data) == 0:
        break
    if rec.AcceptWaveform(data):
        result = json.loads(rec.Result())
        command = result.get("text", "")
        print("Command:", command)
        # 将指令传递给语义理解模块

步骤 3：语义理解（Rasa NLU）

from rasa.nlu.model import Interpreter

interpreter = Interpreter.load("rasa_model")

def parse_command(command):
    result = interpreter.parse(command)
    intent = result["intent"]["name"]
    entities = result["entities"]
    print("Intent:", intent)
    print("Entities:", entities)
    # 根据意图和实体执行相应操作

# 示例
parse_command("turn on the light")