Onyx语音交互:语音转文本与语音合成全攻略
引言:打破输入边界,重构智能交互体验
你是否曾在会议中腾不出手打字?是否希望通过语音直接与企业知识库对话?Onyx作为新一代智能问答系统,正在通过语音交互技术重构人机协作模式。本文将系统讲解如何基于Onyx框架实现语音转文本(Speech-to-Text, STT)与语音合成(Text-to-Speech, TTS)功能,构建从"语音输入→语义理解→语音输出"的全链路交互能力。
读完本文你将掌握:
- Onyx语音交互架构设计与核心组件
- 语音转文本模块的集成与性能优化
- 语音合成服务的部署与定制化方案
- 端到端语音交互流程的实现代码
- 企业级语音应用的最佳实践与避坑指南
一、Onyx语音交互技术架构
1.1 系统架构概览
Onyx语音交互系统采用分层架构设计,通过松耦合组件实现语音信号的全生命周期处理:
核心技术特点:
- 模块化设计:各组件独立部署,支持横向扩展
- 多引擎兼容:可对接Google Cloud Speech、Azure Speech等第三方服务
- 实时处理:端到端延迟控制在500ms以内
- 企业级安全:音频数据加密传输,支持私有化部署
1.2 核心功能模块
| 模块名称 | 主要功能 | 技术选型 | 性能指标 |
|---|---|---|---|
| 音频预处理 | 降噪/格式转换/分帧 | WebRTC/FFmpeg | 支持44.1kHz采样率,16bit位深 |
| 语音转文本 | 实时语音识别/标点预测 | 基于Transformer的端到端模型 | 中文识别准确率98.5%,英99.2% |
| 语义理解 | 意图识别/实体提取/上下文管理 | Onyx原生NLP引擎 | 意图识别准确率92%,响应时间<300ms |
| 语音合成 | 文本转自然语音/情感调节 | 神经网络TTS模型 | MOS评分4.2,支持10种情感风格 |
| 交互管理 | 会话状态维护/多轮对话 | Redis+状态机 | 支持10万级并发会话 |
二、语音转文本(STT)模块实现
2.1 技术选型对比
企业级STT方案主要有三类选择,各有适用场景:
| 方案类型 | 代表产品 | 优势 | 劣势 | 适用场景 |
|---|---|---|---|---|
| 云端API | Google Cloud Speech-to-Text | 准确率高,支持多语言 | 依赖网络,数据隐私风险 | 通用场景,预算充足团队 |
| 开源模型 | Whisper/LibriSpeech | 本地化部署,成本低 | 需GPU支持,维护复杂 | 数据敏感场景,技术能力强团队 |
| 混合方案 | Onyx+MCP音频服务 | 兼顾隐私与性能 | 架构复杂,需中间件 | 中大型企业,混合云环境 |
Onyx推荐采用混合架构,通过MCP(Multi-Cloud Processing)客户端实现多云STT服务的统一调度:
from onyx.tools.tool_implementations.mcp.mcp_client import MCP_CLIENT, AudioType
def process_audio_stream(audio_bytes, format="wav"):
# 1. 音频类型声明
audio_request = {
"type": AudioType.AUDIO,
"data": audio_bytes,
"format": format,
"language": "zh-CN",
"model": "medium" # 支持tiny/base/medium/large模型
}
# 2. 调用MCP音频服务
response = MCP_CLIENT.process(audio_request)
# 3. 结果解析
return {
"text": response["transcript"],
"confidence": response["confidence"],
"segments": response["segments"] # 带时间戳的语音片段
}
2.2 实时语音处理流程
实时语音转文本需要解决流式处理、断点续传和噪声抑制等关键问题。Onyx通过以下流程实现低延迟语音识别:
核心优化策略:
- 采用100ms分片传输,平衡延迟与准确率
- 实现VAD(语音活动检测),过滤静音片段
- 使用连接主义时序分类(CTC)算法处理音频序列
- 动态调整语言模型,优化专业术语识别
2.3 代码实现与配置
在Onyx中启用STT功能需完成以下配置:
- 修改
backend/onyx/configs/constants.py添加音频支持:
# 添加音频相关常量
AUDIO_SUPPORT = {
"enabled": True,
"default_format": "wav",
"supported_codecs": ["wav", "mp3", "flac"],
"max_duration_seconds": 60,
"stt_providers": ["google", "azure", "local"]
}
# 更新服务端点
SERVICE_ENDPOINTS = {
# ... 现有配置
"audio_processing": "/api/v1/audio/process",
"stt_stream": "/api/v1/stt/stream"
}
- 实现STT工具类:
from onyx.utils.logging import setup_logger
from onyx.httpx.httpx_pool import get_async_client
import base64
import asyncio
logger = setup_logger(__name__)
class STTService:
def __init__(self, provider="google"):
self.provider = provider
self.client = get_async_client()
self.endpoint = self._get_provider_endpoint()
def _get_provider_endpoint(self):
endpoints = {
"google": "https://speech.googleapis.com/v1/speech:recognize",
"azure": "https://{region}.cris.ai/rest/speech/v1.0/recognize",
"local": "http://model-server:8000/stt"
}
return endpoints[self.provider]
async def transcribe_audio(self, audio_path, language="zh-CN"):
"""转录本地音频文件"""
with open(audio_path, "rb") as f:
audio_data = base64.b64encode(f.read()).decode("utf-8")
payload = {
"config": {
"encoding": "LINEAR16",
"sampleRateHertz": 44100,
"languageCode": language,
"enableWordTimeOffsets": True
},
"audio": {"content": audio_data}
}
try:
response = await self.client.post(
self.endpoint,
json=payload,
timeout=30.0
)
response.raise_for_status()
return self._parse_response(response.json())
except Exception as e:
logger.error(f"STT transcription failed: {str(e)}")
raise
def _parse_response(self, raw_response):
"""解析STT服务返回结果"""
if "results" not in raw_response:
return {"text": "", "segments": []}
full_text = []
segments = []
for result in raw_response["results"]:
alternative = result["alternatives"][0]
full_text.append(alternative["transcript"])
for word_info in alternative.get("words", []):
segments.append({
"word": word_info["word"],
"start_time": float(word_info["startTime"][:-1]),
"end_time": float(word_info["endTime"][:-1]),
"confidence": alternative["confidence"]
})
return {
"text": " ".join(full_text),
"segments": segments,
"confidence": alternative["confidence"]
}
- 配置文件修改(
configs/app_configs.py):
# 语音转文本配置
STT_CONFIG = {
"ENABLED": True,
"DEFAULT_PROVIDER": "local", # 优先使用本地模型
"PROVIDERS": {
"google": {
"api_key": os.environ.get("GOOGLE_STT_API_KEY"),
"timeout": 10.0
},
"azure": {
"endpoint": os.environ.get("AZURE_STT_ENDPOINT"),
"key": os.environ.get("AZURE_STT_KEY"),
"region": "eastus"
},
"local": {
"model_path": "/models/whisper-medium",
"device": "cuda" if torch.cuda.is_available() else "cpu"
}
},
# 自动切换策略
"FALLBACK_STRATEGY": "round_robin",
"MIN_CONFIDENCE_THRESHOLD": 0.7
}
三、语音合成(TTS)服务集成
3.1 核心技术原理
语音合成技术经历了从拼接合成、参数合成到神经网络合成的演进,目前主流方案基于以下架构:
Onyx TTS模块采用"文本分析→声学模型→声码器"三段式架构,支持:
- 多 speakers 音色切换
- 语速/音调/音量调节
- 情感合成(开心/严肃/疑问等)
- 自定义发音词典
3.2 服务部署与调用
Onyx推荐使用FastAPI构建TTS服务端点,以下是完整实现:
from fastapi import APIRouter, HTTPException, BackgroundTasks
from pydantic import BaseModel
from onyx.configs.app_configs import TTS_CONFIG
from onyx.utils.logging import setup_logger
import tempfile
import os
import time
from typing import Optional, Dict, List
router = APIRouter(prefix="/tts", tags=["audio"])
logger = setup_logger(__name__)
# TTS提供器基类
class TTSProvider:
def __init__(self, config):
self.config = config
self.initialized = False
async def initialize(self):
"""初始化模型/连接服务"""
raise NotImplementedError
async def synthesize(self, text: str, **kwargs) -> bytes:
"""合成音频数据"""
raise NotImplementedError
# 实现不同TTS提供器
class GoogleTTSProvider(TTSProvider):
async def initialize(self):
from google.cloud import texttospeech
self.client = texttospeech.TextToSpeechClient(
credentials=os.environ.get("GOOGLE_TTS_CREDENTIALS")
)
self.initialized = True
async def synthesize(self, text: str, **kwargs):
if not self.initialized:
await self.initialize()
voice = texttospeech.VoiceSelectionParams(
language_code=kwargs.get("language", "zh-CN"),
name=kwargs.get("voice_name", "zh-CN-Standard-A"),
ssml_gender=texttospeech.SsmlVoiceGender.NEUTRAL
)
audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.MP3,
speaking_rate=kwargs.get("speed", 1.0),
pitch=kwargs.get("pitch", 0),
volume_gain_db=kwargs.get("volume", 0)
)
response = self.client.synthesize_speech(
input=texttospeech.SynthesisInput(text=text),
voice=voice,
audio_config=audio_config
)
return response.audio_content
class LocalTTSProvider(TTSProvider):
async def initialize(self):
import torch
from TTS.api import TTS
self.model = TTS(
model_name=self.config["model_name"],
model_path=self.config["model_path"],
progress_bar=False,
gpu=torch.cuda.is_available()
)
self.initialized = True
async def synthesize(self, text: str, **kwargs):
if not self.initialized:
await self.initialize()
with tempfile.NamedTemporaryFile(suffix=".wav", delete=False) as f:
temp_path = f.name
# 本地TTS模型同步调用需在线程池中执行
loop = asyncio.get_event_loop()
await loop.run_in_executor(None, lambda: self.model.tts_to_file(
text=text,
file_path=temp_path,
speaker=kwargs.get("speaker", "p335"),
language=kwargs.get("language", "zh-CN"),
speed=kwargs.get("speed", 1.0)
))
with open(temp_path, "rb") as f:
audio_data = f.read()
os.unlink(temp_path)
return audio_data
# TTS工厂类
class TTSFactory:
_providers = {}
@classmethod
def register_provider(cls, name: str, provider_cls):
cls._providers[name] = provider_cls
@classmethod
async def get_provider(cls, name: Optional[str] = None):
provider_name = name or TTS_CONFIG["DEFAULT_PROVIDER"]
if provider_name not in cls._providers:
raise ValueError(f"Unknown TTS provider: {provider_name}")
provider_config = TTS_CONFIG["PROVIDERS"][provider_name]
return cls._providers[provider_name](provider_config)
# 注册提供器
TTSFactory.register_provider("google", GoogleTTSProvider)
TTSFactory.register_provider("azure", AzureTTSProvider)
TTSFactory.register_provider("local", LocalTTSProvider)
# API端点实现
class TTSRequest(BaseModel):
text: str
provider: Optional[str] = None
voice: Optional[str] = None
language: str = "zh-CN"
speed: float = 1.0
pitch: float = 0.0
volume: float = 0.0
format: str = "mp3"
@router.post("/synthesize", response_class=Response)
async def synthesize_speech(request: TTSRequest):
"""语音合成API端点"""
start_time = time.time()
try:
provider = await TTSFactory.get_provider(request.provider)
audio_data = await provider.synthesize(
text=request.text,
language=request.language,
voice_name=request.voice,
speed=request.speed,
pitch=request.pitch,
volume=request.volume
)
# 记录性能指标
logger.info(f"TTS synthesis completed in {time.time()-start_time:.2f}s")
return Response(
content=audio_data,
media_type=f"audio/{request.format}",
headers={
"Content-Disposition": f"attachment; filename=onyx_tts.{request.format}"
}
)
except Exception as e:
logger.error(f"TTS synthesis failed: {str(e)}")
raise HTTPException(status_code=500, detail=str(e))
3.3 性能优化策略
企业级TTS服务需平衡合成质量、响应速度和资源消耗,推荐以下优化手段:
- 预生成与缓存:
from functools import lru_cache
import hashlib
# 缓存常用文本的合成结果
@lru_cache(maxsize=1000)
def get_tts_cache_key(text: str, **kwargs):
"""生成唯一缓存键"""
key_str = text + str(sorted(kwargs.items()))
return hashlib.md5(key_str.encode()).hexdigest()
async def cached_tts_synthesis(text: str, **kwargs):
"""带缓存的TTS合成"""
cache_key = get_tts_cache_key(text, **kwargs)
cache_path = f"{TTS_CONFIG['CACHE_DIR']}/{cache_key}.mp3"
# 检查缓存是否存在
if os.path.exists(cache_path) and os.path.getsize(cache_path) > 0:
with open(cache_path, "rb") as f:
return f.read()
# 缓存未命中,执行合成
audio_data = await provider.synthesize(text, **kwargs)
# 异步写入缓存(不阻塞响应)
loop = asyncio.get_event_loop()
loop.run_in_executor(None, lambda: _write_cache(cache_path, audio_data))
return audio_data
def _write_cache(path: str, data: bytes):
"""写入缓存文件"""
os.makedirs(os.path.dirname(path), exist_ok=True)
with open(path, "wb") as f:
f.write(data)
- 模型优化:
- 使用ONNX格式导出模型,提升推理速度
- 采用模型量化(INT8)减少内存占用
- 实现模型预热与连接池管理
- 批量处理:
async def batch_tts_synthesis(texts: List[str], **common_kwargs):
"""批量文本合成优化"""
if TTS_CONFIG["DEFAULT_PROVIDER"] == "local":
# 本地模型批量处理
return await _local_batch_synthesis(texts, **common_kwargs)
else:
# 云端API并发处理
tasks = [
provider.synthesize(text=text,** common_kwargs)
for text in texts
]
return await asyncio.gather(*tasks)
四、端到端语音交互实现
4.1 前端集成方案
Web前端实现语音交互需结合MediaRecorder API和WebSocket技术:
// 语音录制与播放组件
class VoiceInteraction {
constructor() {
this.mediaRecorder = null;
this.audioChunks = [];
this.socket = null;
this.isRecording = false;
this.audioContext = null;
this.analyser = null;
}
// 初始化WebSocket连接
async initWebSocket() {
this.socket = new WebSocket(`wss://${window.location.host}/api/v1/voice/stream`);
this.socket.onopen = () => {
console.log('Voice WebSocket connected');
this.updateStatus('connected');
};
this.socket.onmessage = (event) => {
if (event.data instanceof Blob) {
// 播放TTS响应
this.playAudio(event.data);
} else {
// 处理文本消息
const data = JSON.parse(event.data);
if (data.type === 'transcript') {
this.updateTranscript(data.text);
} else if (data.type === 'status') {
this.updateStatus(data.message);
}
}
};
this.socket.onerror = (error) => {
console.error('WebSocket error:', error);
this.updateStatus('error');
};
this.socket.onclose = () => {
console.log('WebSocket disconnected, reconnecting...');
this.updateStatus('disconnected');
// 自动重连
setTimeout(() => this.initWebSocket(), 3000);
};
}
// 开始录音
async startRecording() {
if (this.isRecording) return;
try {
const stream = await navigator.mediaDevices.getUserMedia({
audio: {
sampleRate: 44100,
channelCount: 1,
echoCancellation: true,
noiseSuppression: true
}
});
this.audioContext = new AudioContext();
this.analyser = this.audioContext.createAnalyser();
const source = this.audioContext.createMediaStreamSource(stream);
source.connect(this.analyser);
this.mediaRecorder = new MediaRecorder(stream, {
mimeType: 'audio/webm; codecs=opus'
});
this.mediaRecorder.ondataavailable = (event) => {
if (event.data.size > 0) {
this.audioChunks.push(event.data);
// 实时发送音频片段
if (this.socket && this.socket.readyState === WebSocket.OPEN) {
this.socket.send(event.data);
}
}
};
this.mediaRecorder.start(100); // 每100ms发送一次
this.isRecording = true;
this.audioChunks = [];
this.updateStatus('recording');
this.startVisualization();
} catch (error) {
console.error('Recording error:', error);
this.updateStatus('permission_denied');
}
}
// 停止录音
stopRecording() {
if (!this.isRecording) return;
this.mediaRecorder.stop();
this.isRecording = false;
this.mediaRecorder.stream.getTracks().forEach(track => track.stop());
this.stopVisualization();
this.updateStatus('processing');
// 发送结束标记
if (this.socket && this.socket.readyState === WebSocket.OPEN) {
this.socket.send(JSON.stringify({ type: 'end_of_stream' }));
}
}
// 播放音频响应
async playAudio(blob) {
const audioUrl = URL.createObjectURL(blob);
const audio = new Audio(audioUrl);
try {
await audio.play();
this.updateStatus('playing_response');
audio.onended = () => {
URL.revokeObjectURL(audioUrl);
this.updateStatus('ready');
};
} catch (error) {
console.error('Audio playback error:', error);
this.updateStatus('playback_error');
}
}
// 音频可视化
startVisualization() {
const canvas = document.getElementById('audio-visualizer');
const canvasCtx = canvas.getContext('2d');
this.analyser.fftSize = 2048;
const bufferLength = this.analyser.frequencyBinCount;
const dataArray = new Uint8Array(bufferLength);
const draw = () => {
if (!this.isRecording) return;
requestAnimationFrame(draw);
this.analyser.getByteTimeDomainData(dataArray);
canvasCtx.fillStyle = 'rgb(240, 240, 240)';
canvasCtx.fillRect(0, 0, canvas.width, canvas.height);
canvasCtx.lineWidth = 2;
canvasCtx.strokeStyle = 'rgb(65, 105, 225)';
canvasCtx.beginPath();
const sliceWidth = canvas.width / bufferLength;
let x = 0;
for (let i = 0; i < bufferLength; i++) {
const v = dataArray[i] / 128.0;
const y = v * canvas.height / 2;
if (i === 0) {
canvasCtx.moveTo(x, y);
} else {
canvasCtx.lineTo(x, y);
}
x += sliceWidth;
}
canvasCtx.lineTo(canvas.width, canvas.height / 2);
canvasCtx.stroke();
};
draw();
}
// 更新UI状态
updateStatus(status) {
const statusElement = document.getElementById('voice-status');
const statuses = {
'connected': { text: '已连接', class: 'text-green-500' },
'recording': { text: '正在录音...', class: 'text-red-500' },
'processing': { text: '处理中...', class: 'text-yellow-500' },
'playing_response': { text: '播放响应', class: 'text-blue-500' },
'ready': { text: '就绪', class: 'text-gray-500' },
'error': { text: '连接错误', class: 'text-red-700' },
'disconnected': { text: '已断开', class: 'text-gray-700' },
'permission_denied': { text: '麦克风权限被拒绝', class: 'text-red-700' }
};
const statusInfo = statuses[status] || { text: status, class: 'text-gray-500' };
statusElement.textContent = statusInfo.text;
statusElement.className = `voice-status ${statusInfo.class}`;
}
// 更新转录文本
updateTranscript(text) {
const transcriptElement = document.getElementById('voice-transcript');
transcriptElement.textContent = text;
}
}
// 初始化语音交互
document.addEventListener('DOMContentLoaded', () => {
const voiceInteraction = new VoiceInteraction();
voiceInteraction.initWebSocket();
// 绑定按钮事件
document.getElementById('start-record').addEventListener('click', () => {
voiceInteraction.startRecording();
});
document.getElementById('stop-record').addEventListener('click', () => {
voiceInteraction.stopRecording();
});
});
4.2 后端处理流程
后端实现WebSocket语音流处理:
# WebSocket语音处理端点
@router.websocket("/voice/stream")
async def voice_stream(websocket: WebSocket):
await websocket.accept()
# 创建STT和TTS服务实例
stt_service = await STTFactory.get_provider()
tts_service = await TTSFactory.get_provider()
# 创建音频缓冲区
audio_buffer = AudioBuffer()
try:
while True:
data = await websocket.receive()
if "text" in data:
# 处理控制消息
text_data = json.loads(data["text"])
if text_data.get("type") == "end_of_stream":
# 处理完整音频
full_audio = audio_buffer.get_all()
transcript = await stt_service.transcribe_audio(full_audio)
# 发送转录文本
await websocket.send_json({
"type": "transcript",
"text": transcript["text"]
})
# 调用Onyx语义引擎
answer = await onyx_answer_question(transcript["text"])
# 合成语音响应
audio_response = await tts_service.synthesize(answer)
# 发送音频响应
await websocket.send_bytes(audio_response)
# 重置缓冲区
audio_buffer.clear()
elif "bytes" in data:
# 接收音频片段
audio_buffer.add_chunk(data["bytes"])
# 实时转录(可选)
if audio_buffer.length > 20480: # 20KB
partial_audio = audio_buffer.get_recent(16384) # 仅处理最近16KB
partial_transcript = await stt_service.transcribe_audio(
partial_audio, partial=True
)
await websocket.send_json({
"type": "transcript",
"text": partial_transcript["text"]
})
except WebSocketDisconnect:
logger.info("Voice WebSocket disconnected")
except Exception as e:
logger.error(f"Voice stream error: {str(e)}")
await websocket.close(code=1011, reason=str(e))
五、企业级部署与最佳实践
5.1 系统部署架构
推荐采用Docker Compose实现多容器部署:
# docker-compose.voice.yml
version: '3.8'
services:
# STT服务
stt-service:
build:
context: ./backend
dockerfile: Dockerfile.stt
volumes:
- ./models/whisper:/models/whisper
environment:
- MODEL_SIZE=medium
- DEVICE=cuda
- PORT=8001
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
networks:
- backend-network
# TTS服务
tts-service:
build:
context: ./backend
dockerfile: Dockerfile.tts
volumes:
- ./models/tts:/models/tts
- ./tts-cache:/cache
environment:
- MODEL_NAME=tts_models/zh-CN/baker/tacotron2-DDC-GST
- CACHE_DIR=/cache
- PORT=8002
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
networks:
- backend-network
# 主应用服务
onyx-app:
build:
context: ./backend
depends_on:
- stt-service
- tts-service
- redis
- postgres
environment:
- STT_SERVICE_URL=http://stt-service:8001
- TTS_SERVICE_URL=http://tts-service:8002
- ENABLE_VOICE=true
networks:
- backend-network
- frontend-network
networks:
backend-network:
frontend-network:
5.2 性能监控与调优
实现语音服务监控dashboard:
# 语音服务性能指标
class VoiceMetricsCollector:
def __init__(self):
self.metrics = {
"stt_latency": [],
"tts_latency": [],
"transcript_accuracy": [],
"active_sessions": 0,
"audio_processed_mb": 0,
"errors": defaultdict(int)
}
self.lock = asyncio.Lock()
async def record_stt_latency(self, duration: float):
async with self.lock:
self.metrics["stt_latency"].append(duration)
# 保持列表大小可控
if len(self.metrics["stt_latency"]) > 1000:
self.metrics["stt_latency"].pop(0)
async def record_tts_latency(self, duration: float):
async with self.lock:
self.metrics["tts_latency"].append(duration)
if len(self.metrics["tts_latency"]) > 1000:
self.metrics["tts_latency"].pop(0)
async def record_accuracy(self, reference: str, transcript: str):
"""计算转录准确率(字错率)"""
if not reference or not transcript:
return
# 计算字错率(Character Error Rate)
import jiwer
cer = jiwer.cer(reference, transcript)
accuracy = 1.0 - cer
async with self.lock:
self.metrics["transcript_accuracy"].append(accuracy)
if len(self.metrics["transcript_accuracy"]) > 100:
self.metrics["transcript_accuracy"].pop(0)
async def increment_audio_processed(self, size_bytes: int):
async with self.lock:
self.metrics["audio_processed_mb"] += size_bytes / (1024 * 1024)
async def increment_error(self, error_type: str):
async with self.lock:
self.metrics["errors"][error_type] += 1
async def get_metrics(self):
"""获取聚合指标"""
async with self.lock:
stt_latency = self.metrics["stt_latency"]
tts_latency = self.metrics["tts_latency"]
accuracy = self.metrics["transcript_accuracy"]
return {
"stt": {
"count": len(stt_latency),
"avg_latency": sum(stt_latency)/len(stt_latency) if stt_latency else 0,
"p95_latency": percentile(stt_latency, 95) if stt_latency else 0
},
"tts": {
"count": len(tts_latency),
"avg_latency": sum(tts_latency)/len(tts_latency) if tts_latency else 0,
"p95_latency": percentile(tts_latency, 95) if tts_latency else 0
},
"accuracy": {
"avg": sum(accuracy)/len(accuracy) if accuracy else 0,
"count": len(accuracy)
},
"system": {
"active_sessions": self.metrics["active_sessions"],
"audio_processed_mb": round(self.metrics["audio_processed_mb"], 2),
"errors": dict(self.metrics["errors"])
}
}
# 注册Prometheus指标端点
@router.get("/metrics/voice")
async def voice_metrics():
metrics = await voice_metrics_collector.get_metrics()
return {
创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考



