FunASR实时语音转写:前端Vue组件开发实例

FunASR实时语音转写:前端Vue组件开发实例

【免费下载链接】FunASR A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. 【免费下载链接】FunASR 项目地址: https://gitcode.com/GitHub_Trending/fun/FunASR

引言:实时语音转写的技术痛点与解决方案

你是否还在为构建低延迟语音转写系统而困扰?传统方案往往面临三大挑战:音频流处理复杂WebSocket连接不稳定多模型协同延迟。本文将基于FunASR开源工具包,通过完整Vue组件开发实例,带你从零实现工业级实时语音转写功能,读完你将掌握

  • 浏览器音频流采集与预处理
  • WebSocket全双工通信实现
  • 2Pass实时转写架构集成
  • 前端降噪与音频分片优化

技术架构:FunASR实时转写原理

核心技术栈选型

模块技术选型优势
音频采集MediaRecorder API原生支持PCM格式输出
实时通信WebSocket低延迟双向数据传输
语音识别FunASR-WebSocket服务集成VAD+ASR+标点预测
前端框架Vue 2.x + Composition API组件化状态管理

2Pass转写流程图

mermaid

组件开发:从零构建语音转写组件

1. 项目初始化与依赖安装

# 创建Vue组件项目
vue create funasr-vue-demo
cd funasr-vue-demo

# 安装核心依赖
npm install webrtc-adapter # 音频设备兼容处理
npm install crypto-js # 音频数据加密
npm install ant-design-vue # UI组件库

2. 音频采集组件实现

<template>
  <div class="audio-recorder">
    <a-button 
      :loading="isRecording" 
      @click="toggleRecording"
      type="primary"
      icon="sound"
    >
      {{ isRecording ? '停止录音' : '开始录音' }}
    </a-button>
    <a-slider 
      v-model="volume" 
      :disabled="!isRecording" 
      class="volume-control"
    />
  </div>
</template>

<script>
import { ref, onMounted, onUnmounted } from 'vue'
import adapter from 'webrtc-adapter'

export default {
  name: 'AudioRecorder',
  emits: ['audioChunk'],
  setup(_, { emit }) {
    const isRecording = ref(false)
    const volume = ref(0)
    const mediaRecorder = ref(null)
    const audioContext = ref(null)
    const analyser = ref(null)
    const stream = ref(null)

    // 初始化音频上下文
    onMounted(() => {
      audioContext.value = new (window.AudioContext || window.webkitAudioContext)()
      analyser.value = audioContext.value.createAnalyser()
      analyser.value.fftSize = 256
    })

    // 开始录音
    const startRecording = async () => {
      try {
        stream.value = await navigator.mediaDevices.getUserMedia({
          audio: {
            sampleRate: 16000,
            channelCount: 1,
            echoCancellation: true
          }
        })
        
        // 连接音量分析器
        const source = audioContext.value.createMediaStreamSource(stream.value)
        source.connect(analyser.value)
        
        // 配置MediaRecorder
        const options = {
          mimeType: 'audio/webm;codecs=opus',
          audioBitsPerSecond: 16000
        }
        mediaRecorder.value = new MediaRecorder(stream.value, options)
        
        // 每600ms发送一次音频块
        mediaRecorder.value.start(600)
        
        // 监听数据可用事件
        mediaRecorder.value.ondataavailable = (e) => {
          if (e.data.size > 0) {
            processAudioChunk(e.data)
            updateVolume()
          }
        }
        
        isRecording.value = true
      } catch (err) {
        console.error('录音初始化失败:', err)
        this.$message.error('请授予麦克风权限')
      }
    }

    // 处理音频块
    const processAudioChunk = async (chunk) => {
      // 转换为PCM格式
      const arrayBuffer = await chunk.arrayBuffer()
      const audioBuffer = await audioContext.value.decodeAudioData(arrayBuffer)
      
      // 下采样至16kHz单声道
      const downsampled = downsampleBuffer(
        audioBuffer.getChannelData(0),
        audioBuffer.sampleRate,
        16000
      )
      
      // 发送音频块
      emit('audioChunk', {
        data: downsampled,
        timestamp: Date.now()
      })
    }

    // 音量监测
    const updateVolume = () => {
      const dataArray = new Uint8Array(analyser.value.frequencyBinCount)
      analyser.value.getByteFrequencyData(dataArray)
      const avg = dataArray.reduce((a, b) => a + b, 0) / dataArray.length
      volume.value = Math.min(100, Math.max(0, avg * 2))
    }

    // 停止录音
    const stopRecording = () => {
      if (mediaRecorder.value && mediaRecorder.value.state !== 'inactive') {
        mediaRecorder.value.stop()
        stream.value.getTracks().forEach(track => track.stop())
        isRecording.value = false
      }
    }

    // 工具函数:音频下采样
    const downsampleBuffer = (buffer, sampleRate, targetSampleRate) => {
      if (sampleRate === targetSampleRate) return buffer
      
      const ratio = sampleRate / targetSampleRate
      const newLength = Math.round(buffer.length / ratio)
      const result = new Float32Array(newLength)
      
      let offsetResult = 0
      let offsetBuffer = 0
      while (offsetResult < result.length) {
        const nextOffsetBuffer = Math.round((offsetResult + 1) * ratio)
        let sum = 0, count = 0
        for (let i = offsetBuffer; i < nextOffsetBuffer && i < buffer.length; i++) {
          sum += buffer[i]
          count++
        }
        result[offsetResult] = sum / count
        offsetResult++
        offsetBuffer = nextOffsetBuffer
      }
      return result
    }

    return {
      isRecording,
      volume,
      toggleRecording: () => isRecording.value ? stopRecording() : startRecording()
    }
  </script>
</template>

3. WebSocket通信模块

<template>
  <div class="funasr-transcriber">
    <audio-recorder @audioChunk="handleAudioChunk" />
    <div class="transcriptBox">
      <div class="temp-result" v-if="tempResult">
        {{ tempResult }}
      </div>
      <div class="final-result" v-for="(item, index) in finalResults" :key="index">
        {{ item.text }}
      </div>
    </div>
  </div>
</template>

<script>
import { ref, onMounted, onUnmounted, watch } from 'vue'
import AudioRecorder from './AudioRecorder.vue'

export default {
  components: { AudioRecorder },
  setup() {
    const ws = ref(null)
    const tempResult = ref('')
    const finalResults = ref([])
    const isConnected = ref(false)
    
    // 连接WebSocket服务
    const connectWebSocket = () => {
      // 服务端地址配置
      const protocol = window.location.protocol === 'https:' ? 'wss:' : 'ws:'
      const host = '127.0.0.1'
      const port = '10095'
      
      // 建立连接
      ws.value = new WebSocket(
        `${protocol}//${host}:${port}/ws?mode=2pass&chunk_size=5,10,5`
      )
      
      // 连接事件处理
      ws.value.onopen = () => {
        console.log('WebSocket连接成功')
        isConnected.value = true
      }
      
      ws.value.onmessage = (event) => {
        const data = JSON.parse(event.data)
        handleServerMessage(data)
      }
      
      ws.value.onerror = (error) => {
        console.error('WebSocket错误:', error)
        reconnect()
      }
      
      ws.value.onclose = () => {
        console.log('WebSocket连接关闭')
        isConnected.value = false
        if (process.env.NODE_ENV !== 'development') reconnect()
      }
    }
    
    // 重连机制
    const reconnect = () => {
      if (!isConnected.value) {
        setTimeout(connectWebSocket, 3000)
      }
    }
    
    // 处理服务端消息
    const handleServerMessage = (data) => {
      if (data.type === 'partial_result') {
        // 临时结果更新
        tempResult.value = data.result
      } else if (data.type === 'final_result') {
        // 最终结果确认
        finalResults.value.push({
          text: data.result,
          timestamp: data.timestamp
        })
        tempResult.value = ''
      } else if (data.type === 'error') {
        console.error('转写错误:', data.message)
      }
    }
    
    // 发送音频块
    const handleAudioChunk = (chunk) => {
      if (ws.value && ws.value.readyState === WebSocket.OPEN) {
        // 转换为Base64发送
        const base64Data = btoa(String.fromCharCode(
          ...new Uint8Array(chunk.data.buffer)
        ))
        
        ws.value.send(JSON.stringify({
          type: 'audio',
          data: base64Data,
          timestamp: chunk.timestamp
        }))
      }
    }
    
    onMounted(() => connectWebSocket())
    onUnmounted(() => {
      if (ws.value) ws.value.close()
    })
    
    return {
      tempResult,
      finalResults,
      handleAudioChunk
    }
  }
}
</script>

服务端部署与联调

Docker快速部署

# 拉取镜像
sudo docker pull registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10

# 启动服务
mkdir -p ./models
sudo docker run -p 10095:10095 -it --privileged=true \
  -v $PWD/models:/workspace/models \
  registry.cn-hangzhou.aliyuncs.com/funasr_repo/funasr:funasr-runtime-sdk-online-cpu-0.1.10

# 启动2Pass服务
cd FunASR/runtime
nohup bash run_server_2pass.sh \
  --download-model-dir /workspace/models \
  --vad-dir damo/speech_fsmn_vad_zh-cn-16k-common-onnx \
  --model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-onnx  \
  --online-model-dir damo/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-online-onnx  \
  --punc-dir damo/punc_ct-transformer_zh-cn-common-vad_realtime-vocab272727-onnx \
  --itn-dir thuduj12/fst_itn_zh > log.txt 2>&1 &

性能优化参数配置

参数建议值说明
chunk_size"5,10,5"音频分片大小(前向/中间/后向)
sample_rate16000采样率(必须与模型匹配)
audio_buffer0.2音频缓存时长(秒)
vad_silence_time600静音检测阈值(毫秒)

常见问题与解决方案

Q1: 音频采集有回声或噪音

A: 启用浏览器内置降噪算法

// 修改MediaRecorder配置
const constraints = {
  audio: {
    echoCancellation: true,
    noiseSuppression: true,
    autoGainControl: true
  }
}

Q2: WebSocket连接频繁断开

A: 实现心跳机制

// 添加心跳检测
const heartbeatInterval = setInterval(() => {
  if (ws.value && ws.value.readyState === WebSocket.OPEN) {
    ws.value.send(JSON.stringify({ type: 'ping' }))
  }
}, 30000)

Q3: 转写延迟超过300ms

A: 优化策略组合

  1. 减少音频分片大小至300ms
  2. 启用WebWorker处理音频编码
  3. 服务端使用GPU加速(需部署带GPU的Docker镜像)

总结与扩展

本文通过完整实例演示了基于FunASR的前端语音转写组件开发,核心亮点包括:

  1. 低延迟架构:采用2Pass模式平衡实时性与准确率
  2. 组件化设计:音频采集与WebSocket通信解耦
  3. 全平台兼容:适配Chrome/Firefox/Safari等浏览器

扩展方向

  • 集成热词定制功能(通过WebSocket发送hotword参数)
  • 实现多语言识别切换(修改model-dir参数)
  • 添加语音情感分析(集成FunASR情感模型)

欢迎点赞收藏本教程,关注后续《FunASR模型微调实战》系列文章!

【免费下载链接】FunASR A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc. 【免费下载链接】FunASR 项目地址: https://gitcode.com/GitHub_Trending/fun/FunASR

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值