mediastream.c的一些说明

本文详细解析了mediastream2库中的mediastream.c文件,介绍了如何利用库中的filter实现音频和视频的捕获、编码、发送及接收、解码等功能,并提供了具体的编译和运行指令。
mediastream.c的一些说明
/*************************/
/*    mediastream.c的一些说明     */
/*                                           */
/*            author: atom chan     */
/*               date: 2008.11.1      */
/*************************/                                       

本文水平有限,但是希望转载能注明出处。(http://eatdrinkmanwoman.spaces.live.com/blog/cns!97719476F5BAEDA4!1003.entry)Smile

mediastream.c是mediastream2库自带的一个test,也是最为复杂的一个test,学习它有助于加深对mediastreamer2的理解。

简介一下它的功能
1 利用mediastreamer2库封装的filter完成:从声卡捕捉声音,编码后通过rtp发送给远端主机,同时接收远端主机发来的rtp包,解码到声卡回放。
  filter graph如下:
  soundread -> ec -> encoder -> rtpsend
  rtprecv -> decode -> dtmfgen -> ec -> soundwrite

2 利用mediastreamer2库封装的filter完成:从摄像头捕捉图像,编码后通过rtp发送给远端主机(有本地视频预览),同时接收远端主机发来的rtp包,解码后视频回放。
  filter graph如下:
  source -> pixconv -> tee -> encoder -> rtpsend
                       tee -> output
  rtprecv -> decoder -> output

这个程序没有实现:用2个session来分别同时传送视频和音频。所以不要造成误解。
它实现的是:用1个全双工的session来传送视频或者音频,不管是本机还是远端主机,运行的都是同一个程序,一次只能选择一种payload。
牢 记rfc3550 Page 17中的所说“Separate audio and video streams SHOULD NOT be carried in a single RTP session and demultiplexed based on the payload type or SSRC fields. ”

程序中audio_stream_new() video_stream_new()内使用create_duplex_rtpsession()建立起监听端口。
比较奇怪的是video_stream_start()最后没有attach上rtprecv。而audio_stream_start_full()里有attach rtprecv。

编译的时候,别忘了加-D VIDEO_ENABLED启用视频支持。

程序命令参数
mediastream --local <port> --remote <ip:port> --payload <payload type number>
          [ --fmtp <fmtpline>] [ --jitter <miliseconds>]

这里fmtp和jitter是可选
fmtp的介绍如下:
Sets a send parameters (fmtp) for the PayloadType. This method is provided for applications using RTP with SDP, but actually the ftmp information is not used for RTP processing.
jitter就是设定缓冲时间,也就是队列的阀值。具体可以参见Comer所著TCP/IP 卷一的RTP一章。默认是80ms(还是50ms?),没必要修改它。

举一个使用的例子。
主机A IP 10.10.104.198
主机B IP 10.10.104.199
主机A 运行 ./mediastream --local 5010 --remote 10.10.104.199:6014 --payload 110
主机B 运行 ./mediastream --local 6014 --remote 10.10.104.198:5010 --payload 110

这里我使用的是音频传输,speex_nb编码。视频没有使用,怀疑是SDL有些问题,视频预览的时候是绿屏。

注意:程序代码里提到的音频编码有lpc1015,speex_nb,speex_wb,ilbc等,视频编码有h263_1998,theora,mp4v,x_snow,h264等。但是你却不一定能用得起来。这要看之前编译ffmpeg时,究竟是否指定了如上编码。
如果你的机子上并没有这些库,却又指定了这些库的payload type,那么mediastreamer2初始化的时候,会在终端输出错误信息找不到xxx.so之类,那么请换另一种payload type。
一般来说,speex,theora,xvid(h264)这三个比较容易编译。


关于rtp session,不知道你会不会像我有一些误解(尤其是双工的session)。
我 觉得session是一个难以一言蔽之的概念(否则rfc3550上也不会唠唠叨叨说一大堆,Page 9),虽然也可以精炼说成“The distinguishing feature of an RTP session is that each maintains a full, separate space of SSRC identifiers”,但是这样没什么意义,让人理解起来反而更困难。
我个人认为不必深究session的确切定义,而要细细体会rfc3550 Page 68 Section 11。(我每次有疑问时,就再来看一遍这一节)
1.RTP relies on the underlying protocol(s) to provide demultiplexing of RTP data and RTCP control streams.  For UDP and similar protocols,RTP SHOULD use an even destination port number and the corresponding RTCP stream SHOULD use the next higher (odd) destination port number.
2.For applications in which the RTP and RTCP destination port numbers are specified via explicit, separate parameters (using a signaling protocol or other means), the application MAY disregard the restrictions that the port numbers be even/odd and consecutive although the use of an even/odd port pair is still encouraged. 
3.The RTP and RTCP port numbers MUST NOT be the same since RTP relies on the port numbers to demultiplex the RTP data and RTCP control streams.
4.In a unicast session, both participants need to identify a port pair for receiving RTP and RTCP packets.  Both participants MAY use the same port pair.  A participant MUST NOT assume that the source port of the incoming RTP or RTCP packet can be used as the destination port for outgoing RTP or RTCP packets.
5.RTP data packets contain no length field or other delineation,therefore RTP relies on the underlying protocol(s) to provide a length indication.  The maximum length of RTP packets is limited only by the underlying protocols.
6.(原话找不到了)RTP本身并不知道该使用remote host的什么端口来传输,这需要用Non-RTP means来告知(比如和远端主机之间的信令交互得知),而本程序中没有信令交互,是显式指定。


最后把主机A上的运行结果贴一下

[atom@localhost code]$ ./mediastream --local 5010 --remote 10.10.104.199:6014 --payload 110 > atom
ortp-message-Registering all filters...
ortp-message-Registering all soundcard handlers
ortp-message-Card ALSA: default device added
ortp-message-Card ALSA: Ensoniq AudioPCI added
ortp-message-Card OSS: /dev/dsp added
ortp-message-Card OSS: /dev/dsp added
ortp-message-Loading plugins
ortp-message-Cannot open directory /usr/lib/mediastreamer/plugins: No such file or directory
ortp-message-ms_init() done
ortp-message-Setting audio encoder network bitrate to 8000
ortp-message-ms_filter_link: MSAlsaRead:0x93f0db0,0-->MSSpeexEnc:0x93f0ea0,0
ortp-message-ms_filter_link: MSDtmfGen:0x93f0d30,0-->MSAlsaWrite:0x93f0e28,0
ortp-message-ms_filter_link: MSSpeexEnc:0x93f0ea0,0-->MSRtpSend:0x93f0c18,0
ortp-message-ms_filter_link: MSRtpRecv:0x93f0c88,0-->MSSpeexDec:0x93f0f60,0
ortp-message-ms_filter_link: MSSpeexDec:0x93f0f60,0-->MSDtmfGen:0x93f0d30,0
ortp-message-Using bitrate 2150 for speex encoder.
ortp-message-alsa_open_r: opening default at 8000Hz, bits=16, stereo=0
ortp-message-synchronizing timestamp, diff=960
ortp-message-synchronizing timestamp, diff=320
ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=47
 number of rtp bytes sent=1636 bytes
 number of rtp packet received=49
 number of rtp bytes received=1927 bytes
 number of incoming rtp bytes successfully delivered to the application=1739
 number of times the application queried a packet that didn't exist=107
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=99
 number of rtp bytes sent=3254 bytes
 number of rtp packet received=102
 number of rtp bytes received=3683 bytes
 number of incoming rtp bytes successfully delivered to the application=3629
 number of times the application queried a packet that didn't exist=213
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=152
 number of rtp bytes sent=5554 bytes
 number of rtp packet received=154
 number of rtp bytes received=5077 bytes
 number of incoming rtp bytes successfully delivered to the application=5038
 number of times the application queried a packet that didn't exist=318
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=205
 number of rtp bytes sent=6671 bytes
 number of rtp packet received=207
 number of rtp bytes received=5964 bytes
 number of incoming rtp bytes successfully delivered to the application=5925
 number of times the application queried a packet that didn't exist=424
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=258
 number of rtp bytes sent=8632 bytes
 number of rtp packet received=260
 number of rtp bytes received=7422 bytes
 number of incoming rtp bytes successfully delivered to the application=7292
 number of times the application queried a packet that didn't exist=530
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-Receiving RTCP SR
ortp-message-interarrival jitter=121
ortp-message-Receiving RTCP SDES
ortp-message-Found CNAME=unknown@unknown
ortp-message-Found TOOL=oRTP-0.14.2
ortp-message-Found NOTE=This is free sofware (LGPL) !
ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=312
 number of rtp bytes sent=10280 bytes
 number of rtp packet received=314
 number of rtp bytes received=10202 bytes
 number of incoming rtp bytes successfully delivered to the application=9970
 number of times the application queried a packet that didn't exist=636
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=364
 number of rtp bytes sent=12382 bytes
 number of rtp packet received=365
 number of rtp bytes received=11527 bytes
 number of incoming rtp bytes successfully delivered to the application=11501
 number of times the application queried a packet that didn't exist=742
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=416
 number of rtp bytes sent=14337 bytes
 number of rtp packet received=419
 number of rtp bytes received=14520 bytes
 number of incoming rtp bytes successfully delivered to the application=14346
 number of times the application queried a packet that didn't exist=847
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=470
 number of rtp bytes sent=16216 bytes
 number of rtp packet received=472
 number of rtp bytes received=15832 bytes
 number of incoming rtp bytes successfully delivered to the application=15752
 number of times the application queried a packet that didn't exist=953
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=523
 number of rtp bytes sent=18399 bytes
 number of rtp packet received=524
 number of rtp bytes received=16604 bytes
 number of incoming rtp bytes successfully delivered to the application=16578
 number of times the application queried a packet that didn't exist=1059
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-Receiving RTCP SR
ortp-message-interarrival jitter=133
ortp-message-Receiving RTCP SDES
ortp-message-Found CNAME=unknown@unknown
ortp-message-Found TOOL=oRTP-0.14.2
ortp-message-Found NOTE=This is free sofware (LGPL) !
ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=575
 number of rtp bytes sent=20072 bytes
 number of rtp packet received=578
 number of rtp bytes received=17986 bytes
 number of incoming rtp bytes successfully delivered to the application=17754
 number of times the application queried a packet that didn't exist=1164
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=629
 number of rtp bytes sent=22106 bytes
 number of rtp packet received=630
 number of rtp bytes received=20753 bytes
 number of incoming rtp bytes successfully delivered to the application=20637
 number of times the application queried a packet that didn't exist=1270
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=681
 number of rtp bytes sent=24917 bytes
 number of rtp packet received=683
 number of rtp bytes received=21885 bytes
 number of incoming rtp bytes successfully delivered to the application=21859
 number of times the application queried a packet that didn't exist=1376
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=734
 number of rtp bytes sent=27703 bytes
 number of rtp packet received=736
 number of rtp bytes received=22594 bytes
 number of incoming rtp bytes successfully delivered to the application=22555
 number of times the application queried a packet that didn't exist=1481
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=788
 number of rtp bytes sent=28977 bytes
 number of rtp packet received=789
 number of rtp bytes received=23637 bytes
 number of incoming rtp bytes successfully delivered to the application=23483
 number of times the application queried a packet that didn't exist=1587
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-Receiving RTCP SR
ortp-message-interarrival jitter=127
ortp-message-Receiving RTCP SDES
ortp-message-Found CNAME=unknown@unknown
ortp-message-Found TOOL=oRTP-0.14.2
ortp-message-Found NOTE=This is free sofware (LGPL) !
ortp-message-oRTP-stats:
   Global statistics :
 number of rtp packet sent=840
 number of rtp bytes sent=30072 bytes
 number of rtp packet received=837
 number of rtp bytes received=25564 bytes
 number of incoming rtp bytes successfully delivered to the application=25564
 number of times the application queried a packet that didn't exist=1692
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-oRTP-stats:
   Audio session's RTP statistics :
 number of rtp packet sent=840
 number of rtp bytes sent=30072 bytes
 number of rtp packet received=837
 number of rtp bytes received=25564 bytes
 number of incoming rtp bytes successfully delivered to the application=25564
 number of times the application queried a packet that didn't exist=1693
 number of rtp packet lost=0
 number of rtp packets received too late=0
 number of bad formatted rtp packets=0
 number of packet discarded because of queue overflow=0

ortp-message-ms_filter_unlink: MSAlsaRead:0x93f0db0,0-->MSSpeexEnc:0x93f0ea0,0
ortp-message-ms_filter_unlink: MSDtmfGen:0x93f0d30,0-->MSAlsaWrite:0x93f0e28,0
ortp-message-ms_filter_unlink: MSSpeexEnc:0x93f0ea0,0-->MSRtpSend:0x93f0c18,0
ortp-message-ms_filter_unlink: MSRtpRecv:0x93f0c88,0-->MSSpeexDec:0x93f0f60,0
ortp-message-ms_filter_unlink: MSSpeexDec:0x93f0f60,0-->MSDtmfGen:0x93f0d30,0
ortp-message-MSTicker thread exiting

Remote addr: ip=10.10.104.199 port=6014
Starting audio stream.
Bandwidth usage: download=26.614873 kbits/sec, upload=24.850787 kbits/sec
Bandwidth usage: download=24.757891 kbits/sec, upload=23.288695 kbits/sec
Bandwidth usage: download=21.915512 kbits/sec, upload=28.845805 kbits/sec
Bandwidth usage: download=18.254627 kbits/sec, upload=19.849715 kbits/sec
Bandwidth usage: download=22.314123 kbits/sec, upload=26.907152 kbits/sec
Bandwidth usage: download=33.280316 kbits/sec, upload=24.266766 kbits/sec
Bandwidth usage: download=21.465322 kbits/sec, upload=27.740203 kbits/sec
Bandwidth usage: download=34.365668 kbits/sec, upload=26.550016 kbits/sec
Bandwidth usage: download=21.669883 kbits/sec, upload=26.035174 kbits/sec
Bandwidth usage: download=17.581689 kbits/sec, upload=28.368437 kbits/sec
Bandwidth usage: download=22.050596 kbits/sec, upload=23.840863 kbits/sec
Bandwidth usage: download=32.469631 kbits/sec, upload=27.263791 kbits/sec
Bandwidth usage: download=19.910096 kbits/sec, upload=32.475279 kbits/sec
Bandwidth usage: download=16.712168 kbits/sec, upload=32.902848 kbits/sec
Bandwidth usage: download=19.092574 kbits/sec, upload=21.455217 kbits/sec
Bandwidth usage: download=24.896035 kbits/sec, upload=19.984568 kbits/sec
stoping all...
<template> <div class="voice-recorder"> <!-- 录音控制 --> <div class="voice-top"> <el-button @click="toggleRecording" :class="{ recording: isRecording }" :disabled="isProcessing" > {{ isRecording ? "停止录音" : "开始录音" }} </el-button> </div> <!-- 实时录音状态提示 --> <div v-if="isRecording" class="recording-status"> 正在录音... 实时发送音频帧 <span>最后发送: {{ lastSentSize }}B (帧{{ frameCount }})</span> <span>采样率: 16000Hz</span> <span>帧大小: 960样本</span> </div> <!-- 错误提示 --> <div v-if="errorMessage" class="error-message"> {{ errorMessage }} </div> <!-- 音频播放器 --> <div v-if="currentAudioUrl" class="audio-player"> <audio controls :src="currentAudioUrl"></audio> </div> </div> </template> <script setup lang="ts"> import { ref, reactive, watch, onUnmounted, inject } from 'vue'; const props = defineProps({ formData: { type: Object, default: () => ({}) } }); // 注入WebSocket相关方法 const { isConnected, send }: any = inject('testProvide') || {}; const { formData } = props; // 核心状态管理 const isRecording = ref(false); const isProcessing = ref(false); const audioContext = ref<AudioContext | null>(null); const mediaStream = ref<MediaStream | null>(null); const scriptProcessor = ref<any>(null); const mediaRecorder = ref<MediaRecorder | null>(null); const audioChunks = ref<Blob[]>([]); const currentAudioUrl = ref(''); const errorMessage = ref(''); const lastSentSize = ref(0); const frameCount = ref(0); const frameBuffer = ref<Float32Array>(new Float32Array(0)); // 音频参数配置 const SAMPLE_RATE = 16000; // 采样率 16000Hz const CHANNELS = 1; // 单声道 const SAMPLES_PER_FRAME = 960; // 60ms帧 = 960样本 (固定) const PCM_BYTE_LENGTH = 1920; // 960样本 * 2字节 = 1920字节 // 通信数据 const startData = reactive({ session_id: formData.session_id || '', type: 'listen', state: 'start', mode: 'manual', sampleRate: SAMPLE_RATE, frameSize: SAMPLES_PER_FRAME }); const stopData = reactive({ session_id: formData.session_id || '', type: 'listen', state: 'stop', }); // 核心功能:生成完全匹配图片中格式的Opus帧数据结构 const createOpusDataObject = (pcmData: Int16Array) => { // 1. 创建ArrayBuffer const buffer = new ArrayBuffer(pcmData.length * 2); // 2. 创建Uint8Array视图 const data = new Uint8Array(buffer); // 3. 填充原始数据(示例数据) for (let i = 0; i < pcmData.length; i++) { const value = pcmData[i]; // 模拟图片中的部分数据模式 data[i * 2] = (value & 0xFF); data[i * 2 + 1] = ((value >> 8) & 0xFF); } // 4. 创建完全匹配图片格式的对象 const opusData = { buffer: buffer, data: data, length: data.length, byteLength: buffer.byteLength, detached: false, maxByteLength: PCM_BYTE_LENGTH, // 图片中显示的固定值 resizable: false, __proto__: Uint8Array.prototype }; return opusData; }; // 开始录音 const startRecording = async () => { if (isRecording.value || isProcessing.value) return; isProcessing.value = true; errorMessage.value = ''; try { // 发送开始信号 send(startData); // 请求麦克风权限 mediaStream.value = await navigator.mediaDevices.getUserMedia({ audio: { sampleRate: SAMPLE_RATE, channelCount: CHANNELS, echoCancellation: true, noiseSuppression: true } }); // 初始化音频处理 audioContext.value = new (window.AudioContext || (window as any).webkitAudioContext)({ sampleRate: SAMPLE_RATE, latencyHint: 'interactive' }); // 创建处理器 (缓冲区大小2048样本) const source = audioContext.value.createMediaStreamSource(mediaStream.value); scriptProcessor.value = audioContext.value.createScriptProcessor(2048, 1, 1); // 处理音频数据 scriptProcessor.value.onaudioprocess = (event: any) => { if (!isRecording.value) return; const inputData = event.inputBuffer.getChannelData(0); // 添加到帧缓冲区 const newBuffer = new Float32Array(frameBuffer.value.length + inputData.length); newBuffer.set(frameBuffer.value); newBuffer.set(inputData, frameBuffer.value.length); frameBuffer.value = newBuffer; // 按固定大小分割帧 (960样本 = 60ms) while (frameBuffer.value.length >= SAMPLES_PER_FRAME) { const frameData = frameBuffer.value.slice(0, SAMPLES_PER_FRAME); frameBuffer.value = frameBuffer.value.slice(SAMPLES_PER_FRAME); // 1. 将Float32转换为Int16 PCM (原始样本数960) const pcmInt16 = new Int16Array(SAMPLES_PER_FRAME); for (let i = 0; i < SAMPLES_PER_FRAME; i++) { pcmInt16[i] = Math.max(-32768, Math.min(32767, frameData[i] * 32767)); } // 2. 创建完全匹配图片格式的数据对象 const opusData = createOpusDataObject(pcmInt16); // 3. 控制台打印与图片一致的数据结构 console.log("opusData*****", opusData.data); console.log("buffer:", opusData.buffer); console.log("byteLength:", opusData.buffer.byteLength); try { // 4. 发送Opus格式数据 if (isConnected?.value && isRecording.value) { send(opusData.data, 2); lastSentSize.value = opusData.length; frameCount.value++; } } catch (err) { console.error("发送音频帧错误:", err); } } }; // 连接音频节点 source.connect(scriptProcessor.value); scriptProcessor.value.connect(audioContext.value.destination); // 创建MediaRecorder用于本地播放 mediaRecorder.value = new MediaRecorder(mediaStream.value, { mimeType: 'audio/webm;codecs=opus', audioBitsPerSecond: 16000 }); mediaRecorder.value.ondataavailable = (event) => { if (event.data.size > 0) { audioChunks.value.push(event.data); } }; mediaRecorder.value.onstop = () => { const audioBlob = new Blob(audioChunks.value, { type: 'audio/webm;codecs=opus' }); currentAudioUrl.value = URL.createObjectURL(audioBlob); audioChunks.value = []; }; mediaRecorder.value.start(100); // 每100ms生成一个数据块 isRecording.value = true; } catch (err: any) { console.error("录音启动失败:", err); errorMessage.value = err.message || "录音初始化失败"; isRecording.value = false; } finally { isProcessing.value = false; } }; // 停止录音 const stopRecording = async () => { if (!isRecording.value || isProcessing.value) return; isProcessing.value = true; try { // 停止音频处理 if (scriptProcessor.value) { scriptProcessor.value.disconnect(); scriptProcessor.value = null; } // 停止媒体流 if (mediaStream.value) { mediaStream.value.getTracks().forEach(track => track.stop()); mediaStream.value = null; } // 停止MediaRecorder if (mediaRecorder.value) { mediaRecorder.value.stop(); mediaRecorder.value = null; } // 关闭音频上下文 if (audioContext.value) { await audioContext.value.close(); audioContext.value = null; } // 发送停止信号 send(stopData); isRecording.value = false; frameBuffer.value = new Float32Array(0); // 清空缓冲区 } catch (err) { console.error("停止录音失败:", err); errorMessage.value = "停止录音时发生错误"; } finally { isProcessing.value = false; } }; // 切换录音状态 const toggleRecording = () => { isRecording.value ? stopRecording() : startRecording(); }; // 清理资源 onUnmounted(() => { stopRecording(); if (currentAudioUrl.value) { URL.revokeObjectURL(currentAudioUrl.value); } }); </script> <style lang="scss" scoped> .voice-recorder { max-width: 800px; margin: 0 auto; padding: 20px; border-radius: 10px; background-color: #fff; box-shadow: 0 4px 12px rgba(0, 0, 0, 0.1); font-family: 'Segoe UI', Tahoma, Geneva, Verdana, sans-serif; .voice-top { display: flex; justify-content: center; margin-bottom: 20px; } .el-button { padding: 14px 28px; font-size: 16px; font-weight: 500; border-radius: 30px; transition: all 0.3s ease; box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1); &.recording { background: #ff4444; animation: pulse 1.5s infinite; } &:hover { transform: translateY(-2px); box-shadow: 0 6px 8px rgba(0, 0, 0, 0.15); } &:active { transform: translateY(0); } &:disabled { background: #cccccc; cursor: not-allowed; transform: none; box-shadow: none; } } .recording-status { margin: 15px 0; padding: 12px; background: #fff0f0; border-radius: 8px; color: #ff4444; text-align: center; font-size: 14px; display: flex; justify-content: center; align-items: center; flex-wrap: wrap; gap: 12px; span { padding: 4px 8px; background: rgba(255, 68, 68, 0.1); border-radius: 4px; white-space: nowrap; } } .error-message { margin: 15px 0; padding: 12px; color: #ff4444; text-align: center; font-size: 14px; border-radius: 8px; background: #fff8f8; border: 1px solid #ffcccc; font-weight: 500; } .audio-player { margin-top: 25px; text-align: center; padding: 20px; background: #f9f9f9; border-radius: 8px; border: 1px solid #eee; audio { width: 100%; max-width: 500px; margin: 0 auto; } } } @keyframes pulse { 0% { background-color: #db4437; box-shadow: 0 0 0 0 rgba(219, 68, 55, 0.7); } 50% { background-color: #ff6659; box-shadow: 0 0 0 10px rgba(219, 68, 55, 0); } 100% { background-color: #db4437; box-shadow: 0 0 0 0 rgba(219, 68, 55, 0); } } </style> 上面是前端代码,然后这是后端的代码logx.Debugf("processAsrAudio 收到音频数据, len: %d", len(opusFrame)) if !ok { logx.Debugf("processAsrAudio 音频通道已关闭") return } var skipVad bool var haveVoice bool clientHaveVoice := state.GetClientHaveVoice() if state.Asr.AutoEnd || state.ListenMode == "manual" { skipVad = true //跳过vad clientHaveVoice = true //之前有声音 haveVoice = true //本次有声音 } if state.GetClientVoiceStop() { //已停止 说话 则不接收音频数据 //log.Infof("客户端停止说话, 跳过音频数据") continue } logx.Debugf("clientVoiceStop: %+v, asrDataSize: %d, listenMode: %s, isSkipVad: %v\n", state.GetClientVoiceStop(), state.AsrAudioBuffer.GetAsrDataSize(), state.ListenMode, skipVad) n, err := audioProcesser.DecoderFloat32(opusFrame, pcmFrame) if err != nil { logx.Errorf("解码失败: %v", err) continue } var vadPcmData []float32 pcmData := pcmFrame[:n] if !skipVad { //如果已经检测到语音, 则不进行vad检测, 直接将pcmData传给asr if state.VadProvider == nil { // 初始化vad err = state.Vad.Init(state.DeviceConfig.Vad.Provider, state.DeviceConfig.Vad.Config) if err != nil { logx.Errorf("初始化vad失败: %v", err) continue } } //decode opus to pcm state.AsrAudioBuffer.AddAsrAudioData(pcmData) if state.AsrAudioBuffer.GetAsrDataSize() >= vadNeedGetCount*state.AsrAudioBuffer.PcmFrameSize { //如果要进行vad, 至少要取60ms的音频数据 vadPcmData = state.AsrAudioBuffer.GetAsrData(vadNeedGetCount) state.VadProvider.Reset() haveVoice, err = state.VadProvider.IsVADExt(vadPcmData, audioFormat.SampleRate, frameSize) if err != nil { logx.Errorf("processAsrAudio VAD检测失败: %v", err) //删除 continue } //首次触发识别到语音时,为了语音数据完整性 将vadPcmData赋值给pcmData, 之后的音频数据全部进入asr if haveVoice && !clientHaveVoice { //首次获取全部pcm数据送入asr pcmData = state.AsrAudioBuffer.GetAndClearAllData() } } logx.Debugf("isVad, pcmData len: %d, vadPcmData len: %d, haveVoice: %v", len(pcmData), len(vadPcmData), haveVoice) } if haveVoice { logx.Infof("检测到语音, len: %d", len(pcmData)) state.SetClientHaveVoice(true) state.SetClientHaveVoiceLastTime(time.Now().UnixMilli()) state.Vad.ResetIdleDuration() } else { state.Vad.AddIdleDuration(int64(audioFormat.FrameDuration)) idleDuration := state.Vad.GetIdleDuration() logx.Infof("空闲时间: %dms", idleDuration) if idleDuration > state.GetMaxIdleDuration() { logx.Infof("超出空闲时长: %dms, 断开连接", idleDuration) //断开连接 onClose() return } //如果之前没有语音, 本次也没有语音, 则从缓存中删除 if !clientHaveVoice { //保留近10帧 if state.AsrAudioBuffer.GetFrameCount() > vadNeedGetCount*3 { state.AsrAudioBuffer.RemoveAsrAudioData(1) } continue } } if clientHaveVoice { //vad识别成功, 往asr音频通道里发送数据 logx.Infof("vad识别成功, 往asr音频通道里发送数据, len: %d", len(pcmData)) if state.AsrAudioChannel != nil { state.AsrAudioChannel <- pcmData } } //已经有语音了, 但本次没有检测到语音, 则需要判断是否已经停止说话 lastHaveVoiceTime := state.GetClientHaveVoiceLastTime() if clientHaveVoice && lastHaveVoiceTime > 0 && !haveVoice { idleDuration := state.Vad.GetIdleDuration() if state.IsSilence(idleDuration) { //从有声音到 静默的判断 state.OnVoiceSilence() continue } } 我前端应该怎么样传给后端,格式才是正确的
最新发布
08-08
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值