Audio Warmup

本文介绍了如何测量设备上音频放大电路的唤醒时间,并提供了一些减少唤醒时间的方法。包括使用AudioFlinger的FastMixer线程自动测量输出唤醒时间、估计输入唤醒时间,以及通过电路设计、内核设备驱动中的准确延迟等方式来减少唤醒时间。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Audio Warmup

Audio warmup is the time it takes for the audio amplifier circuit in your device to be fully powered and reach its normal operation state. The major contributors to audio warmup time are power management and any "de-pop" logic to stabilize the circuit.

This document describes how to measure audio warmup time and possible ways to decrease warmup time.

Measuring Output Warmup


AudioFlinger's FastMixer thread automatically measures output warmup and reports it as part of the output of the dumpsys media.audio_flinger command. At warmup, FastMixer calls write() repeatedly until the time between two write()s is the amount expected. FastMixer determines audio warmup by seeing how long a Hardware Abstraction Layer (HAL) write() takes to stabilize.

To measure audio warmup, follow these steps for the built-in speaker and wired headphones and at different times after booting. Warmup times are usually different for each output device and right after booting the device:

  1. Ensure that FastMixer is enabled.
  2. Enable touch sounds by selecting Settings > Sound > Touch sounds on the device.
  3. Ensure that audio has been off for at least three seconds. Five seconds or more is better, because the hardware itself might have its own power logic beyond the three seconds that AudioFlinger has.
  4. Press Home, and you should hear a click sound.
  5. Run the following command to receive the measured warmup:
    adb shell dumpsys media.audio_flinger | grep measuredWarmup

    You should see output like this:

    sampleRate=44100 frameCount=256 measuredWarmup=X ms, warmupCycles=X

    The measuredWarmup=X is X number of milliseconds it took for the first set of HAL write()s to complete.

    The warmupCycles=X is how many HAL write requests it took until the execution time of write() matches what is expected.

  6. Take five measurements and record them all, as well as the mean. If they are not all approximately the same, then it's likely that a measurement is incorrect. For example, if you don't wait long enough after the audio has been off, you will see a lower warmup time than the mean value.

Measuring Input Warmup


There are currently no tools provided for measuring audio input warmup. However, input warmup time can be estimated by observing the time required for startRecording() to return.

Reducing Warmup Time


Warmup time can usually be reduced by a combination of:

  • Good circuit design
  • Accurate time delays in kernel device driver
  • Performing independent warmup operations concurrently rather than sequentially
  • Leaving circuits powered on or not reconfiguring clocks (increases idle power consumption)
  • Caching computed parameters

However, beware of excessive optimization. You may find that you need to tradeoff between low warmup time versus lack of popping at power transitions.

有下述代码,运行时报错: (funasr) D:\FunASR\test2>python test02.py Notice: ffmpeg is not installed. torchaudio is used to load audio If you want to use ffmpeg backend to load audio, please install it by: sudo apt install ffmpeg # ubuntu # brew install ffmpeg # mac funasr version: 1.2.6. Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel You are using the latest version of funasr-1.2.6 Downloading Model from https://www.modelscope.cn to directory: C:\Users\HZH\.cache\modelscope\hub\models\iic\speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch 2025-07-15 10:03:43,036 - modelscope - INFO - Use user-specified model revision: v2.0.4 Downloading Model from https://www.modelscope.cn to directory: C:\Users\HZH\.cache\modelscope\hub\models\iic\speech_fsmn_vad_zh-cn-16k-common-pytorch Downloading Model from https://www.modelscope.cn to directory: C:\Users\HZH\.cache\modelscope\hub\models\iic\punc_ct-transformer_zh-cn-common-vocab272727-pytorch funasr version: 1.2.6. Check update of funasr, and it would cost few times. You may disable it by set `disable_update=True` in AutoModel You are using the latest version of funasr-1.2.6 Downloading Model from https://www.modelscope.cn to directory: C:\Users\HZH\.cache\modelscope\hub\models\iic\speech_campplus_sv_zh_en_16k-common_advanced Download: iic/speech_campplus_sv_zh_en_16k-common_advanced failed!: The model: iic/speech_campplus_sv_zh_en_16k-common_advanced has no revision: v2.0.2 valid are: [v1.0.0]! Traceback (most recent call last): File "D:\FunASR\test2\test02.py", line 233, in <module> diarization_system = RealTimeSpeakerDiarization(warmup_duration=10) File "D:\FunASR\test2\test02.py", line 34, in __init__ self._load_models() File "D:\FunASR\test2\test02.py", line 47, in _load_models self.sv_model = AutoModel( File "D:\InstallationPath\1-CodeSoftwarePath\19-PyCharm\Install\anaconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 125, in __init__ model, kwargs = self.build_model(**kwargs) File "D:\InstallationPath\1-CodeSoftwarePath\19-PyCharm\Install\anaconda3\envs\funasr\lib\site-packages\funasr\auto\auto_model.py", line 261, in build_model assert model_class is not None, f'{kwargs["model"]} is not registered' AssertionError: iic/speech_campplus_sv_zh_en_16k-common_advanced is not registered 代码1:test02.py import numpy as np import sounddevice as sd import queue import threading import time from funasr import AutoModel from sklearn.cluster import AgglomerativeClustering from scipy.spatial.distance import cosine import logging # 配置日志 logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') class RealTimeSpeakerDiarization: def __init__(self, warmup_duration=10): """ 初始化实时说话人分离系统 :param warmup_duration: 声纹预热时长(秒) """ self.warmup_duration = warmup_duration self.speaker_profiles = {} # 存储说话人声纹特征 self.next_speaker_id = 0 self.audio_queue = queue.Queue() self.result_queue = queue.Queue() self.is_warmup_complete = False # 配置音频参数 self.SAMPLE_RATE = 16000 self.CHUNK_DURATION = 1.0 # 处理块时长(秒) self.CHUNK_SIZE = int(self.SAMPLE_RATE * self.CHUNK_DURATION) self.SPK_THRESHOLD = 0.85 # 声纹相似度阈值 # 加载模型 self._load_models() def _load_models(self): """加载FunASR模型""" logging.info("正在加载语音识别模型...") self.asr_model = AutoModel( model="iic/speech_paraformer-large_asr_nat-zh-cn-16k-common-vocab8404-pytorch", vad_model="iic/speech_fsmn_vad_zh-cn-16k-common-pytorch", punc_model="iic/punc_ct-transformer_zh-cn-common-vocab272727-pytorch", model_revision="v2.0.4" ) logging.info("正在加载声纹识别模型...") self.sv_model = AutoModel( model="iic/speech_campplus_sv_zh_en_16k-common_advanced", model_revision="v2.0.2" ) logging.info("模型加载完成") def _extract_speaker_embedding(self, audio_data): """ 提取声纹嵌入向量 :param audio_data: 音频数据(numpy数组) :return: 声纹嵌入向量 """ result = self.sv_model.generate(input=audio_data) return result[0]["spk_embedding"] def _cluster_speakers(self, embeddings): """ 聚类说话人声纹 :param embeddings: 声纹嵌入向量列表 :return: 聚类标签 """ if len(embeddings) <= 1: return [0] * len(embeddings) # 使用层次聚类算法 clustering = AgglomerativeClustering( n_clusters=None, distance_threshold=1.0 - self.SPK_THRESHOLD, metric='cosine', linkage='average' ) return clustering.fit_predict(embeddings) def _assign_speaker_id(self, embedding): """ 为声纹分配说话人ID :param embedding: 声纹嵌入向量 :return: 说话人ID """ # 首次处理时初始化 if not self.speaker_profiles: self.speaker_profiles[self.next_speaker_id] = embedding self.next_speaker_id += 1 return 0 # 在现有说话人中查找匹配 best_match_id = None best_similarity = -1 for spk_id, profile in self.speaker_profiles.items(): similarity = 1 - cosine(embedding, profile) if similarity > self.SPK_THRESHOLD and similarity > best_similarity: best_similarity = similarity best_match_id = spk_id # 找到匹配的说话人 if best_match_id is not None: # 更新说话人声纹(移动平均) self.speaker_profiles[best_match_id] = 0.7 * self.speaker_profiles[best_match_id] + 0.3 * embedding return best_match_id # 创建新说话人 new_id = self.next_speaker_id self.speaker_profiles[new_id] = embedding self.next_speaker_id += 1 return new_id def _audio_callback(self, indata, frames, time_info, status): """音频输入回调函数""" if status: logging.warning(f"音频输入错误: {status}") self.audio_queue.put(indata.copy()) def _process_audio_chunk(self, audio_chunk): """ 处理音频块 :param audio_chunk: 音频数据块 """ # 语音识别 asr_result = self.asr_model.generate(input=audio_chunk)[0] # 提取声纹 sv_embedding = self._extract_speaker_embedding(audio_chunk) # 分配说话人ID speaker_id = self._assign_speaker_id(sv_embedding) # 格式化结果 results = [] for segment in asr_result.get("sentences", []): if segment["end"] - segment["start"] < 0.3: # 过滤短语音 continue results.append({ "start": segment["start"], "end": segment["end"], "text": segment["text"], "speaker": f"SPK-{speaker_id}" }) return results def _warmup_phase(self): """声纹预热阶段""" logging.info(f"声纹预热中... (0/{self.warmup_duration}秒)") warmup_embeddings = [] start_time = time.time() while time.time() - start_time < self.warmup_duration: if not self.audio_queue.empty(): audio_chunk = self.audio_queue.get() # 提取声纹 try: sv_embedding = self._extract_speaker_embedding(audio_chunk) warmup_embeddings.append(sv_embedding) except Exception as e: logging.error(f"声纹提取失败: {str(e)}") # 显示进度 elapsed = int(time.time() - start_time) if elapsed < self.warmup_duration: logging.info(f"声纹预热中... ({elapsed}/{self.warmup_duration}秒)") # 聚类声纹 if warmup_embeddings: labels = self._cluster_speakers(warmup_embeddings) # 创建初始说话人档案 for i, label in enumerate(labels): if label not in self.speaker_profiles: self.speaker_profiles[label] = warmup_embeddings[i] self.next_speaker_id = max(self.speaker_profiles.keys()) + 1 self.is_warmup_complete = True logging.info(f"声纹预热完成! 识别到 {len(self.speaker_profiles)} 位说话人") def _processing_thread(self): """音频处理线程""" # 声纹预热阶段 self._warmup_phase() # 实时处理阶段 logging.info("开始实时说话人分离...") while True: if not self.audio_queue.empty(): audio_chunk = self.audio_queue.get() try: results = self._process_audio_chunk(audio_chunk) if results: self.result_queue.put(results) except Exception as e: logging.error(f"音频处理失败: {str(e)}") def start(self): """启动实时处理系统""" # 启动处理线程 processing_thread = threading.Thread(target=self._processing_thread, daemon=True) processing_thread.start() # 启动音频输入流 with sd.InputStream( samplerate=self.SAMPLE_RATE, channels=1, callback=self._audio_callback, blocksize=self.CHUNK_SIZE, dtype="float32" ): logging.info("麦克风已激活,请开始说话... (按Ctrl+C停止)") try: # 显示实时结果 while True: if not self.result_queue.empty(): results = self.result_queue.get() for res in results: print(f"[{res['speaker']}] {res['text']}") time.sleep(0.1) except KeyboardInterrupt: logging.info("正在停止系统...") logging.info("系统已停止") if __name__ == "__main__": # 创建并启动系统 diarization_system = RealTimeSpeakerDiarization(warmup_duration=10) diarization_system.start()
07-16
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值