【剪映小助手源码精讲】12_音频处理与混音系统

第12章:音频处理与混音系统

12.1 音频处理系统概述

音频处理与混音系统是视频编辑软件中至关重要的组成部分,它负责处理视频中的音频轨道、音效、背景音乐以及实时混音等功能。剪映小助手的音频系统采用分层架构设计,提供了从基础的音频播放到复杂的音频特效处理的全方位解决方案。

12.1.1 系统架构设计

音频处理系统的架构基于以下几个核心设计原则:

分层架构

  • 硬件抽象层:处理不同音频设备的兼容性
  • 音频引擎层:负责音频的解码、编码和处理
  • 效果处理层:实现各种音频特效和滤镜
  • 混音层:处理多轨道音频的混合和输出
  • 应用层:提供用户界面和API接口

模块化设计

  • 音频解码器模块:支持多种音频格式
  • 音频特效模块:提供丰富的音频处理效果
  • 混音器模块:处理多轨道混合
  • 音频分析模块:提供音频特征分析
  • 音频导出模块:支持多种输出格式

性能优化

  • 实时音频处理流水线
  • 多线程音频解码
  • 内存池管理减少GC压力
  • SIMD指令优化音频算法

12.1.2 核心功能特性

音频处理系统提供了全面的音频处理功能:

音频格式支持

  • 常见格式:MP3、AAC、WAV、FLAC、OGG
  • 无损格式:ALAC、APE、WAV
  • 专业格式:PCM、IEEE float
  • 多声道支持:立体声、5.1环绕声、7.1环绕声

音频处理效果

  • 基础效果:音量调节、淡入淡出、静音
  • 滤波效果:低通、高通、带通、带阻
  • 动态处理:压缩器、限制器、扩展器、噪声门
  • 时间处理:变速、变调、时间拉伸
  • 空间效果:混响、延迟、合唱、镶边

实时处理能力

  • 低延迟音频处理
  • 实时音频监控
  • 实时效果预览
  • 实时音频分析

混音功能

  • 多轨道混音
  • 自动化参数控制
  • 侧链压缩
  • 母带处理

12.2 音频基础架构

12.2.1 音频片段模型设计

音频片段是音频系统的基本单元,负责管理单个音频文件及其属性:

from typing import Dict, List, Optional, Tuple, Any, Union
from dataclasses import dataclass, field
from enum import Enum
import numpy as np
import time
import threading
from pathlib import Path

class AudioFormat(Enum):
    """音频格式枚举"""
    MP3 = "mp3"
    AAC = "aac"
    WAV = "wav"
    FLAC = "flac"
    OGG = "ogg"
    M4A = "m4a"
    PCM = "pcm"

class AudioChannelMode(Enum):
    """音频声道模式"""
    MONO = 1      # 单声道
    STEREO = 2    # 立体声
    SURROUND_51 = 6   # 5.1环绕声
    SURROUND_71 = 8   # 7.1环绕声

class AudioSampleRate(Enum):
    """音频采样率"""
    SR_8000 = 8000
    SR_16000 = 16000
    SR_22050 = 22050
    SR_44100 = 44100
    SR_48000 = 48000
    SR_96000 = 96000
    SR_192000 = 192000

class AudioBitDepth(Enum):
    """音频位深度"""
    BIT_8 = 8
    BIT_16 = 16
    BIT_24 = 24
    BIT_32 = 32
    BIT_64 = 64

@dataclass
class AudioMetadata:
    """音频元数据"""
    duration: float = 0.0
    sample_rate: int = 44100
    channels: int = 2
    bit_depth: int = 16
    bitrate: int = 0
    format: str = ""
    codec: str = ""
    file_size: int = 0
    
    # 音频特征
    loudness: float = 0.0  # LUFS
    peak_level: float = 0.0  # dB
    dynamic_range: float = 0.0  # dB
    
    # 元数据信息
    title: str = ""
    artist: str = ""
    album: str = ""
    genre: str = ""
    year: int = 0
    track_number: int = 0
    
    def __post_init__(self):
        """初始化后处理"""
        if self.duration < 0:
            self.duration = 0.0
        if self.sample_rate <= 0:
            self.sample_rate = 44100
        if self.channels <= 0:
            self.channels = 2
        if self.bit_depth <= 0:
            self.bit_depth = 16

@dataclass
class AudioSegment:
    """音频片段类"""
    
    # 基础属性
    id: str = ""
    file_path: str = ""
    name: str = ""
    start_time: float = 0.0  # 在时间轴上的开始时间
    duration: float = 0.0    # 持续时间
    original_duration: float = 0.0  # 原始文件持续时间
    
    # 音频属性
    volume: float = 1.0      # 音量 (0.0-2.0)
    pan: float = 0.0          # 声像 (-1.0左, 0.0中, 1.0右)
    pitch: float = 1.0        # 音调 (0.5-2.0)
    speed: float = 1.0        # 播放速度 (0.5-2.0)
    
    # 音频参数
    sample_rate: int = 44100
    channels: int = 2
    bit_depth: int = 16
    
    # 状态属性
    muted: bool = False
    solo: bool = False
    locked: bool = False
    visible: bool = True
    layer: int = 0
    
    # 效果链
    effects: List[str] = field(default_factory=list)
    effect_params: Dict[str, Any] = field(default_factory=dict)
    
    # 缓存和内部状态
    _audio_data: Optional[np.ndarray] = None
    _waveform_data: Optional[np.ndarray] = None
    _cached_samples: Optional[np.ndarray] = None
    _metadata: Optional[AudioMetadata] = None
    _last_access_time: float = 0.0
    _is_loaded: bool = False
    _load_lock: threading.Lock = field(default_factory=threading.Lock)
    
    def __post_init__(self):
        """初始化后处理"""
        if not self.id:
            self.id = f"audio_{int(time.time() * 1000)}"
        
        if not self.name and self.file_path:
            self.name = Path(self.file_path).stem
        
        self._validate_properties()
    
    def _validate_properties(self) -> None:
        """验证属性有效性"""
        # 验证数值范围
        self.volume = max(0.0, min(2.0, self.volume))
        self.pan = max(-1.0, min(1.0, self.pan))
        self.pitch = max(0.5, min(2.0, self.pitch))
        self.speed = max(0.5, min(2.0, self.speed))
        
        # 验证音频参数
        if self.sample_rate <= 0:
            self.sample_rate = 44100
        if self.channels <= 0:
            self.channels = 2
        if self.bit_depth <= 0:
            self.bit_depth = 16
        
        # 验证时间参数
        if self.duration < 0:
            self.duration = 0.0
        if self.start_time < 0:
            self.start_time = 0.0
    
    def load_audio_data(self, force_reload: bool = False) -> bool:
        """加载音频数据"""
        if self._is_loaded and not force_reload:
            return True
        
        with self._load_lock:
            try:
                # 这里应该使用实际的音频加载库,如librosa或soundfile
                # 暂时使用模拟数据
                if not self.file_path or not Path(self.file_path).exists():
                    return False
                
                # 模拟音频数据加载
                total_samples = int(self.duration * self.sample_rate)
                if total_samples > 0:
                    # 生成模拟音频数据(正弦波)
                    t = np.linspace(0, self.duration, total_samples)
                    frequency = 440.0  # A4音符
                    
                    if self.channels == 1:
                        self._audio_data = np.sin(2 * np.pi * frequency * t)
                    else:
                        # 立体声
                        left_channel = np.sin(2 * np.pi * frequency * t)
                        right_channel = np.sin(2 * np.pi * frequency * t * 1.01)  # 轻微失谐
                        self._audio_data = np.column_stack([left_channel, right_channel])
                    
                    # 应用音量
                    self._audio_data *= self.volume
                    
                    self._is_loaded = True
                    self._last_access_time = time.time()
                    
                    return True
                
            except Exception as e:
                print(f"加载音频数据失败: {e}")
                return False
        
        return False
    
    def unload_audio_data(self) -> None:
        """卸载音频数据"""
        with self._load_lock:
            self._audio_data = None
            self._cached_samples = None
            self._is_loaded = False
    
    def get_audio_data(self, start_time: float = 0.0, 
                      duration: Optional[float] = None) -> Optional[np.ndarray]:
        """获取音频数据片段"""
        if not self.load_audio_data():
            return None
        
        if self._audio_data is None:
            return None
        
        # 计算采样点范围
        start_sample = int(start_time * self.sample_rate)
        
        if duration is None:
            end_sample = len(self._audio_data)
        else:
            end_sample = int((start_time + duration) * self.sample_rate)
        
        # 确保范围有效
        start_sample = max(0, min(start_sample, len(self._audio_data)))
        end_sample = max(start_sample, min(end_sample, len(self._audio_data)))
        
        if start_sample >= end_sample:
            return np.array([])
        
        # 返回音频片段
        audio_slice = self._audio_data[start_sample:end_sample]
        
        # 应用音调变化(简化实现)
        if self.pitch != 1.0:
            audio_slice = self._apply_pitch_shift(audio_slice, self.pitch)
        
        # 应用速度变化(简化实现)
        if self.speed != 1.0:
            audio_slice = self._apply_speed_change(audio_slice, self.speed)
        
        # 应用声像
        if self.pan != 0.0 and self.channels == 2:
            audio_slice = self._apply_pan(audio_slice, self.pan)
        
        return audio_slice
    
    def _apply_pitch_shift(self, audio_data: np.ndarray, pitch_factor: float) -> np.ndarray:
        """应用音调变化(简化实现)"""
        # 这里应该使用专业的音调变换算法,如PSOLA或Phase Vocoder
        # 暂时使用简单的重采样来模拟
        if pitch_factor == 1.0:
            return audio_data
        
        try:
            from scipy import signal
            
            # 计算新的采样率
            new_sample_rate = int(self.sample_rate / pitch_factor)
            
            # 重采样
            if len(audio_data.shape) == 2:  # 立体声
                resampled = np.column_stack([
                    signal.resample(audio_data[:, 0], 
                                  int(len(audio_data) * self.sample_rate / new_sample_rate)),
                    signal.resample(audio_data[:, 1], 
                                  int(len(audio_data) * self.sample_rate / new_sample_rate))
                ])
            else:  # 单声道
                resampled = signal.resample(audio_data, 
                                         int(len(audio_data) * self.sample_rate / new_sample_rate))
            
            # 调整长度以匹配原始持续时间
            target_length = len(audio_data)
            if len(resampled) > target_length:
                return resampled[:target_length]
            else:
                # 填充到目标长度
                if len(resampled.shape) == 2:
                    padding = np.zeros((target_length - len(resampled), resampled.shape[1]))
                    return np.vstack([resampled, padding])
                else:
                    padding = np.zeros(target_length - len(resampled))
                    return np.concatenate([resampled, padding])
        
        except ImportError:
            print("scipy库未安装,音调变换功能受限")
            return audio_data
        
        except Exception as e:
            print(f"音调变换失败: {e}")
            return audio_data
    
    def _apply_speed_change(self, audio_data: np.ndarray, speed_factor: float) -> np.ndarray:
        """应用速度变化(简化实现)"""
        if speed_factor == 1.0:
            return audio_data
        
        try:
            from scipy import signal
            
            # 计算新的长度
            new_length = int(len(audio_data) / speed_factor)
            
            # 重采样
            if len(audio_data.shape) == 2:  # 立体声
                resampled = np.column_stack([
                    signal.resample(audio_data[:, 0], new_length),
                    signal.resample(audio_data[:, 1], new_length)
                ])
            else:  # 单声道
                resampled = signal.resample(audio_data, new_length)
            
            return resampled
        
        except ImportError:
            print("scipy库未安装,速度变换功能受限")
            return audio_data
        
        except Exception as e:
            print(f"速度变换失败: {e}")
            return audio_data
    
    def _apply_pan(self, audio_data: np.ndarray, pan_value: float) -> np.ndarray:
        """应用声像"""
        if len(audio_data.shape) != 2 or audio_data.shape[1] != 2:
            return audio_data
        
        # 计算左右声道增益
        if pan_value < 0:  # 偏左
            left_gain = 1.0
            right_gain = 1.0 + pan_value  # pan_value是负数
        else:  # 偏右
            left_gain = 1.0 - pan_value
            right_gain = 1.0
        
        # 应用增益
        audio_data[:, 0] *= left_gain
        audio_data[:, 1] *= right_gain
        
        return audio_data
    
    def get_waveform_data(self, width: int = 1000, height: int = 100) -> np.ndarray:
        """获取波形数据"""
        if not self.load_audio_data():
            return np.zeros((height, width, 3), dtype=np.uint8)
        
        if self._audio_data is None:
            return np.zeros((height, width, 3), dtype=np.uint8)
        
        # 计算波形数据
        samples_per_pixel = max(1, len(self._audio_data) // width)
        waveform = []
        
        for i in range(width):
            start_idx = i * samples_per_pixel
            end_idx = min((i + 1) * samples_per_pixel, len(self._audio_data))
            
            if start_idx < end_idx:
                # 计算该像素点的最大振幅
                if len(self._audio_data.shape) == 2:  # 立体声
                    left_max = np.max(np.abs(self._audio_data[start_idx:end_idx, 0]))
                    right_max = np.max(np.abs(self._audio_data[start_idx:end_idx, 1]))
                    waveform.append([left_max, right_max])
                else:  # 单声道
                    max_amp = np.max(np.abs(self._audio_data[start_idx:end_idx]))
                    waveform.append([max_amp, max_amp])
            else:
                waveform.append([0.0, 0.0])
        
        return np.array(waveform)
    
    def get_spectrum_data(self, fft_size: int = 2048) -> np.ndarray:
        """获取频谱数据"""
        if not self.load_audio_data():
            return np.zeros(fft_size // 2)
        
        if self._audio_data is None or len(self._audio_data) == 0:
            return np.zeros(fft_size // 2)
        
        try:
            # 使用第一个声道的数据
            if len(self._audio_data.shape) == 2:
                audio_data = self._audio_data[:, 0]
            else:
                audio_data = self._audio_data
            
            # 应用窗函数
            window = np.hanning(min(fft_size, len(audio_data)))
            audio_windowed = audio_data[:len(window)] * window
            
            # 计算FFT
            fft_data = np.fft.fft(audio_windowed)
            magnitude = np.abs(fft_data[:fft_size // 2])
            
            # 转换为dB
            magnitude_db = 20 * np.log10(np.maximum(magnitude, 1e-10))
            
            return magnitude_db
        
        except Exception as e:
            print(f"频谱计算失败: {e}")
            return np.zeros(fft_size // 2)
    
    def apply_effect(self, effect_name: str, params: Dict[str, Any]) -> bool:
        """应用音频效果"""
        if effect_name not in self.effects:
            self.effects.append(effect_name)
        
        self.effect_params[effect_name] = params
        
        # 标记需要重新处理
        self._cached_samples = None
        
        return True
    
    def remove_effect(self, effect_name: str) -> bool:
        """移除音频效果"""
        if effect_name in self.effects:
            self.effects.remove(effect_name)
            if effect_name in self.effect_params:
                del self.effect_params[effect_name]
            
            # 标记需要重新处理
            self._cached_samples = None
            return True
        
        return False
    
    def clear_effects(self) -> None:
        """清除所有效果"""
        self.effects.clear()
        self.effect_params.clear()
        self._cached_samples = None
    
    def get_effects(self) -> List[str]:
        """获取效果列表"""
        return self.effects.copy()
    
    def is_active_at_time(self, current_time: float) -> bool:
        """判断在给定时间是否激活"""
        return (self.visible and not self.muted and 
                self.start_time <= current_time <= self.start_time + self.duration)
    
    def set_volume(self, volume: float) -> None:
        """设置音量"""
        self.volume = max(0.0, min(2.0, volume))
        self._cached_samples = None
    
    def set_pan(self, pan: float) -> None:
        """设置声像"""
        self.pan = max(-1.0, min(1.0, pan))
        self._cached_samples = None
    
    def set_pitch(self, pitch: float) -> None:
        """设置音调"""
        self.pitch = max(0.5, min(2.0, pitch))
        self._cached_samples = None
    
    def set_speed(self, speed: float) -> None:
        """设置速度"""
        self.speed = max(0.5, min(2.0, speed))
        # 更新持续时间
        if self.original_duration > 0:
            self.duration = self.original_duration / speed
        self._cached_samples = None
    
    def fade_in(self, duration: float, curve: str = "linear") -> None:
        """淡入效果"""
        self.apply_effect("fade_in", {
            "duration": duration,
            "curve": curve
        })
    
    def fade_out(self, duration: float, curve: str = "linear") -> None:
        """淡出效果"""
        self.apply_effect("fade_out", {
            "duration": duration,
            "curve": curve
        })
    
    def normalize(self, target_level: float = -1.0) -> None:
        """标准化"""
        if not self.load_audio_data():
            return
        
        if self._audio_data is None or len(self._audio_data) == 0:
            return
        
        # 计算峰值
        peak = np.max(np.abs(self._audio_data))
        
        if peak > 0:
            # 计算增益
            gain = (10 ** (target_level / 20)) / peak
            
            # 应用增益
            self._audio_data *= gain
            
            # 更新音量
            self.volume *= gain
    
    def trim_silence(self, threshold: float = -40.0, 
                    min_silence_duration: float = 0.1) -> Tuple[float, float]:
        """修剪静音部分"""
        if not self.load_audio_data():
            return (0.0, self.duration)
        
        if self._audio_data is None or len(self._audio_data) == 0:
            return (0.0, self.duration)
        
        try:
            # 计算RMS能量
            if len(self._audio_data.shape) == 2:
                rms = np.sqrt(np.mean(self._audio_data ** 2, axis=0))
                rms = np.mean(rms)  # 取两个声道的平均值
            else:
                rms = np.sqrt(np.mean(self._audio_data ** 2))
            
            # 转换为dB
            rms_db = 20 * np.log10(max(rms, 1e-10))
            
            # 找到非静音部分
            threshold_linear = 10 ** (threshold / 20)
            is_silent = rms < threshold_linear
            
            # 找到开始和结束位置
            min_samples = int(min_silence_duration * self.sample_rate)
            
            # 简化实现:找到第一个和最后一个非静音样本
            non_silent_indices = np.where(~is_silent)[0]
            
            if len(non_silent_indices) == 0:
                return (0.0, self.duration)
            
            start_sample = non_silent_indices[0]
            end_sample = non_silent_indices[-1]
            
            # 转换为时间
            start_time = start_sample / self.sample_rate
            end_time = end_sample / self.sample_rate
            
            return (start_time, end_time)
        
        except Exception as e:
            print(f"静音修剪失败: {e}")
            return (0.0, self.duration)
    
    def to_dict(self) -> Dict[str, Any]:
        """转换为字典"""
        return {
            "id": self.id,
            "file_path": self.file_path,
            "name": self.name,
            "start_time": self.start_time,
            "duration": self.duration,
            "original_duration": self.original_duration,
            "volume": self.volume,
            "pan": self.pan,
            "pitch": self.pitch,
            "speed": self.speed,
            "sample_rate": self.sample_rate,
            "channels": self.channels,
            "bit_depth": self.bit_depth,
            "muted": self.muted,
            "solo": self.solo,
            "locked": self.locked,
            "visible": self.visible,
            "layer": self.layer,
            "effects": self.effects,
            "effect_params": self.effect_params
        }
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'AudioSegment':
        """从字典创建"""
        # 移除缓存相关的字段
        data.pop("_audio_data", None)
        data.pop("_waveform_data", None)
        data.pop("_cached_samples", None)
        data.pop("_metadata", None)
        data.pop("_last_access_time", None)
        data.pop("_is_loaded", None)
        data.pop("_load_lock", None)
        
        return cls(**data)
    
    def copy(self) -> 'AudioSegment':
        """创建副本"""
        data = self.to_dict()
        data["id"] = f"{self.id}_copy"
        return AudioSegment.from_dict(data)
    
    def __str__(self) -> str:
        """字符串表示"""
        return f"AudioSegment(id='{self.id}', name='{self.name}', " \
               f"time={self.start_time:.1f}-{self.start_time + self.duration:.1f}s, " \
               f"vol={self.volume:.2f}, pitch={self.pitch:.2f})"
    
    def __repr__(self) -> str:
        """详细字符串表示"""
        return f"AudioSegment(id='{self.id}', name='{self.name}', " \
               f"file='{self.file_path}', duration={self.duration:.2f}s, " \
               f"sr={self.sample_rate}, channels={self.channels})"

12.2.2 音频效果处理器

音频效果处理器负责应用各种音频特效,提供了丰富的音频处理功能:

from abc import ABC, abstractmethod
import numpy as np
from typing import Dict, Any, Optional, List

class AudioEffect(ABC):
    """音频效果基类"""
    
    def __init__(self, name: str, parameters: Dict[str, Any] = None):
        self.name = name
        self.parameters = parameters or {}
        self.enabled = True
        self.bypass = False
    
    @abstractmethod
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        pass
    
    @abstractmethod
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        pass
    
    def set_parameter(self, name: str, value: Any) -> bool:
        """设置参数"""
        if name in self.parameters:
            self.parameters[name] = value
            return True
        return False
    
    def get_parameter(self, name: str) -> Any:
        """获取参数"""
        return self.parameters.get(name)
    
    def enable(self) -> None:
        """启用效果"""
        self.enabled = True
        self.bypass = False
    
    def disable(self) -> None:
        """禁用效果"""
        self.enabled = False
    
    def bypass_effect(self) -> None:
        """旁路效果"""
        self.bypass = True
    
    def reset(self) -> None:
        """重置效果器状态"""
        pass

class VolumeEffect(AudioEffect):
    """音量效果器"""
    
    def __init__(self, gain_db: float = 0.0):
        super().__init__("volume", {"gain_db": gain_db})
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        gain_linear = 10 ** (self.parameters["gain_db"] / 20)
        return audio_data * gain_linear
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "gain_db": {
                "type": "float",
                "min": -60.0,
                "max": 20.0,
                "default": 0.0,
                "unit": "dB",
                "description": "增益(dB)"
            }
        }

class PanEffect(AudioEffect):
    """声像效果器"""
    
    def __init__(self, pan: float = 0.0):
        super().__init__("pan", {"pan": pan})
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        if len(audio_data.shape) != 2 or audio_data.shape[1] != 2:
            return audio_data  # 只对立体声有效
        
        pan = self.parameters["pan"]
        
        # 计算左右声道增益
        if pan < 0:  # 偏左
            left_gain = 1.0
            right_gain = 1.0 + pan
        else:  # 偏右
            left_gain = 1.0 - pan
            right_gain = 1.0
        
        # 应用恒定功率法则
        left_gain = np.sqrt(left_gain)
        right_gain = np.sqrt(right_gain)
        
        audio_data[:, 0] *= left_gain
        audio_data[:, 1] *= right_gain
        
        return audio_data
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "pan": {
                "type": "float",
                "min": -1.0,
                "max": 1.0,
                "default": 0.0,
                "unit": "",
                "description": "声像位置(-1左, 0中, 1右)"
            }
        }

class LowPassFilterEffect(AudioEffect):
    """低通滤波器"""
    
    def __init__(self, cutoff_freq: float = 1000.0, resonance: float = 1.0):
        super().__init__("lowpass", {"cutoff_freq": cutoff_freq, "resonance": resonance})
        self._state = None
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        try:
            from scipy import signal
            
            cutoff_freq = self.parameters["cutoff_freq"]
            resonance = self.parameters["resonance"]
            
            # 设计巴特沃斯低通滤波器
            nyquist = sample_rate / 2
            normalized_cutoff = min(cutoff_freq / nyquist, 0.99)
            
            # 计算滤波器阶数
            order = int(resonance * 4) + 1
            order = min(order, 8)  # 限制最大阶数
            
            # 设计滤波器
            b, a = signal.butter(order, normalized_cutoff, btype='low')
            
            # 应用滤波器
            if len(audio_data.shape) == 2:  # 立体声
                filtered = np.column_stack([
                    signal.filtfilt(b, a, audio_data[:, 0]),
                    signal.filtfilt(b, a, audio_data[:, 1])
                ])
            else:  # 单声道
                filtered = signal.filtfilt(b, a, audio_data)
            
            return filtered
        
        except ImportError:
            print("scipy库未安装,滤波功能受限")
            return audio_data
        
        except Exception as e:
            print(f"低通滤波处理失败: {e}")
            return audio_data
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "cutoff_freq": {
                "type": "float",
                "min": 20.0,
                "max": 20000.0,
                "default": 1000.0,
                "unit": "Hz",
                "description": "截止频率"
            },
            "resonance": {
                "type": "float",
                "min": 0.1,
                "max": 10.0,
                "default": 1.0,
                "unit": "",
                "description": "共振"
            }
        }

class HighPassFilterEffect(AudioEffect):
    """高通滤波器"""
    
    def __init__(self, cutoff_freq: float = 100.0, resonance: float = 1.0):
        super().__init__("highpass", {"cutoff_freq": cutoff_freq, "resonance": resonance})
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        try:
            from scipy import signal
            
            cutoff_freq = self.parameters["cutoff_freq"]
            resonance = self.parameters["resonance"]
            
            # 设计巴特沃斯高通滤波器
            nyquist = sample_rate / 2
            normalized_cutoff = min(cutoff_freq / nyquist, 0.99)
            
            # 计算滤波器阶数
            order = int(resonance * 4) + 1
            order = min(order, 8)
            
            # 设计滤波器
            b, a = signal.butter(order, normalized_cutoff, btype='high')
            
            # 应用滤波器
            if len(audio_data.shape) == 2:  # 立体声
                filtered = np.column_stack([
                    signal.filtfilt(b, a, audio_data[:, 0]),
                    signal.filtfilt(b, a, audio_data[:, 1])
                ])
            else:  # 单声道
                filtered = signal.filtfilt(b, a, audio_data)
            
            return filtered
        
        except ImportError:
            print("scipy库未安装,滤波功能受限")
            return audio_data
        
        except Exception as e:
            print(f"高通滤波处理失败: {e}")
            return audio_data
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "cutoff_freq": {
                "type": "float",
                "min": 20.0,
                "max": 20000.0,
                "default": 100.0,
                "unit": "Hz",
                "description": "截止频率"
            },
            "resonance": {
                "type": "float",
                "min": 0.1,
                "max": 10.0,
                "default": 1.0,
                "unit": "",
                "description": "共振"
            }
        }

class CompressorEffect(AudioEffect):
    """压缩器效果器"""
    
    def __init__(self, threshold_db: float = -18.0, ratio: float = 4.0, 
                 attack_ms: float = 10.0, release_ms: float = 100.0, 
                 makeup_gain_db: float = 0.0):
        super().__init__("compressor", {
            "threshold_db": threshold_db,
            "ratio": ratio,
            "attack_ms": attack_ms,
            "release_ms": release_ms,
            "makeup_gain_db": makeup_gain_db
        })
        self._envelope = 0.0
        self._attack_coeff = 0.0
        self._release_coeff = 0.0
        self._update_coefficients(44100)  # 默认采样率
    
    def _update_coefficients(self, sample_rate: int) -> None:
        """更新系数"""
        attack_time = self.parameters["attack_ms"] / 1000.0
        release_time = self.parameters["release_ms"] / 1000.0
        
        self._attack_coeff = 1.0 - np.exp(-1.0 / (attack_time * sample_rate))
        self._release_coeff = 1.0 - np.exp(-1.0 / (release_time * sample_rate))
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        self._update_coefficients(sample_rate)
        
        threshold = 10 ** (self.parameters["threshold_db"] / 20)
        ratio = self.parameters["ratio"]
        makeup_gain = 10 ** (self.parameters["makeup_gain_db"] / 20)
        
        # 计算RMS电平
        if len(audio_data.shape) == 2:
            rms = np.sqrt(np.mean(audio_data ** 2, axis=1))
        else:
            rms = np.sqrt(audio_data ** 2)
        
        # 应用压缩
        processed = audio_data.copy()
        
        for i, level in enumerate(rms):
            # 更新包络
            if level > self._envelope:
                self._envelope += self._attack_coeff * (level - self._envelope)
            else:
                self._envelope += self._release_coeff * (level - self._envelope)
            
            # 计算增益减少
            if self._envelope > threshold:
                gain_reduction = threshold / self._envelope * (1.0 - 1.0 / ratio) + 1.0 / ratio
                if len(audio_data.shape) == 2:
                    processed[i, :] *= gain_reduction
                else:
                    processed[i] *= gain_reduction
        
        # 应用补偿增益
        processed *= makeup_gain
        
        return processed
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "threshold_db": {
                "type": "float",
                "min": -60.0,
                "max": 0.0,
                "default": -18.0,
                "unit": "dB",
                "description": "阈值"
            },
            "ratio": {
                "type": "float",
                "min": 1.0,
                "max": 20.0,
                "default": 4.0,
                "unit": ":1",
                "description": "压缩比"
            },
            "attack_ms": {
                "type": "float",
                "min": 0.1,
                "max": 100.0,
                "default": 10.0,
                "unit": "ms",
                "description": "启动时间"
            },
            "release_ms": {
                "type": "float",
                "min": 10.0,
                "max": 1000.0,
                "default": 100.0,
                "unit": "ms",
                "description": "释放时间"
            },
            "makeup_gain_db": {
                "type": "float",
                "min": 0.0,
                "max": 30.0,
                "default": 0.0,
                "unit": "dB",
                "description": "补偿增益"
            }
        }

class ReverbEffect(AudioEffect):
    """混响效果器"""
    
    def __init__(self, room_size: float = 0.5, damping: float = 0.5, 
                 wet_level: float = 0.3, dry_level: float = 0.7,
                 width: float = 1.0):
        super().__init__("reverb", {
            "room_size": room_size,
            "damping": damping,
            "wet_level": wet_level,
            "dry_level": dry_level,
            "width": width
        })
        self._delay_buffers = []
        self._delay_lengths = []
        self._delay_index = 0
        self._initialize_delay_buffers(44100)
    
    def _initialize_delay_buffers(self, sample_rate: int) -> None:
        """初始化延迟缓冲区"""
        # 简化的混响算法,使用多个延迟线
        delay_times = [0.0297, 0.0371, 0.0411, 0.0437]  # 延迟时间(秒)
        self._delay_lengths = [int(t * sample_rate) for t in delay_times]
        self._delay_buffers = [np.zeros(length) for length in self._delay_lengths]
        self._delay_index = 0
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        # 初始化延迟缓冲区
        self._initialize_delay_buffers(sample_rate)
        
        room_size = self.parameters["room_size"]
        damping = self.parameters["damping"]
        wet_level = self.parameters["wet_level"]
        dry_level = self.parameters["dry_level"]
        width = self.parameters["width"]
        
        # 简化的混响处理
        processed = audio_data.copy()
        
        if len(audio_data.shape) == 2:  # 立体声
            for i in range(len(audio_data)):
                # 干信号
                dry_left = audio_data[i, 0]
                dry_right = audio_data[i, 1]
                
                # 计算湿信号(简化实现)
                wet_left = 0.0
                wet_right = 0.0
                
                for j, delay_buffer in enumerate(self._delay_buffers):
                    # 从延迟缓冲区读取
                    delayed_sample = delay_buffer[self._delay_index]
                    
                    # 写入当前样本
                    delay_buffer[self._delay_index] = (dry_left + dry_right) * 0.5 * room_size
                    
                    # 累积湿信号
                    if j % 2 == 0:
                        wet_left += delayed_sample * (1.0 - damping)
                    else:
                        wet_right += delayed_sample * (1.0 - damping)
                
                # 混合干湿信号
                processed[i, 0] = dry_left * dry_level + wet_left * wet_level * width
                processed[i, 1] = dry_right * dry_level + wet_right * wet_level * width
                
                # 更新延迟索引
                self._delay_index = (self._delay_index + 1) % len(self._delay_buffers[0])
        
        else:  # 单声道
            for i in range(len(audio_data)):
                dry = audio_data[i]
                wet = 0.0
                
                for delay_buffer in self._delay_buffers:
                    delayed_sample = delay_buffer[self._delay_index]
                    delay_buffer[self._delay_index] = dry * room_size
                    wet += delayed_sample * (1.0 - damping)
                
                processed[i] = dry * dry_level + wet * wet_level
                self._delay_index = (self._delay_index + 1) % len(self._delay_buffers[0])
        
        return processed
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "room_size": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.5,
                "unit": "",
                "description": "房间大小"
            },
            "damping": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.5,
                "unit": "",
                "description": "阻尼"
            },
            "wet_level": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.3,
                "unit": "",
                "description": "湿信号电平"
            },
            "dry_level": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.7,
                "unit": "",
                "description": "干信号电平"
            },
            "width": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 1.0,
                "unit": "",
                "description": "立体声宽度"
            }
        }

class DelayEffect(AudioEffect):
    """延迟效果器"""
    
    def __init__(self, delay_time_ms: float = 500.0, feedback: float = 0.5,
                 wet_level: float = 0.5, dry_level: float = 0.5):
        super().__init__("delay", {
            "delay_time_ms": delay_time_ms,
            "feedback": feedback,
            "wet_level": wet_level,
            "dry_level": dry_level
        })
        self._delay_buffer = None
        self._delay_index = 0
        self._delay_length = 0
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        delay_time_ms = self.parameters["delay_time_ms"]
        feedback = self.parameters["feedback"]
        wet_level = self.parameters["wet_level"]
        dry_level = self.parameters["dry_level"]
        
        # 初始化延迟缓冲区
        delay_samples = int(delay_time_ms * sample_rate / 1000.0)
        if self._delay_buffer is None or len(self._delay_buffer) != delay_samples:
            self._delay_buffer = np.zeros(delay_samples)
            self._delay_length = delay_samples
            self._delay_index = 0
        
        processed = audio_data.copy()
        
        if len(audio_data.shape) == 2:  # 立体声
            for i in range(len(audio_data)):
                dry_left = audio_data[i, 0]
                dry_right = audio_data[i, 1]
                
                # 从延迟缓冲区读取
                delayed_left = self._delay_buffer[self._delay_index, 0] if len(self._delay_buffer.shape) == 2 else self._delay_buffer[self._delay_index]
                delayed_right = self._delay_buffer[self._delay_index, 1] if len(self._delay_buffer.shape) == 2 else self._delay_buffer[self._delay_index]
                
                # 混合干湿信号
                wet_left = delayed_left * wet_level
                wet_right = delayed_right * wet_level
                
                processed[i, 0] = dry_left * dry_level + wet_left
                processed[i, 1] = dry_right * dry_level + wet_right
                
                # 写入延迟缓冲区
                if len(self._delay_buffer.shape) == 2:
                    self._delay_buffer[self._delay_index, 0] = dry_left + delayed_left * feedback
                    self._delay_buffer[self._delay_index, 1] = dry_right + delayed_right * feedback
                else:
                    self._delay_buffer[self._delay_index] = (dry_left + dry_right) * 0.5 + ((delayed_left + delayed_right) * 0.5) * feedback
                
                # 更新延迟索引
                self._delay_index = (self._delay_index + 1) % self._delay_length
        
        else:  # 单声道
            for i in range(len(audio_data)):
                dry = audio_data[i]
                delayed = self._delay_buffer[self._delay_index]
                
                processed[i] = dry * dry_level + delayed * wet_level
                self._delay_buffer[self._delay_index] = dry + delayed * feedback
                self._delay_index = (self._delay_index + 1) % self._delay_length
        
        return processed
    
    def get_parameters_info(self) -> Dict[str, Dict[str, Any]]:
        """获取参数信息"""
        return {
            "delay_time_ms": {
                "type": "float",
                "min": 1.0,
                "max": 2000.0,
                "default": 500.0,
                "unit": "ms",
                "description": "延迟时间"
            },
            "feedback": {
                "type": "float",
                "min": 0.0,
                "max": 0.95,
                "default": 0.5,
                "unit": "",
                "description": "反馈"
            },
            "wet_level": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.5,
                "unit": "",
                "description": "湿信号电平"
            },
            "dry_level": {
                "type": "float",
                "min": 0.0,
                "max": 1.0,
                "default": 0.5,
                "unit": "",
                "description": "干信号电平"
            }
        }

class AudioEffectChain:
    """音频效果链"""
    
    def __init__(self):
        self.effects: List[AudioEffect] = []
        self.enabled = True
        self.bypass = False
    
    def add_effect(self, effect: AudioEffect) -> None:
        """添加效果器"""
        self.effects.append(effect)
    
    def remove_effect(self, effect_name: str) -> bool:
        """移除效果器"""
        for i, effect in enumerate(self.effects):
            if effect.name == effect_name:
                self.effects.pop(i)
                return True
        return False
    
    def get_effect(self, effect_name: str) -> Optional[AudioEffect]:
        """获取效果器"""
        for effect in self.effects:
            if effect.name == effect_name:
                return effect
        return None
    
    def process(self, audio_data: np.ndarray, sample_rate: int) -> np.ndarray:
        """处理音频数据"""
        if not self.enabled or self.bypass:
            return audio_data
        
        result = audio_data.copy()
        
        for effect in self.effects:
            if effect.enabled and not effect.bypass:
                result = effect.process(result, sample_rate)
        
        return result
    
    def clear_effects(self) -> None:
        """清除所有效果器"""
        self.effects.clear()
    
    def get_effects(self) -> List[AudioEffect]:
        """获取所有效果器"""
        return self.effects.copy()
    
    def reorder_effects(self, new_order: List[int]) -> bool:
        """重新排序效果器"""
        if len(new_order) != len(self.effects):
            return False
        
        if set(new_order) != set(range(len(self.effects))):
            return False
        
        # 重新排序
        new_effects = [self.effects[i] for i in new_order]
        self.effects = new_effects
        
        return True

class AudioEffectFactory:
    """音频效果器工厂"""
    
    _effect_classes = {
        "volume": VolumeEffect,
        "pan": PanEffect,
        "lowpass": LowPassFilterEffect,
        "highpass": HighPassFilterEffect,
        "compressor": CompressorEffect,
        "reverb": ReverbEffect,
        "delay": DelayEffect
    }
    
    @classmethod
    def create_effect(cls, effect_name: str, **kwargs) -> Optional[AudioEffect]:
        """创建效果器"""
        if effect_name in cls._effect_classes:
            effect_class = cls._effect_classes[effect_name]
            return effect_class(**kwargs)
        return None
    
    @classmethod
    def get_available_effects(cls) -> List[str]:
        """获取可用效果器列表"""
        return list(cls._effect_classes.keys())
    
    @classmethod
    def register_effect(cls, effect_name: str, effect_class: type) -> None:
        """注册新的效果器类型"""
        cls._effect_classes[effect_name] = effect_class

12.3 音频轨道与混音系统

12.3.1 音频轨道管理

音频轨道负责管理音频片段的集合,提供了轨道级别的控制和处理功能:

from typing import List, Dict, Optional, Tuple, Any
import bisect
from collections import defaultdict
import threading

class AudioTrack:
    """音频轨道类"""
    
    def __init__(self, name: str = "音频轨道", track_id: str = ""):
        self.track_id = track_id or f"audio_track_{int(time.time() * 1000)}"
        self.name = name
        self.segments: List[AudioSegment] = []
        
        # 轨道属性
        self.volume: float = 1.0  # 轨道音量
        self.pan: float = 0.0      # 轨道声像
        self.muted: bool = False   # 静音
        self.solo: bool = False    # 独奏
        self.locked: bool = False  # 锁定
        self.visible: bool = True  # 可见
        self.color: str = "#2196F3"  # 轨道颜色
        
        # 效果链
        self.effect_chain = AudioEffectChain()
        
        # 索引优化
        self._time_index: Dict[float, List[int]] = defaultdict(list)
        self._id_index: Dict[str, int] = {}
        self._layer_index: Dict[int, List[int]] = defaultdict(list)
        
        # 缓存
        self._sorted_segments: Optional[List[AudioSegment]] = None
        self._last_update_time: float = 0.0
        
        # 混音缓存
        self._mix_cache: Dict[Tuple[float, float], np.ndarray] = {}
        self._cache_lock = threading.Lock()
        
        # 性能统计
        self.render_stats = {
            "total_mixes": 0,
            "cache_hits": 0,
            "cache_misses": 0,
            "avg_mix_time": 0.0
        }
    
    def add_segment(self, segment: AudioSegment) -> bool:
        """添加音频片段"""
        if self.locked:
            return False
        
        if segment.id in self._id_index:
            return False
        
        # 添加到列表
        self.segments.append(segment)
        index = len(self.segments) - 1
        
        # 更新索引
        self._id_index[segment.id] = index
        self._time_index[segment.start_time].append(index)
        self._layer_index[segment.layer].append(index)
        
        # 使缓存失效
        self._invalidate_cache()
        
        return True
    
    def remove_segment(self, segment_id: str) -> bool:
        """移除音频片段"""
        if self.locked or segment_id not in self._id_index:
            return False
        
        index = self._id_index[segment_id]
        segment = self.segments[index]
        
        # 从列表中移除
        self.segments.pop(index)
        
        # 更新索引
        del self._id_index[segment_id]
        self._time_index[segment.start_time].remove(index)
        if not self._time_index[segment.start_time]:
            del self._time_index[segment.start_time]
        
        self._layer_index[segment.layer].remove(index)
        if not self._layer_index[segment.layer]:
            del self._layer_index[segment.layer]
        
        # 重建索引
        self._rebuild_indices()
        
        # 使缓存失效
        self._invalidate_cache()
        
        return True
    
    def get_segment(self, segment_id: str) -> Optional[AudioSegment]:
        """获取音频片段"""
        if segment_id not in self._id_index:
            return None
        
        index = self._id_index[segment_id]
        return self.segments[index]
    
    def get_segments_at_time(self, time: float) -> List[AudioSegment]:
        """获取指定时间的音频片段"""
        result = []
        
        # 使用时间索引快速查找
        if time in self._time_index:
            for index in self._time_index[time]:
                segment = self.segments[index]
                if segment.is_active_at_time(time):
                    result.append(segment)
        
        # 检查其他可能的时间段
        for segment in self.segments:
            if segment.is_active_at_time(time) and segment not in result:
                result.append(segment)
        
        # 按层级排序
        result.sort(key=lambda s: s.layer)
        
        return result
    
    def get_segments_in_range(self, start_time: float, end_time: float) -> List[AudioSegment]:
        """获取时间范围内的音频片段"""
        result = []
        
        for segment in self.segments:
            if (segment.start_time < end_time and 
                segment.start_time + segment.duration > start_time):
                result.append(segment)
        
        # 按开始时间和层级排序
        result.sort(key=lambda s: (s.start_time, s.layer))
        
        return result
    
    def mix_audio(self, start_time: float, duration: float, 
                  sample_rate: int = 44100) -> np.ndarray:
        """混音指定时间范围的音频"""
        if self.muted or not self.visible:
            return np.zeros(int(duration * sample_rate))
        
        # 检查缓存
        cache_key = (start_time, duration)
        with self._cache_lock:
            if cache_key in self._mix_cache:
                self.render_stats["cache_hits"] += 1
                return self._mix_cache[cache_key].copy()
        
        self.render_stats["cache_misses"] += 1
        
        # 获取相关片段
        segments = self.get_segments_in_range(start_time, end_time)
        
        if not segments:
            return np.zeros(int(duration * sample_rate))
        
        # 创建混音缓冲区
        mix_buffer = np.zeros(int(duration * sample_rate))
        
        # 混音每个片段
        for segment in segments:
            if not segment.muted:
                # 计算片段在混音时间范围内的部分
                seg_start_in_mix = max(0.0, segment.start_time - start_time)
                seg_end_in_mix = min(duration, segment.start_time + segment.duration - start_time)
                
                if seg_start_in_mix < seg_end_in_mix:
                    # 获取片段音频数据
                    seg_start_time = max(0.0, start_time - segment.start_time)
                    seg_duration = seg_end_in_mix - seg_start_in_mix
                    
                    audio_data = segment.get_audio_data(seg_start_time, seg_duration)
                    
                    if audio_data is not None and len(audio_data) > 0:
                        # 调整采样率
                        if segment.sample_rate != sample_rate:
                            audio_data = self._resample_audio(audio_data, segment.sample_rate, sample_rate)
                        
                        # 计算混音位置
                        mix_start_sample = int(seg_start_in_mix * sample_rate)
                        mix_end_sample = int(seg_end_in_mix * sample_rate)
                        seg_length = min(len(audio_data), mix_end_sample - mix_start_sample)
                        
                        if seg_length > 0:
                            # 混音到缓冲区
                            if len(audio_data.shape) == 2:  # 立体声转单声道
                                mono_data = np.mean(audio_data[:seg_length], axis=1)
                                mix_buffer[mix_start_sample:mix_start_sample + seg_length] += mono_data
                            else:
                                mix_buffer[mix_start_sample:mix_start_sample + seg_length] += audio_data[:seg_length]
        
        # 应用轨道效果链
        if self.effect_chain.enabled and not self.effect_chain.bypass:
            mix_buffer = self.effect_chain.process(mix_buffer, sample_rate)
        
        # 应用轨道音量
        mix_buffer *= self.volume
        
        # 缓存结果
        with self._cache_lock:
            self._mix_cache[cache_key] = mix_buffer.copy()
        
        # 更新统计
        self.render_stats["total_mixes"] += 1
        
        return mix_buffer
    
    def _resample_audio(self, audio_data: np.ndarray, 
                       original_rate: int, target_rate: int) -> np.ndarray:
        """重采样音频数据"""
        try:
            from scipy import signal
            
            # 计算重采样比例
            resample_ratio = target_rate / original_rate
            new_length = int(len(audio_data) * resample_ratio)
            
            if len(audio_data.shape) == 2:  # 立体声
                resampled = np.column_stack([
                    signal.resample(audio_data[:, 0], new_length),
                    signal.resample(audio_data[:, 1], new_length)
                ])
            else:  # 单声道
                resampled = signal.resample(audio_data, new_length)
            
            return resampled
        
        except ImportError:
            print("scipy库未安装,重采样功能受限")
            return audio_data
        
        except Exception as e:
            print(f"音频重采样失败: {e}")
            return audio_data
    
    def _rebuild_indices(self) -> None:
        """重建索引"""
        self._id_index.clear()
        self._time_index.clear()
        self._layer_index.clear()
        
        for i, segment in enumerate(self.segments):
            self._id_index[segment.id] = i
            self._time_index[segment.start_time].append(i)
            self._layer_index[segment.layer].append(i)
    
    def _invalidate_cache(self) -> None:
        """使缓存失效"""
        self._sorted_segments = None
        self._last_update_time = time.time()
        with self._cache_lock:
            self._mix_cache.clear()
    
    def get_sorted_segments(self) -> List[AudioSegment]:
        """获取排序后的片段列表"""
        if self._sorted_segments is None:
            self._sorted_segments = sorted(self.segments, key=lambda s: s.start_time)
        
        return self._sorted_segments
    
    def get_duration(self) -> float:
        """获取轨道持续时间"""
        if not self.segments:
            return 0.0
        
        return max(segment.start_time + segment.duration for segment in self.segments)
    
    def get_active_segments(self, current_time: float) -> List[AudioSegment]:
        """获取当前激活的片段"""
        return [s for s in self.segments if s.is_active_at_time(current_time)]
    
    def has_overlapping_segments(self) -> bool:
        """检查是否有重叠的片段"""
        sorted_segments = self.get_sorted_segments()
        
        for i in range(len(sorted_segments) - 1):
            current = sorted_segments[i]
            next_segment = sorted_segments[i + 1]
            
            if current.start_time + current.duration > next_segment.start_time:
                return True
        
        return False
    
    def resolve_overlaps(self) -> None:
        """解决片段重叠问题"""
        if self.locked:
            return
        
        sorted_segments = self.get_sorted_segments()
        
        for i in range(len(sorted_segments) - 1):
            current = sorted_segments[i]
            next_segment = sorted_segments[i + 1]
            
            overlap = current.start_time + current.duration - next_segment.start_time
            if overlap > 0:
                # 缩短当前片段
                current.duration = next_segment.start_time - current.start_time
    
    def duplicate_segment(self, segment_id: str, offset: float = 0.0) -> Optional[str]:
        """复制音频片段"""
        if self.locked:
            return None
        
        segment = self.get_segment(segment_id)
        if not segment:
            return None
        
        # 创建副本
        new_segment = segment.copy()
        new_segment.id = f"{segment_id}_dup_{int(time.time() * 1000)}"
        
        # 应用时间偏移
        if offset != 0.0:
            new_segment.start_time += offset
        
        # 添加新片段
        if self.add_segment(new_segment):
            return new_segment.id
        
        return None
    
    def split_segment(self, segment_id: str, split_time: float) -> Optional[str]:
        """分割音频片段"""
        if self.locked:
            return None
        
        segment = self.get_segment(segment_id)
        if not segment:
            return None
        
        # 计算分割点在片段内的时间
        split_time_in_segment = split_time - segment.start_time
        
        if split_time_in_segment <= 0 or split_time_in_segment >= segment.duration:
            return None
        
        # 创建新片段
        new_segment = segment.copy()
        new_segment.id = f"{segment_id}_split_{int(time.time() * 1000)}"
        
        # 调整时间
        new_segment.start_time = split_time
        new_segment.duration = segment.duration - split_time_in_segment
        
        segment.duration = split_time_in_segment
        
        # 添加新片段
        if self.add_segment(new_segment):
            return new_segment.id
        
        return None
    
    def apply_effect_to_all(self, effect: AudioEffect) -> None:
        """应用效果到所有片段"""
        if self.locked:
            return
        
        for segment in self.segments:
            segment.apply_effect(effect.name, effect.parameters)
    
    def self.clear_all_segments(self) -> None:
        """清除所有片段"""
        if self.locked:
            return
        
        self.segments.clear()
        self._rebuild_indices()
        self._invalidate_cache()
    
    def set_volume(self, volume: float) -> None:
        """设置轨道音量"""
        self.volume = max(0.0, min(2.0, volume))
        self._invalidate_cache()
    
    def set_pan(self, pan: float) -> None:
        """设置轨道声像"""
        self.pan = max(-1.0, min(1.0, pan))
        self._invalidate_cache()
    
    def to_dict(self) -> Dict[str, Any]:
        """转换为字典"""
        return {
            "track_id": self.track_id,
            "name": self.name,
            "volume": self.volume,
            "pan": self.pan,
            "muted": self.muted,
            "solo": self.solo,
            "locked": self.locked,
            "visible": self.visible,
            "color": self.color,
            "segments": [segment.to_dict() for segment in self.segments]
        }
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'AudioTrack':
        """从字典创建"""
        track = cls(data.get("name", "音频轨道"), data.get("track_id", ""))
        track.volume = data.get("volume", 1.0)
        track.pan = data.get("pan", 0.0)
        track.muted = data.get("muted", False)
        track.solo = data.get("solo", False)
        track.locked = data.get("locked", False)
        track.visible = data.get("visible", True)
        track.color = data.get("color", "#2196F3")
        
        # 加载片段
        for segment_data in data.get("segments", []):
            segment = AudioSegment.from_dict(segment_data)
            track.add_segment(segment)
        
        return track
    
    def __str__(self) -> str:
        """字符串表示"""
        return f"AudioTrack(id='{self.track_id}', name='{self.name}', segments={len(self.segments)})"
    
    def __repr__(self) -> str:
        """详细字符串表示"""
        return f"AudioTrack(id='{self.track_id}', name='{self.name}', volume={self.volume}, muted={self.muted})"

### 12.3.2 音频混音器实现

音频混音器是音频系统的核心组件,负责将多个音频轨道混合成最终的音频输出:

```python
class AudioMixer:
    """音频混音器"""
    
    def __init__(self, sample_rate: int = 44100, channels: int = 2, bit_depth: int = 16):
        self.sample_rate = sample_rate
        self.channels = channels
        self.bit_depth = bit_depth
        
        # 轨道管理
        self.tracks: List[AudioTrack] = []
        self.track_id_index: Dict[str, int] = {}
        
        # 主输出控制
        self.master_volume = 1.0
        self.master_pan = 0.0
        self.master_muted = False
        
        # 主效果链
        self.master_effect_chain = AudioEffectChain()
        
        # 混音缓存
        self._mix_cache: Dict[Tuple[float, float], np.ndarray] = {}
        self._cache_lock = threading.Lock()
        
        # 性能统计
        self.mix_stats = {
            "total_mixes": 0,
            "cache_hits": 0,
            "cache_misses": 0,
            "avg_mix_time": 0.0,
            "peak_cpu_usage": 0.0
        }
        
        # 实时处理状态
        self._is_playing = False
        self._current_time = 0.0
        self._playback_thread = None
        self._playback_lock = threading.Lock()
        
        # 音频输出
        self._audio_output = None
        self._output_buffer_size = 1024
        self._output_latency = 0.1  # 100ms
    
    def add_track(self, track: AudioTrack) -> bool:
        """添加音频轨道"""
        if track.track_id in self.track_id_index:
            return False
        
        self.tracks.append(track)
        self.track_id_index[track.track_id] = len(self.tracks) - 1
        
        # 使缓存失效
        self._invalidate_cache()
        
        return True
    
    def remove_track(self, track_id: str) -> bool:
        """移除音频轨道"""
        if track_id not in self.track_id_index:
            return False
        
        index = self.track_id_index[track_id]
        self.tracks.pop(index)
        
        # 重建索引
        self.track_id_index.clear()
        for i, track in enumerate(self.tracks):
            self.track_id_index[track.track_id] = i
        
        # 使缓存失效
        self._invalidate_cache()
        
        return True
    
    def get_track(self, track_id: str) -> Optional[AudioTrack]:
        """获取音频轨道"""
        if track_id not in self.track_id_index:
            return None
        
        index = self.track_id_index[track_id]
        return self.tracks[index]
    
    def mix_tracks(self, start_time: float, duration: float) -> np.ndarray:
        """混音所有轨道"""
        if self.master_muted or not self.tracks:
            return np.zeros(int(duration * self.sample_rate))
        
        # 检查缓存
        cache_key = (start_time, duration)
        with self._cache_lock:
            if cache_key in self._mix_cache:
                self.mix_stats["cache_hits"] += 1
                return self._mix_cache[cache_key].copy()
        
        self.mix_stats["cache_misses"] += 1
        
        # 创建混音缓冲区
        mix_buffer = np.zeros(int(duration * self.sample_rate))
        
        # 检查是否有独奏轨道
        solo_tracks = [track for track in self.tracks if track.solo]
        tracks_to_mix = solo_tracks if solo_tracks else self.tracks
        
        # 混音每个轨道
        for track in tracks_to_mix:
            if not track.muted and track.visible:
                track_mix = track.mix_audio(start_time, duration, self.sample_rate)
                
                # 确保长度匹配
                if len(track_mix) > len(mix_buffer):
                    track_mix = track_mix[:len(mix_buffer)]
                elif len(track_mix) < len(mix_buffer):
                    # 填充到相同长度
                    padding = np.zeros(len(mix_buffer) - len(track_mix))
                    track_mix = np.concatenate([track_mix, padding])
                
                # 累加到混音缓冲区
                mix_buffer += track_mix
        
        # 应用主效果链
        if self.master_effect_chain.enabled and not self.master_effect_chain.bypass:
            mix_buffer = self.master_effect_chain.process(mix_buffer, self.sample_rate)
        
        # 应用主音量
        mix_buffer *= self.master_volume
        
        # 防止削波
        max_value = np.max(np.abs(mix_buffer))
        if max_value > 0.95:
            mix_buffer *= 0.95 / max_value
        
        # 缓存结果
        with self._cache_lock:
            self._mix_cache[cache_key] = mix_buffer.copy()
        
        # 更新统计
        self.mix_stats["total_mixes"] += 1
        
        return mix_buffer
    
    def mix_realtime(self, buffer_size: int = 1024) -> np.ndarray:
        """实时混音"""
        if self.master_muted or not self.tracks:
            return np.zeros(buffer_size)
        
        # 计算当前时间段的混音
        duration = buffer_size / self.sample_rate
        mix_buffer = self.mix_tracks(self._current_time, duration)
        
        # 更新时间
        self._current_time += duration
        
        return mix_buffer
    
    def start_playback(self) -> bool:
        """开始播放"""
        if self._is_playing:
            return False
        
        with self._playback_lock:
            self._is_playing = True
            self._current_time = 0.0
            
            # 启动播放线程
            self._playback_thread = threading.Thread(target=self._playback_loop)
            self._playback_thread.daemon = True
            self._playback_thread.start()
        
        return True
    
    def stop_playback(self) -> bool:
        """停止播放"""
        if not self._is_playing:
            return False
        
        with self._playback_lock:
            self._is_playing = False
            
            # 等待播放线程结束
            if self._playback_thread:
                self._playback_thread.join(timeout=1.0)
                self._playback_thread = None
        
        return True
    
    def _playback_loop(self) -> None:
        """播放循环"""
        while self._is_playing:
            start_time = time.time()
            
            # 混音当前缓冲区
            audio_buffer = self.mix_realtime(self._output_buffer_size)
            
            # 这里应该将音频数据发送到音频输出设备
            # 暂时使用模拟输出
            time.sleep(self._output_buffer_size / self.sample_rate)
            
            # 计算CPU使用率
            process_time = time.time() - start_time
            cpu_usage = process_time / (self._output_buffer_size / self.sample_rate)
            
            self.mix_stats["peak_cpu_usage"] = max(self.mix_stats["peak_cpu_usage"], cpu_usage)
    
    def seek(self, time: float) -> None:
        """跳转到指定时间"""
        with self._playback_lock:
            self._current_time = max(0.0, time)
            self._invalidate_cache()
    
    def get_current_time(self) -> float:
        """获取当前时间"""
        with self._playback_lock:
            return self._current_time
    
    def is_playing(self) -> bool:
        """是否正在播放"""
        return self._is_playing
    
    def export_audio(self, start_time: float, duration: float, 
                    output_path: str, format: str = "wav") -> bool:
        """导出音频文件"""
        try:
            # 混音指定时间段
            mixed_audio = self.mix_tracks(start_time, duration)
            
            # 转换为指定格式
            if format.lower() == "wav":
                # 这里应该使用实际的音频导出库
                # 暂时模拟导出过程
                print(f"导出音频到: {output_path}")
                print(f"格式: {format}, 时长: {duration}s, 采样率: {self.sample_rate}Hz")
                return True
            
            else:
                print(f"不支持的音频格式: {format}")
                return False
        
        except Exception as e:
            print(f"音频导出失败: {e}")
            return False
    
    def analyze_audio(self, start_time: float, duration: float) -> Dict[str, Any]:
        """分析音频"""
        mixed_audio = self.mix_tracks(start_time, duration)
        
        if len(mixed_audio) == 0:
            return {}
        
        # 计算音频特征
        rms = np.sqrt(np.mean(mixed_audio ** 2))
        peak = np.max(np.abs(mixed_audio))
        
        # 转换为dB
        rms_db = 20 * np.log10(max(rms, 1e-10))
        peak_db = 20 * np.log10(max(peak, 1e-10))
        
        # 计算动态范围
        dynamic_range = peak_db - rms_db
        
        return {
            "rms_db": rms_db,
            "peak_db": peak_db,
            "dynamic_range": dynamic_range,
            "duration": duration,
            "sample_rate": self.sample_rate,
            "channels": self.channels
        }
    
    def get_loudness(self, start_time: float, duration: float) -> float:
        """获取响度(LUFS)"""
        mixed_audio = self.mix_tracks(start_time, duration)
        
        if len(mixed_audio) == 0:
            return -70.0  # 静音
        
        # 简化的响度计算(应该使用ITU-R BS.1770标准)
        # 这里使用RMS作为近似
        rms = np.sqrt(np.mean(mixed_audio ** 2))
        lufs = 20 * np.log10(max(rms, 1e-10))
        
        return lufs
    
    def normalize_loudness(self, target_lufs: float = -16.0) -> bool:
        """标准化响度"""
        if not self.tracks:
            return False
        
        # 计算当前响度
        duration = self.get_total_duration()
        current_lufs = self.get_loudness(0.0, duration)
        
        # 计算需要的增益
        gain_db = target_lufs - current_lufs
        gain_linear = 10 ** (gain_db / 20)
        
        # 应用增益到主音量
        self.master_volume *= gain_linear
        
        return True
    
    def get_total_duration(self) -> float:
        """获取总持续时间"""
        if not self.tracks:
            return 0.0
        
        return max(track.get_duration() for track in self.tracks)
    
    def set_master_volume(self, volume: float) -> None:
        """设置主音量"""
        self.master_volume = max(0.0, min(2.0, volume))
        self._invalidate_cache()
    
    def set_master_pan(self, pan: float) -> None:
        """设置主声像"""
        self.master_pan = max(-1.0, min(1.0, pan))
        self._invalidate_cache()
    
    def _invalidate_cache(self) -> None:
        """使缓存失效"""
        with self._cache_lock:
            self._mix_cache.clear()
    
    def get_stats(self) -> Dict[str, Any]:
        """获取统计信息"""
        return {
            "total_tracks": len(self.tracks),
            "total_segments": sum(len(track.segments) for track in self.tracks),
            "mix_stats": self.mix_stats.copy(),
            "sample_rate": self.sample_rate,
            "channels": self.channels,
            "bit_depth": self.bit_depth,
            "total_duration": self.get_total_duration(),
            "is_playing": self.is_playing(),
            "current_time": self.get_current_time()
        }
    
    def to_dict(self) -> Dict[str, Any]:
        """转换为字典"""
        return {
            "sample_rate": self.sample_rate,
            "channels": self.channels,
            "bit_depth": self.bit_depth,
            "master_volume": self.master_volume,
            "master_pan": self.master_pan,
            "master_muted": self.master_muted,
            "tracks": [track.to_dict() for track in self.tracks]
        }
    
    @classmethod
    def from_dict(cls, data: Dict[str, Any]) -> 'AudioMixer':
        """从字典创建"""
        mixer = cls(
            data.get("sample_rate", 44100),
            data.get("channels", 2),
            data.get("bit_depth", 16)
        )
        
        mixer.master_volume = data.get("master_volume", 1.0)
        mixer.master_pan = data.get("master_pan", 0.0)
        mixer.master_muted = data.get("master_muted", False)
        
        # 加载轨道
        for track_data in data.get("tracks", []):
            track = AudioTrack.from_dict(track_data)
            mixer.add_track(track)
        
        return mixer
    
    def __str__(self) -> str:
        """字符串表示"""
        return f"AudioMixer(tracks={len(self.tracks)}, sr={self.sample_rate}Hz, ch={self.channels})"
    
    def __repr__(self) -> str:
        """详细字符串表示"""
        return f"AudioMixer(sample_rate={self.sample_rate}, channels={self.channels}, bit_depth={self.bit_depth})"

12.4 音频系统总结

音频处理与混音系统是剪映小助手中的核心组件,它提供了全面的音频处理功能,从基础的音频播放到复杂的音频特效处理。系统采用分层架构设计,确保了良好的可扩展性和维护性。

12.4.1 核心组件回顾

音频片段模型

  • 提供了完整的音频数据管理功能
  • 支持多种音频格式和参数设置
  • 实现了音频效果链和缓存机制
  • 提供了音频分析和可视化功能

音频效果处理器

  • 基于插件架构的效果器系统
  • 提供了丰富的音频处理效果
  • 支持实时参数调整和预设管理
  • 实现了效果器链和信号流控制

音频轨道管理

  • 支持多轨道音频编辑
  • 提供了轨道级别的控制和处理
  • 实现了片段管理和时间线操作
  • 支持轨道效果和自动化控制

音频混音器

  • 实现了多轨道混音算法
  • 提供了主输出控制和效果处理
  • 支持实时播放和音频导出
  • 实现了性能优化和缓存机制

12.4.2 技术特点

高性能音频处理

  • 使用NumPy进行高效的数组运算
  • 实现了多线程音频处理
  • 提供了智能缓存和预加载机制
  • 支持实时低延迟音频处理

灵活的架构设计

  • 采用分层架构和模块化设计
  • 支持插件式效果器扩展
  • 提供了完整的API接口
  • 实现了配置化和参数化控制

丰富的音频功能

  • 支持多种音频格式和编解码器
  • 提供了专业的音频处理效果
  • 实现了音频分析和可视化
  • 支持音频导出和格式转换

12.4.3 应用场景

音频处理与混音系统在剪映小助手中有着广泛的应用:

视频配音:为视频添加背景音乐、音效和旁白
音频编辑:剪辑、混合和处理音频素材
实时预览:提供低延迟的音频播放和效果预览
音频导出:生成高质量的音频输出文件
音频分析:提供音频特征分析和可视化展示

这个系统为剪映小助手提供了强大的音频处理能力,使用户能够创作出专业水准的音视频作品。通过模块化的设计和丰富的功能,系统能够适应不同的应用场景和用户需求。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值