什么叫Frame Buffer Pitch

什么叫Frame Buffer Pitch


在写ARM的显示驱动时,往往能碰到”Frame Buffer Pitch"这个词。

其实最难理解的还是这个“Pitch”,词典的解释是:

n. 音高, 声音相对的高低; 焦油, 沥青; 投执; 程度, 倾向; 贸易场所; 一英寸内的字符数 (计算机用语)

完全看不出跟“Frame Buffer"有什么关系。Baidu一下好像没什么人关注这个东西,不知道是大家都理解还是没人想弄明白。谷歌了一下发现 在“http://www.x.org”的《How Video Cards Work》中 给出了解释:
Buffers in video ram generally have a stride (also called pitch) associated with them. The stride is the width of the buffer in bytes.

 Pitch”也就是液晶屏每一行与下一行之间的步进量。而《How video Cards Work》这篇文档也却是值得阅读一番。 《How Video Cards Work》原文如下:

Video Cards

So you want to know how modern video cards work. Here goes...

Modern video cards usually have several common features:

  • Video Ram
  • Display control
  • 2D engine
  • 3D engine
  • Overlay
  • HW sprites (cursor, icon, etc.)
  • AGP/PCI/PCIE
  • Apertures (registers, framebuffer)

Video Ram

Basically a large chunk of fast ram. This memory is used for all sorts of things:

  • Scan-out buffers (what you see on your monitor)
  • Offscreen rendering buffers
  • Cursor images
  • Command buffers
  • Vertex data
  • Textures

Buffers in video ram generally have a stride (also called pitch) associated with them. The stride is the width of the buffer in bytes.   For example, if you have a 1024x768 pixel buffer at 16 bits/pixel (2 bytes/pixel), your stride would be:

1024 pixels * 2 bytes/pixel = 2048 bytes

At 32 bits/pixel (4 bytes/pixel), your stride would be:

1024 pixels * 4 bytes/pixel = 4096 bytes

Stride is important as it delineates where each line of the buffer starts and ends. With a linear buffer format, each line of the buffer follows the previous linearly in video ram:

framebuffer address
0              2048            4096
|---------------|---------------|---------------| ... |---------------|

Tiled framebuffers

The above layout is called "linear", because the layout of pixels in memory is like that on the screen: the pixel to the right of the current one on the screen is the one at the next highest address in memory. Tiling is a common variation where pixel layout in memory is not linear, but instead laid out in small squares. For example, a 4x4 tile would look like:

 0  1  2  3
 4  5  6  7
 8  9 10 11
12 13 14 15

In other words, the 4th (zero-based) pixel in memory would be at screen coordinate (0, 1), whereas in linear memory it would be at screen coordinate (4, 0). The pattern then continues: the 16th (zero-based) pixel is screen coordinate (4, 0) instead of (16, 0). The reason for this alternate layout is it makes pixels that are adjacent on the screen also adjacent in memory, which improves cache locality.

Some hardware has multiple levels of tiling. For example, Radeon hardware can have microtiles composed of pixels, and macrotiles composed of microtiles. Sometimes the GPU can hide tiling from the CPU (ie, make tiled regions appear linear to PCI bus accesses).

Display control

Overview

The display cell on most video cards controls the size, timing, and type of signal sent to the monitor. There are 3 elements involved in this:

  1. CRTC or Display Controller
  2. PLLs (pixel clock)
  3. Outputs

CRTCs

CRTC is a jargon term for "CRT controller", and CRTs are those big bulky glass things with pictures on them you see in old movies. Practically speaking, they define a region of pixels you can see.

The crtc controls the size and timing of the signal. This includes the vertical and horizontal sizes and blanking periods. Most cards have 2 or more crtcs. Each crtc can drive one or more outputs. Generally, each crtc can have it's own set of timings. If that crtc is driving more than one output, each output is driven at the same timings. Crtcs can also scan out of different parts of the framebuffer. If you have more than one crtc pointing at the same framebuffer address you have "clone" modes. Clone modes can also be achieved by driving more than one output with one crtc. If you point the crtcs to different parts of the framebuffer, you have dualhead.

On VGA-like signalling, this signal includes sync signals so the monitor can find the edges of the image. A modeline contains the timings (in pixels) where these sync signals are generated, relative to the active pixel times. (For the rest of this discussion we'll use "pixel" to mean "pixel interval" for brevity.) For example:

Modeline "1680x1050R"  119.00  1680 1728 1760 1840  1050 1053 1059 1080 +hsync -vsync

Here, 1680 of 1840 total pixel in each horizontal interval contain actual pixel data, and the horizontal sync pulse runs from pixel 1728 to pixel 1760. 1050 of the 1080 total lines contain actual pixel data, and the vertical sync pulse runs from line 1053 to line 1059. The interval between the end of the active region and the beginning of the sync pulse is called the front porch; the interval between the end of the sync pulse and the end of a line or frame is called the back porch. Sync polarity is set by convention, so the monitor can know which timing formula is in use. Normal modes generated by the GTF or CVT timing formulas are -hsync +vsync. Modes generated by the CVT reduced-blanking formula or by GTF when using a secondary curve are +hsync -vsync. Other polarity combos are occasionally seen for various historical modes.

The stride of a crtc is set to the stride of the buffer it is scanning out of. The stride of the buffer does not have to correspond the size of the crtc mode. This allows you to implement things like virtual desktops (1024x768 mode scanning out of a 2048x2048 pixel virtual desktop) or have multiple crtcs scan out of different parts of the same buffer (two 1024x768 crtcs scanning out of a 2048x768 pixel buffer).

PLLs

The PLLs controls the pixel/video clock. This is the rate at which pixels are sent to the monitor. The higher the vertical refresh rate or resolution of your screen the higher the pixel clock.

The pixel clock is usually generated using the following formula:

pixel clock = (ref freq) * (m/n) * (1/(1 + r))
 
ref freq = the base clock frequency provided by the hardware
m = clock multiplier
n = clock divider
r = clock post divider

Outputs

The outputs convert the data stream sent from the crtc into something the monitor understands. For example a DAC (Digital Analog Converter) converts the digital data stream into an analog signal for your monitor. Some other examples include TMDS (Transition Minimized Differential Signaling) transmitters (converts to the digital format used by DVI and some other connectors), LVDS (Low Voltage Differential Signaling) transmitters (commonly used to connect local flat panels like LCDs on laptops), and TV encoders (converts to an analog TV signal often with image scaling). Outputs can be integrated into the graphics chip or provided as external components (usually connected via a standard interface like DVO (Digital Video Out) or SDVO (Serial Digital Video Out)).

Driver Examples

In most Xorg drivers there are 3 sets functions (usually found in chipname_driver.c) associated with configuring the display controllers:

  • Save() - Saves the current hardware state of the output registers
  • Init() - Initializes the hardware register data structures for the requested output configuration
  • Restore()/Write() - Writes the initialized register values set up in the Init() functions to the hardware
Radeon

Save:

  • RADEONSaveMemMapRegisters() - saves memory map register state
  • RADEONSaveCommonRegisters() - saves common register state
  • RADEONSaveCrtcRegisters() - saves the registers for the primary crtc
  • RADEONSaveFPRegisters() - saves the registers for the panel outputs (RMX, TMDS, LVDS)
  • RADEONSaveCrtc2Registers() - saves the registers for the secondary crtc
  • RADEONSavePLLRegisters() - saves the registers for the primary (crtc1) pixel clock
  • RADEONSavePLL2Registers() - saves the registers for the secondary (crtc2) pixel clock
  • RADEONSavePalette() - saves the palette/CLUT registers
  • RADEONSaveMode() - calls the above functions

Init:

  • RADEONInitOutputRegisters() - Initializes registers for outputs and sets up the crtc to output mapping. Calls output init functions
  • RADEONInitCrtcRegisters() - Initializes registers for crtc1. Calls RADEONInitOutputRegisters() to initialize the outputs driven by crtc1 and RADEONInitPLLRegisters() to set up the pixel clock.
  • RADEONInitCrtc2Registers() - Initializes registers for crtc2. Calls RADEONInitOutputRegisters() to initialize the outputs driven by crtc2 and RADEONInitPLL2Registers() to set up the pixel clock.
  • RADEONInitPLLRegisters() - initialize the pixel clock for crtc1
  • RADEONInitPLL2Registers() - initialize the pixel clock for crtc2
  • RADEONInit2() - calls the above functions

Restore/Write:

  • RADEONRestoreMemMapRegisters() - restore memory map register state
  • RADEONRestoreCommonRegisters() - restore common register state
  • RADEONRestoreCrtcRegisters() - restore the registers for the primary crtc
  • RADEONRestoreFPRegisters() - restore the registers for the panel outputs (RMX, TMDS, LVDS)
  • RADEONRestoreCrtc2Registers() - restore the registers for the secondary crtc
  • RADEONRestorePLLRegisters() - restore the registers for the primary (crtc1) pixel clock
  • RADEONRestorePLL2Registers() - restore the registers for the secondary (crtc2) pixel clock
  • RADEONRestorePalette() - restore the palette/CLUT registers
  • RADEONEnableDisplay() - enables/disables outputs
  • RADEONRestoreMode() - calls the above functions

2D Engine

Overview

The 2D engine (often called a blitter) basically moves data around in video ram. There are generally 4 operations done by the 2D engine: blits (copying data from one place to another), fills (draw a solid color), lines (draws lines), and color expansion (convert mono data to color data; e.g. convert monochrome font glyphs to the depth of your screen: usually 16 or 24 bit color). Logical operations (rops -- raster operations) can also be performed on the data. You have a source and destination buffers (often called surfaces) and these operations will use one or more surfaces. Some, like solid fills, only use a destination (where do I draw the red rectangle). Others like blits require a source and destination (copy this rectangle from address A to address B). Surfaces can (and often do) overlap. Because of this, blitting also has the concept of direction: if you are copying data from overlapping source and destination regions you need to make sure you copy the right data (e.g., top to bottom, right to left, etc.). Data from system memory can also be the source of these operations. This is referred to as a hostdata blit. With hostdata blits, host data is copied into a special region of video ram or into the command queue depending on the chip and from there it is copied to the destination in the framebuffer via the blitter.

2D engines are usually either controlled via direct MMIO access to the relevant registers or via a command queue. With direct MMIO, the appropriate values are written the relevant registers and then the command is usually executed when the last reg in the series is written or when the command register is written (depends on HW). With a command queue, part of the framebuffer is reserved as a command queue (FIFO). Commands and associated data are written sequentially to the queue and processed via the drawing engine.

Solid example

Draw a solid red 200x400 pixel rectangle on the screen at (x,y) location (25, 75).

  1. Set the pitch of your destination surface to the pitch of the screen and set the offset to the offset in video ram where your screen buffer is located.
  2. Set the rop you want to use
  3. Set the color you want
  4. Set the destination rectangle width and height and (x,y) location relative to the surface

Blit Example

Copy a 200x400 pixel rectangle on the screen from (500, 400) to (25, 75).

  1. Set the pitch of your source surface to the pitch of the screen and set the offset to the offset in video ram where your screen buffer is located.
  2. Set the pitch of your destination surface to the pitch of the screen and set the offset to the offset in video ram where your screen buffer is located.
  3. Set the rop you want to use
  4. Set the source rectangle width and height and (x,y) location relative to the source surface
  5. Set the destination rectangle width and height and (x,y) location relative to the destination surface

Xorg Acceleration Examples

  • Blits: XAA ScreentoScreenCopy; EXA Copy
  • Hostdata Blits: XAA ImageWrite, CPUToScreen functions; EXA UploadToScreen
  • Solid Fills: XAA SolidFillRect; EXA Solid
  • Lines: XAA SolidBresenhamLine, SolidTwoPointLine
  • Color Expansion: XAA CPUToScreenColorExpandFill

Driver Examples

Radeon

EXA Solid Fill:

  • RADEONPrepareSolid() - Sets up the hardware state for the solid fill
  • RADEONSolid() - Draws a solid rectangle of size w x h at location (x,y)

EXA Blit:

  • RADEONPrepareCopy() - Sets up the hardware state for the copy
  • RADEONCopy() - Performs a copy of a rectangle of size w x h from (x1,y1) to (x2,y2)

3D Engine

Overview

The 3D engine provides HW to build and rasterize a 3 dimensional scene. Most fixed function hardware has the following layout:

  • Small set of 3D state registers. These control the state of the 3D scene: fog, mipmapping, texturing, blending, etc.
  • 3D engine offset registers. Controls where in the framebuffer the 3D engine renders to
  • Texture control and offset registers. Control texture format and size and where the textures are located
  • Depth buffer control and offset registers. Controls depth buffer layout and location
  • Vertex registers. Used to specify the location and format of the vertexes which make up the 3D scene.

Buffers

Generally 3 buffers are required for 3D:

  1. Front buffer. This is usually the buffer that is scanned out for the user to see.
  2. Back buffer. This is the buffer that is rendered to while that front buffer is being scanned out.
  3. Depth buffer. Also called z-buffer. This buffer is used to determine the relative depth of different object in the 3D scene. This is used to determine which elements are visible and which are obscured.

ToDo: give driver examples

Overlay

Overview

The overlay provides a mechanism for mixing data from multiple framebuffers automatically. It is most often used for mixing YUV (video) and RGB data. Most overlays contain special filtering and scaling hardware along with a colorspace converter. The streams are mixed or blended in several ways (depending on the hardware):

  • Colorkey. Overlay data is overlaid on the primary data stream where the color of the primary stream matches the colorkey RGB color. Generally used to overlay YUV or RGB data on an RGB surface.
  • Chromakey. Same as colorkey but the key is a YUV value rather than RGB. Generally used to overlay RGB or YUV data on a YUV surface.
  • Position/Offset. Overlay data appears at specified position in the scan out buffer.

When an overlay is enabled, data from the overlay framebuffer is automatically mixed into the output stream during the scanout of the visible framebuffer. For example, with colorkeying, the crtc scans out of the primary framebuffer until it hits a region with a color matching the colorkey. At this point, the hardware automatically scans the data out of the overlay buffer.

Most hardware only has one overlay which is often tied to a crtc or can only be sourced to one crtc at a time.

Overlays are most commonly used for video playback and scaling. See Xv.

Driver Examples

Radeon
  • RADEONPutImage() - Prepares and copies overlay data to video ram, then calls RADEONDisplayVideo().
  • RADEONDisplayVideo() - Write the overlay configuration to hardware to display the overlay data.

HW sprites

Overview

HW sprites are small buffers that get blended with the output stream during scan out. The most common examples are HW cursors and HW icons. Sprites are usually limited to small sizes (64x64 or 128x128 pixels) and on older hardware they are limited to 2 colors (newer hardware supports 32 bit ARGB sprites). The cursor image is written to a location in video ram and that image is mixed into the output stream at a particular location during scan out.

ToDo: give driver examples

PCI

PCI is by now the standard bus for connecting video cards to computers. AGP and PCIE merely look like enhanced versions of PCI, as far as the host software is concerned.

PCI devices can present various resources to the host, along with a standardized way of discovering and accessing them. The important ones as far as video is concerned are BARs, or bus address ranges. Each device can present up to 6 BARs, which can function as video memory or register banks. BARs can be either memory or I/O ranges, but are usually memory. There is also an optional "7th BAR", the option ROM, which most video devices support. This is used to support multiple video cards, since the ROM contains the initialization code for the chip, and most system BIOSes will not attempt to initialize more than one card at boot time.

PCI also provides a mechanism for supporting the legacy VGA address space and I/O ports, by allowing the host software to route this space to individual PCI cards. Again, this is mostly used for multi-card setups.

AGP

ToDo: fill me in.

PCIE

ToDo: fill me in.

Apertures

ToDo: fill me in.

ToDo: indexed vs. direct access registers

-- ?AlexDeucher



在以下这段代码的基础上增加短时功率谱以及倒谱的matlab代码%% 读取音频文件 [audio, Fs] = audioread('留得青山在,不怕没柴烧.wav'); audio = mean(audio, 2); % 转为单声道 %% 基础参数计算 duration = length(audio)/Fs; % 音频时长 t = (0:length(audio)-1)/Fs; % 时间轴 %% 噪声分析(假设前50ms为静音段) noise_samples = round(0.05*Fs); % 取前50ms作为噪声样本 noise_energy = rms(audio(1:noise_samples)); % 噪声能量 signal_energy = rms(audio); % 整体能量 SNR = 20*log10(signal_energy/noise_energy); % 信噪比估计 %% 清浊音判断参数设置 frame_size = round(0.03*Fs); % 30ms帧长 overlap = round(0.02*Fs); % 20ms重叠 threshold_ZCR = 0.15; % 过零率阈值 threshold_E = 0.05; % 能量阈值 %% 分帧处理 frames = buffer(audio, frame_size, overlap, 'nodelay'); num_frames = size(frames, 2); %% 特征提取 energy = zeros(1, num_frames); ZCR = zeros(1, num_frames); pitch = zeros(1, num_frames); for n = 1:num_frames frame = frames(:, n); % 短时能量 energy(n) = sum(frame.^2); % 过零率 ZCR(n) = 0.5*sum(abs(diff(sign(frame)))); % 基频估计(自相关法) [r, lags] = xcorr(frame); r = r(lags>=0); [peaks, locs] = findpeaks(r); if ~isempty(peaks) [~, idx] = max(peaks(peaks>0.3*max(peaks))); pitch(n) = Fs/locs(idx+1); end end %% 清浊音分类 voiced = (energy > threshold_E) & (ZCR < threshold_ZCR) & (pitch > 80 & pitch < 300); %% 时频分析 % 语谱图绘制 figure('Color','white','Position',[100 100 1200 800]) subplot(4,1,1) spectrogram(audio, hamming(512), 256, 512, Fs, 'yaxis'); title('语谱图') %% 波形+基频轨迹 subplot(4,1,2) plot(t, audio) hold on frame_time = (0:num_frames-1)*(frame_size-overlap)/Fs + frame_size/(2*Fs); plot(frame_time(voiced), pitch(voiced), 'ro') title('波形与浊音段基频') xlabel('Time (s)'), ylabel('Amplitude/F0 (Hz)') %% 能量曲线 subplot(4,1,3) plot(frame_time, energy/max(energy)) hold on plot([0 duration], [threshold_E threshold_E], 'r--') title('归一化能量曲线') ylabel('Energy') %% 过零率 subplot(4,1,4) plot(frame_time, ZCR/max(ZCR)) hold on plot([0 duration], [threshold_ZCR threshold_ZCR], 'r--') title('归一化过零率') xlabel('Time (s)'), ylabel('ZCR') %% 参数报表输出 %%fprintf('===== 语音特性分析报告 =====\
03-18
function audio_pitch_correction_t3 % 创建主GUI界面 fig = uifigure('Name', '音频音准矫正系统', 'Position', [100 100 900 700]); % 创建音频选择区域 uilabel(fig, 'Position', [50 680 300 20], 'Text', '待矫正音频来源:', 'FontWeight', 'bold'); % 创建录音选项按钮组 source_btn_group = uibuttongroup(fig, 'Position', [50 630 300 40], 'Title', ''); uibutton(source_btn_group, 'Position', [10 10 130 30], 'Text', '导入音频文件', ... 'ButtonPushedFcn', @(btn,event) select_audio(fig, 'source')); uibutton(source_btn_group, 'Position', [160 10 130 30], 'Text', '录制音频', ... 'ButtonPushedFcn', @(btn,event) record_audio(fig)); % 创建参考音频选择按钮 uilabel(fig, 'Position', [400 680 300 20], 'Text', '参考音频来源:', 'FontWeight', 'bold'); uibutton(fig, 'Position', [400 630 150 30], 'Text', '导入参考音频', ... 'ButtonPushedFcn', @(btn,event) select_audio(fig, 'reference')); % 创建处理按钮 process_btn = uibutton(fig, 'Position', [600 630 150 30], ... 'Text', '开始矫正', 'Enable', 'off', ... 'ButtonPushedFcn', @(btn,event) process_audio(fig)); % 创建播放和保存按钮 uibutton(fig, 'Position', [50 580 150 30], 'Text', '播放原始音频', ... 'ButtonPushedFcn', @(btn,event) play_audio(fig, 'source')); uibutton(fig, 'Position', [250 580 150 30], 'Text', '播放矫正音频', ... 'ButtonPushedFcn', @(btn,event) play_audio(fig, 'corrected')); uibutton(fig, 'Position', [450 580 150 30], 'Text', '保存矫正音频', ... 'ButtonPushedFcn', @(btn,event) save_audio(fig)); % 创建录音状态显示 recording_label = uilabel(fig, 'Position', [650 580 200 30], ... 'Text', '准备录音', 'FontColor', [0 0.5 0]); % 创建波形显示区域 ax_source = uiaxes(fig, 'Position', [50 350 800 150]); title(ax_source, '待矫正音频波形'); ax_reference = uiaxes(fig, 'Position', [50 180 800 150]); title(ax_reference, '参考音频波形'); ax_corrected = uiaxes(fig, 'Position', [50 10 800 150]); title(ax_corrected, '矫正后音频波形'); % 存储数据 fig.UserData.source_audio = []; fig.UserData.reference_audio = []; fig.UserData.corrected_audio = []; fig.UserData.fs = 44100; % 默认采样率 fig.UserData.process_btn = process_btn; fig.UserData.axes = struct('source', ax_source, 'reference', ax_reference, 'corrected', ax_corrected); fig.UserData.recording_label = recording_label; fig.UserData.recorder = []; % 录音器对象 fig.UserData.timer = []; % 计时器对象 end function select_audio(fig, audio_type) [file, path] = uigetfile({'*.wav;*.mp3;*.ogg;*.flac;*.mat', ... '音频文件 (*.wav,*.mp3,*.ogg,*.flac,*.mat)'}); if isequal(file, 0), return; end filename = fullfile(path, file); [~, ~, ext] = fileparts(filename); if strcmpi(ext, '.mat') % 加载MAT文件 data = load(filename); % 检查必需字段 if isfield(data, 'corrected_audio') && isfield(data, 'f0_corrected') && ... isfield(data, 'time_source') && isfield(data, 'fs') % 存储数据 fig.UserData.corrected_audio = data.corrected_audio; fig.UserData.f0_corrected = data.f0_corrected; fig.UserData.time_source = data.time_source; fig.UserData.fs = data.fs; % 更新波形显示 ax = fig.UserData.axes.corrected; cla(ax); yyaxis(ax, 'left'); plot(ax, (1:length(data.corrected_audio))/data.fs, data.corrected_audio); ylabel(ax, '幅度'); yyaxis(ax, 'right'); plot(ax, data.time_source, data.f0_corrected, 'Color', [1 0.5 0], 'LineWidth', 2); ylabel(ax, '频率 (Hz)'); title(ax, '矫正后音频波形与音高'); grid(ax, 'on'); return; else errordlg('MAT文件缺少必需的音高数据字段!', '加载错误'); return; end end % 常规音频文件处理 [audio, fs] = audioread(filename); % 处理立体声:转换为单声道 if size(audio, 2) > 1 audio = mean(audio, 2); end % 截取前20秒 max_samples = min(20*fs, length(audio)); audio = audio(1:max_samples); % 存储数据 fig.UserData.([audio_type '_audio']) = audio; fig.UserData.fs = fs; % 更新波形显示 ax = fig.UserData.axes.(audio_type); cla(ax); plot(ax, (1:length(audio))/fs, audio); xlabel(ax, '时间 (s)'); ylabel(ax, '幅度'); % 如果是矫正音频,尝试读取元数据中的音高信息 if strcmp(audio_type, 'corrected') try info = audioinfo(filename); if isfield(info, 'Comment') && ~isempty(info.Comment) metadata = jsondecode(info.Comment); if isfield(metadata, 'f0_corrected') fig.UserData.f0_corrected = metadata.f0_corrected; fig.UserData.time_source = metadata.time_source; % 添加音高曲线 yyaxis(ax, 'left'); plot(ax, (1:length(audio))/fs, audio); ylabel(ax, '幅度'); yyaxis(ax, 'right'); plot(ax, metadata.time_source, metadata.f0_corrected, 'r', 'LineWidth', 1.5); ylabel(ax, '频率 (Hz)'); title(ax, '矫正后音频波形与音高'); grid(ax, 'on'); end end catch % 忽略元数据读取错误 end end % 启用处理按钮 if ~isempty(fig.UserData.source_audio) && ~isempty(fig.UserData.reference_audio) fig.UserData.process_btn.Enable = 'on'; end end function record_audio(fig) % 创建录音界面 record_fig = uifigure('Name', '音频录制', 'Position', [300 300 400 200]); % 录音时长设置 uilabel(record_fig, 'Position', [50 150 100 20], 'Text', '录音时长 (秒):'); duration_edit = uieditfield(record_fig, 'numeric', ... 'Position', [160 150 100 20], 'Value', 5, 'Limits', [1 30]); % 采样率设置 uilabel(record_fig, 'Position', [50 120 100 20], 'Text', '采样率:'); fs_dropdown = uidropdown(record_fig, ... 'Position', [160 120 100 20], ... 'Items', {'8000', '16000', '44100', '48000'}, ... 'Value', '44100'); % 控制按钮 record_btn = uibutton(record_fig, 'Position', [50 70 100 30], ... 'Text', '开始录音', ... 'ButtonPushedFcn', @(btn,event) start_recording(fig, duration_edit.Value, str2double(fs_dropdown.Value))); uibutton(record_fig, 'Position', [160 70 100 30], ... 'Text', '停止录音', ... 'ButtonPushedFcn', @(btn,event) stop_recording(fig)); uibutton(record_fig, 'Position', [270 70 100 30], ... 'Text', '关闭', ... 'ButtonPushedFcn', @(btn,event) close(record_fig)); end function start_recording(fig, duration, fs) % 更新状态 fig.UserData.recording_label.Text = '录音中...'; fig.UserData.recording_label.FontColor = [1 0 0]; drawnow; % 创建录音器对象 recorder = audiorecorder(fs, 16, 1); % 16-bit, 单声道 % 设置录音时长 fig.UserData.recorder = recorder; fig.UserData.fs = fs; % 开始录音 record(recorder, duration); % 创建计时器显示剩余时间 t = timer('ExecutionMode', 'fixedRate', 'Period', 1, ... 'TasksToExecute', duration, ... 'TimerFcn', @(t,~) update_recording_timer(fig, t, duration)); start(t); % 存储计时器 fig.UserData.timer = t; end function update_recording_timer(fig, t, total_duration) elapsed = t.TasksExecuted; remaining = total_duration - elapsed; fig.UserData.recording_label.Text = sprintf('录音中: %d秒', remaining); % 录音结束时自动停止 if remaining <= 0 stop_recording(fig); end end function stop_recording(fig) if ~isempty(fig.UserData.recorder) && isrecording(fig.UserData.recorder) stop(fig.UserData.recorder); end % 停止计时器 if ~isempty(fig.UserData.timer) && isvalid(fig.UserData.timer) stop(fig.UserData.timer); delete(fig.UserData.timer); fig.UserData.timer = []; end % 获取录音数据 audio = getaudiodata(fig.UserData.recorder); fs = fig.UserData.fs; % 更新状态 fig.UserData.recording_label.Text = '录音完成!'; fig.UserData.recording_label.FontColor = [0 0.5 0]; % 存储为待矫正音频 fig.UserData.source_audio = audio; % 更新波形显示 ax = fig.UserData.axes.source; plot(ax, (1:length(audio))/fs, audio); title(ax, '录制音频波形'); xlabel(ax, '时间 (s)'); ylabel(ax, '幅度'); % 启用处理按钮 if ~isempty(fig.UserData.reference_audio) fig.UserData.process_btn.Enable = 'on'; end end function process_audio(fig) source = fig.UserData.source_audio; reference = fig.UserData.reference_audio; fs = fig.UserData.fs; % 确保主图窗存在 if ~isvalid(fig) errordlg('主窗口已关闭,无法处理音频!', '处理错误'); return; end % 创建处理进度对话框 h = uiprogressdlg(fig, 'Title', '处理中', 'Message', '音频对齐...', 'Indeterminate', 'on'); % === 关键修复1: 统一帧参数 === frame_len = round(0.05 * fs); % 50ms帧长 hop_size = round(0.025 * fs); % 25ms跳跃 % 步骤1:音频对齐 try [aligned_source, aligned_ref] = improved_align_audio(source, reference, fs); catch ME close(h); errordlg(['音频对齐失败: ' ME.message], '处理错误'); return; end % 步骤2:基频提取(使用统一帧参数) h.Message = '提取音高...'; try [f0_source, time_source] = extract_pitch(aligned_source, fs, frame_len, hop_size); [f0_ref, time_ref] = extract_pitch(aligned_ref, fs, frame_len, hop_size); % === 关键修复2: 确保时间序列长度一致 === min_time_len = min(length(time_source), length(time_ref)); time_source = time_source(1:min_time_len); time_ref = time_ref(1:min_time_len); f0_source = f0_source(1:min_time_len); f0_ref = f0_ref(1:min_time_len); catch ME close(h); errordlg(['音高提取失败: ' ME.message], '处理错误'); return; end % 步骤3:音调矫正 h.Message = '矫正音调...'; try [corrected, f0_corrected] = correct_pitch(fig, aligned_source, fs, f0_source, f0_ref, time_source, time_ref); catch ME close(h); errordlg(['音高校正失败: ' ME.message], '处理错误'); return; end % 关闭进度对话框 close(h); % 存储矫正结果 fig.UserData.corrected_audio = corrected; % 更新播放按钮状态 play_btn = findobj(fig, 'Text', '播放矫正音频'); if ~isempty(play_btn) play_btn.Enable = 'on'; end % 更新原始音频波形图 ax_src = fig.UserData.axes.source; cla(ax_src); yyaxis(ax_src, 'left'); plot(ax_src, (1:length(aligned_source))/fs, aligned_source, 'b'); ylabel(ax_src, '幅度'); yyaxis(ax_src, 'right'); plot(ax_src, time_source, f0_source, 'r', 'LineWidth', 1.5); ylabel(ax_src, '频率 (Hz)'); title(ax_src, '原始音频波形与音高'); grid(ax_src, 'on'); % 更新参考音频波形图 ax_ref = fig.UserData.axes.reference; cla(ax_ref); yyaxis(ax_ref, 'left'); plot(ax_ref, (1:length(aligned_ref))/fs, aligned_ref, 'g'); ylabel(ax_ref, '幅度'); yyaxis(ax_ref, 'right'); plot(ax_ref, time_ref, f0_ref, 'm', 'LineWidth', 1.5); ylabel(ax_ref, '频率 (Hz)'); title(ax_ref, '参考音频波形与音高'); grid(ax_ref, 'on'); % 更新矫正后音频波形图 ax_corr = fig.UserData.axes.corrected; cla(ax_corr); yyaxis(ax_corr, 'left'); plot(ax_corr, (1:length(corrected))/fs, corrected, 'Color', [0.5 0 0.5]); ylabel(ax_corr, '幅度'); yyaxis(ax_corr, 'right'); plot(ax_corr, time_source, f0_corrected, 'Color', [1 0.5 0], 'LineWidth', 2); ylabel(ax_corr, '频率 (Hz)'); title(ax_corr, '矫正后音频波形与音高'); grid(ax_corr, 'on'); % 存储所有关键数据 fig.UserData.f0_corrected = f0_corrected; fig.UserData.time_source = time_source; fig.UserData.original_fs = fs; % 绘制综合音高对比图 plot_pitch_comparison(time_source, f0_source, time_ref, f0_ref, f0_corrected,... aligned_source, aligned_ref, corrected, fs); end function [aligned_src, aligned_ref] = improved_align_audio(src, ref, fs) % 改进的音频对齐方法:使用频谱互相关 win_size = round(0.1 * fs); % 100ms窗口 hop_size = round(0.05 * fs); % 50ms跳跃 % 计算源音频的频谱图 [S_src, ~, t_src] = spectrogram(src, win_size, win_size-hop_size, win_size, fs); % 计算参考音频的频谱图 [S_ref, ~, t_ref] = spectrogram(ref, win_size, win_size-hop_size, win_size, fs); % 计算互相关 n_frames = min(length(t_src), length(t_ref)); corr_vals = zeros(1, n_frames); for i = 1:n_frames spec_src = abs(S_src(:, i)); spec_ref = abs(S_ref(:, i)); corr_vals(i) = dot(spec_src, spec_ref) / (norm(spec_src) * norm(spec_ref)); end % 找到最大相关帧 [~, max_idx] = max(corr_vals); time_diff = t_src(max_idx) - t_ref(max_idx); sample_diff = round(time_diff * fs); % 对齐音频 if sample_diff > 0 aligned_src = src(1:end-sample_diff); aligned_ref = ref(sample_diff+1:end); else aligned_src = src(-sample_diff+1:end); aligned_ref = ref(1:end+sample_diff); end % 确保等长 min_len = min(length(aligned_src), length(aligned_ref)); aligned_src = aligned_src(1:min_len); aligned_ref = aligned_ref(1:min_len); end function mfcc = mfcc_feature(audio, fs, frame_size, hop_size) % 参数验证 if nargin < 4 hop_size = round(frame_size/2); % 默认50%重叠 end % 预处理:预加重 audio = filter([1 -0.97], 1, audio); % 分帧处理 frames = buffer(audio, frame_size, frame_size - hop_size, 'nodelay'); num_frames = size(frames, 2); % 加窗(汉明窗) window = hamming(frame_size); windowed_frames = frames .* repmat(window, 1, num_frames); % 计算功率谱 nfft = 2^nextpow2(frame_size); mag_frames = abs(fft(windowed_frames, nfft)); power_frames = (mag_frames(1:nfft/2+1, :)).^2; % 设计梅尔滤波器组 num_filters = 26; % 滤波器数量 mel_min = 0; % 最小Mel频率 mel_max = 2595 * log10(1 + (fs/2)/700); % 最大Mel频率 % 创建等间隔的Mel频率点 mel_points = linspace(mel_min, mel_max, num_filters + 2); % 将Mel频率转换为线性频率 hz_points = 700 * (10.^(mel_points/2595) - 1); % 转换为FFT bin索引 bin_indices = floor((nfft+1) * hz_points / fs); % 创建梅尔滤波器组 filter_bank = zeros(num_filters, nfft/2+1); for m = 2:num_filters+1 left = bin_indices(m-1); center = bin_indices(m); right = bin_indices(m+1); % 左侧斜坡 for k = left:center-1 filter_bank(m-1, k+1) = (k - left) / (center - left); end % 右侧斜坡 for k = center:right filter_bank(m-1, k+1) = (right - k) / (right - center); end end % 应用梅尔滤波器组 mel_spectrum = filter_bank * power_frames; % 取对数 log_mel = log(mel_spectrum + eps); % 计算DCT得到MFCC系数 mfcc = dct(log_mel); % 保留前13个系数(含能量系数) mfcc = mfcc(1:13, :); % 可选:添加能量特征 energy = log(sum(power_frames) + eps); mfcc(1, :) = energy; % 替换第0阶MFCC为对数能量 % 应用倒谱均值归一化 (CMN) mfcc = mfcc - mean(mfcc, 2); end function [f0, time] = extract_pitch(audio, fs, frame_len, hop_size) % 参数验证与默认值设置 if nargin < 3 frame_len = round(0.05 * fs); % 默认50ms帧长 end if nargin < 4 hop_size = round(0.025 * fs); % 默认25ms跳跃 end % === 关键修复: 确保所有尺寸参数为整数 === frame_len = round(frame_len); hop_size = round(hop_size); % 安全计算帧数 n_frames = floor((length(audio) - frame_len) / hop_size) + 1; % 基频提取参数 f0_min = 80; % 最低基频(Hz) f0_max = 1000; % 最高基频(Hz) tau_min = max(1, round(fs/f0_max)); % 确保至少为1 tau_max = max(tau_min+1, round(fs/f0_min)); % 确保大于tau_min % 预分配输出 f0 = zeros(1, n_frames); time = zeros(1, n_frames); % 预处理:带通滤波去除噪声 if fs > 2000 % 确保采样率足够高 [b, a] = butter(4, [80, 1000]/(fs/2), 'bandpass'); audio = filtfilt(b, a, audio); end for i = 1:n_frames start_idx = (i-1)*hop_size + 1; end_idx = min(start_idx + frame_len - 1, length(audio)); frame = audio(start_idx:end_idx); % 确保帧长度有效 if length(frame) < 10 f0(i) = 0; time(i) = (start_idx + frame_len/2) / fs; continue; end % === 修复1: 确保tau在有效范围内 === tau_min_valid = max(1, tau_min); tau_max_valid = min(tau_max, length(frame)-1); if tau_max_valid <= tau_min_valid f0(i) = 0; time(i) = (start_idx + frame_len/2) / fs; continue; end % === 改进的YIN算法 === % 自相关函数计算 autocorr = xcorr(frame, 'biased'); autocorr = autocorr(length(frame):end); % 取正延迟部分 % 差分函数计算 diff = zeros(1, tau_max_valid); for tau = tau_min_valid:tau_max_valid diff(tau) = autocorr(1) - 2*autocorr(tau+1) + autocorr(1); end % 累积均值归一化 (CMND) cmnd = diff; cmnd(1) = 1; sum_val = diff(1); for tau = 2:tau_max_valid sum_val = sum_val + diff(tau); cmnd(tau) = diff(tau) * tau / sum_val; end % 寻找全局最小值 [min_val, min_idx] = min(cmnd(tau_min_valid:tau_max_valid)); tau_int = min_idx + tau_min_valid - 1; % 抛物线插值精炼 if tau_int > 1 && tau_int < tau_max_valid if cmnd(tau_int-1) < cmnd(tau_int+1) num = cmnd(tau_int-1) - cmnd(tau_int); denom = cmnd(tau_int-1) - 2*cmnd(tau_int) + cmnd(tau_int+1); delta = num / denom; else num = cmnd(tau_int+1) - cmnd(tau_int); denom = cmnd(tau_int-1) - 2*cmnd(tau_int) + cmnd(tau_int+1); delta = -num / denom; end tau_true = tau_int + delta; f0(i) = fs / tau_true; else f0(i) = fs / tau_int; end time(i) = (start_idx + frame_len/2) / fs; end % === 增强的平滑处理 === % 1. 异常值检测和替换 med_f0 = medfilt1(f0, 5); % 5点中值滤波 diff_ratio = abs(f0 - med_f0) ./ med_f0; outlier_idx = diff_ratio > 0.2; % 20%偏差视为异常 % 2. 三次样条插值替代线性插值 f0(outlier_idx) = NaN; valid_idx = find(~isnan(f0)); if length(valid_idx) > 3 f0 = interp1(time(valid_idx), f0(valid_idx), time, 'spline'); else f0 = fillmissing(f0, 'linear'); end % 3. 自适应高斯平滑 win_size = min(15, floor(length(f0)/10)); % 动态窗口大小 if win_size > 2 gauss_filter = gausswin(win_size); gauss_filter = gauss_filter / sum(gauss_filter); f0 = conv(f0, gauss_filter, 'same'); end % 确保输出长度 if length(f0) > n_frames f0 = f0(1:n_frames); time = time(1:n_frames); end end function [corrected, f0_corrected] = correct_pitch(fig, audio, fs, f0_src, f0_ref, time_src, time_ref) % === 关键修复1: 统一帧参数 === frame_len = round(0.05 * fs); % 50ms帧长 hop_size = round(0.025 * fs); % 25ms跳跃 % === 关键修复2: 安全计算帧数 === n_frames_src = length(f0_src); n_frames_audio = floor((length(audio)-frame_len)/hop_size) + 1; n_frames = min(n_frames_src, n_frames_audio); % 取最小值确保不超限 % 调试信息 fprintf('[DEBUG] 音频帧数: %d, 基频帧数: %d, 使用帧数: %d\n', ... n_frames_audio, n_frames_src, n_frames); % 预分配输出 corrected = zeros(ceil(length(audio)*1.2), 1); f0_corrected = zeros(1, n_frames); % 使用相位声码器 pv = PhaseVocoder(fs, frame_len, hop_size); % 创建参考音高插值函数 (使用平滑后的参考基频) valid_ref = f0_ref > 50; if sum(valid_ref) > 3 % 对参考基频进行额外平滑 smooth_f0_ref = smooth_pitch(f0_ref, 7); ref_interp = @(t) interp1(time_ref(valid_ref), smooth_f0_ref(valid_ref), t, 'pchip', 'extrap'); else ref_interp = @(t) 0; end % 创建相位声码器实例 pv = PhaseVocoder(fs, frame_len, hop_size); for i = 1:n_frames % ... [帧提取部分不变] ... % 安全获取基频值 if i <= length(f0_src) src_f0 = f0_src(i); else src_f0 = f0_src(end); end t_frame = (start_idx + frame_len/2) / fs; target_f0 = ref_interp(t_frame); % === 智能矫正逻辑 === if ~isnan(src_f0) && src_f0 > 50 && target_f0 > 50 % 计算半音差 semitone_diff = 12 * log2(target_f0 / src_f0); % 动态范围限制 max_diff = min(24, 6 + 0.2*abs(semitone_diff)); % 自适应阈值 semitone_diff = max(-max_diff, min(max_diff, semitone_diff)); % 使用相位声码器处理 corrected_frame = pv.process(frame, semitone_diff); f0_corrected(i) = src_f0 * 2^(semitone_diff/12); % 实际应用后的频率 else % 不进行矫正的区域 corrected_frame = frame; f0_corrected(i) = src_f0; end % === 改进的OLA合成 === output_start = (i-1)*hop_size + 1; output_end = output_start + length(corrected_frame) - 1; % 扩展输出数组 if output_end > length(corrected) corrected(output_end) = 0; end % 应用双向渐变的汉宁窗 win = hann(length(corrected_frame)); % 重叠区域特殊处理 overlap_start = max(1, output_start - hop_size + 1); overlap_end = min(output_end, output_start + hop_size - 1); overlap_len = overlap_end - overlap_start + 1; if overlap_len > 0 % 创建渐变混合窗 blend_win = linspace(0, 1, overlap_len)'; prev_win = blend_win; curr_win = 1 - blend_win; % 应用混合 overlap_idx = overlap_start:overlap_end; corrected(overlap_idx) = corrected(overlap_idx).*prev_win + ... corrected_frame(1:overlap_len).*curr_win; % 处理非重叠部分 if output_start < overlap_start start_idx_new = output_start:(overlap_start-1); corrected(start_idx_new) = corrected(start_idx_new); end if output_end > overlap_end end_idx_new = (overlap_end+1):output_end; corrected(end_idx_new) = corrected_frame((overlap_len+1):end) .* ... win((overlap_len+1):end); end else % 无重叠区域 corrected(output_start:output_end) = corrected(output_start:output_end) + ... corrected_frame .* win; end end % === 关键修复5: 安全裁剪输出 === last_sample = find(corrected ~= 0, 1, 'last'); if ~isempty(last_sample) corrected = corrected(1:last_sample); else corrected = zeros(1, 0); % 返回空数组 end % 归一化 if ~isempty(corrected) max_val = max(abs(corrected)); if max_val > 0 corrected = corrected / max_val; end end % === 关键修复6: 确保f0_corrected长度正确 === if length(f0_corrected) > n_frames f0_corrected = f0_corrected(1:n_frames); end end function smoothed = smooth_pitch(pitch, window_size) % 自适应中值-高斯混合滤波 % 步骤1:中值滤波去除尖峰 med_smoothed = medfilt1(pitch, window_size, 'omitnan', 'truncate'); % 步骤2:计算局部方差 local_var = movvar(med_smoothed, window_size); avg_var = mean(local_var, 'omitnan'); % 步骤3:自适应高斯滤波 gauss_window = window_size * 2 - 1; % 更大的高斯窗口 gauss_filter = gausswin(gauss_window); gauss_filter = gauss_filter / sum(gauss_filter); % 扩展边界处理 padded = [repmat(med_smoothed(1), 1, window_size-1), med_smoothed, ... repmat(med_smoothed(end), 1, window_size-1)]; % 应用卷积 conv_result = conv(padded, gauss_filter, 'same'); smoothed = conv_result(window_size:end-window_size+1); % 步骤4:基于方差的平滑强度调整 high_var_idx = local_var > 2*avg_var; if any(high_var_idx) % 在高方差区域应用额外平滑 extra_smooth = movmean(smoothed, 5); smoothed(high_var_idx) = extra_smooth(high_var_idx); end % 确保长度一致 smoothed = smoothed(1:length(pitch)); end % 长度受控的相位声码器 function y = controlled_phase_vocoder(x, den, num, expected_len) % 基本参数 n = 1024; % FFT大小 hop_in = round(n * 0.25); % 输入跳跃(25%重叠) hop_out = round(hop_in * den/num); % 输出跳跃 % 初始化 y = zeros(expected_len, 1, 'like', x); theta = zeros(n,1); y_pos = 1; % 处理所有帧 for start = 1:hop_in:length(x)-n % 当前帧 frame = x(start:start+n-1); X = fft(frame .* hamming(n)); % 相位处理 mag = abs(X); phase_diff = angle(X) - theta; theta = angle(X); % 相位累积 delta_phi = 2*pi*hop_in*(0:n-1)'/n; phase_diff = phase_diff - delta_phi; phase_diff = wrapToPi(phase_diff); % 频率估计 omega = (phase_diff + delta_phi) / hop_in; % 相位传播 theta = theta + hop_out * omega; % 重建帧 y_frame = real(ifft(mag .* exp(1i*theta))); % 安全写入 end_pos = min(y_pos+n-1, expected_len); valid_len = end_pos - y_pos + 1; if valid_len > 0 y(y_pos:end_pos) = y(y_pos:end_pos) + y_frame(1:valid_len) .* hamming(valid_len); end % 更新位置 y_pos = y_pos + hop_out; end end % 安全帧提取函数 function [frame, valid] = safe_frame(x, start, len) if start < 1 || start+len-1 > length(x) frame = zeros(len, 1); if start < 1 valid_part = x(1:min(len+start-1, length(x))); frame(1-start+1:end) = valid_part; else valid_part = x(start:min(start+len-1, length(x))); frame(1:length(valid_part)) = valid_part; end valid = false; else frame = x(start:start+len-1); valid = true; end end % function [corrected, f0_corrected] = correct_pitch(fig, audio, fs, f0_src, f0_ref, time_src, time_ref) % % 创建进度条 % h = uiprogressdlg(fig, 'Title', '处理中', 'Message', '音高校正...'); % % frame_len = round(0.05 * fs); % 50ms帧长 % hop_size = round(0.025 * fs); % 25ms跳跃 % n_frames = floor((length(audio)-frame_len)/hop_size) + 1; % corrected = zeros(size(audio)); % f0_corrected = zeros(1, n_frames); % % % 创建参考音高插值函数 % valid_ref = f0_ref > 0; % if any(valid_ref) % ref_interp = @(t) interp1(time_ref(valid_ref), f0_ref(valid_ref), t, 'linear', 'extrap'); % else % ref_interp = @(t) 0; % end % % for i = 1:n_frames % % 计算当前帧位置 % start_idx = (i-1)*hop_size + 1; % end_idx = start_idx + frame_len - 1; % frame = audio(start_idx:end_idx); % % % 查找当前帧对应的目标音高 % t_frame = (start_idx + frame_len/2) / fs; % target_f0 = ref_interp(t_frame); % % if f0_src(i) > 0 && target_f0 > 0 % % 使用对数比例(音乐音高是几何级数) % semitone_diff = 12 * log2(target_f0 / f0_src(i)); % % % 限制最大校正范围(±12半音) % semitone_diff = max(-12, min(12, semitone_diff)); % % % 转换为频率比例 % target_ratio = 2^(semitone_diff/12); % % % 使用相位声码器 % corrected_frame = phase_vocoder(frame, target_ratio, fs); % % f0_corrected(i) = target_f0; % else % corrected_frame = frame; % f0_corrected(i) = f0_src(i); % end % % % % 重叠相加 % frame_end_idx = start_idx + length(corrected_frame) - 1; % if frame_end_idx <= length(corrected) % corrected(start_idx:frame_end_idx) = ... % corrected(start_idx:frame_end_idx) + corrected_frame .* hamming(length(corrected_frame)); % end % % % 更新进度条 % h.Value = i/n_frames; % h.Message = sprintf('处理进度: %d/%d 帧 (%.1f%%)', i, n_frames, i/n_frames*100); % end % % % === 关键修复 3: 数据格式处理 === % corrected = real(corrected); % 确保实数 % max_amp = max(abs(corrected)); % if max_amp > 0 % corrected = corrected / max_amp; % else % corrected = zeros(size(corrected)); % 处理全零情况 % end % if ~isa(corrected, 'double') % corrected = double(corrected); % end % % % 归一化防止削波 % max_amp = max(abs(corrected)); % if max_amp > 0 % corrected = corrected / max_amp; % end % % close(h); % end function plot_pitch_comparison(time_src, f0_src, time_ref, f0_ref, f0_corrected, src_wave, ref_wave, corr_wave, fs) % 确保所有序列长度一致 min_length = min([length(time_src), length(time_ref), length(f0_corrected)]); time_src = time_src(1:min_length); f0_src = f0_src(1:min_length); time_ref = time_ref(1:min_length); f0_ref = f0_ref(1:min_length); f0_corrected = f0_corrected(1:min_length); % 创建综合音高对比图(包含波形和音高) pitch_fig = figure('Name', '音频波形与音高分析', 'Position', [100 100 900 800]); % 原始音频波形 + 音高 subplot(3,1,1); time_wave_src = (1:length(src_wave)) / fs; yyaxis left; plot(time_wave_src, src_wave, 'Color', [0.7 0.7 1], 'LineWidth', 0.5); ylabel('幅度'); ylim([-1.1 1.1]); % 固定幅度范围 yyaxis right; plot(time_src, f0_src, 'b', 'LineWidth', 1.5); hold on; plot(time_ref, f0_ref, 'r--', 'LineWidth', 1.5); hold off; title('原始音频波形与音高'); xlabel('时间 (s)'); ylabel('频率 (Hz)'); legend('原始波形', '原始音高', '参考音高', 'Location', 'best'); grid on; % 参考音频波形 + 音高 subplot(3,1,2); time_wave_ref = (1:length(ref_wave)) / fs; yyaxis left; plot(time_wave_ref, ref_wave, 'Color', [1 0.7 0.7], 'LineWidth', 0.5); ylabel('幅度'); ylim([-1.1 1.1]); % 固定幅度范围 yyaxis right; plot(time_ref, f0_ref, 'r', 'LineWidth', 1.5); title('参考音频波形与音高'); xlabel('时间 (s)'); ylabel('频率 (Hz)'); legend('参考波形', '参考音高', 'Location', 'best'); grid on; % 矫正后音频波形 + 音高 subplot(3,1,3); time_wave_corr = (1:length(corr_wave)) / fs; yyaxis left; plot(time_wave_corr, corr_wave, 'Color', [0.7 1 0.7], 'LineWidth', 0.5); ylabel('幅度'); ylim([-1.1 1.1]); % 固定幅度范围 yyaxis right; plot(time_src, f0_src, 'b:', 'LineWidth', 1); hold on; plot(time_ref, f0_ref, 'r--', 'LineWidth', 1); plot(time_src, f0_corrected, 'g', 'LineWidth', 2); hold off; title('矫正后音频波形与音高'); xlabel('时间 (s)'); ylabel('频率 (Hz)'); legend('矫正波形', '原始音高', '参考音高', '矫正音高', 'Location', 'best'); grid on; % 添加音高误差分析 valid_idx = (f0_src > 0) & (f0_ref > 0) & (f0_corrected > 0); if any(valid_idx) src_error = mean(abs(f0_src(valid_idx) - f0_ref(valid_idx))); corr_error = mean(abs(f0_corrected(valid_idx) - f0_ref(valid_idx))); annotation(pitch_fig, 'textbox', [0.15 0.05 0.7 0.05], ... 'String', sprintf('原始音高平均误差: %.2f Hz | 矫正后音高平均误差: %.2f Hz | 改进: %.1f%%', ... src_error, corr_error, (src_error - corr_error)/src_error*100), ... 'FitBoxToText', 'on', 'BackgroundColor', [0.9 0.9 0.9], ... 'FontSize', 12, 'HorizontalAlignment', 'center'); end end function play_audio(fig, audio_type) if ~isvalid(fig) errordlg('主窗口无效!', '播放错误'); return; end switch audio_type case 'source' audio = fig.UserData.source_audio; title_text = '播放原始音频'; if isempty(audio) errordlg('未找到原始音频数据!', '播放错误'); return; end case 'corrected' audio = fig.UserData.corrected_audio; title_text = '播放矫正音频'; if isempty(audio) errordlg('请先完成音高校正!', '播放错误'); return; end otherwise return; end fs = fig.UserData.fs; player = audioplayer(audio, fs); % 创建播放控制界面 play_fig = uifigure('Name', title_text, 'Position', [500 500 300 150]); % 播放进度条 ax = uiaxes(play_fig, 'Position', [50 100 200 20]); hold(ax, 'on'); prog_line = plot(ax, [0 0], [0 1], 'b', 'LineWidth', 2); % 垂直范围[0,1] hold(ax, 'off'); xlim(ax, [0 1]); ylim(ax, [0 1]); set(ax, 'XTick', [], 'YTick', []); % 播放时间显示 time_label = uilabel(play_fig, 'Position', [50 80 200 20], ... 'Text', '00:00 / 00:00', 'HorizontalAlignment', 'center'); % 控制按钮 uibutton(play_fig, 'Position', [50 30 60 30], 'Text', '播放', ... 'ButtonPushedFcn', @(btn,event) play(player)); uibutton(play_fig, 'Position', [120 30 60 30], 'Text', '暂停', ... 'ButtonPushedFcn', @(btn,event) pause(player)); uibutton(play_fig, 'Position', [190 30 60 30], 'Text', '停止', ... 'ButtonPushedFcn', @(btn,event) stop(player)); % 总时长计算 total_time = length(audio)/fs; mins = floor(total_time/60); secs = round(total_time - mins*60); total_str = sprintf('%02d:%02d', mins, secs); % 更新播放进度回调 player.TimerFcn = {@update_playback, play_fig, time_label, total_str, prog_line, length(audio)}; player.TimerPeriod = 0.1; % 更新频率(秒) player.StopFcn = @(src,event) stop_playback(src, event, play_fig); end function stop_playback(src, ~, fig) stop(src); if isvalid(fig) close(fig); end end function save_audio(fig) if ~isvalid(fig) || isempty(fig.UserData.corrected_audio) errordlg('无有效音频数据可保存!', '保存错误'); return; end % 获取所有相关数据 corrected_audio = fig.UserData.corrected_audio; f0_corrected = fig.UserData.f0_corrected; time_source = fig.UserData.time_source; fs = fig.UserData.original_fs; % 创建元数据结构 metadata = struct(); metadata.f0_corrected = f0_corrected; metadata.time_source = time_source; metadata.fs = fs; metadata.creation_date = datestr(now); metadata.pitch_correction_info = 'Generated by Audio Pitch Correction System'; % 提示用户保存 [file, path] = uiputfile({'*.wav', 'WAV文件 (*.wav)'; '*.mat', 'MATLAB数据文件 (*.mat)'}, ... '保存矫正音频和音高数据'); if isequal(file, 0), return; end filename = fullfile(path, file); [~, ~, ext] = fileparts(filename); if strcmpi(ext, '.wav') % 保存为WAV文件并嵌入元数据 audiowrite(filename, corrected_audio, fs, ... 'BitsPerSample', 24, ... 'Comment', jsonencode(metadata)); msgbox('音频和音高数据保存成功!', '完成'); elseif strcmpi(ext, '.mat') % 保存为MAT文件 save(filename, 'corrected_audio', 'f0_corrected', 'time_source', 'fs'); msgbox('完整数据保存成功!', '完成'); end end 显示Size输入必须为整数
最新发布
06-16
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

lionchan187

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值