qt+gxdi+ffmpeg远程控制（九）

zdsji

已于 2024-09-05 16:23:38 修改

阅读量1k

点赞数 7

分类专栏： qt+gxdi+ffmpeg远程控制文章标签： qt ffmpeg 开发语言

于 2024-09-05 16:21:13 首次发布

本文链接：https://blog.youkuaiyun.com/zdsji/article/details/141930262

版权

qt+gxdi+ffmpeg远程控制专栏收录该内容

9 篇文章

订阅专栏

好久没更新了，在学校比较忙，大家见谅。

本篇介绍根据wasapi捕获音频数据，将捕获到的数据传给ffmpeg编码为aac，与前文捕获屏幕并编码后的h264数据合并存储为mp4文件。

qt+gxdi+ffmpeg远程控制（八）

wasapi

wasapi是windows给出的音频api，详细就不再介绍了，有兴趣可以自行百度。这里注意在使用前要在同线程调用CoInitialize()，但与ffmpeg有些冲突，在初始化h264编码器时必须执行CoUninitialize()，否则avcodec_open2打开编码器失败，具体原因不明。

枚举音频设备

IMMDeviceEnumerator* enumerator = nullptr;
IMMDeviceCollection* device_collection = nullptr;

HRESULT result = CoCreateInstance(__uuidof(MMDeviceEnumerator), nullptr,
CLSCTX_ALL, IID_PPV_ARGS(&enumerator));

enumerator->EnumAudioEndpoints(eAll, eConsole | eMultimedia, &device_collection);

以上是关键代码，CoCreateInstance创建出枚举对象enumerator，随后调用enumerator对象的EnumAudioEndpoints生成满足指定条件的音频设备对象。

HRESULT EnumAudioEndpoints(
[in] EDataFlow dataFlow,
[in] DWORD dwStateMask,
[out] IMMDeviceCollection **ppDevices
);

其中dataFlow表示数据流方向，即输入数据流/输出数据流，对应的也就是扬声器与麦克风。

dwStateMask参数设定与windowsApi给出的不一样，网上查找的结果与obs源码中都是这么写的，采用的是ERole枚举。

eConsole   与计算机的交互   游戏和系统通知   语音命令

eCommunications   与其他人的语音通信   聊天和 VoIP   聊天和 VoIP
eMultimedia   播放或录制音频内容   音乐和电影   旁白和实时音乐录制

最后，函数将结果存放到pDevices指针中，这里会自动分配内存。

可以用IPropertyStore对象获取音频设备的详细信息，代码中用来获取设备名称。

Microsoft::WRL::ComPtr<IPropertyStore> property_store;
PROPVARIANT name_var;
temp = new AudioCapture();
device_collection->Item(i, &temp->m_device);
temp->m_device->GetId(&device_id);
temp->m_device->OpenPropertyStore(STGM_READ, property_store.GetAddressOf());
PropVariantInit(&name_var);
property_store->GetValue(PKEY_Device_FriendlyName, &name_var);

其中GetValue的第一个参数用来标识获取的属性名称，具体其他属性参考windows api

初始化设备

以扬声器为例，捕获音频需要IAudioCaptureClient对象，获取步骤为：IMMDevice $\rightarrow$ IAudioClient $\rightarrow$ IAudioCaptureClient。

	HRESULT hr = S_OK;
	hr = this->m_device->Activate(__uuidof(IAudioClient), CLSCTX_ALL, NULL, reinterpret_cast<void**>(&this->m_audio_client));

	WAVEFORMATEX* mic_format = nullptr;
	// 获取音频引擎内如共享模式程序的音频流格式
	hr = this->m_audio_client->GetMixFormat(&mic_format);
	PWAVEFORMATEXTENSIBLE pEx = NULL;
	switch (mic_format->wFormatTag) {
	case WAVE_FORMAT_IEEE_FLOAT:
		break;
	case WAVE_FORMAT_EXTENSIBLE:
		pEx = reinterpret_cast<PWAVEFORMATEXTENSIBLE>(mic_format);
		if (IsEqualGUID(KSDATAFORMAT_SUBTYPE_IEEE_FLOAT, pEx->SubFormat)) {
			pEx->SubFormat = KSDATAFORMAT_SUBTYPE_PCM;
			pEx->Samples.wValidBitsPerSample = 16;

			mic_format->wBitsPerSample = 16;
			mic_format->nBlockAlign = mic_format->nChannels * mic_format->wBitsPerSample / 8;
			mic_format->nAvgBytesPerSec = mic_format->nBlockAlign * mic_format->nSamplesPerSec;
		}
		else {
			CoTaskMemFree(mic_format);
			return;
		}
		break;
	default:
		break;
	}
	this->m_sample_rate = mic_format->nSamplesPerSec;
	this->m_sample_size = (mic_format->wBitsPerSample / 8) * mic_format->nChannels;
	/*
	*	AUDCLNT_STREAMFLAGS_EVENTCALLBACK允许设置事件通知回调 SetEventHandle才会有效果
	*/
	hr = this->m_audio_client->Initialize(AUDCLNT_SHAREMODE_SHARED, AUDCLNT_STREAMFLAGS_EVENTCALLBACK | AUDCLNT_STREAMFLAGS_LOOPBACK, BUFFER_TIME_100NS, 0, mic_format, NULL);
	this->m_audio_samples_ready_event = CreateEvent(NULL, FALSE, FALSE, NULL);

	//设置事件通知对象
	hr = this->m_audio_client->SetEventHandle(this->m_audio_samples_ready_event);

	//生成采集服务
	hr = this->m_audio_client->GetService(__uuidof(IAudioCaptureClient), (void**)&this->m_capture_client);

其中device由上一步枚举音频设备时获取，这里关键的是对mic_format的处理，指定捕获音频为pcm，类型为AV_SAMPLE_FMT_S16，写法参考window api。

之后用Initialize方法对IAudioClient初始化，其中第二个参数添加AUDCLNT_STREAMFLAGS_EVENTCALLBACK，用来允许设置事件通知对象，该事件通知对象用来判断扬声器是否有声音输出。

音频采集

开始采集

调用上一步获取的IAudioCaptureClient对象中Start方法开始捕获音频，之后用GetBuffer获取捕获的音频。GetBuffer需要主动调用，因此单开线程调用方法，获取音频数据。

/开始采集
this->m_mutex = new std::shared_mutex();
HRESULT hr = this->m_audio_client->Start();
this->m_alive = true;
this->m_capture_thread = new std::thread(&AudioCapture::capture, this);

采集线程

以下是线程执行的函数，当有事件通知时即扬声器有输出时才继续执行。

    HRESULT hr;
	byte* data = nullptr;
	uint32_t sample_count = 0, packet_length = 0;
	DWORD flag;
	HANDLE wait_array[3];
	wait_array[0] = this->m_audio_samples_ready_event;
	uint64_t ts = 0;
	while (this->m_alive) {
		DWORD waitResult = WaitForMultipleObjects(1, wait_array, FALSE, INFINITE);
		switch (waitResult)
		{
		case WAIT_OBJECT_0 + 0:
			hr = this->m_capture_client->GetBuffer(&data, &sample_count, &flag, NULL, &ts);
			if (flag & AUDCLNT_BUFFERFLAGS_SILENT)
			{
				// printf("AUDCLNT_BUFFERFLAGS_SILENT \n");
			}

			if (flag & AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY)
			{
				//printf("%06d # AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY \n", nCnt);
			}

			if (flag & AUDCLNT_BUFFERFLAGS_TIMESTAMP_ERROR)
			{
				//printf("%06d # AUDCLNT_BUFFERFLAGS_TIMESTAMP_ERROR \n", nCnt);
			}

			if (this->captureCallback != nullptr) {
				this->captureCallback(data, sample_count, this->m_sample_size);
			}

			hr = this->m_capture_client->ReleaseBuffer(sample_count);

			if (hr != S_OK) {
				// printf("release error");
			}
			hr = this->m_capture_client->GetNextPacketSize(&packet_length);
			if (hr != S_OK) {
				// printf("GetNextPacketSize error");
			}
			break;
		default:
			break;
		}
	}

GetBuffer方法用来获取音频数据，api定义如下

HRESULT GetBuffer(
[out] BYTE **ppData,
[out] UINT32 *pNumFramesToRead,
[out] DWORD *pdwFlags,
[out] UINT64 *pu64DevicePosition,
[out] UINT64 *pu64QPCPosition
);

ppData用来获取数据地址

pNumFramesToRead获取捕获的帧数，实测在我的电脑上是480，注意这里是帧计数，实际大小需要乘帧长度，即上一步中m_sample_size，以AV_SAMPLE_FMT_S16为例，计算得出m_sample_size为4，也就是说获取的帧数为480，数据大小为480*4=1920。

pdwFlags缓冲区状态，枚举如下

AUDCLNT_BUFFERFLAGS_DATA_DISCONTINUITY
数据包中的数据与上一个数据包的设备位置不相关;这可能是由于流状态转换或计时故障造成的。
AUDCLNT_BUFFERFLAGS_SILENT
将数据包中的所有数据视为静音，并忽略实际数据值。有关使用此标志的详细信息，请参阅呈现Stream和捕获Stream。
AUDCLNT_BUFFERFLAGS_TIMESTAMP_ERROR
设备流位置的记录时间不确定。因此，客户端可能无法准确设置当前数据包的时间戳。

pu64DevicePosition设备位置表示为从流开始的音频帧数，不需要，设置为NULL。

pu64QPCPosition性能计数器值，目前没有用到。

随后采用回调方式送入编码器，这里注意每次成功GetBuffer后要ReleaseBuffer。

ffmpeg编码为aac

初始化编码器

	this->m_audio_encode_codec = const_cast<AVCodec*>(avcodec_find_encoder(AV_CODEC_ID_AAC));
	if (this->m_audio_encode_codec == nullptr) {
		// TODO log

		return false;
	}
	this->m_audio_encode_codec_ctx = avcodec_alloc_context3(this->m_audio_encode_codec);
	this->m_audio_encode_codec_ctx->sample_rate = _sample_rate;
	this->m_audio_encode_codec_ctx->strict_std_compliance = FF_COMPLIANCE_EXPERIMENTAL;
	this->m_audio_encode_codec_ctx->sample_fmt = AV_SAMPLE_FMT_FLTP;
	this->m_audio_encode_codec_ctx->bit_rate = _bit_rate;
	this->m_audio_encode_codec_ctx->ch_layout = AV_CHANNEL_LAYOUT_STEREO;
	if (avcodec_open2(this->m_audio_encode_codec_ctx, this->m_audio_encode_codec, NULL) != 0) {
		// TODO log

		return false;
	}

其中，AV_CODEC_ID_AAC表示获取aac编码器。

sample_rate表示采样率，这里为了方便就与音频捕获的采样率相同，在我电脑上为48000。

sample_fmt为数据格式，这里aac编码器需要AV_SAMPLE_FMT_FLTP格式，没有测试是否支持其他格式。可以用如下代码测试：

    const AVPixelFormat* pfmt = this->m_audio_encode_codec->pix_fmts;
	while (pfmt != NULL && *pfmt != AV_PIX_FMT_NONE) {
		const char* name = av_get_pix_fmt_name(*pfmt);
		std::cout << pfmt << ": " << name << std::endl;
		pfmt++;
	}

bit_rate对应obs设置中音频比特率。

重点，需要将strict_std_compliance设置为FF_COMPLIANCE_EXPERIMENTAL，因为ffmpeg音频编码器对aac是默认关闭的，设置以方便打开编码器。

初始化frame与packet

    this->m_audio_encode_frame = av_frame_alloc();
	this->m_audio_encode_frame->pts = 0;
	this->m_audio_encode_frame->nb_samples = this->m_audio_encode_codec_ctx->frame_size;
	this->m_audio_encode_frame->format = this->m_audio_encode_codec_ctx->sample_fmt;
	this->m_audio_encode_frame->ch_layout = this->m_audio_encode_codec_ctx->ch_layout;
	this->m_audio_encode_frame->sample_rate = this->m_audio_encode_codec_ctx->sample_rate;
	auto _buff_size = av_samples_get_buffer_size(NULL, this->m_audio_encode_frame->ch_layout.nb_channels, this->m_audio_encode_codec_ctx->frame_size, this->m_audio_encode_codec_ctx->sample_fmt, 1);
	this->m_audio_frame_buf = (uint8_t*)av_malloc(_buff_size);
	int r = avcodec_fill_audio_frame(this->m_audio_encode_frame, this->m_audio_encode_frame->ch_layout.nb_channels, this->m_audio_encode_codec_ctx->sample_fmt, this->m_audio_frame_buf, _buff_size, 1);

	this->m_audio_encode_packet = av_packet_alloc();
	this->m_audio_encode_buffer = new uint8_t[10240];

这里就是简单的，对编码时需要用到的对象初始化，m_audio_encode_buffer用来缓存音频数据，原因后面说。

音频重采样

	if (swr_alloc_set_opts2(&this->m_audio_swr_ctx, &this->m_audio_encode_codec_ctx->ch_layout, this->m_audio_encode_codec_ctx->sample_fmt, this->m_audio_encode_codec_ctx->sample_rate, &this->m_audio_encode_codec_ctx->ch_layout, _input_format, this->m_audio_encode_codec_ctx->sample_rate, 0, nullptr) != 0) {
		// TODO log

		return false;
	}
	swr_init(this->m_audio_swr_ctx);
	this->m_audio_swr_buffer = new uint8_t[10240];

这里主要是改变音频数据的采样率与格式，也就是sample_rate与sample_fmt。

m_audio_swr_buffer用来接收重采样之后的数据。

接下来是采集线程的回调方法，首先需要重采样音频数据。

	// 转换
	size_t size = _sample_count * _sample_size;
	uint8_t* in[2] = { _data, nullptr };
	uint8_t* out[2] = { this->m_audio_swr_buffer, this->m_audio_swr_buffer + size };
	int count = swr_convert(this->m_audio_swr_ctx, out, _sample_count, const_cast<const uint8_t**>(in), _sample_count);
	if (this->m_filter != nullptr) {
		this->m_filter->afterAudioSwr(out[0], _sample_count, this);
	}
	// 缓存保证传入编码器数据为1024 * 4  aac规定
	memcpy_s(this->m_audio_encode_buffer + this->m_audio_encode_buffer_size, size, out[0], size);
	this->m_audio_encode_buffer_size += size;

由于aac编码器规定传入的帧长度为1024，因此需要m_audio_encode_buffer缓存重采样之后的数据。其中SENDFRAMESIZE=1024*4，原因与音频采集部分相同，帧长度*单个帧所占字节数，这里直接写死了，因为aac传入格式为AV_SAMPLE_FMT_FLTP。

	while (this->m_audio_encode_buffer_size >= SENDFRAMESIZE) {
		memcpy_s(m_audio_frame_buf, SENDFRAMESIZE, this->m_audio_encode_buffer, SENDFRAMESIZE);
		memcpy_s(m_audio_frame_buf + SENDFRAMESIZE, SENDFRAMESIZE, this->m_audio_encode_buffer, SENDFRAMESIZE);
		avcodec_send_frame(this->m_audio_encode_codec_ctx, this->m_audio_encode_frame);
		memcpy_s(this->m_audio_encode_buffer, SENDFRAMESIZE, this->m_audio_encode_buffer + SENDFRAMESIZE, SENDFRAMESIZE);
		this->m_audio_encode_buffer_size -= SENDFRAMESIZE;
	}
	while (avcodec_receive_packet(this->m_audio_encode_codec_ctx, this->m_audio_encode_packet) >= 0) {
		if (this->m_filter != nullptr) {
			this->m_filter->afterAudioEncode(this->m_audio_encode_packet);
		}
		av_packet_unref(this->m_audio_encode_packet);
	}

这里贴一下仿照obs写的实时获取音频输入的音量，传入_data为重采样之后的frame.data：

void MemCodecAcc::calculateVolum(uint8_t** _data, size_t _sample_count)
{
	float r[2] = { 0.0 };
	for (int i = 0; i < 2; ++i) {
		float* samples = reinterpret_cast<float*>(_data[i]);
		__m128 peek = _mm_loadu_ps(samples);
		for (size_t i = 0; (i + 3) < _sample_count; i += 4) {
			__m128 new_work = _mm_load_ps(&samples[i]);
			peek = _mm_max_ps(peek, _mm_andnot_ps(_mm_set1_ps(-0.f), new_work));
		}
		float x4_mem[4];
		float mul = 1.0f;
		_mm_storeu_ps(x4_mem, peek);
		r[i] = x4_mem[0];
		r[i] = fmaxf(r[i], x4_mem[1]);
		r[i] = fmaxf(r[i], x4_mem[2]);
		r[i] = fmaxf(r[i], x4_mem[3]);
		r[i] = 20.0f * log10f(r[i] * mul);
	}
	std::cout << "\r";
	std::cout << "L: " << std::setprecision(4) << r[0] << ", R: " << std::setprecision(4) << r[1];
}

将h264与aac合成为mp4

这里写成了一个filter，具体可以去gitee翻一下源码，在Encoder里。

初始化

	if (avformat_alloc_output_context2(&this->m_save_ctx, nullptr, nullptr, _file_name) < 0) {
		// TODO log
		return false;
	}
	if (avio_open(&this->m_save_ctx->pb, _file_name, AVIO_FLAG_WRITE | AVIO_FLAG_READ) < 0) {
		// TODO log

		avformat_free_context(this->m_save_ctx);
		this->m_save_ctx = nullptr;
		return false;
	}
	
	if (_video_encoder_codec_ctx != nullptr) {
		AVCodecContext* video_encoder_codec_ctx = reinterpret_cast<AVCodecContext*>(_video_encoder_codec_ctx);
		this->m_save_video_stream = avformat_new_stream(this->m_save_ctx, nullptr);
		avcodec_parameters_from_context(this->m_save_video_stream->codecpar, video_encoder_codec_ctx);
		this->m_video_time_base = new AVRational;
		this->m_video_time_base->den = video_encoder_codec_ctx->time_base.den;
		this->m_video_time_base->num = video_encoder_codec_ctx->time_base.num;
		this->m_video_pts = 0;
	}

	if (_audio_encoder_codec_ctx != nullptr) {
		AVCodecContext* audio_encoder_codec_ctx = reinterpret_cast<AVCodecContext*>(_audio_encoder_codec_ctx);
		this->m_save_audio_stream = avformat_new_stream(this->m_save_ctx, nullptr);
		avcodec_parameters_from_context(this->m_save_audio_stream->codecpar, audio_encoder_codec_ctx);
		this->m_audio_pts = 0;
	}

	// av_dump_format(this->m_saveCtx, 0, _save_file_name, 1); TODO dump to log
	avformat_write_header(this->m_save_ctx, nullptr);
	this->m_ready = true;
	return true;

avformat_alloc_output_context2是ffmpeg的方法，根据文件名，打开一个AVFormatContext。

avio_open打开一个文件，将相应信息传给AVFormatContext.pb，可以类比为fopen_s()，第三个参数表示可读可写。

接下来创建输出流avformat_new_stream，随后通过avcodec_parameters_from_context给流赋值，其实就是指定流的编码格式，保存到文件后方便解码时获取。

合成

void SaveFileFilter::afterVideoEncode(AVPacket* _packet)
{
	if (this->m_ready) {
		_packet->stream_index = this->m_save_video_stream->index;
		_packet->dts = _packet->pts = this->m_video_pts;
		av_packet_rescale_ts(_packet, *this->m_video_time_base, this->m_save_video_stream->time_base);
		this->m_video_pts += 1;
		this->m_mutex->lock();
		av_interleaved_write_frame(this->m_save_ctx, _packet);
		this->m_mutex->unlock();
	}
	EncoderFilter::afterVideoEncode(_packet);
}

void SaveFileFilter::afterAudioEncode(AVPacket* _packet)
{
	if (this->m_ready) {
		_packet->stream_index = this->m_save_audio_stream->index;
		_packet->dts = _packet->pts = this->m_audio_pts;
		av_packet_rescale_ts(_packet, this->m_save_audio_stream->time_base, this->m_save_audio_stream->time_base);
		this->m_audio_pts += 1024;
		this->m_mutex->lock();
		av_interleaved_write_frame(this->m_save_ctx, _packet);
		this->m_mutex->unlock();
	}
	EncoderFilter::afterAudioEncode(_packet);
}

以上是获取音频/视频编码后回调，具体调用方式可以翻下源码。

其中关键的是两个步骤

1. av_packet_rescale_ts，用来进行时间基的转换，对pts与dts进行修改。pts表示解码时帧什么时候显示，dts表示帧什么时候开始解码，其实可以理解为帧的索引或者说刻度。用video举例，video的time_base是1/60，也就是60fps的视频，每秒60帧。而stream的time_base为1/90000，也就是每秒90000帧。因此，如果不进行时间基转换，以stream的时间基为基准，则传入90000张图片组成90000帧，可以构成一秒的视频。pts=60的帧会在60/60秒时显示，而在stream的时间基中，会在60/90000秒显示。因此，video时间基中pts=1的帧，在转换后pts=1500 = 1 *(90000/60)。

2. 在初始化时，pts要分别设置为0，这样在存入文件解码时，相应的帧才会在正确时间被解码展示与播放。