深度学习模型试跑(十五):Real-ESRGAN(VS2019 trt推理部署)

本文档详细介绍了在Visual Studio 2019环境下,使用C++运行Real-ESRGAN模型进行图像超分辨率修复的步骤。首先,通过Python代码生成权重文件.wts,然后利用TensorRT构建C++预测引擎,解析.onnx模型,创建并优化网络结构。接着,详细展示了CMakeLists.txt的配置,包括库路径、依赖项设置等。最后,说明了如何在VS2019中编译、生成、运行工程,以及如何执行推理,包括输入尺寸、GPU配置和精度模式等。此外,还讨论了代码中用于图像预处理和后处理的自定义插件的使用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

前言

超分辨率修复、重建是采用低分辨率(LR)输入并将其提高到高分辨率的任务,具体原理可以参考paddlegan原理介绍。对于在linux上如何实现trt加速该网络,这里有一篇文章详细记录了过程。
我的环境:

  • Visual Studio 2019
  • CUDA 11.6,cudnn 8.2
  • CMake 3.17.1
  • Tensorrt 8.4.1.5

一.模型解读

作者在ESRGAN(Real-ESRGAN同一作者)的基础上,又做了一些改进。这篇博文把这块改进写的很详细,想要了解原理的可以参看这篇文章解读作者的论文。
网络设计的解读,可以参考ESRGAN官方代码解读,了解整个网络结构,我这里截取部分onnx的节点图。
RDB块

由于推理网络(就是生成器)是照搬了 ESRGAN 中的 Generator,即使用 Residual-in-residul Dense Block(RRDB),所以代码的复现还是基于ESRGAN 。

二.模型训练

我直接跳过了,试了下带不动。

三.VS2019运行C++预测

主要参照:
tensorrtx的代码

  1. 生成二进制序列化权重文件.wts
    1)首先拷贝Real-ESRGAN官方pytorch(python)实现,我是直接下载了它的python代码,
    2)然后安装各种依赖,这两步具体可参考README_CN.md。
    请添加图片描述
	pip install basicsr
	pip install facexlib
	pip install gfpgan
	pip install -r requirements.txt
	python setup.py develop

3)下载权重文件,在Real-ESRGAN官方pytorch(python)新建一个experiments/pretrained_models
二级目录,并将权重文件拷贝到这里面;将tensorrtx/real-esrgan/目录下的gen_wts.py脚本拷贝到 Real-ESRGAN官方pytorch下并运行

python gen_wts.py

运行成功后会生成一个real-esrgan.wts。

  1. 构建tensorrtx/real-esrgan工程并运行
    1. 转到tensorrtx/real-esrgan/目录下
    2. 修改CMakeLists.txt,主要将第4、5、6、7、8、9、10、11、12、16行改为相关库的目录,Tensorrt建议改用8开头的版本,31、67行根据我了解到的情况暂可不改。(修改 # 标记的地方)
cmake_minimum_required(VERSION 3.0)

project(real-esrgan) #1
set(OpenCV_DIR "D:\\opencv\\build")  #2
set(OpenCV_INCLUDE_DIRS ${OpenCV_DIR}\\include) #3
set(OpenCV_LIB_DIRS ${OpenCV_DIR}\\x64\\vc15\\lib) #4
set(OpenCV_Debug_LIBS "opencv_world450d.lib") #5
set(OpenCV_Release_LIBS "opencv_world450.lib") #6
set(TRT_DIR "D:\\lbq\\TensorRT-7.2.3.4")  #7
set(TRT_INCLUDE_DIRS ${TRT_DIR}\\include) #8
set(TRT_LIB_DIRS ${TRT_DIR}\\lib) #9
set(Dirent_INCLUDE_DIRS "D:\\lbq\\dirent\\include") #10

add_definitions(-std=c++14)

set(CUDA_BIN_PATH C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v11.5)
option(CUDA_USE_STATIC_CUDA_RUNTIME OFF)
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_BUILD_TYPE Release)

set(THREADS_PREFER_PTHREAD_FLAG ON)
find_package(Threads)

# setup CUDA
find_package(CUDA REQUIRED)
message(STATUS "    libraries: ${CUDA_LIBRARIES}")
message(STATUS "    include path: ${CUDA_INCLUDE_DIRS}")

include_directories(${CUDA_INCLUDE_DIRS})

set(CUDA_NVCC_PLAGS ${CUDA_NVCC_PLAGS};-std=c++14; -g; -G;-gencode; arch=compute_86;code=sm_86)
####
enable_language(CUDA)  # add this line, then no need to setup cuda path in vs
####
include_directories(${PROJECT_SOURCE_DIR}/include) #14
include_directories(${TRT_INCLUDE_DIRS}) #12
link_directories(${TRT_LIB_DIRS}) #13
include_directories(${OpenCV_INCLUDE_DIRS}) #14
link_directories(${OpenCV_LIB_DIRS}) #15
include_directories(${Dirent_INCLUDE_DIRS}) #16


# -D_MWAITXINTRIN_H_INCLUDED for solving error: identifier "__builtin_ia32_mwaitx" is undefined
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++14 -Wall -Ofast -D_MWAITXINTRIN_H_INCLUDED")

# setup opencv
find_package(OpenCV QUIET
    NO_MODULE
    NO_DEFAULT_PATH
    NO_CMAKE_PATH
    NO_CMAKE_ENVIRONMENT_PATH
    NO_SYSTEM_ENVIRONMENT_PATH
    NO_CMAKE_PACKAGE_REGISTRY
    NO_CMAKE_BUILDS_PATH
    NO_CMAKE_SYSTEM_PATH
    NO_CMAKE_SYSTEM_PACKAGE_REGISTRY
)

message(STATUS "OpenCV library status:")
message(STATUS "    version: ${OpenCV_VERSION}")
message(STATUS "    lib path: ${OpenCV_LIB_DIRS}")
message(STATUS "    Debug libraries: ${OpenCV_Debug_LIBS}")
message(STATUS "    Release libraries: ${OpenCV_Release_LIBS}")
message(STATUS "    include path: ${OpenCV_INCLUDE_DIRS}")

if(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)
set(CMAKE_CUDA_ARCHITECTURES 86)
endif(NOT DEFINED CMAKE_CUDA_ARCHITECTURES)

add_executable(real-esrgan ${PROJECT_SOURCE_DIR}/real-esrgan.cpp ${PROJECT_SOURCE_DIR}/common.hpp 
	${PROJECT_SOURCE_DIR}/preprocess.cu ${PROJECT_SOURCE_DIR}/preprocess.hpp
	${PROJECT_SOURCE_DIR}/postprocess.cu ${PROJECT_SOURCE_DIR}/postprocess.hpp
	)   #17

target_link_libraries(real-esrgan "nvinfer" "nvinfer_plugin") #18
target_link_libraries(real-esrgan debug ${OpenCV_Debug_LIBS}) #19
target_link_libraries(real-esrgan optimized ${OpenCV_Release_LIBS}) #20
target_link_libraries(real-esrgan ${CUDA_LIBRARIES}) #21
target_link_libraries(real-esrgan Threads::Threads)  

  1. 用cmake-gui打开代码,自己适配下版本,步骤跟yolov5类似。第一行填入Real_ESRGAN_TRT 目录,第三行填入Real_ESRGAN_TRT/buildnew目录。然后就是点击’Configure’、‘Generate’、‘Open Project’ 编译、生成、打开工程
    请添加图片描述
    请添加图片描述
    请添加图片描述

  2. 右击real-esrgan,点击“设为启动项目”,将调试模式全部调成”Release”
    请添加图片描述

  3. 接着点击“生成”-> “生成解决方案”,会生成对应的real-esrgan.exe
    请添加图片描述

  4. 将第一步中的real-esrgan.wts拷贝到这个Release目录下,将tensorrt中相关的dll文件如nvinfer.dll等也拷贝(软链接)到该目录下。

  5. 打开命令行窗口,执行real-esrgan.exe -s real-esrgan.wts real-esrgan_f32.engine ,生成对应的engine,这一步可能会花半个小时的时间,具体要看GPU。
    请添加图片描述

  6. 最后将 -d real-esrgan_f32.engine …/samples 写到VS的命令参数里,注意一定要索引到正确的目录,第一个是engine 文件所在的目录,第二个是图像所在的目录,就可以直接按F5执行程序了,图像涉密这里我就不贴出来了。请添加图片描述

附一.VS2019运行C++预测

  • 输入尺寸INPUT_H、INPUT_W、INPUT_C,其中前两项是可以根据实际需求做调整
  • GPU id(DEVICE),在第9行,目前可能只支持单显卡
  • BATCH_SIZE,在第10行,每次推理多少张图片
  • PRECISION_MODE,推理精度,在第14行,精度越低效果越差速度越快
  • VISUALIZATION,推理可视化,在第15行.

我发现对特定的图像修复还是要引用一个传统的图像修复算法做前处理,于是封装了四个传统图像增强方法在代码里,有兴趣的可以和我交流。
~~以下方式二选一,在代码257~273行;需要注意第262行是采用了传统的图像增强的方法,这一步对细胞的改善也是至关重要的;代码里一共集成了四种常用的传统的图像增强的方法均在utils.h文件里,里面的一些参数需要调节,目前采用的是伽马变换(gamma_transform)

  • 静态图输入,指的是固定了输入图像的尺度(C、W、H),不可以用其它尺度的图片来推理,
  • 动态图输入,即支持多种尺度的输入,目前输出结果有毛边(还未处理);~~

附二.real-esrgan代码解读

#include "cuda_utils.h"
#include "common.hpp"		
#include "preprocess.hpp"	// preprocess plugin 
#include "postprocess.hpp"	// postprocess plugin 
#include "logging.h"
#include "utils.h"
#include <windows.h>	//access()

#define DEVICE 0  // GPU id
#define BATCH_SIZE 1
#define MAX_IMAGE_INPUT_SIZE_THRESH 4096 * 4096 // ensure it exceed the maximum size in the input images !

// stuff we know about the network and the input/output blobs
static const int PRECISION_MODE = 32; // fp32 : 32, fp16 : 16
static const bool VISUALIZATION = false;
static const int INPUT_H = 1024;
static const int INPUT_W = 1024;
static const int INPUT_C = 3;
static const int OUT_SCALE = 4;
static const int OUTPUT_SIZE = INPUT_C * INPUT_H * OUT_SCALE * INPUT_W * OUT_SCALE;
const char* INPUT_BLOB_NAME = "data";
const char* OUTPUT_BLOB_NAME = "prob";
static Logger gLogger;

// Creat the engine using only the API and not any parser.	作者存手敲API的写法,没有用parser
ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, std::string& wts_name) {
	INetworkDefinition* network = builder->createNetworkV2(0U);//#定义网络

	// Create input tensor of shape {INPUT_H, INPUT_W, INPUT_C} with name INPUT_BLOB_NAME
	ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ INPUT_H, INPUT_W, INPUT_C });
	assert(data);
	// tensorrtx项目加载.wts权重图的通用函数
	std::map<std::string, Weights> weightMap = loadWeights(wts_name);

	// 前处理,Custom preprocess (NHWC->NCHW, BGR->RGB, [0, 255]->[0, 1](Normalize))
	Preprocess preprocess{ maxBatchSize, INPUT_C, INPUT_H, INPUT_W };

	// TensorRT Plugin:	https://zhuanlan.zhihu.com/p/448241566
	// 注册PluginCreator, 前处理的plugin,tensorrtx项目里很多模型都用这个前处理plugin
	IPluginCreator* preprocess_creator = getPluginRegistry()->getPluginCreator("preprocess", "1");
	// 创建自定义层类型的对象并返回
	IPluginV2* preprocess_plugin = preprocess_creator->createPlugin("preprocess_plugin", (PluginFieldCollection*)&preprocess);
	// Add a plugin layer to the network using the IPluginV2 interface
	IPluginV2Layer* preprocess_layer = network->addPluginV2(&data, 1, *preprocess_plugin);
	// https://docs.nvidia.com/deeplearning/tensorrt/api/c_api/classnvinfer1_1_1_i_network_definition.html#a0c670938a4aef867545f41b65c52cd93
	preprocess_layer->setName("preprocess_layer");
	ITensor* prep = preprocess_layer->getOutput(0);

	// 以下是整个RRDBNet生成器/推理网络的结构,可以参照wts/onnx模型文件查看具体的网络节点
	// conv_first, 第一个卷积层(特征图生成), 输入tensor是*prep, 输出64通道,kernel大小是3x3,后面是权重和偏置的值
	IConvolutionLayer* conv_first = network->addConvolutionNd(*prep, 64, DimsHW{ 3, 3 }, weightMap["conv_first.weight"], weightMap["conv_first.bias"]);
	conv_first->setStrideNd(DimsHW{ 1, 1 });
	conv_first->setPaddingNd(DimsHW{ 1, 1 });
	conv_first->setName("conv_first");
	ITensor* feat = conv_first->getOutput(0);

	// conv_body, https://www.cnblogs.com/carsonzhu/p/10967369.html
	// inference_realesrgan.py <args.model_name == 'RealESRGAN_x4plus', line 48>
	ITensor* body_feat = RRDB(network, weightMap, feat, "body.0");
	// https://blog.youkuaiyun.com/qq_39751446/article/details/119970924
	for (int idx = 1; idx < 23; idx++) { //num_block=23
		// RRDB的代码编写参考了<rrdbnet_arch.py line-42>
		body_feat = RRDB(network, weightMap, body_feat, "body." + std::to_string(idx));
	}
	// ****************此处为RRDB结构截至处****************
	IConvolutionLayer* conv_body = network->addConvolutionNd(*body_feat, 64, DimsHW{ 3, 3 }, weightMap["conv_body.weight"], weightMap["conv_body.bias"]);
	conv_body->setStrideNd(DimsHW{ 1, 1 });
	conv_body->setPaddingNd(DimsHW{ 1, 1 });
	IElementWiseLayer* ew1 = network->addElementWise(*feat, *conv_body->getOutput(0), ElementWiseOperation::kSUM);
	feat = ew1->getOutput(0);

	//	upsample, onnx图从最后一个Add后的Resize算子开始
	//	添加一个resize网络层,使用线性插值的方法,使得输出数据的尺寸是输入数据尺寸的2倍
	IResizeLayer* interpolate_nearest = network->addResize(*feat);
	float sclaes1[] = { 1, 2, 2 };	// 需要制定channel,heigh和widht三个通道的缩放比例
	interpolate_nearest->setScales(sclaes1, 3);
	interpolate_nearest->setResizeMode(ResizeMode::kNEAREST);//kLINEAR

	IConvolutionLayer* conv_up1 = network->addConvolutionNd(*interpolate_nearest->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_up1.weight"], weightMap["conv_up1.bias"]);
	conv_up1->setStrideNd(DimsHW{ 1, 1 });
	conv_up1->setPaddingNd(DimsHW{ 1, 1 });
	IActivationLayer* leaky_relu_1 = network->addActivation(*conv_up1->getOutput(0), ActivationType::kLEAKY_RELU);
	leaky_relu_1->setAlpha(0.2);

	IResizeLayer* interpolate_nearest2 = network->addResize(*leaky_relu_1->getOutput(0));
	float sclaes2[] = { 1, 2, 2 };
	interpolate_nearest2->setScales(sclaes2, 3);
	interpolate_nearest2->setResizeMode(ResizeMode::kNEAREST);
	IConvolutionLayer* conv_up2 = network->addConvolutionNd(*interpolate_nearest2->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_up2.weight"], weightMap["conv_up2.bias"]);
	conv_up2->setStrideNd(DimsHW{ 1, 1 });
	conv_up2->setPaddingNd(DimsHW{ 1, 1 });
	IActivationLayer* leaky_relu_2 = network->addActivation(*conv_up2->getOutput(0), ActivationType::kLEAKY_RELU);
	leaky_relu_2->setAlpha(0.2);

	IConvolutionLayer* conv_hr = network->addConvolutionNd(*leaky_relu_2->getOutput(0), 64, DimsHW{ 3, 3 }, weightMap["conv_hr.weight"], weightMap["conv_hr.bias"]);
	conv_hr->setStrideNd(DimsHW{ 1, 1 });
	conv_hr->setPaddingNd(DimsHW{ 1, 1 });
	IActivationLayer* leaky_relu_hr = network->addActivation(*conv_hr->getOutput(0), ActivationType::kLEAKY_RELU);
	leaky_relu_hr->setAlpha(0.2);
	IConvolutionLayer* conv_last = network->addConvolutionNd(*leaky_relu_hr->getOutput(0), 3, DimsHW{ 3, 3 }, weightMap["conv_last.weight"], weightMap["conv_last.bias"]);
	conv_last->setStrideNd(DimsHW{ 1, 1 });
	conv_last->setPaddingNd(DimsHW{ 1, 1 });
	ITensor* out = conv_last->getOutput(0);

	//	后处理,Custom postprocess (RGB -> BGR, NCHW->NHWC, *255, ROUND, uint8)
	Postprocess postprocess{ maxBatchSize, out->getDimensions().d[0], out->getDimensions().d[1], out->getDimensions().d[2] };
	IPluginCreator* postprocess_creator = getPluginRegistry()->getPluginCreator("postprocess", "1");
	IPluginV2* postprocess_plugin = postprocess_creator->createPlugin("postprocess_plugin", (PluginFieldCollection*)&postprocess);
	IPluginV2Layer* postprocess_layer = network->addPluginV2(&out, 1, *postprocess_plugin);
	postprocess_layer->setName("postprocess_layer");

	ITensor* final_tensor = postprocess_layer->getOutput(0);
	final_tensor->setName(OUTPUT_BLOB_NAME);
	network->markOutput(*final_tensor); //网络输出

	// Build engine
	builder->setMaxBatchSize(maxBatchSize);
	config->setMaxWorkspaceSize(16 * (1 << 20));  // 16MB,左移20位 = 16 * 1 * (2^20)

	if (PRECISION_MODE == 16) {
		std::cout << "==== precision f16 ====" << std::endl << std::endl;
		config->setFlag(BuilderFlag::kFP16);
	}
	else {
		std::cout << "==== precision f32 ====" << std::endl << std::endl;
	}

	std::cout << "Building engine, please wait for a while..." << std::endl;
	ICudaEngine* engine = builder->buildEngineWithConfig(*network, *config);
	std::cout << "Build engine successfully!" << std::endl;

	// Don't need the network any more
	delete network;

	// Release host memory
	for (auto& mem : weightMap)
	{
		free((void*)(mem.second.values));
	}

	return engine;
}

void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, std::string& wts_name) {
	// Create builder,	#创建一个build初始化tensorRT的库
	IBuilder* builder = createInferBuilder(gLogger);
	// 构造 CudaEngine 的配置参数,可添加 IOptimizationProfile 配置,设置最大工作内存空间、最大Batch大小、最小可接受精度级别、半浮点精度运算等
	IBuilderConfig* config = builder->createBuilderConfig();

	// Create model to populate the network, then set the outputs and create an engine
	ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, wts_name);

	assert(engine != nullptr);

	// Serialize the engine
	(*modelStream) = engine->serialize();

	// Close everything down
	delete engine;
	delete builder;
	delete config;
}

void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, uint8_t* output, int batchSize) {
	// infer on the batch asynchronously, and DMA output back to host
	context.enqueue(batchSize, buffers, stream, nullptr);
	CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(uint8_t), cudaMemcpyDeviceToHost, stream));
	cudaStreamSynchronize(stream);
}

bool parse_args(int argc, char** argv, std::string& wts, std::string& engine, std::string& img_dir) {
	if (argc < 4) return false;
	if (std::string(argv[1]) == "-s" && argc == 4) {
		wts = std::string(argv[2]);
		engine = std::string(argv[3]);
	}
	else if (std::string(argv[1]) == "-d" && argc == 4) {
		engine = std::string(argv[2]);
		img_dir = std::string(argv[3]);
	}
	else {
		return false;
	}
	return true;
}

// ./real-esrgan -s ./real-esrgan.wts ./real-esrgan_f32.engine
// ./real-esrgan -d ./real-esrgan_f32.engine ../samples

int main(int argc, char** argv) {
	std::string wts_name = "";
	std::string engine_name = "";
	std::string img_dir;
	if (!parse_args(argc, argv, wts_name, engine_name, img_dir)) {
		std::cerr << "arguments not right!" << std::endl;
		std::cerr << "./real-esrgan -s [.wts] [.engine] // serialize model to plan file" << std::endl;
		std::cerr << "./real-esrgan -d [.engine] ../samples  // deserialize plan file and run inference" << std::endl;
		return -1;
	}

	// create a model using the API directly and serialize it to a stream
	if (!wts_name.empty()) {
		IHostMemory* modelStream{ nullptr };
		APIToModel(BATCH_SIZE, &modelStream, wts_name);
		assert(modelStream != nullptr);
		std::ofstream p(engine_name, std::ios::binary);
		if (!p) {
			std::cerr << "could not open plan output file" << std::endl;
			return -1;
		}
		p.write(reinterpret_cast<const char*>(modelStream->data()), modelStream->size());
		delete modelStream;
		return 0;
	}

	// deserialize the .engine and run inference
	std::ifstream file(engine_name, std::ios::binary);
	if (!file.good()) {
		std::cerr << "read " << engine_name << " error!" << std::endl;
		return -1;
	}
	char* trtModelStream = nullptr;
	size_t size = 0;
	file.seekg(0, file.end);
	size = file.tellg();
	file.seekg(0, file.beg);
	trtModelStream = new char[size];
	assert(trtModelStream);
	file.read(trtModelStream, size);
	file.close();

	std::vector<std::string> file_names;
	std::cout << "img_dir:" << img_dir.c_str() << std::endl;
	if (read_files_in_dir(img_dir.c_str(), file_names) < 0) {
		std::cerr << "read_files_in_dir failed." << std::endl;
		return -1;
	}

	IRuntime* runtime = createInferRuntime(gLogger);
	assert(runtime != nullptr);
	ICudaEngine* engine = runtime->deserializeCudaEngine(trtModelStream, size);
	assert(engine != nullptr);
	IExecutionContext* context = engine->createExecutionContext();
	assert(context != nullptr);
	delete[] trtModelStream;
	assert(engine->getNbBindings() == 2);
	void* buffers[2];
	// In order to bind the buffers, we need to know the names of the input and output tensors.
	// Note that indices are guaranteed to be less than IEngine::getNbBindings()
	const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME);
	const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME);
	assert(inputIndex == 0);
	assert(outputIndex == 1);

	// Create GPU buffers on device	
	CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * INPUT_C * INPUT_H * INPUT_W * sizeof(uint8_t)));
	CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(uint8_t)));

	std::vector<uint8_t> input(BATCH_SIZE * INPUT_H * INPUT_W * INPUT_C);
	std::vector<uint8_t> outputs(BATCH_SIZE * OUTPUT_SIZE);

	// Create stream
	cudaStream_t stream;
	CUDA_CHECK(cudaStreamCreate(&stream));

	std::vector<cv::Mat> imgs_buffer(BATCH_SIZE);
	for (int f = 0; f < (int)file_names.size(); f++) {
		cv::Mat re_img;
		for (int b = 0; b < BATCH_SIZE; b++) {
			cv::Mat img = cv::imread(img_dir + "/" + file_names[f]);
			if (img.empty()) continue;

			//  以下两种方法二选一
			//  1.仅静态图输入,固定了输入图像可接受的C、W、H
			//  memcpy(input.data() + b * INPUT_H * INPUT_W * INPUT_C, img.data, INPUT_H * INPUT_W * INPUT_C);

			//  2.支持多种尺度的输入
			cv::Mat traditional_enhance_img = gamma_transform(img);
			cv::Mat pr_img;
			std::pair<cv::Mat, cv::Mat> preprocess_rst;
			if (img.cols != INPUT_W && img.cols != INPUT_H)
			{
				preprocess_rst = preprocess_img(traditional_enhance_img, INPUT_W, INPUT_H);//等比例填充
				// std::cout << "img_dir:" << std::endl;
				pr_img = preprocess_rst.first;
				re_img = preprocess_rst.second;//对比图象
			}
			else
			{
				pr_img = traditional_enhance_img;
			}

			memcpy(input.data() + b * INPUT_H * INPUT_W * INPUT_C, pr_img.data, INPUT_H * INPUT_W * INPUT_C);
		}

		CUDA_CHECK(cudaMemcpyAsync(buffers[inputIndex], input.data(), BATCH_SIZE * INPUT_C * INPUT_H * INPUT_W * sizeof(uint8_t), cudaMemcpyHostToDevice, stream));

		// Run inference
		auto start = std::chrono::system_clock::now();
		doInference(*context, stream, (void**)buffers, outputs.data(), BATCH_SIZE);
		auto end = std::chrono::system_clock::now();
		std::cout << "inference time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;
		cv::Mat frame = cv::Mat(INPUT_H * OUT_SCALE, INPUT_W * OUT_SCALE, CV_8UC3, outputs.data());

		//	去除非标准的图片产生的填充
		int dif_h = 0;
		int dif_w = 0;
		if (re_img.cols == INPUT_H && re_img.rows != INPUT_W)
		{
			dif_w = (INPUT_W - re_img.rows) * 2;
		}
		else if (re_img.cols != INPUT_H && re_img.rows == INPUT_W)
		{
			dif_h = (INPUT_H - re_img.rows) * 2;
		}
		//	std::cout << dif_w << dif_h << std::endl;
		cv::Mat result = frame(cv::Rect(dif_h, dif_w, INPUT_H * OUT_SCALE - 2 * dif_h, INPUT_W * OUT_SCALE - 2 * dif_w));
		cv::imwrite("../_" + file_names[f], result);

		if (VISUALIZATION) {
			cv::imshow("result : " + file_names[0], frame);
			cv::waitKey(0);
		}
	}

	// Release stream and buffers
	cudaStreamDestroy(stream);
	CUDA_CHECK(cudaFree(buffers[inputIndex]));
	CUDA_CHECK(cudaFree(buffers[outputIndex]));
	// Destroy the engine
	delete context;
	delete engine;
	delete runtime;
}
<think>好的,用户现在想了解如何使用Real-ESRGAN框架训练自定义数据集。我需要结合之前的对话历史来分析他的需求。之前用户问的是关于下采样的示例代码是否完整,我详细分析了代码的完整性并给出了改进建议,还提到了Real-ESRGAN的退化模型。现在用户可能已经准备好进入实际训练阶段,需要具体的步骤指导。 首先,用户提供的引用中提到项目特点,比如示例数据集BSDS100和Set5,使用类似SRCNN的结构,L1和L2损失函数,以及快速启动的方法。但用户现在转向Real-ESRGAN,可能需要更具体的指导,而不仅仅是示例数据集。 用户可能已经尝试过基础的下采样代码,现在需要扩展到完整的训练流程。我需要确保回答覆盖数据准备、环境配置、训练步骤、调参技巧和验证测试。同时,要参考之前的回答中的要点,比如颜色空间转换、尺寸对齐、复杂退化模型,因为这些在训练自定义数据集时同样重要。 用户可能没有明确说明数据集的类型,但Real-ESRGAN常用于图像超分,所以需要高分辨率图像作为HR,然后生成对应的LR图像。之前的回答提到了如何生成LR,这里可能需要强调数据准备的步骤,包括使用提供的复杂退化代码。 另外,用户可能对如何配置训练参数、选择损失函数、调整学习率等有疑问。需要给出具体的命令行示例,如训练命令的参数设置,以及如何监控训练过程。 还要注意用户可能遇到的常见问题,比如数据集路径错误、显存不足、退化模拟不足等,给出解决方案。例如,预处理步骤中的尺寸调整,使用数据增强,混合损失函数等。 最后,确保回答结构清晰,分步骤说明,并附上相关代码示例和命令,方便用户跟随操作。同时,生成的相关问题要针对训练过程中可能遇到的难点,如数据准备、参数调优、模型过拟合等。</think>以下是使用Real-ESRGAN框架训练自定义数据集的完整技术指南,包含关键步骤和工程实践要点: --- ### **一、数据准备核心要点** #### 1. 数据集结构规范 需按以下目录结构组织数据(参考官方要求[^1]): ``` datasets/ ├── train/ │ ├── HR/ # 原始高分辨率图像(建议PNG格式) │ └── LR_bicubic/X4/ # 通过退化模型生成的低分辨率图像 └── val/ ├── HR/ └── LR_bicubic/X4/ ``` **关键操作**: ```bash # 使用前文改进的退化脚本批量生成LR python scripts/generate_lr.py --hr-dir ./datasets/train/HR --lr-dir ./datasets/train/LR_bicubic/X4 --scale 4 --degradation complex ``` #### 2. 图像预处理要求 - **尺寸对齐**:确保HR图像尺寸是缩放因子的整数倍(如训练x4模型时,HR尺寸应为4的倍数) - **格式转换**:使用OpenCV处理时需注意BGR-RGB转换 ```python def check_image_size(img, scale): h, w = img.shape[:2] h = h - h % scale w = w - w % scale return img[:h, :w] ``` --- ### **二、训练环境配置** #### 1. 依赖安装 ```bash # 官方推荐环境 conda create -n realesrgan python=3.8 conda activate realesrgan git clone https://github.com/xinntao/Real-ESRGAN.git cd Real-ESRGAN pip install -r requirements.txt python setup.py develop ``` #### 2. 硬件建议 - GPU:至少16GB显存(如RTX 3090) - RAM:建议32GB以上 - 存储:SSD阵列加速IO(大规模数据集需TB级空间) --- ### **三、训练流程详解** #### 1. 配置文件修改 编辑`options/train_realesrgan_x4plus.yml`: ```yaml datasets: train: name: CustomDataset dtype: img dataroot_gt: datasets/train/HR # 修改为自定义路径 dataroot_lq: datasets/train/LR_bicubic/X4 meta_info: meta_info/meta_info_CustomDataset.txt # 自动生成 ``` #### 2. 启动训练命令 ```bash python -m torch.distributed.launch --nproc_per_node=4 basicsr/train.py -opt options/train_realesrgan_x4plus.yml --launcher pytorch ``` **关键参数说明**: - `--nproc_per_node`: 使用的GPU数量 - `--auto_resume`: 自动恢复上次训练 - `--debug`: 调试模式 --- ### **四、高级调参技巧** #### 1. 损失函数组合(参考论文[^1]) ```python # 典型配置(在配置文件中修改) losses: pixel: type: L1Loss weight: 1.0 perceptual: type: PerceptualLoss layer_weights: 'conv5_4': 1.0 # VGG19的第5层第4卷积 weight: 0.1 ``` #### 2. 学习率策略 ```yaml optimizer: type: Adam lr: !!float 2e-4 weight_decay: 0 scheduler: type: CosineAnnealingRestartLR periods: [250000, 250000, 250000, 250000] restart_weights: [1, 0.5, 0.5, 0.5] eta_min: !!float 1e-7 ``` --- ### **五、验证与测试** #### 1. 验证集监控 ```bash # 生成验证集指标(PSNR/SSIM) python scripts/validate.py -opt options/validate_realesrgan_x4plus.yml ``` #### 2. 可视化测试 ```python from basicsr.utils import imwrite from realesrgan import RealESRGANer upsampler = RealESRGANer(scale=4, model_path='experiments/pretrained_models/RealESRGAN_x4plus.pth') output, _ = upsampler.enhance(lr_img) imwrite(output, 'results/output.png') ``` --- ### **六、常见问题解决方案** | 问题现象 | 解决方法 | |--------------------------|--------------------------------------------------------------------------| | 训练时显存不足 | 减小`batch_size`(默认16→8),启用`--amp`混合精度训练 | | 生成图像存在伪影 | 检查退化模型是否包含sinc滤波,增加训练迭代次数(建议>500k) | | 验证指标不提升 | 尝试调整损失权重(如增加感知损失权重至0.2),添加梯度裁剪 | | 训练速度慢 | 使用`--fp16`优化,启用多GPU训练,将数据预加载到内存 | --- ### **七、工程化建议** 1. **数据增强策略**: ```yaml # 在配置文件中添加 degradation: resize_back_prob: 0.8 blur_kernel_range: [7, 21] noise_range: [1, 15] ``` 2. **混合精度训练**: ```bash python -m torch.distributed.launch --nproc_per_node=4 basicsr/train.py --amp -opt ... ``` --- ### **相关引用** [^1]: Real-ESRGAN官方文档中的数据集规范要求 [^2]: BasicSR框架的多GPU训练配置指南 [^3]: 论文《Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data》中的退化模型设计 [^4]: PyTorch官方混合精度训练教程 --- ### **相关问题** 1. 如何评估Real-ESRGAN模型在实际业务场景中的效果? 2. 训练自定义数据集时需要多少图像量才能达到理想效果? 3. Real-ESRGANESRGAN在训练策略上有哪些关键区别? 4. 如何将训练好的Real-ESRGAN模型部署到移动端?
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值