图像风格迁移核心程序框架

最新推荐文章于 2025-04-09 17:07:16 发布

讷逍遥

最新推荐文章于 2025-04-09 17:07:16 发布

阅读量314

点赞数 4

文章标签：计算机视觉深度学习人工智能

本文链接：https://blog.youkuaiyun.com/m0_60218675/article/details/146038975

版权

图像预处理

利用 torchvision.transforms 进行图像的预处理和后期处理。预处理的过程接收一个 PIL 图片，改变图片大小，转换为张量，进行标准化，最后乘以 255。对 RGB 三个通道进行标准化，这是 VGG 模型的要求。后期处理则为这一过程的逆过程。



class ImageCoder:
    def __init__(self, image_size, device):
        self.device = device

        # 预处理流程
        self.preproc = transforms.Compose([
            transforms.Resize(image_size),  # 改变图像大小
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],  # 标准化
                                std=[1, 1, 1]),
            transforms.Lambda(lambda x: x.mul_(255))  # 将像素值缩放到 [0, 255]
        ])

        # 后处理流程
        self.postproc = transforms.Compose([
            transforms.Lambda(lambda x: x.mul_(1./255)),  # 将像素值缩放到 [0, 1]
            transforms.Normalize(mean=[-0.485, -0.456, -0.406], std=[1, 1, 1])  # 反标准化
        ])

        # 将张量转换为 PIL 图像
        self.to_image = transforms.ToPILImage()

    def encode(self, image_path):
        """将图像路径加载为张量并进行预处理"""
        image = Image.open(image_path)  # 加载图像
        image = self.preproc(image)  # 预处理
        image = image.unsqueeze(0)  # 增加批次维度
        return image.to(self.device, torch.float)  # 移动到指定设备

    def decode(self, image):
        """将张量解码为 PIL 图像"""
        image = image.cpu().clone()  # 复制到 CPU
        image = image.squeeze()  # 移除批次维度
        image = self.postproc(image)  # 后处理
        image = image.clamp(0, 1)  # 将像素值限制在 [0, 1] 范围内
        return self.to_image(image)  # 转换为 PIL 图像

参数定义

这一部分对参数进行定义，确定内容损失函数使用的卷积层、风格损失函数使用的卷积层、各卷积层的权重以及最优化的步数。

content_layers = ['conv_4_2']  # 内容损失函数使用的卷积层
style_layers = ['conv_1_1', 'conv_2_1', 'conv_3_1', 'conv_4_1', 'conv_5_1']  # 风格损失函数使用的卷积层
content_weights = [1]  # 内容损失函数的权重
style_weights = [1e3, 1e3, 1e3, 1e3, 1e3]  # 风格损失函数的权重
num_steps = 200  # 最优化的步数

模型初始化

这一部分中，调用 torchvision.models 提供的预先训练好的 VGG 模型。


class Model:
    def __init__(self, device, image_size):
        # 加载预训练的 VGG19 模型的特征提取部分
        cnn = torchvision.models.vgg19(pretrained=True).features.to(device).eval()
        self.cnn = deepcopy(cnn)  # 深拷贝模型以避免修改原始模型
        self.device = device

        # 初始化损失列表
        self.content_losses = []
        self.style_losses = []

        # 初始化图像处理器
        self.image_proc = ImageCoder(image_size, device)

运行风格迁移的主函数

主函数读取图片并进行预处理，随后依据 VGG 提取的特征图建立内容损失函数和风格损失函数（self._build()方法），再进行最优化得到迁移后的图片（self._transfer()方法）。这两个方法的实现在后面给出。

def run(self, content_image_path, style_image_path):
    content_image = self.image_proc.encode(content_image_path)
    style_image = self.image_proc.encode(style_image_path)

    self._build(content_image, style_image)  # 建立损失函数
    output_image = self._transfer(content_image)  # 进行最优化
    return self.image_proc.decode(output_image)

利用VGG网络建立损失函数

这一部分中，程序遍历 VGG19 中的各层并进行编号，取定义好的特征图层建立内容损失函数和风格损失函数，并添加到模型中。

def _build(self, content_image, style_image):
    self.model = nn.Sequential()
    block_idx = 1 # 用于标识当前是第几个卷积块（通常一个卷积块包含多个卷积层、ReLU 激活层和池化层）。
    conv_idx = 1 # 用于标识当前卷积块中的第几个卷积层。
    # 逐层遍历 VGG19，取用需要的卷积层
    for layer in self.cnn.children(): # children() 方法返回模型的所有子模块（即各层），代码逐层遍历这些子模块。


        # 识别该层类型并进行编号命名
        if isinstance(layer, nn.Conv2d):
            name = 'conv_{}_{}'.format(block_idx, conv_idx)
            conv_idx += 1
        elif isinstance(layer, nn.ReLU):
            name = 'relu_{}_{}'.format(block_idx, conv_idx)
            layer = nn.ReLU(inplace=False) # 将 inplace 参数设置为 False，避免覆盖输入数据。
        elif isinstance(layer, nn.MaxPool2d):
            name = 'pool_{}'.format(block_idx)
            block_idx += 1 # block_idx 递增，表示进入下一个卷积块。
            conv_idx = 1 # conv_idx 重置为 1，因为下一个卷积块从第一个卷积层开始。
        elif isinstance(layer, nn.BatchNorm2d):
            name = 'bn_{}'.format(block_idx)
        else:
            raise Exception("invalid layer")
        
        self.model.add_module(name, layer) # 将当前层添加到 self.model 中，并使用之前生成的名称作为标识。
        
        if name in content_layers:
            # 添加内容损失函数
            target = self.model(content_image).detach() # 将 content_image 输入到当前模型中，提取目标特征，并通过 detach() 分离计算图。
            content_loss = ContentLoss(target)
            self.model.add_module("content_loss_{}_{}".format(block_idx, conv_idx),
                                  content_loss) # 将内容损失函数添加到模型中
            self.content_losses.append(content_loss) # 将内容损失函数添加到 self.content_losses 列表中，方便后续计算总损失。
        
        if name in style_layers:
            # 添加风格损失函数
            target_feature = self.model(style_image).detach()
            style_loss = StyleLoss(target_feature)
            self.model.add_module("style_loss_{}_{}".format(block_idx, conv_idx),
                                  style_loss)
            self.style_losses.append(style_loss)
    
    # 取卷积特征部分
    i = 0
    for i in range(len(self.model) - 1, -1, -1):
        if isinstance(self.model[i], ContentLoss) or isinstance(self.model[i], StyleLoss):
            break    # 从后向前遍历模型，找到最后一个 ContentLoss 或 StyleLoss 层。
    
    self.features = self.model[:(i + 1)] # 截取模型的前半部分（即特征提取部分），并将其赋值给 self.features。

风格迁移的优化过程

这一部分中，程序用 LBFGS 算法对定义好的损失进行反向传播最优化，逐步改变图片内容，得到迁移后的图片。可以看到，该部分循环调用了 closure() 函数 num_steps 次。closure() 函数计算当前的风格损失和内容损失，将它们进行加权和，通过 loss.backward()计算梯度，并更新合成的图片。

def _transfer(self, content_image):
    output_image = content_image.clone()
    random_image = torch.randn(content_image.data.size(), device=self.device)
    output_image = 0.4 * output_image + 0.6 * random_image
    optimizer = torch.optim.LBFGS([output_image.requires_grad_()])
    print('Optimizing..')
    run = [0]
    while run[0] <= num_steps:

        def closure():
            optimizer.zero_grad()
            self.features(output_image)
            style_score = 0
            content_score = 0
            for sl, sw in zip(self.style_losses, style_weights):
                style_score += sl.loss * sw

            for cl, cw in zip(self.content_losses, content_weights):
                content_score += cl.loss * cw

            loss = style_score + content_score
            loss.backward()
            run[0] += 1
            if run[0] % 50 == 0:
                print("iteration {}: Loss: {:.4f} Style Loss: {:.4f} Content Loss: {:.4f}".format(
                    run[0], loss.item(), style_score.item(), content_score.item()))

            return loss

        optimizer.step(closure)

    return output_image

运行风格迁移

这一部分中，首先获取计算硬件的类型(CPU 或 GPU)，然后将用于运行风格迁移的
Model 类实例化，将风格图片和内容图片的路径传入，并通过运行 model.run 进行风格迁移。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
image_size = 256
model = Model(device, image_size)
style_image_path = './images/van_gogh.jpg'
content_image_path = './images/street.jpg'
out_image = model.run(content_image_path, style_image_path)
plt.imshow(out_image)
plt.show()

PS：本账号致力于持续创作优质技术内容，感谢各位同行朋友们的支持！