torchvision包提图像特征

本文介绍如何使用PyTorch中的ResNet34模型提取图像特征,并提供了一个具体的实现示例,包括删除不必要的网络层和提取特定层的特征。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

以前用tensorflow与torch比较多,最近入坑pytorch,发现好用到飞起,更多关于Pytorch文档请戳:Pytorch document

本文主要讲一下里面的torchvision,torchvision是一个蛮方便的包,里面封装了目前常用的数据集(ImageNet,coco,cifar10等),还封装了常用的网络框架(VGG, ResNet, AlexNet, DenseNet, Inception v3 等),也可以在里面找到计算机视觉里常用的图像变换。

现在主要讲一下怎利用其中的ResNet框架提取图像特征用以输入后续网络训练,以前常用caffe版本的VGG19提取图像特征,比如获得最后全连接4096维输出,或者pool5的feature maps: 77512,因此若要提取所需要layer的特征,需要分析其extractor的网络结构,如下是Resnet18、34、50、101、152分别的网络详情:
这里写图片描述
我想利用Resnet34提取77512的图像特征:
resnet34 = torchvision.models.resnet34(pretrained=True) #加载预训练过的Resnet模型
打印网络结构:

ResNet(
(conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(maxpool): MaxPool2d(kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), dilation=(1, 1), ceil_mode=False)
(layer1): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
)
(1): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
)
(2): BasicBlock(
(conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True)
)
)
(layer2): Sequential(
(0): BasicBlock(
(conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(downsample): Sequential(
(0): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
)
(2): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
)
(3): BasicBlock(
(conv1): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True)
)
)
(layer3): Sequential(
(0): BasicBlock(
(conv1): Conv2d(128, 256, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(downsample): Sequential(
(0): Conv2d(128, 256, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
(2): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
(3): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
(4): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
(5): BasicBlock(
(conv1): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True)
)
)
(layer4): Sequential(
(0): BasicBlock(
(conv1): Conv2d(256, 512, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(downsample): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2), bias=False)
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)
)
(1): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)
(2): BasicBlock(
(conv1): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
(relu): ReLU(inplace)
(conv2): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
(bn2): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True)
)
)
(avgpool): AvgPool2d(kernel_size=7, stride=1, padding=0, ceil_mode=False, count_include_pad=True)
(fc): Linear(in_features=512, out_features=1000, bias=True)
)

我们所需要的是77512的feature maps 因此不需要最后两层的avgpool与Linear,我记得torch是可以直接选择输出对应层的特征的,但是pytorch目前还不知到怎样直接获得,因此选择删掉网络中的后两层,前面的层保持不变。

# -*- coding: utf-8 -*-

import os.path

import torch
import torch.nn as nn
from torchvision import models, transforms
from torch.autograd import Variable

import numpy as np
from PIL import Image
import json

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"

feature_path = 'Resnet34.jsnol'

def extractor(img_path, net, use_gpu):
    transform = transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(224),
            transforms.ToTensor(),
            transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])    ]
    )

    img = Image.open(img_path)
    img = transform(img)

    x = Variable(torch.unsqueeze(img, dim=0).float(), requires_grad=False)
    if use_gpu:
        x = x.cuda()
        net = net.cuda()
    y = net(x).cpu()
    y = y.view(1,-1)
    y = y.data.numpy().tolist()
    return y


if __name__ == '__main__':

    files_list = []

    image_list = "Flickr30k_image_list.txt"
    Flickr30k_path = "flickr30k-images"
    resnet34 = models.resnet34(pretrained = True)
    modules = list(resnet34.children())[:-2]      # delete the last fc layer and avgpooling.
    resnet34 = nn.Sequential(*modules) # 从list转model
    for param in resnet34.parameters():
        param.requires_grad = False

    use_gpu = True
    
    dict = {}
    with open(image_list) as f:
        for line in f:
            image_name = line.split()[0]
            image_path = os.path.join(Flickr30k_path,image_name)
            image_feature = extractor(image_path,resnet34, use_gpu)
            print(image_path+" successfully extracted!")
            dict[image_name] = image_feature


    with open(feature_path, "w") as fw:
        jsobj = json.dump(dict,fw)
        fw.write(jsobj)
        fw.close()

                                                                                                                                                                                                                          

### PyTorch 图像处理训练教程与示例代码 #### 使用PyTorch进行图像处理的基础资源 对于希望深入理解如何利用PyTorch执行图像处理任务的开发者而言,官方供的PyTorch教程是一个不可多得的学习材料[^1]。该网站不仅涵盖了基础概念介绍,还含了丰富的实践案例研究。 #### 预训练模型的应用实例——基于FCN-ResNet101实现语义分割 具体到实际应用层面,在图像分割领域内有一个很好的例子可以借鉴:采用由PyTorch官方供并预先训练好的FCN-ResNet101模型来进行语义分割操作。此模型特别针对COCO 2017数据集中的一部分进行了优化调整,能够识别多达二十种不同类型的物体轮廓[^2]。 ```python import torch from torchvision import models, transforms from PIL import Image import matplotlib.pyplot as plt # 加载预训练模型 model = models.segmentation.fcn_resnet101(pretrained=True).eval() # 定义图片转换流程 preprocess = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]), ]) img = Image.open("path_to_image").convert('RGB') input_tensor = preprocess(img) input_batch = input_tensor.unsqueeze(0) with torch.no_grad(): output = model(input_batch)['out'][0] output_predictions = output.argmax(0) plt.imshow(output_predictions.byte().cpu()) plt.show() ``` 这段代码展示了怎样加载一个已经过充分训练的FCN-ResNet101模型,并将其应用于新的输入图像之上以获得相应的预测结果。 #### 自定义神经网络的设计与训练过程概述 当涉及到构建自定义架构来解决特定问题时,则可能需要更加灵活的方法论指导。例如创建一个简单的卷积神经网络(CNN),并通过适当的数据增强手段升泛化能力;同时记录下每次迭代期间的关键性能指标变化情况以便后续分析评估[^3]。 ```python import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, Dataset from torchvision.transforms import Compose, RandomHorizontalFlip, ToTensor class CustomDataset(Dataset): def __init__(self, transform=None): self.transform = transform def __len__(self): pass # 实现获取样本总数逻辑 def __getitem__(self, idx): pass # 返回单个样本及其标签 def train_model(model, criterion, optimizer, scheduler, num_epochs=25): best_acc = 0.0 for epoch in range(num_epochs): print(f'Epoch {epoch}/{num_epochs - 1}') running_loss = 0.0 corrects = 0 for inputs, labels in dataloaders['train']: outputs = model(inputs) loss = criterion(outputs, labels) optimizer.zero_grad() loss.backward() optimizer.step() _, preds = torch.max(outputs, 1) corrects += torch.sum(preds == labels.data) running_loss += loss.item() * inputs.size(0) epoch_loss = running_loss / dataset_sizes['train'] epoch_acc = corrects.double() / dataset_sizes['train'] if phase == 'val': if epoch_acc > best_acc: best_acc = epoch_acc print('{} Loss: {:.4f} Acc: {:.4f}'.format( phase, epoch_loss, epoch_acc)) scheduler.step() ``` 上述代码片段给出了一个完整的训练循环框架,其中涉及到了损失计算、反向传播更新参数以及周期性的学习率衰减策略等方面的内容。 #### UNet结构下的医学影像分割实战指南 最后值得一的是,在某些特殊应用场景比如医疗健康行业当中,UNet作为一种经典的编码解码器结构被广泛采纳用来完成高质量的二值化掩膜生成工作。通过精心设计的数据扩充方案配合高效的GPU加速机制,可以在较短时间内取得令人满意的实验成果[^4]。 ```python class DoubleConv(nn.Module): """(convolution => [BN] => ReLU) * 2""" def __init__(self, in_channels, out_channels): super().__init__() self.double_conv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): return self.double_conv(x) class Down(nn.Module): """Downscaling with maxpool then double conv""" def __init__(self, in_channels, out_channels): super().__init__() self.maxpool_conv = nn.Sequential( nn.MaxPool2d(2), DoubleConv(in_channels, out_channels) ) def forward(self, x): return self.maxpool_conv(x) class Up(nn.Module): """Upscaling then double conv""" def __init__(self, in_channels, out_channels, bilinear=True): super().__init__() if bilinear: self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) self.conv = DoubleConv(in_channels, out_channels, in_channels // 2) else: self.up = nn.ConvTranspose2d(in_channels , in_channels // 2, kernel_size=2, stride=2) self.conv = DoubleConv(in_channels, out_channels) def forward(self, x1, x2): x1 = self.up(x1) diffY = x2.size()[2] - x1.size()[2] diffX = x2.size()[3] - x1.size()[3] x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2, diffY // 2, diffY - diffY // 2]) x = torch.cat([x2, x1], dim=1)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

猴猴猪猪

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值