语义分割--DeconvNet--Learning Deconvolution Network for Semantic Segmentation

提出了一种新的deconvolution network用于改进FCN在语义分割任务中的表现。该网络包含卷积和反卷积两部分,卷积部分用于特征提取,反卷积部分则用于生成分割结果。通过unpooling和deconvolution操作,能够更好地保留细节信息。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Learning Deconvolution Network for Semantic Segmentation
ICCV2015
http://cvlab.postech.ac.kr/research/deconvnet/
https://devhub.io/zh/repos/myungsub-DeconvNet

本文提出了一个 deconvolution network 用于语义分割,还是针对 FCN 改善的。在pooling operation 记录 the locations of maximum activations ,将这些位置用于 unpooling
这一点和 SegNet 是一样的。

首先来说说FCN 有什么问题?
这里写图片描述

FCN 的 limitations:
1)FCN 因为其固定尺寸receptive field只能解决单尺度的semantics ,对于过大过小的目标分割都有可能有问题
the network can handle only a single scale semantics within image due to the fixed-size receptive field. Therefore, the object that is substantially larger or smaller than the receptive field may be fragmented or mislabeled.
2)FCN 的 deconvolution procedure 太粗糙太简单,FCN 的 deconvolution procedure输入尺寸只有16 × 16,将这个尺寸通过 bilinear interpolation 放大到输入图像尺寸。目标很多细节信息丢失

3 System Architecture
这里写图片描述
我们的网络包括两个部分:convolution and deconvolution networks

卷积网络用于提取特征,使用 VGG 16-layer net
convolution network corresponds to feature extractor

反卷积网络根据特征产生分割结果
deconvolution network is a shape generator that produces object segmentation from the feature extracted from the convolution network.

3.2. Deconvolution Network for Segmentation
反卷积网络中主要有两个操作步骤: unpooling and deconvolution

3.2.1 Unpooling
Pooling 会产生降采样的效果,导致一些细节丢失。 spatial information within a receptive field is lost during pooling,这些信息对于语义分割很重要。为了解决这个问题,我们在反卷积网络中使用 unpooling layers, It
records the locations of maximum activations selected during pooling operation in switch variables, which are employed to place each activation back to its original pooled location.

3.2.2 Deconvolution
unpooling layers 得到一个放大的但是稀疏的响应特征图, 这里通过deconvolution layers 来将稀疏的特征变为稠密的特征
The deconvolution layers densify the sparse activations obtained by unpooling through convolution-like operations with multiple learned filters.

The learned filters in deconvolutional layers correspond to bases to reconstruct shape of an input object.

这里写图片描述

这里写图片描述
这里写图片描述

Unpooling captures example-specific structures
learned filters in deconvolutional layers tend to capture class-specific shapes

与FCN 效果对比:
这里写图片描述

PASCAL VOC 2012 test set
这里写图片描述

instance-wise prediction 的好处
这里写图片描述

本文算法较FCN的细节要好些
这里写图片描述

FCN 的全局观要好一些
这里写图片描述

与FCN的结合
这里写图片描述

### Fully Convolutional Networks with ResNet for Semantic Segmentation Implementation and Tutorials Fully Convolutional Networks (FCNs) combined with the ResNet architecture have become a popular choice for semantic segmentation tasks due to their effectiveness in capturing spatial information at multiple scales. The integration of FCN with ResNet leverages deep convolutional networks while maintaining high-resolution predictions. #### Understanding FCN-ResNet Architecture The core idea behind using ResNet within an FCN framework is to take advantage of residual learning, which helps mitigate vanishing gradient problems during training on very deep architectures[^1]. This combination allows models like FCN-ResNet50 or FCN-ResNet101 to achieve state-of-the-art performance across various datasets by effectively handling complex patterns found in images. #### Key Components Involved To implement such systems typically involves several key components: - **Backbone Network**: Utilizing pre-trained ResNet variants as feature extractors. - **Deconvolution Layers/Upsampling Operations**: These layers help restore the resolution lost through pooling operations performed earlier in the network pipeline. - **Skip Connections**: Integrating lower-level features from early stages into higher levels via skip connections can improve localization accuracy significantly. An example code snippet demonstrating how one might set up this kind of model could look something like below when implemented using PyTorch: ```python import torch.nn as nn from torchvision import models class FCNResNet(nn.Module): def __init__(self, num_classes=21): super(FCNResNet, self).__init__() resnet = models.resnet101(pretrained=True) self.backbone = nn.Sequential(*list(resnet.children())[:-2]) # Add deconv layer(s), e.g., transposed convolutions here... def forward(self, x): ... ``` This setup initializes a backbone based on `resnet101`, removing fully connected layers that are not needed for pixel-wise prediction tasks typical in segmentation scenarios. For detailed guidance including dataset preparation, loss function selection, optimization strategies, etc., resources available online provide comprehensive tutorials covering these aspects thoroughly. Websites dedicated to computer vision projects often host well-documented examples alongside explanations about best practices specific to implementing FCN-based solutions.
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值