Classification with an edge: Improving semantic image segmentation with boundary detection

本文介绍了一种结合多个卷积神经网络(CNN)的方法来改进语义图像分割任务的表现。通过引入边界检测技术和多尺度CNN架构,提高了图像分割的准确性。具体地,使用了SEG-H编码器-解码器网络、HED-H多尺度CNN和FCN-N语义分割网络等模型。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Classification with an edge: Improving semantic image segmentation with boundary detection

Netwroks:

SEG-H encoder-decoder network

It’s a crossbreed of FCN and encoder-decoder architecture. Use pyramid-bottleneck architecture. Compared with SEG model, SEG-H combine the DEM and data as well in the database. For channels, except coulor channela(using pascal pre-trained model), it combines DSM and nDSM channel and initialized randomly using “Xavier” weight initialization which could make the gradient magnitude roughly the same across layers. These two streams are concatenated and fed through 1x1 convolution(linearly combines the vector of feature responses at each location into a score per class). Finally those scores are further converted to probabilities with a softmax layer.
这里写图片描述

HED-H multi-scale CNN

Add second branch for DSM. By using a regression los w.r.t. HED-H is mainly to detect the edge by height, and the color map HED-H is initialized by original HED model, and for height map it’s initialized by scratch. 这里写图片描述

FCN-N semantic segmentation network

It’s two FCN with initialzed by VGG and Pascal.这里写图片描述

Conclude

This paper is mainly describe how to fuse several CNN together.

### YOLOv8 Segmentation Loss Explanation In the context of object detection models like YOLOv8, segmentation loss plays a crucial role in training models that can accurately predict both bounding boxes and segment masks for objects within images. The seg_loss component specifically targets improving mask prediction accuracy. The segmentation loss function typically combines multiple elements to ensure comprehensive learning: - **Dice Loss**: Measures overlap between predicted and ground truth masks using the Dice coefficient formula \((2 * |X ∩ Y|) / (|X| + |Y|)\), where \(X\) represents predictions while \(Y\) stands for actual values. - **Focal Loss**: Addresses class imbalance issues by focusing more on hard-to-classify examples during training through an adjusted cross-entropy loss formulation[^1]. For practical implementation purposes, adjusting parameters associated with these losses allows fine-tuning model performance based on specific dataset characteristics or application requirements. For instance, modifying weight factors applied to each term inside `seg_loss` could emphasize either boundary precision over overall coverage depending upon project goals. ```python import torch.nn.functional as F def compute_seg_loss(pred_masks, true_masks): """ Computes combined dice and focal loss for segmentation task Args: pred_masks (Tensor): Predicted binary masks from network output true_masks (Tensor): Ground-truth binary masks Returns: Tensor: Scalar value representing total segmentation loss """ # Compute dice score numerator & denominator terms separately first intersection = (pred_masks * true_masks).sum(dim=(1, 2)) union = pred_masks.sum(dim=(1, 2)) + true_masks.sum(dim=(1, 2)) # Calculate final dice scores per sample then average across batch dimension dice_scores = ((2 * intersection + 1e-7) / (union + 1e-7)).mean() # Apply sigmoid activation before computing BCEWithLogitsLoss which internally applies log-sum-exp trick bce_loss = F.binary_cross_entropy_with_logits(pred_masks, true_masks) return -(dice_scores.log() + bce_loss) # Example Usage batch_size = 4 height_width = 640 num_classes = 1 predicted_masks = torch.randn(batch_size, num_classes, height_width, height_width) true_masks = torch.randint(0, 2, size=(batch_size, num_classes, height_width, height_width)).float() loss_value = compute_seg_loss(predicted_masks, true_masks) print(f"Segmentation Loss Value: {loss_value.item():.4f}") ``` --related questions-- 1. How does one balance between classification and segmentation losses when optimizing multi-task networks? 2. What strategies exist for handling extremely imbalanced classes within semantic segmentation tasks? 3. Can you provide insights into how transfer learning impacts initialization of weights related to segmentation heads in detectors such as YOLOv8? 4. In what ways do different types of regularization techniques influence convergence rates during training deep neural nets used for image segmentation?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值