语义分割中的常用Loss实战

1. nn.BCEWithLogitsLoss() 与 nn.CrossEntropyLoss()

      工程代码参考:https://github.com/milesial/Pytorch-UNet.git

      (1)nn.BCEWithLogitsLoss() 一般用于两个类别的分割,只有前景和背景。

      (2)nn.CrossEntropyLoss() 一般用于多个类别的分割。

        具体应用见代码和注释:

import torch
import torch.nn as nn


if __name__ == "__main__":
    # 测试nn.BCEWithLogitsLoss()
    # img = np.expand_dims(img, axis=0) 用来扩展维度
    loss = nn.BCEWithLogitsLoss()
    inputs = torch.randn((32, 1, 224, 224), requires_grad=True)
    targets = torch.empty((32, 1, 224,224)).random_(2)
    output = loss(inputs, targets)
    output.backward()


    # 测试nn.CrossEntropyLoss()
    # 以分20类为例(包括背景),targets里面的只为0~19
    loss = nn.CrossEntropyLoss()
    inputs = torch.randn((32, 20, 224, 224), requires_grad=True)
    targets = torch.empty((32, 224, 224)).random_(20).long()
    output = loss(inputs, targets)
    output.backward()

2. LovaszLossHinge() 与 LovaszLossSoftmax()

     工程代码参考:https://github.com/zonasw/unet-nested-multiple-classification.git

     用法与上面的nn.BCEWithLogitsLoss()和nn.CrossEntropyLoss()的用法基本一致。

     (1)LovaszLossHinge() 一般用于两个类别的分割,只有前景和背景。

     (2)LovaszLossSoftmax() 一般用于多个类别的分割。

       具体应用见代码和注释:

# -*- coding: utf-8 -*-
# @Time    : 2020-02-26 17:46
# @Author  : Zonas
# @Email   : zonas.wang@gmail.com
# @File    : losses.py
"""

"""
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.autograd import Function

import lovasz_losses as L


class LovaszLossSoftmax(nn.Module):
    def __init__(self):
        super(LovaszLossSoftmax, self).__init__()

    def forward(self, input, target):
        out = F.softmax(input, dim=1)
        loss = L.lovasz_softmax(out, target)
        return loss


class LovaszLossHinge(nn.Module):
    def __init__(self):
        super(LovaszLossHinge, self).__init__()

    def forward(self, input, target):
        loss = L.lovasz_hinge(input, target)
        return loss


class DiceCoeff(Function):
    """Dice coeff for individual examples"""

    def forward(self, input, target):
        self.save_for_backward(input, target)
        eps = 0.0001
        self.inter = torch.dot(input.view(-1), target.view(-1))
        self.union = torch.sum(input) + torch.sum(target) + eps

        t = (2 * self.inter.float() + eps) / self.union.float()
        return t

    # This function has only a single output, so it gets only one gradient
    def backward(self, grad_output):

        input, target = self.saved_variables
        grad_input = grad_target = None

        if self.needs_input_grad[0]:
            grad_input = grad_output * 2 * (target * self.union - self.inter) \
                         / (self.union * self.union)
        if self.needs_input_grad[1]:
            grad_target = None

        return grad_input, grad_target


def dice_coeff(input, target):
    """Dice coeff for batches"""
    if input.is_cuda:
        s = torch.FloatTensor(1).cuda().zero_()
    else:
        s = torch.FloatTensor(1).zero_()

    for i, c in enumerate(zip(input, target)):
        s = s + DiceCoeff().forward(c[0], c[1])

    return s / (i + 1)


if __name__ == "__main__":
    # 测试多类别分割
    loss = LovaszLossSoftmax()
    inputs = torch.randn((32, 20, 224, 224), requires_grad=True)
    targets = torch.empty((32, 224, 224)).random_(20).long()
    output = loss(inputs, targets)
    output.backward()


    # 测试2类别分割
    loss = LovaszLossHinge()
    inputs = torch.randn((32, 1, 224, 224), requires_grad=True)
    targets = torch.empty((32, 1, 224,224)).random_(2)
    output = loss(inputs, targets)
    output.backward()

           lovasz_losses.py的实现如下:

"""
Lovasz-Softmax and Jaccard hinge loss in PyTorch
Maxim Berman 2018 ESAT-PSI KU Leuven (MIT License)
"""

from __future__ import print_function, division

import torch
from torch.autograd import Variable
import torch.nn.functional as F
import numpy as np
try:
    from itertools import  ifilterfalse
except ImportError: # py3k
    from itertools import  filterfalse as ifilterfalse


def lovasz_grad(gt_sorted):
    """
    Computes gradient of the Lovasz extension w.r.t sorted errors
    See Alg. 1 in paper
    """
    p = len(gt_sorted)
    gts = gt_sorted.sum()
    intersection = gts - gt_sorted.float().cumsum(0)
    union = gts + (1 - gt_sorted).float().cumsum(0)
    jaccard = 1. - intersection / union
    if p > 1: # cover 1-pixel case
        jaccard[1:p] = jaccard[1:p] - jaccard[0:-1]
    return jaccard


def iou_binary(preds, labels, EMPTY=1., ignore=None, per_image=True):
    """
    IoU for foreground class
    binary: 1 foreground, 0 background
    """
    if not per_image:
        preds, labels = (preds,), (labels,)
    ious = []
    for pred, label in zip(preds, labels):
        intersection = ((label == 1) & (pred == 1)).sum()
        union = ((label == 1) | ((pred == 1) & (label != ignore))).sum()
        if not union:
            iou = EMPTY
        else:
            iou = float(intersection) / float(union)
        ious.append(iou)
    iou = mean(ious)    # mean accross images if per_image
    return 100 * iou


def iou(preds, labels, C, EMPTY=1., ignore=None, per_image=False):
    """
    Array of IoU for each (non ignored) class
    """
    if not per_image:
        preds, labels = (preds,), (labels,)
    ious = []
    for pred, label in zip(preds, labels):
        iou = []    
        for i in range(C):
            if i != ignore: # The ignored label is sometimes among predicted classes (ENet - CityScapes)
                intersection = ((label == i) & (pred == i)).sum()
                union = ((label == i) | ((pred == i) & (label != ignore))).sum()
                if not union:
                    iou.append(EMPTY)
                else:
                    iou.append(float(intersection) / float(union))
        ious.append(iou)
    ious = [mean(iou) for iou in zip(*ious)] # mean accross images if per_image
    return 100 * np.array(ious)


# --------------------------- BINARY LOSSES ---------------------------


def lovasz_hinge(logits, labels, per_image=True, ignore=None):
    """
    Binary Lovasz hinge loss
      logits: [B, H, W] Variable, logits at each pixel (between -\infty and +\infty)
      labels: [B, H, W] Tensor, binary ground truth masks (0 or 1)
      per_image: compute the loss per image instead of per batch
      ignore: void class id
    """
    if per_image:
        loss = mean(lovasz_hinge_flat(*flatten_binary_scores(log.unsqueeze(0), lab.unsqueeze(0), ignore))
                          for log, lab in zip(logits, labels))
    else:
        loss = lovasz_hinge_flat(*flatten_binary_scores(logits, labels, ignore))
    return loss


def lovasz_hinge_flat(logits, labels):
    """
    Binary Lovasz hinge loss
      logits: [P] Variable, logits at each prediction (between -\infty and +\infty)
      labels: [P] Tensor, binary ground truth labels (0 or 1)
      ignore: label to ignore
    """
    if len(labels) == 0:
        # only void pixels, the gradients should be 0
        return logits.sum() * 0.
    signs = 2. * labels.float() - 1.
    errors = (1. - logits * Variable(signs))
    errors_sorted, perm = torch.sort(errors, dim=0, descending=True)
    perm = perm.data
    gt_sorted = labels[perm]
    grad = lovasz_grad(gt_sorted)
    loss = torch.dot(F.relu(errors_sorted), Variable(grad))
    return loss


def flatten_binary_scores(scores, labels, ignore=None):
    """
    Flattens predictions in the batch (binary case)
    Remove labels equal to 'ignore'
    """
    scores = scores.view(-1)
    labels = labels.view(-1)
    if ignore is None:
        return scores, labels
    valid = (labels != ignore)
    vscores = scores[valid]
    vlabels = labels[valid]
    return vscores, vlabels


class StableBCELoss(torch.nn.modules.Module):
    def __init__(self):
         super(StableBCELoss, self).__init__()
    def forward(self, input, target):
         neg_abs = - input.abs()
         loss = input.clamp(min=0) - input * target + (1 + neg_abs.exp()).log()
         return loss.mean()


def binary_xloss(logits, labels, ignore=None):
    """
    Binary Cross entropy loss
      logits: [B, H, W] Variable, logits at each pixel (between -\infty and +\infty)
      labels: [B, H, W] Tensor, binary ground truth masks (0 or 1)
      ignore: void class id
    """
    logits, labels = flatten_binary_scores(logits, labels, ignore)
    loss = StableBCELoss()(logits, Variable(labels.float()))
    return loss


# --------------------------- MULTICLASS LOSSES ---------------------------


def lovasz_softmax(probas, labels, classes='present', per_image=False, ignore=None):
    """
    Multi-class Lovasz-Softmax loss
      probas: [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1).
              Interpreted as binary (sigmoid) output with outputs of size [B, H, W].
      labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1)
      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.
      per_image: compute the loss per image instead of per batch
      ignore: void class labels
    """
    if per_image:
        loss = mean(lovasz_softmax_flat(*flatten_probas(prob.unsqueeze(0), lab.unsqueeze(0), ignore), classes=classes)
                          for prob, lab in zip(probas, labels))
    else:
        loss = lovasz_softmax_flat(*flatten_probas(probas, labels, ignore), classes=classes)
    return loss


def lovasz_softmax_flat(probas, labels, classes='present'):
    """
    Multi-class Lovasz-Softmax loss
      probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1)
      labels: [P] Tensor, ground truth labels (between 0 and C - 1)
      classes: 'all' for all, 'present' for classes present in labels, or a list of classes to average.
    """
    if probas.numel() == 0:
        # only void pixels, the gradients should be 0
        return probas * 0.
    C = probas.size(1)
    losses = []
    class_to_sum = list(range(C)) if classes in ['all', 'present'] else classes
    for c in class_to_sum:
        fg = (labels == c).float() # foreground for class c
        if (classes is 'present' and fg.sum() == 0):
            continue
        if C == 1:
            if len(classes) > 1:
                raise ValueError('Sigmoid output possible only with 1 class')
            class_pred = probas[:, 0]
        else:
            class_pred = probas[:, c]
        errors = (Variable(fg) - class_pred).abs()
        errors_sorted, perm = torch.sort(errors, 0, descending=True)
        perm = perm.data
        fg_sorted = fg[perm]
        losses.append(torch.dot(errors_sorted, Variable(lovasz_grad(fg_sorted))))
    return mean(losses)


def flatten_probas(probas, labels, ignore=None):
    """
    Flattens predictions in the batch
    """
    if probas.dim() == 3:
        # assumes output of a sigmoid layer
        B, H, W = probas.size()
        probas = probas.view(B, 1, H, W)
    B, C, H, W = probas.size()
    probas = probas.permute(0, 2, 3, 1).contiguous().view(-1, C)  # B * H * W, C = P, C
    labels = labels.view(-1)
    if ignore is None:
        return probas, labels
    valid = (labels != ignore)
    vprobas = probas[valid.nonzero().squeeze()]
    vlabels = labels[valid]
    return vprobas, vlabels

def xloss(logits, labels, ignore=None):
    """
    Cross entropy loss
    """
    return F.cross_entropy(logits, Variable(labels), ignore_index=255)


# --------------------------- HELPER FUNCTIONS ---------------------------
def isnan(x):
    return x != x
    
    
def mean(l, ignore_nan=False, empty=0):
    """
    nanmean compatible with generators.
    """
    l = iter(l)
    if ignore_nan:
        l = ifilterfalse(isnan, l)
    try:
        n = 1
        acc = next(l)
    except StopIteration:
        if empty == 'raise':
            raise ValueError('Empty mean')
        return empty
    for n, v in enumerate(l, 2):
        acc += v
    if n == 1:
        return acc
    return acc / n

3. DiceLoss()

      工程代码参考:https://github.com/ooooverflow/BiSeNet.git

      可用于两个类别的分割和多个类别的分割。

      具体用法见代码和注释:

import torch.nn as nn
import torch
import torch.nn.functional as F

def flatten(tensor):
    """Flattens a given tensor such that the channel axis is first.
    The shapes are transformed as follows:
       (N, C, D, H, W) -> (C, N * D * H * W)
    """
    C = tensor.size(1)
    # new axis order
    axis_order = (1, 0) + tuple(range(2, tensor.dim()))
    # Transpose: (N, C, D, H, W) -> (C, N, D, H, W)
    transposed = tensor.permute(axis_order)
    # Flatten: (C, N, D, H, W) -> (C, N * D * H * W)
    return transposed.contiguous().view(C, -1)


class DiceLoss(nn.Module):
    def __init__(self):
        super().__init__()
        self.epsilon = 1e-5

    def forward(self, output, target):
        assert output.size() == target.size(), "'input' and 'target' must have the same shape"
        output = F.softmax(output, dim=1)
        output = flatten(output)
        target = flatten(target)
        # intersect = (output * target).sum(-1).sum() + self.epsilon
        # denominator = ((output + target).sum(-1)).sum() + self.epsilon

        intersect = (output * target).sum(-1)
        denominator = (output + target).sum(-1)
        dice = intersect / denominator
        dice = torch.mean(dice)
        return 1 - dice
        # return 1 - 2. * intersect / denominator


if __name__ == "__main__":
    # 可用于多个类别的分割,下面以2类分割为例说明
    # target 每个像素点的值都要转化为独热编码的形式
    loss = DiceLoss()
    inputs = torch.randn((32, 2, 224, 224), requires_grad=True)
    targets = torch.empty((32, 2, 224, 224)).random_(2).long()
    output = loss(inputs, targets)
    output.backward()

      如果想将彩色图转化为独热编码的形式,参考:博客

 

如有疑问,请留言,欢迎讨论。

### 图像语义分割实战教程 #### 使用PyTorch实现图像语义分割 以下是基于PyTorch的一个简单图像语义分割流程,适合初学者理解和实践: 1. **环境搭建** 需要先安装PyTorch及其依赖项。可以参考官方文档完成安装过程[^2]。 2. **数据集准备** 数据集的选择对于语义分割至关重要。常用的数据集有Cityscapes、PASCAL VOC等。可以通过`torchvision.datasets`模块加载标准数据集。 3. **模型构建** 基于U-Net架构的模型是一个常见的选择。以下是一个简单的U-Net实现代码片段: ```python import torch.nn as nn import torch class DoubleConv(nn.Module): def __init__(self, in_channels, out_channels): super(DoubleConv, self).__init__() self.conv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): return self.conv(x) class UNet(nn.Module): def __init__(self, in_channels=3, out_channels=1): super(UNet, self).__init__() filters = [64, 128, 256, 512] # 下采样路径 self.down_conv_1 = DoubleConv(in_channels, filters[0]) self.pool_1 = nn.MaxPool2d(kernel_size=2, stride=2) self.down_conv_2 = DoubleConv(filters[0], filters[1]) self.pool_2 = nn.MaxPool2d(kernel_size=2, stride=2) self.down_conv_3 = DoubleConv(filters[1], filters[2]) self.pool_3 = nn.MaxPool2d(kernel_size=2, stride=2) self.bottleneck = DoubleConv(filters[2], filters[3]) # 上采样路径 self.up_transpose_1 = nn.ConvTranspose2d(filters[3], filters[2], kernel_size=2, stride=2) self.up_conv_1 = DoubleConv(filters[3], filters[2]) self.up_transpose_2 = nn.ConvTranspose2d(filters[2], filters[1], kernel_size=2, stride=2) self.up_conv_2 = DoubleConv(filters[2], filters[1]) self.final_layer = nn.Conv2d(filters[1], out_channels, kernel_size=1) def forward(self, x): down_x1 = self.down_conv_1(x) pool_x1 = self.pool_1(down_x1) down_x2 = self.down_conv_2(pool_x1) pool_x2 = self.pool_2(down_x2) down_x3 = self.down_conv_3(pool_x2) pool_x3 = self.pool_3(down_x3) bottleneck = self.bottleneck(pool_x3) up_x1 = self.up_transpose_1(bottleneck) concat_x1 = torch.cat([up_x1, down_x3], dim=1) up_conv_x1 = self.up_conv_1(concat_x1) up_x2 = self.up_transpose_2(up_conv_x1) concat_x2 = torch.cat([up_x2, down_x2], dim=1) up_conv_x2 = self.up_conv_2(concat_x2) output = self.final_layer(up_conv_x2) return output ``` 上述代码展示了如何使用双卷积层和转置卷积层来构建一个基础的U-Net模型[^4]。 4. **定义损失函数与优化器** 对于语义分割任务,交叉熵损失(CrossEntropyLoss)是一种常见选择。此外,Adam优化器因其高效性和稳定性被广泛应用于深度学习项目中。 ```python criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.001) ``` 5. **训练与验证** 训练过程中需迭代更新权重并监控性能指标如IoU(Intersection over Union)。具体实现可参见相关文献。 --- #### 使用TensorFlow/Keras实现图像语义分割 如果倾向于使用TensorFlow,则可以选择Keras作为高层次API简化开发工作流。下面给出一段简易版FCN实现示例: ```python from tensorflow.keras.models import Model from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, UpSampling2D, concatenate def fcn_model(input_shape=(256, 256, 3), num_classes=2): inputs = Input(shape=input_shape) conv1 = Conv2D(64, (3, 3), activation='relu', padding='same')(inputs) pool1 = MaxPooling2D((2, 2))(conv1) conv2 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool1) pool2 = MaxPooling2D((2, 2))(conv2) conv3 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool2) up1 = UpSampling2D((2, 2))(conv3) merge1 = concatenate([conv2, up1], axis=-1) up2 = UpSampling2D((2, 2))(merge1) merge2 = concatenate([conv1, up2], axis=-1) outputs = Conv2D(num_classes, (1, 1), activation='softmax')(merge2) model = Model(inputs=[inputs], outputs=[outputs]) return model ``` 此代码实现了基本的FCN结构,并采用了逐点相加的方式进行特征融合[^3]。 --- #### 总结 无论是选用PyTorch还是TensorFlow/Keras,都可以找到相应的资源支持图像语义分割的研究与发展需求。两者各有优势,在实际应用时可根据团队熟悉程度和个人偏好决定取舍。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

CV-deeplearning

请博主加个火腿

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值