PyTorch crop images differentiablly

本文介绍了如何在PyTorch中进行全微分图像裁剪。首先,解释了PyTorch图像坐标系,这是一个左手法则笛卡尔坐标系,坐标归一化到[-1,1]。接着,讨论了仿射变换理论,用于从裁剪后的图像坐标系统映射到原始图像系统。最后,提供了将仿射变换参数化为矩阵Θ的代码实现,以及从裁剪图像坐标找到原始图像坐标的函数。" 112246069,10295287,Contextual Loss在图像风格迁移的应用与理解,"['图像处理', '深度学习', 'CNN', '风格迁移', '损失函数']

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Intro

PyTorch provides a variety of means to crop images. For example, torchvision.transforms provides several functions to crop PIL images; PyTorch Forum provides an answer of how to crop image in a differentiable way (differentiable with respect to the image). However, sometimes we need a fully differentiable approach for the cropping action itself. How shall we implement that?

Theory: Affine transformation

Before reaching the answer, we need first to learn about the image coordinate system in PyTorch. It is a left-handed Cartesian system origined at the middle of an image. The coordinate has been normalized to range [ − 1 , 1 ] [-1,1] [1,1], where ( − 1 , − 1 ) (-1,-1) (1,1) indicates the top-left corner, and ( 1 , 1 ) (1,1) (1,1) indicates the bottom-right corner, as pointed out by the doc.

Let ( x , y ) (x,y) (x,y) be the top-left corner of the cropped image with respect to the coordinate of the original image; likewise, we denote ( x ′ , y ′ ) (x',y') (x,y) as the bottom-right corner of the cropped image. It’s clear that ( x , y ) (x,y) (x,y) corresponds to ( − 1 , − 1 ) (-1,-1) (1,1) with respect to the cropped image coordinate system, and ( x ′ , y ′ ) (x',y') (x,y) corresponds to ( 1 , 1 ) (1,1) (1,1). We’d like a function f f f that maps from the cropped image system to the original image system for every point in the cropped image. Since only scaling and translation are involved, the function f f f can be parameterized by an affine transformation matrix Θ \Theta Θ such that

Θ = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) \Theta = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=θ11000θ220θ13θ231

where θ 12 = θ 21 = 0 \theta_{12}=\theta_{21}=0 θ12=θ21=0 since skewing is not involved. Denote u H \mathbf{u}_H uH as the homogeneous coordinate of u = ( u v ) T \mathbf{u}=\begin{pmatrix}u & v\\ \end{pmatrix}^T u=(uv)T such that u H = ( u T 1 ) T \mathbf{u}_H=\begin{pmatrix}\mathbf{u}^T&1\end{pmatrix}^T uH=(uT1)T, Θ \Theta Θ maps u H \mathbf{u}_H uH with respect to the cropped image system to x H \mathbf{x}_H xH with respect to the original image system, i.e. x H = Θ u H \mathbf{x}_H = \Theta \mathbf{u}_H xH=ΘuH. Thus,

( x x ′ y y ′ 1 1 ) = ( θ 11 0 θ 13 0 θ 22 θ 23 0 0 1 ) ( − 1 1 − 1 1 1 1 ) \begin{pmatrix} x & x'\\ y & y'\\ 1 & 1 \end{pmatrix} = \begin{pmatrix} \theta_{11} & 0 & \theta_{13}\\ 0 & \theta_{22} & \theta_{23}\\ 0 & 0 & 1\\ \end{pmatrix} \begin{pmatrix} -1 & 1\\ -1 & 1\\ 1 & 1\\ \end{pmatrix} xy1xy1=θ11000θ220θ13θ231111111

Solving the equations,

Θ = ( x ′ − x 2 0 x ′ + x 2 0 y ′ − y 2 y ′ + y 2 0 0 1 ) \Theta = \begin{pmatrix} \frac{x'-x}{2} & 0 & \frac{x'+x}{2}\\ 0 & \frac{y'-y}{2} & \frac{y'+y}{2}\\ 0 & 0 & 1\\ \end{pmatrix} Θ=2xx0002yy02x+x2y+y1

where x ′ ≥ x , y ′ ≥ y x'\ge x, y' \ge y xx,yy.

Coding time

We’ll need two functions:

  1. torch.nn.functional.affine_grid to convert the Θ \Theta Θ parameterization to f f f
  2. torch.nn.functional.grid_sample to find the corresponding original image coordinate from each cropped image coordinate
import torch
import torch.nn.functional as F

B, C, H, W = 16, 3, 224, 224  # batch size, input channels
                              # original image height and width
# Let `I` be our original image
I = torch.rand(B, C, H, W)
# Set the (x,y) and (x',y') to define the rectangular region to crop
x, y = -0.5, -0.3  # some examplary random coordinates;
x_, y_ = 0.7, 0.8  # in practice, (x,y,x_,y_) might be predicted
                   # as a tensor in the computation graph
# Set the affine parameters
theta = torch.tensor([
    [(x_-x)/2,       0, (x_+x)/2],
    [       0,(y_-y)/2, (y_+y)/2],
]).unsqueeze_(0).expand(B, -1, -1)
# compute the flow field;
# where size is the output size (scaling involved)
# `align_corners` option must be the same throughout the code
f = F.affine_grid(theta, size=(B, C, H//2, W//2), align_corners=False)
I_cropped = F.grid_sample(I, f, align_corners=False)

Read also

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值