Rasterized Edge Gradients: Handling Discontinuities Differentiably 翻译-优快云博客

本文链接：https://blog.youkuaiyun.com/Doc2X/article/details/143978813

Doc2X | 专注学术文档翻译
支持 PDF 转 Word、多栏识别和沉浸式双语翻译，为您的论文处理和学术研究提供全方位支持。
Doc2X | Academic Document Translation Expert
Support PDF to Word, multi-column recognition, and immersive bilingual translation for comprehensive academic research assistance.
👉 了解 Doc2X 功能 | Learn About Doc2X

原文链接：https://arxiv.org/pdf/2405.02508

Rasterized Edge Gradients: Handling Discontinuities Differentiably

光栅化边缘梯度：以可微分方式处理不连续性

Stanislav Pidhorskyi , Tomas Simon, Gabriel Schwartz. He Wen, Yaser Sheikh, and Jason Saragih

Stanislav Pidhorskyi, Tomas Simon, Gabriel Schwartz, He Wen, Yaser Sheikh, 和 Jason Saragih

Reality Labs, Meta, Pittsburgh, Pennsylvania, USA

Reality Labs, Meta, 匹兹堡，宾夕法尼亚州，美国

Abstract. Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the ras-terized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks.

摘要。计算渲染过程的梯度对于计算机视觉和图形学的多种应用至关重要。然而，由于不连续性和渲染近似，特别是对于基于表面的表示和基于光栅化的渲染，准确计算这些梯度具有挑战性。我们提出了一种新颖的方法，用于计算基于光栅化的可微分渲染器在可见性不连续处的梯度。我们的方法通过精心设计的近似策略，优雅地简化了传统上复杂的问题，提供了一个直接、有效且高效的解决方案。我们引入了一个新的微边缘概念，使我们能够将光栅化图像视为与固有的非可微分、离散像素光栅化相一致的可微分、连续过程的结果。这项技术消除了对渲染近似或其他前向传递修改的需求，保留了渲染图像的完整性，使其适用于光栅化掩码、深度和法线图像，在这些图像中过滤是不可行的。利用微边缘简化了不连续处的梯度解释，并能够处理几何交集，相比现有技术具有优势。我们在动态人头场景重建中展示了我们的方法，展示了有效处理相机图像和分割掩码的能力。

1 Introduction

1 引言

Significant advances have been made in recent years in modeling real 3D objects from image measurements. Much of this advancement can be credited to improvements in inverse rendering, which enables the automatic inference of 3D scene parameters that best reconstruct the images. While volumetric methods like NeRF [31] simplify inverse rendering by eliminating the need for predefined scene topology, classical mesh-based representations remain widely used due to their efficiency in modeling opaque surfaces. These representations often rely on highly performant rasterization, which requires discrete operations to reason about ordering and coverage, which poses challenges for gradient computations. Our main contribution is a theoretical framework for approximating gradients of a rasterization process that simultaneously elucidates previous constructions, arrives at a simple, fast, and accurate formulation, and further improves gradient approximation accuracy over state-of-the-art methods.

近年来，从图像测量中建模真实三维物体取得了显著进展。这一进步很大程度上归功于逆渲染技术的改进，逆渲染技术能够自动推断出最佳重建图像的三维场景参数。虽然像 NeRF [31] 这样的体积方法通过消除对预定义场景拓扑的需求简化了逆渲染，但经典的基于网格的表示仍然被广泛使用，因为它们在建模不透明表面时效率高。这些表示通常依赖于高性能的光栅化技术，这需要离散操作来推理顺序和覆盖范围，这对梯度计算提出了挑战。我们的主要贡献是一个理论框架，用于近似光栅化过程的梯度，该框架同时阐明了先前的构造，得出了一个简单、快速且准确的公式，并进一步提高了对现有最先进方法的梯度近似精度。

In rasterization, a rendered pixel’s value is determined by the foremost triangle covering it. Pixels whose footprint lies entirely within a single triangle are straightforward to handle, as infinitesimal mesh motion does not change their triangle membership. However, pixels on triangle boundaries, either between adjacent triangles, at occlusion boundaries, or at intersections between triangles, must account for how their triangle membership changes in response to infinitesimal changes in the mesh. Existing methods employ different approximations of soft-membership to account for this, enabling gradient computation in those areas, effectively compositing contributions from member triangles via a weighted sum. For example, [20] uses the anti-aliasing approach, 14 averages monte-carlo samples during ray-casting, and 40 extends the influence of boundary triangles using a falloff function. In this work, we show that all these methods exhibit approximation errors that can fail to compute correct gradients in one case or another. In particular, none of the existing methods accurately compute the gradient at triangle intersections, which can often happen during inverse rendering, especially when the current estimate is far from the desired solution.

在光栅化中，渲染像素的值由覆盖它的最前面的三角形决定。像素的足迹完全位于单个三角形内的情况很容易处理，因为微小的网格运动不会改变它们的三角形成员身份。然而，位于三角形边界上的像素，无论是相邻三角形之间、遮挡边界上，还是三角形相交处，都必须考虑其三角形成员身份如何随网格的微小变化而变化。现有方法采用不同的软成员身份近似来处理这一问题，从而在这些区域中实现梯度计算，通过加权和有效地合成成员三角形的贡献。例如，[20] 使用抗锯齿方法，14 在光线投射期间平均蒙特卡罗样本，40 使用衰减函数扩展边界三角形的影响。在这项工作中，我们表明所有这些方法都存在近似误差，可能会在某些情况下无法计算正确的梯度。特别是，现有方法都无法准确计算三角形相交处的梯度，这在逆渲染过程中经常发生，尤其是在当前估计远离所需解决方案时。

In contrast to existing works that directly tackle geometric discontinuities and rasterization’s discrete nature, we introduce a micro-edge formulation which allows to interpret the rasterized image as an outcome of a continuous process that coincidentally aligns with discrete-pixel rasterization, simplifying gradient computation significantly. Unlike some other works [20,40], our formulation achieves good gradient estimates without altering the rasterization forward pass, maintaining the rasterized image’s integrity. This is crucial for optimizing segmentation masks, depth maps, and normal maps where filtering or smoothing is not feasible, e.g. anisotropic filtering or soft rasterization would mix normals from different surfaces misrepresenting the geometry. The simplifications offered by our micro-edge formulation allow us to seamlessly handle discontinuities caused by geometry intersections, offering an advantage over the prior art.

与直接处理几何不连续性和光栅化离散性的现有工作相比，我们引入了一种微边公式，该公式允许将光栅化图像解释为与离散像素光栅化偶然对齐的连续过程的结果，从而显著简化了梯度计算。与一些其他工作 [20,40] 不同，我们的公式在不改变光栅化前向传递的情况下实现了良好的梯度估计，保持了光栅化图像的完整性。这对于优化分割掩码、深度图和法线图至关重要，因为在这些情况下过滤或平滑是不可行的，例如各向异性过滤或软光栅化会混合来自不同表面的法线，从而错误地表示几何形状。我们的微边公式提供的简化使我们能够无缝处理由几何交点引起的不连续性，从而优于现有技术。

In summary, our paper introduces a straightforward, accurate, and efficient method for computing gradients in rasterization, comparable to existing techniques but with greater simplicity and the ability to handle self-intersections. We analyze our method’s errors from a theoretical standpoint and compare gradients qualitatively and quantitatively with finite differences and other methods across various test cases. We also assess runtime efficiency and accuracy by image size and showcase qualitative and quantitative findings on a synthetic blender dataset 32 . Our method excels in complex applications, such as detailed dynamic human head reconstructions, effectively managing the intricate details of the inner mouth region with significant occlusions and deformations.

总之，我们的论文介绍了一种简单、准确且高效的光栅化梯度计算方法，与现有技术相当，但更简单且能够处理自交点。我们从理论角度分析了我们的方法的误差，并在各种测试案例中定性和定量地比较了我们的梯度与有限差分和其他方法。我们还通过图像大小评估了运行时效率和准确性，并在合成 blender 数据集 32 上展示了定性和定量的发现。我们的方法在复杂应用中表现出色，例如详细的动态人头重建，有效地管理了具有显著遮挡和变形的口腔内部区域的复杂细节。

2 Related work

2 相关工作

Differentiable rendering is one of the critical building blocks for many existing computer vision problems,such as $3\mathrm{\;d}$ object reconstruction10,17,45,50,51, 3d object prediction 5, 8,9,12, pose estimation 1,6, 11,38 novel view synthesis4,18,32,33,54,as well as newly emerging applications such as text-to-3D generative models [22, 39, 43, 44]. Differentiable rendering employs a wide array of underlying 3D representations including explicit surfaces like mesh-based representations [19, 20, 27], implicit surfaces [2, 16, 37, 46], point clouds [18, 42, 48], explicit or implicit volume representations [4, 23, 26, 32, 35, 50, 52], and hybrid representations 7, 33 . For a comprehensive introduction to differentiable rendering, we refer the reader to Zhao et al.'s excellent SIGGRAPH course notes 55 .

可微渲染是许多现有计算机视觉问题的关键构建模块之一，例如 $3\mathrm{\;d}$ 物体重建10,17,45,50,51, 三维物体预测5, 8,9,12, 姿态估计1,6, 11,38, 新颖视图合成4,18,32,33,54，以及新兴的应用如文本到三维生成模型 [22, 39, 43, 44]。可微渲染采用多种底层三维表示，包括显式表面如基于网格的表示 [19, 20, 27]，隐式表面 [2, 16, 37, 46]，点云 [18, 42, 48]，显式或隐式体积表示 [4, 23, 26, 32, 35, 50, 52]，以及混合表示 7, 33。关于可微渲染的全面介绍，我们建议读者参考 Zhao 等人的优秀 SIGGRAPH 课程笔记 55。

Mesh optimization often poses more challenges than implicit and volumetric methods, leading to the latter’s prevalence in the field, as noted by Roessle et al. 41. Despite this, meshes excel in high-performance rendering and imply registration with predefined topologies. Mesh rendering methods can be categorized into ray tracing-based and rasterization-based, with the former including frameworks like Mitsuba 3 [15, 36]. Differentiable ray tracing, aimed at direct illumination or path tracing, is computationally expensive but principled. Key advancements by Li et al. [21] and Zhang et al. [53] split this gradient into discontinuity integrals with Dirac delta functions and differentiable continuous parts. Li et al. replaced Dirac delta integration with boundary line integration, while Zhang et al. applied Reynold’s transport theorem for a similar decomposition. Both methods are intricate, necessitating silhouette edge sampling - a significant bottleneck in their application. Loubet et al. [28] proposed a variable change to simplify discontinuity integration, further refined by Bangaru et al. 3 using warped area fields to correct biases, yet practical application challenges persist. Despite their flexibility, the computational expense of such ray tracing frameworks restrict their practicality in optimization-intensive tasks [20].

网格优化通常比隐式和体积方法更具挑战性，导致后者在领域中更为普遍，正如Roessle等人所指出的那样。尽管如此，网格在高性能渲染方面表现出色，并且暗示与预定义拓扑结构的注册。网格渲染方法可以分为基于光线追踪和基于光栅化的方法，前者包括Mitsuba 3 [15, 36]等框架。可微分光线追踪，旨在直接光照或路径追踪，计算成本高昂但原则性强。Li等人[21]和张等人[53]的关键进展将这一梯度分解为带有狄拉克δ函数的间断积分和可微分的连续部分。Li等人用边界线积分取代了狄拉克δ积分，而张等人则应用了雷诺传输定理进行类似的分解。这两种方法都较为复杂，需要轮廓边缘采样——这是它们应用中的一个显著瓶颈。Loubet等人[28]提出了一种变量变化来简化间断积分，Bangaru等人进一步使用扭曲面积场来纠正偏差，但实际应用中仍存在挑战。尽管这些光线追踪框架具有灵活性，但其计算成本限制了它们在优化密集型任务中的实用性[20]。

For rasterization-based methods, rendering meshes differentiably is hindered by non-differentiable visibility at boundaries. Unlike ray tracing, which uses boundary integrals for visibility gradients, rasterization’s fixed-grid and z-buffer approach lacks this feature. Solutions typically involve approximating derivatives or approximating rendering to achieve inherent differentiability. De La Gorce et al. [19] and Loper et al. [27] pioneered derivative approximation in rasteriza-tion. De La Gorce et al. distinguished between continuous regions and occlusion boundaries, introducing ‘occlusion forces’ for the latter. Loper et al. simplified this by detecting discontinuities post-rasterization and employing differential filters, like the Sobel filter, for boundary approximation.

对于基于光栅化的方法，渲染网格的可微性受到边界处不可微的可见性阻碍。与使用边界积分计算可见性梯度的光线追踪不同，光栅化的固定网格和z缓冲区方法缺乏这一特性。解决方案通常涉及近似导数或近似渲染以实现固有的可微性。De La Gorce等人[19]和Loper等人[27]在光栅化中开创了近似导数的计算。De La Gorce等人区分了连续区域和遮挡边界，为后者引入了“遮挡力”。Loper等人通过在光栅化后检测不连续性并使用Sobel滤波器等微分滤波器进行边界近似，简化了这一过程。

The ‘Nvdiffrast’ approach by Laine et al. [20] tackles point-sampled visibility issues with innovative differentiable analytic antialiasing, which transforms sharp discontinuities into smooth transitions, enabling gradient computation. Analytical antialiasing is achieved by estimating coverage using silhouette edges. A limitation of the method is that if triangles containing silhouette edges do not overlap with any pixel center, they will be overlooked during the antialiasing process. Moreover, analytical antialiasing is inherently approximate, so the obtained gradients are also approximate. Regrettably, the approach is still quite complex, modifies the rasterized image, and requires a specialized data structure for connectivity and detailed pixel coverage computations. On the other hand, methods like Rhodin et al. [40] and Liu et al.'s ‘Soft Rasterizer’ 24 opt for modifying the rendering model. Rhodin et al. introduce fuzzy edges to soften discontinuities, while ‘Soft Rasterizer’ further blurs boundaries and averages depth contributions, facilitating gradient propagation across occluded primitives.

Laine等人[20]提出的“Nvdiffrast”方法通过创新的微分分析抗锯齿技术解决了点采样可见性问题，该技术将锐利的不连续性转化为平滑的过渡，从而实现梯度计算。通过使用轮廓边缘估计覆盖率来实现分析抗锯齿。该方法的一个限制是，如果包含轮廓边缘的三角形不与任何像素中心重叠，它们将在抗锯齿过程中被忽略。此外，分析抗锯齿本质上是近似的，因此获得的梯度也是近似的。遗憾的是，该方法仍然相当复杂，修改了光栅化图像，并且需要专门的数据结构来计算连接性和详细的像素覆盖率。另一方面，Rhodin等人[40]和Liu等人的“软光栅化器”[24]选择修改渲染模型。Rhodin等人引入了模糊边缘以软化不连续性，而“软光栅化器”进一步模糊边界并平均深度贡献，促进了遮挡原语之间的梯度传播。

In this work, we deliberately bypass physically based rendering, global illumination, and lighting/material models, and instead focus exclusively on raster-ization-based approaches for their speed. We draw inspiration from the core idea of the edge sampling method [21] and analytical antialiasing [20] and distill this into a much simpler yet effective method that also provides accurate gradients. Notably, our method’s simplicity allows direct handling of self-penetrating geometry, an aspect not addressed in previous research. Other methods to handle self-penetrating geometry would require complex and computationally intensive geometry preprocessing at each optimization step, involving intersection detection and differentiable splitting of the faces. In contrast, our approach explicitly manages interpenetrating geometry with virtually no additional overhead.

在这项工作中，我们有意绕过基于物理的渲染、全局光照和光照/材质模型，而是专门关注基于光栅化的方法，因为它们的速度更快。我们从边缘采样方法 [21] 和解析抗锯齿 [20] 的核心思想中汲取灵感，并将其提炼成一种更简单但有效的方法，该方法还能提供准确的梯度。值得注意的是，我们方法的简单性允许直接处理自穿透几何体，这是先前研究中未涉及的方面。其他处理自穿透几何体的方法需要在每次优化步骤中进行复杂且计算密集的几何预处理，包括交点检测和可微分的面分割。相比之下，我们的方法明确管理穿透几何体，几乎没有额外的开销。

3 Preliminaries

3 预备知识

Following Li et al. [21,given a 2D pixel filter $k$ and radiance $R$ ,a pixel’s color can be written:

根据 Li 等人 [21] 的描述，给定一个 2D 像素滤波器 $k$ 和辐射度 $R$ ，像素的颜色可以表示为：

For notational convenience,as in [21],we denote $f\left( {x,y}\right) = k\left( {x,y}\right) R\left( {x,y}\right)$ . For simplicity, let us first consider the case of a single pixel. We are interested in the gradient of $\frac{\partial L\left( I\right) }{\partial \Phi }$ ,where the scalar function $L\left( I\right)$ of the rendered pixel $I$ defines our loss,and $\Phi$ are the scene parameters that we aim to optimize. According to the chain rule:

为了符号上的方便，如 [21] 中所述，我们表示 $f\left( {x,y}\right) = k\left( {x,y}\right) R\left( {x,y}\right)$ 。为了简单起见，让我们首先考虑单个像素的情况。我们对 $\frac{\partial L\left( I\right) }{\partial \Phi }$ 的梯度感兴趣，其中渲染像素 $I$ 的标量函数 $L\left( I\right)$ 定义了我们的损失，而 $\Phi$ 是我们旨在优化的场景参数。根据链式法则：

where $\frac{\partial L}{\partial I}$ is the incoming gradient from the loss function and is typically computed by automatic differentiation (AD). So, our goal becomes to find how the pixel value changes with respect to the scene parameters:

其中 $\frac{\partial L}{\partial I}$ 是来自损失函数的传入梯度，通常通过自动微分 (AD) 计算。因此，我们的目标变成了找到像素值相对于场景参数的变化：

While the function $f$ may not in fact be differentiable,its integral remains continuous and thus differentiable. Let us assume that $f$ is partitioned into two continuous half-spaces, each represented by two continuous and differentiable functions ${f}_{a}$ and ${f}_{b}$ ,respectively,as follows:

尽管函数 $f$ 可能实际上不可微分，但其积分仍然是连续的，因此是可微分的。让我们假设 $f$ 被划分为两个连续的半空间，每个半空间分别由两个连续且可微分的函数 ${f}_{a}$ 和 ${f}_{b}$ 表示，如下所示：

where $\alpha$ specifies the dividing edge,and $\theta$ is the Heaviside step function,which selects between ${f}_{a}$ and ${f}_{b}$ . The differentiation of the integral can be broken into two parts by applying the product rule. Let us start with the integrand ${f}_{a}$ :