Multi-View Mesh Reconstruction with Neural Deferred Shading

最新推荐文章于 2024-09-01 08:01:50 发布

MarvelousJ

最新推荐文章于 2024-09-01 08:01:50 发布

阅读量389

点赞数 1

文章标签：深度学习机器学习人工智能

本文链接：https://blog.youkuaiyun.com/qq_50003999/article/details/126547616

版权

本文介绍了一种结合三角网格和神经渲染的快速分析合成方法——神经延迟着色(NDS)。该方法受到实时图形技术的启发，通过两步过程实现：首先进行网格光栅化以获取像素级别的几何信息，然后使用神经着色器模拟光照与材质的交互，最终实现高效且通用的三维重建。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Multi-View Mesh Reconstruction with Neural Deferred Shading

1、Introduction

point-nerf和mobile-nerf和这篇都把mesh（geometry）和mlp联合优化。
Problem： While fully neural approaches are general, both in terms of geometry and appearance, current methods exhibit excessive runtime, making them impractical for domains that handle a large number of objects or multi-view video (e.g. of human performances).
We propose Neural Deferred Shading (NDS), a fast analysis-by-synthesis method that combines triangle meshes and neural rendering. The rendering pipeline is inspired by real-time graphics and implements a technique called deferred shading: a triangle mesh is first rasterized and the pixels are then processed by a neural shader that models the interaction of geometry, material, and light. Since the rendering pipeline, including rasterization and shading, is differentiable, we can optimize the neural shader and the surface mesh with gradient descent . The explicit geometry representation enables fast convergence while the neural shader maintains the generality of the modeled appearance.

2、Method

Given a set of images $I = \{I_1, · · · , I_n\}$ from calibrated cameras and corresponding masks $M = \{M_1, · · · , M_n\}$ , we want to estimate the 3D surface of an object shown in the images. To this end, we follow an analysis-by-synthesis approach: we find a surface that reproduces the images when rendered from the camera views. In this work, the surface is represented by a triangle mesh $G = (V, E, F)$ , consisting of vertex positions V , a set of edges E, and a set of faces F. We solve the optimization problem using gradient descent and gradually deform a mesh based on an objective function that compares renderings of the mesh to the input images.

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-uEmkQ447-1661506026965)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808194739207.png)]$

2.1. Neural Deferred Shading

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-2YqPKYCw-1661506026966)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808194845592.png)]$

Given a camera i, the mesh is rasterized in a first pass, yielding a triangle index and barycentric coordinates per pixel. This information is used to interpolate both vertex positions and vertex normals, creating a geometry buffer (g-buffer) with per-pixel positions and normals. In a second pass, the g-buffer is processed by a learned shader with parameters $\theta$ .

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-80OGzGVA-1661506026967)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808202534566.png)]$

The shader returns an RGB color value for a given position $x ∈ R^3$ , normal $n ∈ R^3$ , and view direction $ω_o = \frac{ci−x} {∥ci−x∥}$ , with $c_i ∈ R^3$ the center of camera i. It is optimized together with the geometry.

In addition to a color image, the renderer also produces a mask that indicates if a pixel is covered by the mesh.

2.2. Objective Function

Finding an estimate of shape and appearance formally corresponds to solving the following minimization problem in our framework:

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8hrWDhbn-1661506026967)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808210951145.png)]$

where $L_{appearance}$ compares the rendered appearance of the estimated surface to the camera images and $L_{geometry}$ regularizes the mesh to avoid undesired vertex configurations.

2.2.1. Apperance

The appearance objective is composed of two terms:

$(img-50mfSjTV-1661506026968)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808212350601.png)]$

where the shading term:

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-XI1Q9huJ-1661506026968)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808212407535.png)]$

ensures that the color images produced by the shader $\widetilde{I_i}$ correspond to the input images and the silhouette term ensures that the rendered masks $\widetilde{M_i}$ match the input masks for all views.
$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-bF7zq4Yx-1661506026969)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808213735000.png)]$

Formally, the masks $\widetilde{M_i}$ are functions of the geometry $G$ and the parameters of camera i.

Separating the shading from the silhouette objective mainly has performance reasons: For a camera view i, the rasterization considers all pixels in the image, therefore computing the mask $\widetilde{M_i}$ is cheap. However, shading is more involved and requires invoking the neural shader for all pixels after rasterization, which is an expensive operation. In practice, we only shade a subset of pixels inside the intersection of input and rendered masks while comparing the silhouette for all pixels.

2.2.2. Geometry Regularization

Naively moving the vertices unconstrained in each iteration quickly leads to undesirable meshes with degenerate triangles and self-intersections.We use a geometry regularization term that favors smooth solutions and is inspired by Luan et al. :

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-7TOM6XxW-1661506026969)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808214411565.png)]$

Let $V ∈ R^{n×3}$ be a matrix with vertex positions as rows, the Laplacian term is defined as:

$[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-8Ku67TdC-1661506026970)(C:\Users\47008\AppData\Roaming\Typora\typora-user-images\image-20220808214501848.png)]$

where

在这里插入图片描述

are the differential coordinates of vertex i, $L ∈ R^{n×n}$ is the graph Laplacian of the mesh G. Intuitively, by minimizing the magnitude of the differential coordinates of a vertex, we minimize its distance to the average position of its neighbors.

The normal consistency term is defined as:

在这里插入图片描述

where $\overline{F}$ is the set of triangle pairs that share an edge and $n_i ∈ R^3$ is the normal of triangle i (under an arbitrary ordering of the triangles). It computes the cosine similarity between neighboring face normals and enforces additional smoothness.

2.3.3. Optimization

Our optimization starts from an initial mesh that is computed from the masks and resembles a visual hull. Alternatively, it can start from a custom mesh.