3D感知生成对抗网络的高斯溅射解码器

最新推荐文章于 2025-05-08 20:46:55 发布

原创

最新推荐文章于 2025-05-08 20:46:55 发布 · 1.6k 阅读

32 ·

CC 4.0 BY-SA版权

文章标签：

#3d #生成对抗网络 #人工智能

基于NeRF的3D感知生成对抗网络渲染质量高，但在低功耗设备使用及融入显式3D场景有挑战。3D高斯溅射可克服这些限制。本文提出新方法，将3D感知GAN高渲染质量与3DGS优势结合，还能实现高分辨率GAN反演和实时编辑，为3D资产创建提供新思路。

Gaussian Splatting Decoder for 3D-aware Generative Adversarial Networks
3D感知生成对抗网络的高斯溅射解码器

Florian Barthel1, 2 Arian Beckmann1 Wieland Morgenstern1 Anna Hilsmann1 Peter Eisert1,2
Florian Barthel 1, 2 阿里安·贝克曼Wieland晨星Anna Hilsmann彼得·艾泽特 1,2
1 Fraunhofer Heinrich Hertz Institute, HHI
德国弗劳恩霍夫海因里希赫兹研究所
2 Humboldt University of Berlin
柏林洪堡大学

Abstract 摘要

NeRF-based 3D-aware Generative Adversarial Networks (GANs) like EG3D or GIRAFFE have shown very high rendering quality under large representational variety. However, rendering with Neural Radiance Fields poses challenges for 3D applications: First, the significant computational demands of NeRF rendering preclude its use on low-power devices, such as mobiles and VR/AR headsets. Second, implicit representations based on neural networks are difficult to incorporate into explicit 3D scenes, such as VR environments or video games. 3D Gaussian Splatting (3DGS) overcomes these limitations by providing an explicit 3D representation that can be rendered efficiently at high frame rates. In this work, we present a novel approach that combines the high rendering quality of NeRF-based 3D-aware GANs with the flexibility and computational advantages of 3DGS. By training a decoder that maps implicit NeRF representations to explicit 3D Gaussian Splatting attributes, we can integrate the representational diversity and quality of 3D GANs into the ecosystem of 3D Gaussian Splatting for the first time. Additionally, our approach allows for a high resolution GAN inversion and real-time GAN editing with 3D Gaussian Splatting scenes.
基于NeRF的3D感知生成对抗网络（GAN），如EG 3D或GIRAFFE，在大的代表性变化下表现出非常高的渲染质量。然而，使用Neural Radiance Fields进行渲染给3D应用带来了挑战：首先，NeRF渲染的大量计算需求阻碍了其在低功耗设备上的使用，例如手机和VR/AR耳机。其次，基于神经网络的隐式表示很难融入到显式3D场景中，例如VR环境或视频游戏。3D高斯溅射（3DGS）通过提供可以以高帧速率高效地渲染的显式3D表示来克服这些限制。在这项工作中，我们提出了一种新的方法，将基于NeRF的3D感知GAN的高渲染质量与3DGS的灵活性和计算优势相结合。通过训练一个将隐式NeRF表示映射到显式3D高斯溅射属性的解码器，我们可以首次将3D GAN的表示多样性和质量集成到3D高斯溅射的生态系统中。此外，我们的方法允许高分辨率GAN反演和实时GAN编辑3D高斯溅射场景。
Project page: florian-barthel.github.io/gaussian_decoder
项目页面：florian-barthel.github.io/gaussian_decoder

{strip}

[Uncaptioned image]

Figure 1: We propose a novel 3D Gaussian Splatting decoder that converts high quality results from pre-trained 3D-aware GANs into Gaussian Splatting scenes in real-time for efficient and high resolution rendering.
图1：我们提出了一种新的3D高斯溅射解码器，它将预先训练的3D感知GAN的高质量结果实时转换为高斯溅射场景，以实现高效和高分辨率的渲染。

1Introduction 一、导言

Creating and editing realistic 3D assets is of vital importance for applications such as Virtual Reality (VR) or video games. Often, this process is very costly and requires a significant amount of manual editing. Over the last few years, there have been drastic improvements to 2D [14, 15, 16, 37] and 3D [8, 33, 7, 6, 39, 2, 42] image synthesis. These advancements increasingly narrow the gap between professionally created 3D assets and those that are automatically synthesized. One of the most notable recent methods is the Efficient Geometry-aware 3D GAN (EG3D) [8]. It successfully combines the strength of StyleGAN [16], originally built for 2D image generation, with a 3D NeRF renderer [26, 3], achieving state-of-the-art 3D renderings synthesized from a latent space. Despite EG3D’s significant contributions to 3D rendering quality, its integration into 3D modeling environments like Unity or Blender remains difficult. This challenge stems from its NeRF dependency, which only generates 2D images from 3D scenes, without ever explicitly representing the 3D scene. As a result, EG3D cannot be imported or manipulated in these computer graphics tools.
创建和编辑逼真的3D资源对于虚拟现实（VR）或视频游戏等应用至关重要。通常，这个过程是非常昂贵的，需要大量的手动编辑。在过去的几年里，2D [14，15，16，37]和3D [8，33，7，6，39，2，42]图像合成有了很大的改进。这些进步不断缩小专业创建的3D资源与自动合成的资源之间的差距。最近最值得注意的方法之一是高效几何感知3D GAN（EG3D）[8]。它成功地将StyleGAN [16]的优势（最初用于生成2D图像）与3D NeRF渲染器[26，3]相结合，实现了从潜在空间合成的最先进的3D渲染。尽管EG3D对3D渲染质量做出了重大贡献，但将其集成到Unity或Blender等3D建模环境中仍然很困难。这一挑战源于它的NeRF依赖性，它只能从3D场景生成2D图像，而不能显式地表示3D场景。因此，EG3D无法在这些计算机图形工具中导入或操作。

Recently introduced, 3D Gaussian Splatting (3DGS) [19] provides a novel explicit 3D scene representation, enabling high-quality renderings at high frame rates. Following its debut, numerous derivative techniques have already emerged, including the synthesis of controllable human heads [34, 43, 48], the rendering of full body humans [20] or the compression of the storage size of Gaussian objects [30]. On the one hand, 3DGS provides a substantial improvement in terms of rendering speed and flexibility compared to NeRF: The explicit modelling enables simple exporting of the scenes into 3D software environments. Furthermore, the novel and efficient renderer in 3DGS allows for high-resolutions renderings, and an increase in rendering speed with a factor of up to 1000× over state-of-the-art NeRF frameworks [18, 4, 31, 36]. On the other hand, NeRF’s implicit scene representation allows for straightforward decoding of scene information from latent spaces. Notably through the usage of tri-planes [8], which store visual and geometric information of the scene to be rendered. This enables the integration of NeRF rendering into GAN frameworks, lifting the representational variety and visual fidelity of GANs up into three-dimensional space. Combining NeRFs and GANs is highly advantageous, as rendering from a latent space offers multiple benefits: Firstly, it allows for rendering an unlimited amount of unique appearances. Secondly, a large variety of editing methods [12, 1, 29] can be applied. And thirdly, single 2D images can be inverted, using 3D GAN inversion [15, 35, 5], allowing for full 3D reconstructions from a single image.
最近引入的3D高斯溅射（3DGS）[19]提供了一种新颖的显式3D场景表示，可以在高帧速率下实现高质量的渲染。在其首次亮相之后，已经出现了许多衍生技术，包括可控人头的合成[34，43，48]，全身人体的渲染[20]或高斯对象的存储大小的压缩[30]。一方面，与NeRF相比，3DGS在渲染速度和灵活性方面提供了实质性的改进：显式建模可以将场景简单地导出到3D软件环境中。此外，3DGS中新颖高效的渲染器允许高分辨率渲染，并且渲染速度比最先进的NeRF框架提高了 1000× [18，4，31，36]。另一方面，NeRF的隐式场景表示允许从潜在空间直接解码场景信息。特别是通过使用三平面[8]，它存储要渲染的场景的视觉和几何信息。这使得NeRF渲染能够集成到GAN框架中，将GAN的代表性多样性和视觉保真度提升到三维空间中。将NeRF和GAN相结合是非常有利的，因为从潜在空间进行渲染提供了多种好处：首先，它允许渲染无限数量的独特外观。其次，可以应用各种各样的编辑方法[12，1，29]。第三，可以使用3D GAN反演[15，35，5]来反演单个2D图像，从而允许从单个图像进行完整的3D重建。

Sampling visual information from latent spaces with large representational variety poses a challenge for rendering with 3DGS, as the framework requires the information for the appearance of the scene to be encoded as attributes of individual splats, rather than in the latent space itself. This severely complicates the task of fitting 3D Gaussian splats to variable latent spaces, given that the splats would need to be repositioned for every new latent code - a challenge that is not addressed in the original 3DGS framework. Several approaches tackling the problem of rendering with 3DGS from latent tri-planes have been proposed [20, 49, 45], but to the best of our knowledge, no method exists to create 3D heads rendered with Gaussian Splatting from a latent space.
从具有大量代表性变化的潜在空间中采样视觉信息对3DGS渲染提出了挑战，因为框架需要将场景外观的信息编码为单个splats的属性，而不是在潜在空间本身中。这使得将3D高斯splats拟合到可变潜在空间的任务变得非常复杂，因为splats需要为每个新的潜在代码重新定位-这是原始3DGS框架中没有解决的挑战。已经提出了几种方法来解决从潜在三平面使用3DGS渲染的问题[20，49，45]，但据我们所知，没有方法可以从潜在空间创建使用高斯溅射渲染的3D头部。

In this work, we propose a framework for the synthesis of explicit 3D scenes representing human heads from a latent space. This is done by combining the representational variety and fidelity of 3D-aware GANs with the explicit scenes and fast rendering speed of 3D Gaussian Splatting. Our main contributions can be summarized as follows:
在这项工作中，我们提出了一个框架，用于合成显式的3D场景，从一个潜在的空间，代表人类的头部。这是通过将3D感知GAN的代表性多样性和保真度与3D高斯溅射的显式场景和快速渲染速度相结合来实现的。我们的主要贡献可归纳如下：

1.

A novel method that allows for GAN-based synthesis of explicit 3D Gaussian Splatting scenes, additionally avoiding superresolution modules as used in the generation of implicit scene representations.

1.一种新的方法，允许基于GAN的显式3D高斯溅射场景的合成，另外避免了在隐式场景表示的生成中使用的超分辨率模块。
2.

A novel sequential decoder architecture, a strategy for sampling Gaussian splat positions around human heads and a generator backbone fine-tuning technique to improve the decoders capacity.

2.一种新的顺序解码器架构，一个策略，用于采样高斯飞溅位置周围的人的头部和生成器骨干微调技术，以提高解码器的容量。
3.

An open source end-to-end pipeline for synthesizing state-of-the-art 3D assets to be used in 3D software.

3.一个开源的端到端管道，用于合成要在3D软件中使用的最先进的3D资产。