GaussianCube：使用最优传输构造高斯溅射用于3D生成建模

最新推荐文章于 2025-02-23 14:55:59 发布

原创

最新推荐文章于 2025-02-23 14:55:59 发布 · 2.1k 阅读

28 ·

CC 4.0 BY-SA版权

文章标签：

#3d

GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling
GaussianCube：使用最优传输构造高斯溅射用于3D生成建模

Bowen Zhang1⁣* Yiji Cheng2⁣* Jiaolong Yang3 Chunyu Wang3
张博文 1⁣* 程一季 2⁣* 杨娇龙 3 王春雨 3
Feng Zhao1 Yansong Tang2 Dong Chen3 Baining Guo3
赵峰 1 唐岩松 2 陈冬 3 郭柏宁 3
1University of Science and Technology of China 2Tsinghua University 3Microsoft Research Asia
中国科学技术大学 2 清华大学 3 微软亚洲研究院

Abstract 摘要 GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling

3D Gaussian Splatting (GS) have achieved considerable improvement over Neural Radiance Fields in terms of 3D fitting fidelity and rendering speed. However, this unstructured representation with scattered Gaussians poses a significant challenge for generative modeling. To address the problem, we introduce GaussianCube, a structured GS representation that is both powerful and efficient for generative modeling. We achieve this by first proposing a modified densification-constrained GS fitting algorithm which can yield high-quality fitting results using a fixed number of free Gaussians, and then re-arranging the Gaussians into a predefined voxel grid via Optimal Transport. The structured grid representation allows us to use standard 3D U-Net as our backbone in diffusion generative modeling without elaborate designs. Extensive experiments conducted on ShapeNet and OmniObject3D show that our model achieves state-of-the-art generation results both qualitatively and quantitatively, underscoring the potential of GaussianCube as a powerful and versatile 3D representation. Project page: GaussianCube: Structuring Gaussian Splatting using Optimal Transport for 3D Generative Modeling.
3D高斯溅射（GS）在3D拟合保真度和渲染速度方面比神经辐射场有了相当大的改进。然而，这种具有分散高斯的非结构化表示对生成建模提出了重大挑战。为了解决这个问题，我们引入了GaussianCube，这是一种结构化的GS表示，对于生成式建模来说，它既强大又高效。我们首先提出了一种改进的密度约束GS拟合算法，该算法可以使用固定数量的自由高斯来产生高质量的拟合结果，然后通过最优传输将高斯重新排列到预定义的体素网格中。结构化的网格表示使我们能够使用标准的3D U-Net作为扩散生成建模的骨干，而无需精心设计。在ShapeNet和OmniObject3D上进行的大量实验表明，我们的模型在定性和定量方面都达到了最先进的生成结果，强调了GaussianCube作为功能强大且通用的3D表示的潜力。项目页面：www.example.com。

1Interns at Microsoft Research Asia.
微软亚洲研究院（Microsoft Research Asia）

1Introduction 一、导言

Recent advancements in generative modeling Ho et al. (2020); Goodfellow et al. (2020); Nichol and Dhariwal (2021); Dhariwal and Nichol (2021); Zhang et al. (2022); Karras et al. (2019) have led to significant progress in 3D content creation Wang et al. (2023); Müller et al. (2023); Cao et al. (2023); Tang et al. (2023c); Shue et al. (2023); Chan et al. (2022); Gao et al. (2022). Most of the prior works in this domain leverage variants of Neural Radiance Field (NeRF) Mildenhall et al. (2021) as their underlying 3D representations Chan et al. (2022); Tang et al. (2023c), which typically consist of an explicit and structured proxy representation and an implicit feature decoder. However, such hybrid NeRF variants have degraded representation power, particularly when used for generative modeling where a single implicit feature decoder is shared across all objects. Furthermore, the high computational complexity of volumetric rendering leads to both slow rendering speed and extensive memory costs. Recently, the emergence of 3D Gaussian Splatting (GS) Kerbl et al. (2023) has enabled high-quality reconstruction Xu et al. (2023); Luiten et al. (2023); Wu et al. (2023a) along with real-time rendering speed. The fully explicit characteristic of 3DGS also eliminates the need for a shared implicit decoder. Although 3DGS has been widely studied in scene reconstruction tasks, its spatially unstructured nature presents significant challenge when applying it to generative modeling.
生成建模的最新进展Ho et al.（2020）; Goodfellow et al.（2020）; Nichol和达里瓦尔（2021）;达里瓦尔和Nichol（2021）; Zhang et al.（2022）; Karras et al.（2019）导致了3D内容创建的重大进展Wang et al.（2023）; Müller et al.（2023）; Cao等人（2023）; Tang等人（2023 c）; Shue等人（2023）; Chan等人（2022）; Gao等人（2022）。该领域的大多数先前工作都利用神经辐射场（NeRF）的变体Mildenhall et al.（2021）作为其基础3D表示Chan et al.（2022）; Tang et al.（2023 c），其通常由显式和结构化代理表示和隐式特征解码器组成。然而，这种混合NeRF变体具有降级的表示能力，特别是当用于生成建模时，其中单个隐式特征解码器在所有对象之间共享。此外，体绘制的高计算复杂度导致绘制速度慢和大量的存储器成本。最近，3D高斯溅射（GS）Kerbl et al.（2023）的出现使得高质量重建成为可能Xu et al.（2023）; Luiten et al.（2023）; Wu et al.（2023 a）沿着的实时渲染速度。3DGS的完全显式特性还消除了对共享隐式解码器的需要。虽然3DGS在场景重建任务中得到了广泛的研究，但其空间非结构化的性质在将其应用于生成式建模时提出了重大挑战。

In this work, we introduce GaussianCube, a novel representation crafted to address the unstructured nature of 3DGS and unleash its potential for 3D generative modeling (see Table 1 for comparisons with prior works). Converting 3D Gaussians into a structured format without sacrificing their expressiveness is not a trivial task. We propose to first perform high-quality fitting using a fixed number of Gaussians and then organize them in a spatially structured manner. To keep the number of Gaussians fixed during fitting, a naive solution might omit the densification and pruning steps in GS, which, however, would significantly degrade the fitting quality. In contrast, we propose a densification-constrained fitting strategy, which retains the original pruning process yet constrains the number of Gaussians that perform densification, ensuring the total does not exceed a predefined maximum �3 (32,768 in this paper). For the subsequent structuralization, we allocate the Gaussians across an �×�×� voxel grid using Optimal Transport (OT). Consequently, our fitted Gaussians are systematically arranged within the voxel grid, with each grid containing a Gaussian feature. The proposed OT-based structuralization process achieves maximal spatial coherence, characterized by minimal total transport distances, while preserving the high expressiveness of the 3DGS.
在这项工作中，我们介绍了GaussianCube，这是一种新颖的表示方法，旨在解决3DGS的非结构化性质，并释放其在3D生成建模方面的潜力（与先前工作的比较见表1）。将3D高斯转换为结构化格式而不牺牲其表现力并不是一项微不足道的任务。我们建议首先使用固定数量的高斯函数进行高质量的拟合，然后以空间结构化的方式组织它们。为了在拟合期间保持高斯数固定，朴素的解决方案可能会省略GS中的致密化和修剪步骤，然而，这会显著降低拟合质量。相比之下，我们提出了一种密度约束拟合策略，它保留了原始的修剪过程，但限制了执行密度化的高斯函数的数量，确保总数不超过预定义的最大值 �3 （本文中为32，768）。对于随后的结构化，我们使用最优传输（OT）将高斯分布在 �×�×� 体素网格上。因此，我们拟合的高斯分布被系统地排列在体素网格内，每个网格都包含高斯特征。提出的基于OT的结构化过程实现了最大的空间一致性，其特征在于最小的总传输距离，同时保持了3DGS的高表现力。

Representation	Spatially-structured 空间结构	Fully-explicit 全显式	High-quality Reconstruction 高质量重建	Efficient Rendering 高效渲染
Vanilla NeRF Mildenhall et al. (2021) Vanilla NeRF Mildenhall等人2021	✗	✗	✗	✗
Neural Voxels Tang et al. (2023c) 神经体素Tang等人2023c	✓	✗	✗	✗
Triplane Chan et al. (2022) Triplane Chan等人2022	✓	✗	✗	✗
Gaussian Splatting Kerbl et al. (2023) 高斯飞溅Kerbl等人2023	✗	✓	✓	✓
Our GaussianCube 我们的高斯魔方	✓	✓	✓	✓

Table 1:Comparison with prior 3D representations.
表1：与先前3D表示的比较。

We perform 3D generative modeling with the proposed GaussianCube using diffusion models Ho et al. (2020). The spatially coherent structure of the Gaussians in our representation facilitates efficient feature extraction and permits the use of standard 3D convolutions to capture the correlations among neighboring Gaussians effectively. Therefore, we construct our diffusion model with standard 3D U-Net architecture without elaborate designs. It is worth noting that our diffusion model and the GaussianCube representation are generic, which facilitates both unconditional and conditional generation tasks.
我们使用扩散模型Ho et al.（2020）使用提出的GaussianCube执行3D生成建模。在我们的表示中，高斯的空间相干结构有助于有效的特征提取，并允许使用标准的3D卷积来有效地捕获相邻高斯之间的相关性。因此，我们使用标准的3D U—Net架构构建我们的扩散模型，而无需进行精心设计。值得注意的是，我们的扩散模型和GaussianCube表示是通用的，这有利于无条件和有条件的生成任务。

We conduct comprehensive experiments to verify the efficacy of our proposed approach. The model’s capability for unconditional generation is evaluated on the ShapeNet dataset Chang et al. (2015). Both the quantitative and qualitative comparisons indicate that our model surpasses all previous methods. Additionally, we perform class-conditioned generation on the OmniObject3D dataset Wu et al. (2023b), which is a extensive collection of real-world scanned objects with a broad vocabulary. Our model excels in producing semantically accurate 3D objects with complex geometries and realistic textures, outperforming the state-of-the-art methods. These experiments collectively demonstrate the strong capabilities of our GaussianCube and suggest its potential as a powerful and versatile 3D representation for a variety of applications. Some generated samples of our method is presented in Figure 1.
我们进行了全面的实验，以验证我们所提出的方法的有效性。该模型的无条件生成能力在ShapeNet数据集Chang et al.（2015）上进行了评估。定量和定性的比较表明，我们的模型优于所有以前的方法。此外，我们对OmniObject3D数据集Wu等人（2023 b）执行类条件生成，该数据集是具有广泛词汇的真实世界扫描对象的广泛集合。我们的模型在生成具有复杂几何形状和逼真纹理的语义准确的3D对象方面表现出色，优于最先进的方法。这些实验共同证明了我们的GaussianCube的强大功能，并表明其作为各种应用程序的强大和通用的3D表示的潜力。我们的方法生成的一些示例如图1所示。

Refer to caption