Spec-Gaussian：3D高斯溅射的各向异性视图相关外观

最新推荐文章于 2025-09-12 01:57:23 发布

原创

最新推荐文章于 2025-09-12 01:57:23 发布 · 2.2k 阅读

29 ·

CC 4.0 BY-SA版权

文章标签：

#3d

3D高斯溅射（3D-GS）虽能实现实时渲染且质量高，但在建模镜面反射和各向异性组件时存在困难。为此引入Spec-Gaussian方法，利用各向异性球面高斯（ASG）外观场建模，还开发粗到精训练策略。实验表明该方法提升了渲染质量，扩展了3D-GS适用性。

Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting
Spec-Gaussian：3D高斯溅射的各向异性视图相关外观

Ziyi Yang1,3 Xinyu Gao1 Yangtian Sun2 Yihua Huang2 Xiaoyang Lyu2
杨子怡 1,3 高新宇 1 太阳扬天 2 黄宜华 2 吕晓阳 2
Wen Zhou3 Shaohui Jiao3 Xiaojuan Qi2 Xiaogang Jin1
温州 3 邵辉娇 3 小娟齐 2 小岗金 1
1Zhejiang University 2The University of Hong Kong 3ByteDance Inc
1 浙江大学 2 香港大学 3 字节跳动公司

Abstract 摘要 Spec-Gaussian: Anisotropic View-Dependent Appearance for 3D Gaussian Splatting

The recent advancements in 3D Gaussian splatting (3D-GS) have not only facilitated real-time rendering through modern GPU rasterization pipelines but have also attained state-of-the-art rendering quality. Nevertheless, despite its exceptional rendering quality and performance on standard datasets, 3D-GS frequently encounters difficulties in accurately modeling specular and anisotropic components. This issue stems from the limited ability of spherical harmonics (SH) to represent high-frequency information. To overcome this challenge, we introduce Spec-Gaussian, an approach that utilizes an anisotropic spherical Gaussian (ASG) appearance field instead of SH for modeling the view-dependent appearance of each 3D Gaussian. Additionally, we have developed a coarse-to-fine training strategy to improve learning efficiency and eliminate floaters caused by overfitting in real-world scenes. Our experimental results demonstrate that our method surpasses existing approaches in terms of rendering quality. Thanks to ASG, we have significantly improved the ability of 3D-GS to model scenes with specular and anisotropic components without increasing the number of 3D Gaussians. This improvement extends the applicability of 3D GS to handle intricate scenarios with specular and anisotropic surfaces. Our codes and datasets will be released.
3D高斯溅射（3D-GS）的最新进展不仅促进了通过现代GPU光栅化管道的实时渲染，而且还获得了最先进的渲染质量。然而，尽管其卓越的渲染质量和性能的标准数据集，3D-GS经常遇到困难，准确建模镜面反射和各向异性组件。这个问题源于球谐函数（SH）表示高频信息的能力有限。为了克服这一挑战，我们引入Spec-Gaussian，这是一种利用各向异性球面高斯（ASG）外观场而不是SH来建模每个3D高斯的视图相关外观的方法。此外，我们还开发了一种从粗到精的训练策略，以提高学习效率，并消除真实场景中过拟合造成的漂浮物。我们的实验结果表明，我们的方法优于现有的方法在渲染质量。由于ASG，我们已经显着提高了3D-GS的能力，在不增加3D高斯的数量的情况下，用镜面反射和各向异性组件建模场景。这种改进扩展了3D GS的适用性，以处理具有镜面反射和各向异性表面的复杂场景。我们的代码和数据集将被发布。

[Uncaptioned image]

Figure 1:Our method not only achieves real-time rendering but also significantly enhances the capability of 3D-GS to model scenes with specular and anisotropic components. Key to this enhanced performance is our use of ASG appearance field to model the appearance of each 3D Gaussian, which results in substantial improvements in rendering quality for both complex and general scenes. Moreover, we employ anchor Gaussians to constrain the geometry of point-based representations, thereby improving the ability of 3D-GS to accurately model reflective parts and accelerating both training and rendering processes.
图一：我们的方法不仅实现了实时渲染，但也显着提高了3D-GS模型的能力，镜面反射和各向异性组件的场景。这种增强性能的关键是我们使用ASG外观场来模拟每个3D高斯的外观，这导致复杂和一般场景的渲染质量都有实质性的提高。此外，我们采用锚高斯约束的几何点为基础的表示，从而提高了3D-GS的能力，准确地建模反射部分和加速训练和渲染过程。

1Introduction 一、导言

High-quality reconstruction and photorealistic rendering from a collection of images are crucial for a variety of applications, such as augmented reality/virtual reality (AR/VR), 3D content production, and art creation. Classic methods employ primitive representations, like meshes [34] and points [4, 60], and take advantage of the rasterization pipeline optimized for contemporary GPUs to achieve real-time rendering. In contrast, neural radiance fields (NeRF) [32, 6, 33] utilize neural implicit representation to offer a continuous scene representation and employ volumetric rendering to produce rendering results. This approach allows for enhanced preservation of scene details and more effective reconstruction of scene geometries.
从图像集合中进行高质量的重建和真实感渲染对于各种应用至关重要，例如增强现实/虚拟现实（AR/VR），3D内容制作和艺术创作。经典方法采用图元表示，如网格[34]和点[4，60]，并利用针对当代GPU优化的光栅化流水线来实现实时渲染。相比之下，神经辐射场（NeRF）[32，6，33]利用神经隐式表示来提供连续的场景表示，并采用体积渲染来产生渲染结果。这种方法允许增强场景细节的保存和更有效的场景几何重建。

Recently, 3D Gaussian Splatting (3D-GS) [21] has emerged as a leading technique, delivering state-of-the-art quality and real-time speed. This method optimizes a set of 3D Gaussians that capture the appearance and geometry of a 3D scene simultaneously, offering a continuous representation that preserves details and produces high-quality results. Besides, the CUDA-customized differentiable rasterization pipeline for 3D Gaussians enables real-time rendering even at high resolution.
最近，3D高斯溅射（3D-GS）[21]已经成为一种领先的技术，提供最先进的质量和实时速度。该方法优化了一组同时捕获3D场景的外观和几何形状的3D高斯，提供了一种连续的表示，保留了细节并产生了高质量的结果。此外，CUDA定制的3D高斯可微分光栅化管道即使在高分辨率下也能实现实时渲染。

Despite its exceptional performance, 3D-GS struggles to model specular components within scenes (see Fig. 1). This issue primarily stems from the limited ability of low-order spherical harmonics (SH) to capture the high-frequency information required in these scenarios. Consequently, this poses a challenge for 3D-GS to model scenes with reflections and specular components, as illustrated in Fig. 1 and Fig. 7.
尽管3D-GS具有出色的性能，但它仍难以对场景中的镜面反射组件进行建模（见图1）。这个问题主要源于低阶球面谐波（SH）捕获这些场景中所需的高频信息的能力有限。因此，如图1和图7所示，这对3D-GS建模具有反射和镜面反射分量的场景提出了挑战。

To address the issue, we introduce a novel approach called Spec-Gaussian, which combines anisotropic spherical Gaussian (ASG) [54] for modeling anisotropic and specular components, anchor-based geometry-aware 3D Gaussians for acceleration and storage reduction, and an effective training mechanism to eliminate floaters and improve learning efficiencies. Specifically, the method incorporates three key designs: 1) A new 3D Gaussian representation that utilizes an ASG appearance field instead of SH to model the appearance of each 3D Gaussian. ASG with a few orders can effectively model high-frequency information that low-order SH cannot. This new design enables 3D-GS to more effectively model anisotropic and specular components in static scenes. 2) A hybrid approach employing sparse anchor points to control the location and representation of its child Gaussians. This strategy results in a hierarchical and geometry-aware point-based scene representation and enables us to store only the anchor Gaussians, significantly reducing storage requirements and enhancing the geometry. 3) A coarse-to-fine training scheme specifically tailored for 3D-GS is designed to eliminate floaters and boost learning efficiency. This strategy effectively shortens learning time by optimizing low-resolution rendering in the initial stage, preventing the need to increase the number of 3D Gaussians and regularizing the learning process to avoid the generation of unnecessary geometric structures that lead to floaters.
为了解决这个问题，我们引入了一种名为Spec-Gaussian的新方法，该方法结合了各向异性球面高斯（ASG）[54]用于建模各向异性和镜面反射组件，基于锚的几何感知3D高斯用于加速和减少存储，以及有效的训练机制来消除浮动和提高学习效率。具体而言，该方法结合了三个关键设计：1）新的3D高斯表示，其利用ASG外观场而不是SH来对每个3D高斯的外观进行建模。几阶的ASG可以有效地模拟低频SH不能模拟的高频信息。这种新设计使3D-GS能够更有效地在静态场景中对各向异性和镜面反射组件进行建模。2)采用稀疏锚点的混合方法来控制其子高斯的位置和表示。这种策略的结果在一个层次和几何感知的基于点的场景表示，使我们能够存储只有锚高斯，显着降低存储要求和增强的几何形状。3)专门为3D-GS定制的由粗到精的训练方案旨在消除浮动点并提高学习效率。该策略通过在初始阶段优化低分辨率渲染，防止需要增加3D高斯的数量，并使学习过程规则化，以避免产生导致浮动的不必要的几何结构，从而有效地缩短了学习时间。

By combining these advances, our approach can render high-quality results for specular highlights and anisotropy as shown in Fig. 4 while preserving the efficiency of Gaussians. Furthermore, comprehensive experiments reveal that our method not only endows 3D-GS with the ability to model specular highlights but also achieves state-of-the-art results in general benchmarks.
通过结合这些进步，我们的方法可以为镜面高光和各向异性呈现高质量的结果，如图4所示，同时保留高斯的效率。此外，全面的实验表明，我们的方法不仅赋予3D-GS与建模镜面高光的能力，但也达到了国家的最先进的结果，在一般的基准。

In summary, the major contributions of our work are as follows:
总括而言，我们的工作主要贡献如下：

•

A novel ASG appearance field to model the view-dependent appearance of each 3D Gaussian, which enables 3D-GS to effectively represent scenes with specular and anisotropic components without sacrificing rendering speed.

·一种新颖的ASG外观场，用于对每个3D高斯的视图相关外观进行建模，这使得3D-GS能够有效地表示具有镜面反射和各向异性分量的场景，而不会牺牲渲染速度。
•

An anchor-based hybrid model to reduce the computational and storage overhead brought by learning the ASG appearance field.

·基于锚点的混合模型，以减少学习ASG外观字段所带来的计算和存储开销。
•

A coarse-to-fine training scheme that effectively regularizes training to eliminate floaters and improve the learning efficiency of 3D-GS in real-world scenes.

·从粗到精的训练方案，有效地规则化训练以消除漂浮物并提高3D-GS在真实场景中的学习效率。
•

An anisotropic dataset has been made to assess the capability of our model in representing anisotropy. Extensive experiments show the effectiveness of our method in modeling scenes with specular highlights and anisotropy.

·已经制作了一个各向异性数据集来评估我们的模型在表示各向异性方面的能力。大量的实验表明，我们的方法在建模场景的镜面高光和各向异性的有效性。

2Related Work 2相关工作

2.1Implicit Neural Radiance Fields
2.1隐式神经辐射场

Neural rendering has attracted significant interest in the academic community for its unparalleled ability to generate photorealistic images. Methods like NeRF [32] utilize Multi-Layer Perceptrons (MLPs) to model the geometry and radiance fields of a scene. Leveraging the volumetric rendering equation and the inherent continuity and smoothness of MLPs, NeRF achieves high-quality scene reconstruction from a set of posed images, establishing itself as the state-of-the-art (SOTA) method for novel view synthesis. Subsequent research has extended the utility of NeRF to various applications, including mesh reconstruction [46, 25, 52], inverse rendering [42, 63, 29, 56], optimization of camera parameters [27, 48, 47, 36], few-shot learning [12, 55, 51], and anti-aliasing [2, 1, 3].
神经绘制以其无与伦比的生成真实感图像的能力引起了学术界的极大兴趣。像NeRF [32]这样的方法利用多层感知器（MLP）对场景的几何和辐射场进行建模。利用体积渲染方程和MLP固有的连续性和平滑性，NeRF从一组姿态图像中实现了高质量的场景重建，将自己确立为最先进的（SOTA）新视图合成方法。随后的研究将NeRF的实用性扩展到各种应用，包括网格重建[46，25，52]，逆渲染[42，63，29，56]，相机参数优化[27，48，47，36]，少拍学习[12，55，51]和抗锯齿[2，1，3]。

However, this stream of methods relies on ray casting rather than rasterization to determine the color of each pixel. Consequently, every sampling point along the ray necessitates querying the MLPs, leading to significantly slow rendering speed and prolonged training convergence. This limitation substantially impedes their application in large-scene modeling and real-time rendering.
然而，这种方法流依赖于光线投射而不是光栅化来确定每个像素的颜色。因此，沿着射线的每个采样点沿着都需要查询MLP，导致显着降低的渲染速度和延长的训练收敛。这种局限性极大地阻碍了它们在大场景建模和实时绘制中的应用。

To reduce the training time of MLP-based NeRF methods and improve rendering speed, subsequent work has enhanced NeRF’s efficiency in various ways. Structure-based techniques [61, 13, 38, 16, 8] have sought to improve inference or training efficiency by caching or distilling the implicit neural representation into more efficient data structures. Hybrid methods [28, 43] increase efficiency by incorporating explicit voxel-based data structures. Factorization methods [5, 17, 9, 15] apply a low-rank tensor assumption to decompose the scene into low-dimensional planes or vectors, achieving better geometric consistency. Compared to continuous implicit representations, the convergence of individual voxels in the grid is independent, significantly reducing training time. Additionally, Instant-NGP [33] utilizes a hash grid with a corresponding CUDA implementation for faster feature querying, enabling rapid training and interactive rendering of neural radiance fields.
为了减少基于MLP的NeRF方法的训练时间并提高渲染速度，后续工作以各种方式提高了NeRF的效率。基于结构的技术[61，13，38，16，8]试图通过缓存或提取隐式神经表示为更有效的数据结构来提高推理或训练效率。混合方法[28，43]通过合并显式的基于体素的数据结构来提高效率。因式分解方法[5，17，9，15]应用低秩张量假设将场景分解为低维平面或向量，从而实现更好的几何一致性。与连续隐式表示相比，网格中单个体素的收敛是独立的，显著减少了训练时间。此外，Instant-NGP [33]利用哈希网格和相应的CUDA实现来实现更快的特征查询，从而实现神经辐射场的快速训练和交互式渲染。

Despite achieving higher quality and faster rendering, these methods have not fundamentally overcome the substantial query overhead associated with ray casting. As a result, a notable gap remains before achieving real-time rendering. In this work, we build upon the recent 3D-GS [21], a point-based rendering method that leverages rasterization. Compared to ray casting-based methods, it significantly enhances both training and rendering speed.
尽管实现了更高的质量和更快的渲染，这些方法没有从根本上克服与光线投射相关联的大量查询开销。因此，在实现实时渲染之前仍然存在明显的差距。在这项工作中，我们建立在最近的3D-GS [21]，一个基于点的渲染方法，利用光栅化。与基于光线投射的方法相比，它显著提高了训练和渲染速度。

Refer to caption

Figure 2:Pipeline of our proposed Spec-Gaussian. The optimization process begins with SfM points derived from COLMAP or generated randomly, serving as the initial state for the anchor Gaussians. Within a view frustum, � neural Gaussians are spawned from each visible anchor Gaussian, using the corresponding offsets. Their other attributes, such as opacity �, rotation �, and scaling �, are decoded through the respective tiny MLPs. To address the limitations of low-order SH and pure MLP in modeling high-frequency information, we additionally employ ASG in conjunction with a feature decoupling MLP to model the view-dependent appearance of each neural Gaussian. Subsequently, neural Gaussians with opacity �>0 are rendered through a differentiable Gaussian rasterization pipeline, effectively capturing specular highlights and anisotropy in the scene.
图2：我们提出的Spec-Gaussian的流水线。优化过程开始于从COLMAP导出或随机生成的SfM点，用作锚高斯的初始状态。在视锥体内，使用对应的偏移从每个可见的锚高斯产生 � 神经高斯。它们的其他属性，例如不透明度 � 、旋转 � 和缩放 � ，通过相应的微小MLP进行解码。为了解决低阶SH和纯MLP在建模高频信息时的局限性，我们还采用ASG结合特征解耦MLP来建模每个神经高斯的视图相关外观。随后，通过可微分高斯光栅化流水线渲染具有不透明度 �>0 的神经高斯，有效地捕获场景中的镜面高光和各向异性。

2.2Point-based Neural Radiance Fields
2.2基于点的神经辐射场

Point-based representations, similar to triangle mesh-based methods, can exploit the highly efficient rasterization pipeline of modern GPUs to achieve real-time rendering. Although these methods offer breakneck rendering speeds and are well-suited for editing tasks, they often suffer from holes and outliers, leading to artifacts in the rendered images. This issue arises from the discrete nature of point clouds, which can create gaps in the primitives and, consequently, in the rendered image.
基于点的表示类似于基于三角形网格的方法，可以利用现代GPU的高效光栅化流水线来实现实时渲染。虽然这些方法提供了极快的渲染速度，非常适合编辑任务，但它们通常会遇到漏洞和离群值，导致渲染图像中出现伪影。这个问题源于点云的离散特性，它会在图元中产生间隙，从而在渲染图像中产生间隙。

To address these discontinuity issues, differentiable point-based rendering [