FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
FreGS:具有渐进频率正则化的3D高斯溅射
张家慧 1 詹方能 2 许慕玉 1 卢世坚 1 邢志伟 3, 4
1Nanyang Technological University 2Max Planck Institute for Informatics
1 南洋理工大学 2 马普信息研究所
3Carnegie Mellon University 4MBZUAI
3 卡内基梅隆大学 4 MBZUAI
Abstract 摘要 [2403.06908] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.
三维高斯溅射在实时新颖视图合成中取得了令人瞩目的性能。然而,它经常遭受过度重建期间的高斯致密化,其中高方差图像区域仅由几个大的高斯覆盖,导致在渲染图像中的模糊和伪影。我们设计了一种渐进频率正则化(FreGS)技术来解决频率空间内的过度重建问题。具体来说,FreGS通过利用低到高的频率分量来执行从粗到细的高斯致密化,这些频率分量可以很容易地在傅立叶空间中用低通和高通滤波器提取。通过最小化渲染图像的频谱和相应的地面真实值之间的差异,它实现了高质量的高斯致密化,并有效地消除了高斯飞溅的过度重建。在多个广泛采用的基准上进行实验(例如,,Mip-NeRF 360,Tanks-and-Temples and Deep Blending)表明,FreGS实现了上级的新颖视图合成,并且始终优于最先进的视图合成。
Figure 1: The proposed FreGS mitigates the over-reconstruction of Gaussian densification and renders images with much less blur and artifact as compared with the 3D Gaussian splatting (3D-GS). For the two sample images from Mip-NeRF360 [2], (a) and (b) show the Rendered Image and the Gaussian Visualization of the highlighted regions, as well as the Spectra of over-reconstructed areas in the rendered image by 3D-GS and corresponding areas in FreGS. The Gaussian Visualization shows how the learnt rasterized 3D Gaussians compose images (all Gaussians are rasterized with full opacity). The Spectra are generated via image Fourier transformation, where the colour changes from blue to green as the spectrum amplitude changes from small to large.
图一:所提出的FreGS减轻了过度重建的高斯致密化和渲染图像与更少的模糊和伪影相比,3D高斯飞溅(3D-GS)。对于来自Mip-NeRF 360 [2]的两个样本图像,(a)和(B)显示了渲染图像和高亮区域的高斯可视化,以及3D-GS渲染图像中过度重建区域的光谱和FreGS中的相应区域。高斯可视化显示了学习的光栅化3D高斯如何组成图像(所有高斯都是完全不透明的光栅化)。光谱是通过图像傅里叶变换生成的,其中随着光谱幅度从小到大的变化,颜色从蓝色变为绿色。
*Shijian Lu is the corresponding author.
卢世坚为通讯作者。
1Introduction 1介绍
Novel View Synthesis (NVS) has been a pivotal task in the realm of 3D computer vision which holds immense significance in various applications such as virtual reality, image editing, etc. It aims for generating images from arbitrary viewpoints of a scene, often necessitating precise modelling of the scene from multiple scene images. Leveraging implicit scene representation and differentiable volume rendering, NeRF [21] and its extension [1, 2] have recently achieved remarkable progress in novel view synthesis. However, NeRF is inherently plagued by long training and rendering time. Though several NeRF variants [22, 5, 9, 26, 7] speed up the training and rendering greatly, they often sacrifice the quality of rendered images notably, especially while handling high-resolution rendering.
新视图合成(NVS)一直是三维计算机视觉领域的一项关键任务,在虚拟现实、图像编辑等各种应用中具有重要意义。它旨在从场景的任意视点生成图像,通常需要从多个场景图像中精确建模场景。利用隐式场景表示和可微分体绘制,NeRF [21]及其扩展[1,2]最近在新视图合成方面取得了显着进展。然而,NeRF固有地受到长训练和渲染时间的困扰。虽然几个NeRF变体[22,5,9,26,7]大大加快了训练和渲染速度,但它们通常会显着牺牲渲染图像的质量,特别是在处理高分辨率渲染时。
As a compelling alternative to NeRF, 3D Gaussian splatting (3D-GS) [16] has attracted increasing attention by offering superb training and inference speed while maintaining competitive rendering quality. By introducing anisotropic 3D Gaussians together with adaptive density control of Gaussian properties, 3D-GS can learn superb and explicit scene representations for novel view synthesis. It replaces the cumbersome volume rendering in NeRF by efficient splatting, which directly projects 3D Gaussians onto a 2D plane and ensures real-time rendering. However, 3D-GS often suffers from over-reconstruction [16] during Gaussian densification, where high-variance image regions are covered by a few large Gaussians only which leads to clear deficiency in the learnt representations. The over-reconstruction can be clearly observed with blur and artifacts in the rendered 2D images as well as the discrepancy of frequency spectrum of the render images (by 3D-GS) and the corresponding ground truth as illustrated in Fig. 1.
作为NeRF的一个引人注目的替代方案,3D高斯溅射(3D-GS)[16]通过提供出色的训练和推理速度,同时保持有竞争力的渲染质量,吸引了越来越多的关注。通过引入各向异性的3D高斯函数以及高斯属性的自适应密度控制,3D-GS可以学习出色和明确的场景表示,用于新颖的视图合成。它取代了NeRF中繁琐的体绘制,通过高效的splatting,直接将3D高斯投影到2D平面上,并确保实时渲染。然而,3D-GS在高斯致密化过程中经常遭受过度重建[16],其中高方差图像区域仅被一些大的高斯覆盖,这导致学习表示中的明显不足。 可以清楚地观察到过度重建,其中渲染的2D图像中有模糊和伪影,以及渲染图像(通过3D-GS)的频谱与对应的地面实况的差异,如图1所示。
Based on the observation that the over-reconstruction manifests clearly by the discrepancy in frequency spectra, we design FreGS, an innovative 3D Gaussian splatting technique that addresses the over-reconstruction by regularizing the frequency signals in the Fourier space. FreGS introduces a novel frequency annealing technique to achieve progressive frequency regularization. Specifically, FreGS takes a coarse-to-fine Gaussian densification process by annealing the regularization progressively from low-frequency signals to high-frequency signals, based on the rationale that low-frequency and high-frequency signals usually encode large-scale (e.g., global patterns and structures which are easier to model) and small-scale features (e.g., local details which are harder to model), respectively. The progressive regularization strives to minimize the discrepancy of frequency spectra of the rendered image and the corresponding ground truth, which provides faithful guidance in the frequency space and complements the pixel-level L1 loss in the spatial space effectively. Extensive experiments show that FreGS mitigates the over-reconstruction and greatly improves Gaussian densification and novel view synthesis as illustrated in Fig. 1.
基于观察到的过度重建清楚地表现为频谱的差异,我们设计了FreGS,一种创新的3D高斯溅射技术,通过在傅立叶空间中正则化频率信号来解决过度重建问题。FreGS引入了一种新的频率退火技术来实现渐进频率正则化。具体地,FreGS基于低频和高频信号通常编码大尺度(例如,更容易建模的全局图案和结构)和小尺度特征(例如,更难建模的局部细节)。 渐进正则化努力最小化渲染图像的频谱和相应的地面真值的差异,这在频率空间中提供了忠实的指导,并有效地补充了空间空间中的像素级L1损失。大量的实验表明,FreGS减轻了过度重建,大大提高了高斯致密化和新颖的视图合成,如图1所示。
The contributions of this work can be summarized in three aspects. First, we propose FreGS, an innovative 3D Gaussian splatting framework that addresses the over-reconstruction issue via frequency regularization in the frequency space. To the best of our knowledge, this is the first effort that tackles the over-reconstruction issue of 3D Gaussian splatting from a spectral perspective. Second, we design a frequency annealing technique for progressive frequency regularization. The annealing performs regularization from low-to-high frequency signals progressively, achieving faithful coarse-to-fine Gaussian densification. Third, experiments over multiple benchmarks show that FreGS achieves superior novel view synthesis and outperforms the 3D-GS consistently.
这项工作的贡献可以概括为三个方面。首先,我们提出了FreGS,一个创新的3D高斯溅射框架,通过频率空间中的频率正则化来解决过度重建问题。据我们所知,这是第一次从光谱角度解决3D高斯溅射的过度重建问题。其次,我们设计了一个渐进频率正则化的频率退火技术。退火从低到高的频率信号逐步执行正则化,实现忠实的粗到细高斯致密化。第三,多个基准测试的实验表明,FreGS实现了上级新颖的视图合成,并始终优于3D-GS。
2Related Work 2相关工作
2.1Neural Rendering for Novel View Synthesis
2.1用于新视图合成的神经绘制
Novel view synthesis aims to generate new, unseen views of a scene or object from a set of existing images or viewpoints. When deep learning became popular in early days, CNNs were explored for novel view synthesis [12, 27, 32], e.g., they were adopted to predict blending weights for image rendering in [12]. Later, researchers exploit CNNs for volumetric ray-marching [14, 29]. For example, Sitzmann et al. propose Deepvoxels [29] which builds a persistent 3D volumetric scene representation and then achieves rendering via volumetric ray-marching.
新颖的视图合成旨在从一组现有的图像或视点生成场景或对象的新的、不可见的视图。当深度学习在早期变得流行时,CNN被探索用于新颖的视图合成[12,27,32],例如,在[12]中,它们被用于预测图像渲染的混合权重。后来,研究人员利用CNN进行体积射线行进[14,29]。例