StylizedGS: Controllable Stylization for 3D Gaussian Splatting
StylizedGS:3D高斯溅射的可控样式化
张定西,陈卓勋,袁玉洁,张芳略,何振良,Shiguang Shan,Lin Gao 11 Corresponding Author is Lin Gao (gaolin@ict.ac.cn). Dingxi Zhang and Zhuoxun Chen are with the University of Chinese Academy of Sciences, Beijing, China.
1通讯作者为高林(gaolin@ict.ac.cn)。张定西和陈卓勋来自中国北京中国科学院大学。
E-Mail: {zhangdingxi20a, zhuoxunchen20}@mails.ucas.ac.cn Yu-Jie Yuan and Lin Gao are with the Beijing Key Laboratory of Mobile Computing and Pervasive Device, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, and also with the University of Chinese Academy of Sciences, Beijing, China.
电子邮件地址:{zhangdingxi 20 a,zhuoxunchen 20}@mails.ucas.ac.cn Yu-Jie Yuan和Lin Gao来自中国科学院计算技术研究所北京移动的计算与普适设备重点实验室和中国科学院大学。
E-Mail: {yuanyujie, gaolin}@ict.ac.cn Fang-Lue Zhang is with Victoria University of Wellington, New Zealand. Zhenliang He and Shiguang Shan are with the Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, and also with the University of Chinese Academy of Sciences, Beijing, China.
E-Mail:{yuanyujie,gaolin}@ict.ac.cn Fang-Lue Zhang来自新西兰惠灵顿的维多利亚大学。Zhenliang He和Shiguang Shan分别就职于中国科学院计算技术研究所智能信息处理重点实验室和中国科学院大学。
E-mail: {hezhenliang, sgshan}@ict.ac.cn
邮箱:{hezhenliang,sgshan}@ict.ac.cn
Abstract 摘要 StylizedGS: Controllable Stylization for 3D Gaussian Splatting
With the rapid development of XR, 3D generation and editing are becoming more and more important, among which, stylization is an important tool of 3D appearance editing. It can achieve consistent 3D artistic stylization given a single reference style image and thus is a user-friendly editing way. However, recent NeRF-based 3D stylization methods face efficiency issues that affect the actual user experience and the implicit nature limits its ability to transfer the geometric pattern styles. Additionally, the ability for artists to exert flexible control over stylized scenes is considered highly desirable, fostering an environment conducive to creative exploration. In this paper, we introduce StylizedGS, a 3D neural style transfer framework with adaptable control over perceptual factors based on 3D Gaussian Splatting (3DGS) representation. The 3DGS brings the benefits of high efficiency. We propose a GS filter to eliminate floaters in the reconstruction which affects the stylization effects before stylization. Then the nearest neighbor-based style loss is introduced to achieve stylization by fine-tuning the geometry and color parameters of 3DGS, while a depth preservation loss with other regularizations is proposed to prevent the tampering of geometry content. Moreover, facilitated by specially designed losses, StylizedGS enables users to control color, stylized scale and regions during the stylization to possess customized capabilities. Our method can attain high-quality stylization results characterized by faithful brushstrokes and geometric consistency with flexible controls. Extensive experiments across various scenes and styles demonstrate the effectiveness and efficiency of our method concerning both stylization quality and inference FPS.
随着XR的快速发展,三维生成和编辑变得越来越重要,其中风格化是三维外观编辑的重要工具。它可以实现一致的3D艺术风格给定一个参考样式的图像,因此是一个用户友好的编辑方式。然而,最近的NeRF为基础的3D风格化方法面临的效率问题,影响实际的用户体验和隐式的性质限制了其传输的几何图案样式的能力。此外,艺术家对程式化场景施加灵活控制的能力被认为是非常可取的,从而培养有利于创造性探索的环境。在本文中,我们介绍了StylizedGS,一个3D神经风格转移框架,具有自适应控制感知因素的基础上3D高斯飞溅(3DGS)表示。3DGS带来了高效率的好处。 我们提出了一个GS滤波器,以消除重建中的浮动,影响风格化之前的风格化效果。然后引入基于最近邻的样式丢失,通过微调3DGS的几何和颜色参数实现样式化,同时提出深度保持丢失和其他正则化,以防止几何内容的篡改。此外,通过专门设计的损失,StylizedGS使用户能够在风格化过程中控制颜色,风格化规模和区域,以拥有定制的能力。我们的方法可以获得高质量的风格化结果,其特点是忠实的笔触和几何一致性与灵活的控制。在各种场景和风格的广泛实验证明了我们的方法的有效性和效率,风格化质量和推理FPS。
Index Terms:
Gaussian Splatting, Style Transfer, Perceptual Control索引词:高斯飞溅,风格转移,知觉控制
Figure 1:Stylization Results. Given a 2D style image, the proposed StylizedGS method can stylize the pre-trained 3D Gaussian Splatting to match the desired style with detailed geometric features and satisfactory visual quality within a few minutes. We also enable users to control several perceptual factors, such as color, the style pattern size (scale), and the stylized regions (spatial), during the stylization to enhance the customization capabilities.
图1:定型结果。给定一个2D样式图像,所提出的StylizedGS方法可以在几分钟内对预训练的3D Gaussian Splatting进行风格化,以匹配具有详细几何特征和令人满意的视觉质量的所需样式。我们还使用户能够控制几个感知因素,如颜色,风格模式的大小(规模),和风格化的区域(空间),在风格化,以提高定制能力。
1Introduction 1引言
Nowadays, the once professionally-dominated domain of artistic content creation has become increasingly accessible to novice users, thanks to recent groundbreaking advancements in visual artistic stylization research. As a pivotal artistic content generation tool in crafting visually engaging and memorable experiences, 3D scene stylization has attracted growing research efforts. Previous methodologies have attempted style transfer by enabling control over diverse explicit representations such as mesh [1, 2, 3], voxel [4, 5], and point cloud [6, 7, 8]. However, the quality of their results is limited by the quality of the geometric reconstructions. The recent 3D stylization methods benefit from the emerging implicit neural representations [9, 10, 11], such as neural radiance field (NeRF) [12, 13, 14, 15, 16], achieving more faithful and consistent stylization within 3D scenes. Nonetheless, NeRF-based methods are computationally intensive to optimize and suffer from the geometry artifacts in the original radiance fields.
如今,由于视觉艺术风格化研究的突破性进展,曾经专业主导的艺术内容创作领域已经越来越多地为新手用户所接受。作为一个关键的艺术内容生成工具,在制作视觉上引人入胜和难忘的经验,3D场景风格化吸引了越来越多的研究工作。先前的方法已经尝试通过启用对诸如网格[ 1,2,3]、体素[ 4,5]和点云[ 6,7,8]等各种显式表示的控制来进行样式转换。然而,他们的结果的质量是有限的几何重建的质量。最近的3D风格化方法受益于新兴的隐式神经表示[9,10,11],例如神经辐射场(NeRF)[12,13,14,15,16],在3D场景中实现更忠实和一致的风格化。 尽管如此,基于NeRF的方法是计算密集型的,以优化并遭受原始辐射场中的几何伪影。
The recently introduced 3D Gaussian Splatting (3DGS) [17], showcasing remarkable 3D reconstruction quality from multi-view images with high efficiency, suggests representing the 3D scene using an array of colored and explicit 3D Gaussians. Given that practical 3D stylization applications often demand a prompt response, we propose using 3DGS as the representation of real-world scenes and conducting 3D stylization on it. Recent 3DGS scene manipulation methods [18, 19, 20, 21] explore the editing and control of 3D Gaussians using text instructions within designated regions of interest or semantic tracing. However, these approaches are constrained by text input and fall short of delivering detailed style transfer capabilities. The methods for performing 3D Gaussian stylization remain less explored in the field.
最近引入的3D高斯溅射(3DGS)[ 17],展示了高效率的多视图图像的卓越3D重建质量,建议使用彩色和显式3D高斯阵列来表示3D场景。考虑到实际的3D风格化应用通常需要快速响应,我们建议使用3DGS作为真实世界场景的表示并对其进行3D风格化。最近的3DGS场景操作方法[18,19,20,21]探索使用指定感兴趣区域或语义跟踪内的文本指令编辑和控制3D高斯。然而,这些方法受到文本输入的限制,并且无法提供详细的样式转换功能。用于执行3D高斯风格化的方法在该领域中仍然较少探索。
In this paper, we introduce the first controllable scene stylization method based on 3DGS, StylizedGS. With a single reference style image, our method can effectively transfer its style features to the entire 3D scene, represented by a set of 3D Gaussians. It facilitates the artistic creation of visually coherent novel views that exhibit transformed and detailed style features in a visually reasonable manner. More importantly, StylizedGS operates at a notable inference speed, ensuring the efficient synthesis of stylized scenes. To address the desire for users to control perceptual factors such as color, scale, and spatial aspects, as in 2D image style transfer [22, 23, 24] and 3D scene style transfer [25], we enhance the flexibility and introduce an advanced level of perceptual controllability in our approach to achieve personalized and diverse characteristics in 3DGS stylization.
本文介绍了第一种基于3DGS的可控场景风格化方法StylizedGS。对于单个参考样式图像,我们的方法可以有效地将其样式特征转移到由一组3D高斯表示的整个3D场景中。它促进了视觉上连贯的新颖观点的艺术创作,这些观点以视觉上合理的方式展示了转换和详细的风格特征。更重要的是,StylizedGS以显著的推理速度运行,确保高效合成风格化场景。为了满足用户控制感知因素(如颜色,比例和空间方面)的愿望,如2D图像风格转换[22,23,24]和3D场景风格转换[ 25],我们增强了灵活性并在我们的方法中引入了高级感知可控性,以实现3DGS风格化中的个性化和多样化特征。
The proposed 3DGS stylization process is formulated as an optimization problem that optimizes the geometry and color of 3D Gaussians to render images with faithful stylistic features while preserving the semantic content. A trivial solution is directly minimizing both a style loss and a content loss to align the rendered images with the style image and avoid overly strong stylization by fine-tuning the color of 3D Gaussians, respectively. Nevertheless, it poses a challenge in obtaining high-quality results with intricate style details as shown in Fig. 10 (b), as learning only the color alterations cannot effectively capture the overall style pattern. Instead, we propose to learn both the optimal geometry and color parameters of 3D Gaussians to capture the detailed style feature and facilitate the stylization of the entire 3D scene. We propose a 2-step stylization framework with the first step to enhance the fidelity in the stylized scene by the color match and reduce cloudy artifacts and geometry noises by the proposed GS filter. The second step is the optimization for stylization with a nearest neighbor feature match (NNFM) loss. We also introduce a depth preservation loss without the need for additional networks or regulation operations to preserve the overall learned 3D scene geometry. Last, we specially design a set of effective loss functions and optimization schemes to enable flexible perceptual control for users, including color, scale, and spatial regions Our contribution can be summarized as follows:
建议的3DGS风格化过程被制定为一个优化问题,优化的几何形状和颜色的3D高斯渲染图像与忠实的风格特征,同时保留语义内容。一个简单的解决方案是直接最小化样式损失和内容损失,以分别通过微调3D高斯的颜色来将渲染图像与样式图像对齐并避免过于强烈的风格化。然而,它在获得具有复杂风格细节的高质量结果方面提出了挑战,如图10(b)所示,因为仅学习颜色变化不能有效地捕捉整体风格模式。相反,我们建议学习3D高斯的最佳几何和颜色参数,以捕获详细的风格特征,并促进整个3D场景的风格化。 我们提出了一个两步风格化框架,第一步是通过颜色匹配来增强风格化场景的保真度,并通过建议的GS滤波器来减少云状伪影和几何噪声。第二步是在最近邻特征匹配(NNFM)损失的情况下对风格化进行优化。我们还引入了深度保留损失,而不需要额外的网络或调节操作来保留整体学习的3D场景几何。 最后,我们专门设计了一套有效的损失函数和优化方案,为用户提供灵活的感知控制,包括颜色,尺度和空间区域。
- •
We introduce StylizedGS, a novel controllable 3D Gaussian stylization method that organically integrates various modules with proper improvements to transfer detailed style features and produce faithful novel stylized views.
·我们介绍了StylizedGS,一种新型的可控3D高斯风格化方法,通过适当的改进,有机地整合了各个模块,以传递详细的风格特征,并产生忠实的新颖风格化视图。 - •
We empower users with an efficient stylization process and flexible control by specially designed losses, enhancing their creative capabilities.
·我们为用户提供高效的风格化流程,并通过专门设计的损失进行灵活控制,增强他们的创造力。 - •
Our approach achieves significantly reduced training and rendering times while generating high-quality stylized scenes compared with existing 3D stylization methods.
·与现有的3D风格化方法相比,我们的方法在生成高质量的风格化场景的同时,显著减少了训练和渲染时间。
2Related Work 2相关工作
Image Style Transfer. Style transfer aims to generate synthetic images with the artistic style of given images while preserving content. Initially proposed in neural style transfer methods by Gatys et al. [26, 27], this process involves iteratively optimizing the output image using Gram matrix loss and content loss calculated from VGG-Net [28] extracted features. Subsequent works [29, 30, 31, 32] have explored alternative style loss formulations to enhance semantic consistency and capture high-frequency style details such as brushstrokes. Feed-forward transfer methods [33, 34, 35], where neural networks are trained to capture style information from the style image and transfer it to the input image in a single forward pass, ensuring faster stylization. Recent improvements in style loss [31, 32, 13] involve replacing the global Gram