Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion实

最新推荐文章于 2025-11-24 17:06:07 发布

翻译最新推荐文章于 2025-11-24 17:06:07 发布 · 107 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://arxiv.org/html/2411.10369?_immersive_translate_auto_translate=1

文章标签：

#3d

Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion
实现细节丰富的高保真 3D 人像生成由 Cross-View Prior-Aware Diffusion 提供

https://arxiv.org/html/2411.10369?_immersive_translate_auto_translate=1

Haoran Wei1∗, Wencheng Han1∗, Xingping Dong2, Jianbing Shen1†
1SKL-IOTSC, CIS, University of Macau, 2School of Computer Science, Wuhan University
{hr.wei1998, wenchenghan, xingping.dong}@gmail.com, jianbingshen@um.edu.mo

Abstract 抽象

Recent diffusion-based Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations. However, these methods usually struggle to produce high-fidelity 3D models, frequently yielding excessively blurred textures. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views and ultimately leading to blurred 3D representations. In this paper, we address this issue by comprehensively exploiting multi-view priors in both the conditioning and diffusion procedures to produce consistent, detail-rich portraits. From the conditioning standpoint, we propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits. From the diffusion perspective, considering the significant impact of the diffusion noise distribution on detailed texture generation, we propose a Multi-View Noise Resamplig Strategy integrated within the optimization process leveraging cross-view priors to enhance representation consistency. Extensive experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image. The project page is at https://haoran-wei.github.io/Portrait-Diffusion.
最近基于扩散的单图像 3D 肖像生成方法通常采用 2D 扩散模型来提供多视图知识，然后将其提炼成 3D 表示。然而，这些方法通常难以生成高保真 3D 模型，经常产生过度模糊的纹理。我们将这个问题归因于扩散过程中对交叉视图一致性的考虑不足，导致不同视图之间存在显着差异，最终导致 3D 表示模糊。在本文中，我们通过在条件反射和扩散过程中全面利用多视图先验来解决这个问题，以生成一致的、细节丰富的肖像。从条件反射的角度出发，我们提出了一种混合先验差异模型，该模型显式和隐式地将多视图先验作为条件，以增强生成的多视图肖像的状态一致性。从扩散的角度来看，考虑到扩散噪声分布对详细纹理生成的重大影响，我们提出了一种集成在优化过程中的多视图噪声重采样策略，利用交叉视图先验来增强表示的一致性。广泛的实验表明，我们的方法可以从单个图像中生成具有精确几何形状和丰富细节的 3D 肖像。项目页面位于 https://haoran-wei.github.io/Portrait-Diffusion。

Figure 1:Our proposed Portrait Diffusion framework can generate high-quality detail-rich 3D portraits from a single reference portrait image. In comparison to SOTA methods Wonder3D [19] and Portrait3D [36], our approach achieves clearer and more detailed textures.
图 1：我们提出的 Portrait Diffusion 框架可以从单个参考人像图像生成高质量、细节丰富的 3D 人像。与 SOTA 方法 Wonder3D [19] 和 Portrait3D [36] 相比，我们的方法实现了更清晰、更详细的纹理。

†∗ Equal contribution. † Corresponding author: Jianbing Shen.
∗ 平等贡献。 † 通讯作者： 沈建兵 。

1Introduction 1 介绍

The generation of realistic 3D portraits from a single image [7, 39, 10, 35, 20] has become an important focus in computer vision and graphics, with broad applications in augmented reality, virtual reality, video conferencing, and gaming[14, 18, 45]. The most straightforward approach involves training GAN models [47, 1] on extensive portrait datasets to directly produce 3D representations. However, acquiring such training data can be costly and technically challenging, leading to failures in generating high-fidelity 360° full-head portraits [7, 8] and often resulting in a lack of diversity in the outputs.
从单个图像生成逼真的 3D 肖像 [7,39,10,35,20] 已成为计算机视觉和图形学的重要焦点，在增强现实、虚拟现实、视频会议和游戏中具有广泛的应用 [14,18,45]。最直接的方法是在广泛的肖像数据集上训练 GAN 模型 [47,1]，以直接生成 3D 表示。然而，获取此类训练数据可能成本高昂且技术上具有挑战性，导致无法生成高保真 360°全头肖像 [7,8]，并且通常会导致输出缺乏多样性。

To address these limitations, recent developments [24, 46, 26, 31, 30] leverage text-to-image diffusion priors [44, 41, 5], which exhibit stronger generalization capabilities and higher generation quality, to produce novel perspectives. Most approaches incorporate additional priors, such as reference image latents [42, 50], ID features [28, 13], and view embeddings [28], to enhance the consistency between new perspectives and the primary viewpoint. Subsequently, they commonly employ Score Distilling Sampling (SDS) loss [25] to distill these 2D priors into 3D representations, ensuring consistent 3D generation.
为了解决这些局限性，最近的发展 [24,46,26,31,30] 利用文本到图像的扩散先验 [44,41,5] 来产生新的视角，这些先验表现出更强的泛化能力和更高的生成质量。大多数方法都包含额外的先验，例如参考图像潜在[42,50]、ID 特征 [28,13] 和视图嵌入 [28]，以增强新视角与主要视点之间的一致性。随后，他们通常采用分数蒸馏采样（SDS）损失 [25] 将这些 2D 先验提炼成 3D 表示，确保一致的 3D 生成。

However, in single-image 3D portrait generation, these methods still face challenges: generated portraits often appear over-smoothed and fail to capture detailed textures like hair strands, as illustrated in Fig. 1, limiting their practical applications. We attribute this issue to the insufficient consideration of cross-view consistency during the diffusion process, resulting in significant disparities between different views. This 2D inconsistency results in blurred 3D output by SDS optimization. Although these methods attempt to improve consistency by incorporating additional priors, they rely solely on diffusion attentions to implicitly convey these priors. This reliance results in a lack of explict constraints, leading to inconsistent status across different viewpoints. Moreover, the diffusion procedure is inherently stochastic; even with the same conditions, a diffusion model can generate varied representations due to randomly sampled noises. By using view-independent procedures with purely random noise in diffusion, these methods overlook the impact of stochasticity on representation consistency. Consequently, these inconsistencies in status and representation jointly result in over-smoothed 3D models when optimized under the SDS loss, which enforces 3D consistency and continuity in sacrifice of texture details.
然而，在单张图像 3D 人像生成中，这些方法仍然面临挑战：生成的人像往往显得过度平滑，无法捕捉到发丝等细节纹理，如图所示。 1、限制了它们的实际应用。我们将这个问题归因于扩散过程中对交叉观点一致性的考虑不足，导致不同观点之间存在显著差异。这种 2D 不一致导致 SDS 优化导致 3D 输出模糊。尽管这些方法试图通过合并额外的先验来提高一致性，但它们仅依靠扩散注意力来隐式传达这些先验。这种依赖导致缺乏明确的约束，导致不同观点之间的状态不一致。此外，扩散过程本质上是随机的;即使在相同的条件下，扩散模型也会因随机采样噪声而生成不同的表示。通过使用与视图无关的程序，在扩散中具有纯随机噪声，这些方法忽略了随机性对表示一致性的影响。因此，这些状态和表示的不一致共同导致在 SDS 损失下优化时过度平滑的 3D 模型，从而在牺牲纹理细节的情况下强制执行 3D 一致性和连续性。

To address these issues, we propose fully exploiting cross-view priors in both the conditioning and diffusion procedures to enhance multi-view consistency, thus yielding detail-rich 3D portraits, as showcased in Fig. 1. From a conditioning perspective, we propose Hybrid Priors Diffusion Model (HPDM). Our approach seeks to transfer and utilize cross-view prior information in both explicit and implicit ways to control the novel view generation. In an explicit manner, we begin by employing geometric priors to map pixels from the current view to the next, providing an explicit reference to dominate the generation process. Given that this reference encompasses only a limited overlapping region and contains artifacts introduced through perspective transformations, we further propose to utilize the robust modeling capabilities of attention mechanisms to mitigate these deficiencies. These mechanisms capture finer texture and geometry priors and implicitly transfer these priors into the control conditions, ensuring a more comprehensive and precise guidance for the portrait status of novel viewpoint.
为了解决这些问题，我们建议在调节和扩散过程中充分利用交叉视图先验来增强多视图一致性，从而产生细节丰富的 3D 肖像，如图所示。 1.从条件反射的角度来看，我们提出了混合先验扩散模型（HPDM）。我们的方法旨在以显式和隐式方式传输和利用交叉视图先验信息来控制新颖的视图生成。以显式的方式，我们首先使用几何先验将像素从当前视图映射到下一个视图，提供显式引用来主导生成过程。鉴于该参考仅包含有限的重叠区域，并包含通过透视转换引入的伪影，我们进一步建议利用注意力机制的鲁棒建模能力来减轻这些缺陷。这些机制捕获更精细的纹理和几何先验，并将这些先验隐式转移到控制条件中，确保对新视点的肖像状态进行更全面和精确的指导。

From a diffusion procedure perspective, our goal is to manage randomness in adjacent viewpoints so that they can share detailed, consistent representations. To achieve this, we introduce a Multi-View Noise Resampling Strategy (MV-NRS) integrated into the SDS loss, which manages each view’s noise distribution by passing cross-view priors. MV-NRS consists of two main components: first, a shared anchor noise initialization that leverages geometric priors to establish a preliminary representation; and second, an anchor noise optimization phase, where we resample and update the anchor noise based on denoising gradient consistency prior to progressively align the representations during the SDS optimization.
从扩散过程的角度来看，我们的目标是管理相邻视点的随机性，以便它们可以共享详细、一致的表示。为了实现这一目标，我们引入了集成到 SDS 损耗中的多视图噪声重采样策略（MV-NRS），它通过传递交叉视图先验来管理每个视图的噪声分布。MV-NRS 由两个主要组件组成：首先，共享锚点噪声初始化，利用几何先验建立初步表示;其次，锚点噪声优化阶段，在 SDS 优化期间逐步对齐表示之前，我们根据去噪梯度一致性重新采样和更新锚点噪声。

To summarize, our main contributions are as follows:
综上所述，我们的主要贡献如下：

•

We developed a Portrait Diffusion pipeline consisting of GAN-prior Initialization, Portrait Geometry Restoration, and Multi-view Diffusion Refinement modules to generate rich-detail 3D portraits.

• 我们开发了一个由 GAN 先验初始化、人像几何恢复和多视图扩散细化模块组成的人像扩散管道，以生成细节丰富的 3D 人像。
•

We designed a Hybrid Priors Diffusion Model that emphasizes both explicit and implicit integration of multi-view priors to impose conditions, aiming to enhance the consistency of multi-view status.

• 我们设计了一个混合先验扩散模型，强调显式和隐式集成多视图先验来施加条件，旨在增强多视图状态的一致性。
•

We introduced a Multi-View Noise Resampling Strategy integrated within the SDS loss to manage randomness across different views through the transmission of cross-view priors, thereby achieving fine-grained consistent representations.

• 我们引入了集成在 SDS 损耗中的多视图噪声重采样策略，通过传输交叉视图先验来管理不同视图之间的随机性，从而实现细粒度一致的表示。
•

Through extensive experiments, we show that our proposed pipeline successfully achieves high-fidelity 3D full portrait generation with rich details.

• 通过大量的实验，我们表明我们提出的管线成功地实现了细节丰富的高保真 3D 全肖像生成。

Refer to caption

Figure 2: The Portrait Diffusion Framework. This framework comprises three integral modules. GAN-prior Portrait Initialization, employs existing Portrait GAN priors to derive initial tri-plane NeRF features from frontal-view portrait images. Portrait Geometry Restoration, is focused on reconstructing the geometry using these initialized tri-planes. Multi-view Diffusion Texture Refinement, transforms coarse textures into detailed representations.
图 2： Portrait Diffusion 框架。 该框架由三个整体模块组成。GAN 先验人像初始化 ，使用现有的人像 GAN 先验从前视人像图像中导出初始三平面 NeRF 特征。Portrait Geometry Restoration 专注于使用这些初始化的三平面重建几何体。 多视图扩散纹理细化 ，将粗糙纹理转换为详细的表示。

2Related Work 阿拉伯数字相关工作

One-shot 3D Generation 3D GANs [48, 11, 27, 4, 43, 40] have made significant strides in advancing one-shot 3D object generation by enhancing both quality and efficiency. GRAM [6] enhanced efficiency through point sampling on 2D manifolds, and GET3D [12] integrated differentiable rendering with 2D GANs to efficiently generate detailed 3D meshes. For improving 3D consistency, Geometry-aware 3D GAN [3] used a hybrid architecture to maintain multi-view consistency, while GRAM-HD [37] employed super-resolution techniques to address inconsistency issues. Despite these advances, limited datasets constrain the prior distribution, and acquiring high-quality data remains costly.
一次性 3D 生成 3D GAN [48， 11， 27， 4， 43， 40] 通过提高质量和效率，在推进一次性 3D 对象生成方面取得了重大进展。GRAM [6] 通过对 2D 流形进行点采样提高了效率，GET3D [12] 将可微分渲染与 2D GAN 集成在一起，以高效生成详细的 3D 网格。为了提高 3D 一致性，几何感知 3D GAN [3] 使用混合架构来保持多视图一致性，而 GRAM-HD [37] 则采用超分辨率技术来解决不一致问题。尽管取得了这些进步，但有限的数据集限制了先前的分布，并且获取高质量数据的成本仍然很高。

Recently, methods leveraging 2D diffusion prior [22, 21, 21, 23, 2, 34, 49, 33] for generating 3D objects have gained traction [9, 32, 16, 38, 46, 26]. Dreamfusion [24] introduces a loss mechanism based on probability density distillation for optimizing parametric image generators. DreamCraft3D [29] employs view-dependent diffusion models for coherent 3D generation, using Bootstrapped Score Distillation to enhance textures. Make-It-3D [30] uses 2D diffusion models as perceptual supervision in a two-stage process, enhancing textures with reference images. Make-it-Vivid [31] focuses on automatic texture generation from text instructions, achieving quality outputs in UV space. These advancements underscore the promise of diffusion priors in achieving multi-view consistency in 3D object generation.
最近，利用二维扩散先验 [22,21,21,23,2,34,49,33] 生成三维对象的方法已经受到关注 [9,32,16,38,46,26]。 Dreamfusion [24] 引入了一种基于概率密度蒸馏的损失机制，用于优化参数化图像生成器。DreamCraft3D [29] 采用与视图相关的扩散模型来生成相干的 3D，使用 Bootstrapped Score Distillation 来增强纹理。Make-It-3D [30] 在两阶段过程中使用二维扩散模型作为感知监督，通过参考图像增强纹理。Make-it-Vivid [31] 专注于从文本指令中自动生成纹理，在 UV 空间中实现高质量的输出。这些进步强调了扩散先验在 3D 对象生成中实现多视图一致性的前景。

One-shot 3D Portrait Generation In 3D portrait synthesis, Yin et al. [47] enhanced 3D GAN inversion using facial symmetry and depth-guided pseudo labels for better structural consistency and texture fidelity. PanoHead [1] creates 360° portraits with a two-stage registration process using tri-mesh neural volumetric representation.
一次性 3D 人像生成 在 3D 人像合成中，Yin 等[47] 使用面部对称性和深度引导的伪标签增强了 3D GAN 反演，以获得更好的结构一致性和纹理保真度。PanoHead [1] 使用三网格神经体积表示，通过两阶段配准过程创建 360° 肖像。

Benefiting from diffusion priors, diffusion models significantly enhance 3D portrait synthesis by enabling detailed zero-shot full head generation. Portrait3D [36] uses 3DPortraitGAN to produce 360° canonical portraits, addressing “grid-like” artifacts with a pyramidal tri-grid representation and improving details through diffusion model fractional distillation sampling. DiffusionAvatars [17] combine a diffusion-based renderer with a neural head model, using cross-attention for consistent expressions across angles. Another Portrait3D framework [13] by Hao et al. emphasizes identity preservation in avatars across three phases: geometry initialization, sculpting, and texture generation, employing ID-aware techniques. While many of these methods utilize SDS and incorporate ID and normal information for enhanced representation, they often struggle to fully utilize multiple priors across viewpoints, leading to texture issues like over-smoothing or artifacts.
受益于扩散先验，扩散模型通过实现详细的零样本全头生成来显着增强 3D 人像合成。Portrait3D [36] 使用 3DPortraitGAN 生成 360°规范肖像，通过金字塔形三网格表示解决“网格状”伪影，并通过扩散模型分馏采样改进细节。DiffusionAvatars [17] 将基于扩散的渲染器与神经头模型相结合，使用交叉注意力实现跨角度的一致表达。Hao 等人的另一个 Portrait3D 框架 [13] 强调在三个阶段对化身进行身份保留：几何初始化、雕刻和纹理生成，采用 ID 感知技术。虽然其中许多方法都利用 SDS 并结合 ID 和正常信息来增强表示，但它们通常难以充分利用跨视点的多个先验，从而导致纹理问题，例如过度平滑或伪影。

3Methods 3、方法

In this section, we first analyze the limitations of existing methods and give our motivations (Sec. 3.1). Next, we provide an overview of our pipeline, including GAN Prior Initialization Module, Portrait Geometry Restoration Module and Multi-view Diffusion Texture Refinement Module (Sec. 3.2). We then focus on the Multi-view Diffusion Texture Refinement Module, emphasizing both Multi-view Status Consistency (Sec. 3.3) and Multi-view Representation Consistency (Sec. 3.4) to achieve consistent multi-view generation achieving fine texture fidelity in 3D portrait.
在本节中，我们首先分析现有方法的局限性并给出我们的动机（第 3.1 节）。接下来，我们概述了我们的管道，包括 GAN 先验初始化模块、肖像几何恢复模块和多视图扩散纹理细化模块（第 3.2 节）。然后，我们重点介绍多视图扩散纹理细化模块，强调多视图状态一致性（第 3.3 节）和多视图表示一致性（第 3.4 节），以实现一致的多视图生成，实现 3D 人像中的精细纹理保真度。

3.1Preliminary 3.1 初步

Existing diffusion-based methods for generating 3D objects predominantly utilize Score Distillation Sampling (SDS) loss [25] to distill 2D diffusion priors into 3D representations. This process can be formulated as follows:
现有的基于扩散的 3D 对象生成方法主要利用分数蒸馏采样（SDS）损失 [25] 将 2D 扩散先验提炼成 3D 表示。这个过程可以表述如下：

Φ∗=arg⁡minΦ⁢(ℒSDS⁢(Φ;θ)+ℒref⁢(Φ;Iref))

(1)

where Φ denotes the parameters of the 3D model, ℒSDS⁢(Φ;θ) represents the SDS loss using a diffusion model paramterized by θ, and ℒref⁢(Φ;Iref) is a loss computed from reference image Iref. The SDS loss can be formulated as:
其中 Φ 表示 3D 模型的参数， ℒSDS⁢(Φ;θ) 使用参数化的 θ 扩散模型表示 SDS 损耗， ℒref⁢(Φ;Iref) 并且是根据参考图像 Iref 计算的损耗。SDS 损耗可表述为：

	∇ΦℒSDS	=𝔼t,v,ϵ⁢[wt⁢(ϵθ⁢(zt,v,t,c)−ϵ)⋅∇ΦℛΦ⁢(v)]		(2)
	zt,v	=αt⁢zv⁢(Φ)+1−αt⁢ϵ,ϵ∼𝒩⁢(0,I),		(2)

where zt,v is a noisy latent representation obtained by combining the image latents zv⁢(Φ), which is rendered from viewpoint v by Φ, with random noise ϵ; ϵθ⁢(zt,v,t,c) is a diffusion UNet model that predicts the noise component at each time step t, conditioned on c. wt and αt are weitghts, and ℛ is rendering function.
式中 zt,v ，是通过将图像潜在图像 zv⁢(Φ) v Φ 与随机噪声 ϵ 相结合而得到的噪声潜在表示; ϵθ⁢(zt,v,t,c) 是一个扩散 UNet 模型，用于预测每个时间步长 t 的噪声分量，条件为 c 。 wt 是 αt weitghts，并且 ℛ 是渲染功能。

From (2), the SDS loss aggregates the denoising gradients from all v to the 3D model parameters Φ. When the denosing distributions across viewpoints are inconsistent, the SDS loss will produce over-smoothed representations to minimize the overall loss by averaging conflicting gradients, sacrificing the details of each perspective. The denoising function ϵ𝜽 is influenced by both the conditions 𝒄 and the distribution of noise ϵ from each viewpoint, making them essential for the quality of the 3D representation.
从（2）开始，SDS 损失将去噪梯度从所有 v 参数聚合到 3D 模型参数 Φ 。当跨视点的除缘分布不一致时，SDS 损失将产生过度平滑的表示，通过对冲突的梯度进行平均来最大限度地减少总体损失，从而牺牲每个视角的细节。去噪功能 ϵ𝜽 受条件 𝒄 和每个视角的噪声 ϵ 分布的影响，因此它们对于 3D 表示的质量至关重要。

However, previous methods did not fully leverage multi-view priors to effectively control both conditions 𝒄 and noise ϵ. This resulted in inconsistent multi-view denoising, making them impossible to generate detailed 3D textures. Although some of these methods incorporate additional priors like ID features to enhance the condition, they rely solely on implicit priors transfer through embeddings and attention mechanisms within the diffusion, while lacking explicit guidances. Therefore, they are unable to effectively constrain the portrait status across dffierent views. Additionally, these methods focus merely on enhancing conditions 𝒄 and overlook the significant influence of ϵ on detailed representations. As a result, the mutually independent multi-view noise adding procedure leads to a lack of fine-grained alignment in denoising gradients.
然而，以前的方法并没有充分利用多视图先验来有效控制条件 𝒄 和噪声 ϵ 。这导致多视图去噪不一致，使他们无法生成详细的 3D 纹理。尽管其中一些方法结合了额外的先验，如 ID 特征来增强条件，但它们仅依赖于通过扩散中的嵌入和注意力机制进行隐式先验转移，而缺乏明确的指导。因此，它们无法有效地约束不同视图的肖像状态。此外，这些方法只关注增强条件 𝒄 ，而忽略了对详细表示的 ϵ 重大影响。因此，相互独立的多视图噪声添加过程导致去噪梯度缺乏细粒度对齐。

Refer to caption

Figure 3:The presentations of our proposed Hybrid Priors Portrait Diffusion model (a) and Multi-View Noise Resampling Strategy (b). HPDM is designed to leverage various multi-view priors in a hybrid manner to condition the new view synthetic process for more consistent status. NV-NRS is designed to transfer corss-view priors to control the diffusion noise distribution for representations alignment.
图 3：我们提出的混合先验肖像扩散模型 （a）和多视图噪声重采样策略 （b）的演示。HPDM 旨在以混合方式利用各种多视图先验来调节新的视图合成过程，以获得更一致的状态。NV-NRS 旨在传输 corss-view 先验，以控制表示对齐的扩散噪声分布。

3.2Detail-Rich Portrait Diffusion
3.2 细节丰富的人像扩散

Our Portrait Diffusion framework for high-fidelity detail-rich 3D portrait generation is illustrated in Fig. 2. It consists of three major modules:
我们的 Portrait Diffusion 框架用于生成高保真、细节丰富的 3D 人像，如图所示。 2.它由三个主要模块组成：

GAN-Prior Initialization Module utilizes GAN priors learnt from large-scale offline multi-view portrait iamges to initialize a tri-plane representation, as shown in Fig. 2 (a). This tri-plane offers preliminary geometry and texture, facilitating the subsequent training process.
GAN-Prior 初始化模块利用从大规模离线多视图纵向图像中学习的 GAN 先验来初始化三平面表示，如图所示。 2（a）。该三平面提供初步的几何形状和纹理，促进后续训练过程。

In our method, we employ a NeRF, parameterized by Ψ, as our 3D model. Initialization plays a critical role in the quality of NeRF models. Standard initialization, such as using a central sphere, often produce excessively smooth geometries with insufficient details. Therefore, a high-quality initialization can significantly benefit the subsequent optimization process. Inspired by Portrait 3D [13], we utilize GAN priors to initialize NeRF representations. Specifically, we utilize SOAT GAN method [7] to generate triplane features from a frontal image. This process can be formalized as:
在我们的方法中，我们使用由参数 Ψ 化的 NeRF 作为我们的 3D 模型。初始化对 NeRF 模型的质量起着至关重要的作用。标准初始化（例如使用中心球体）通常会产生过于平滑且细节不足的几何形状。因此，高质量的初始化可以显着有利于后续的优化过程。受 Portrait 3D [13] 的启发，我们利用 GAN 先验来初始化 NeRF 表示。具体来说，我们利用 SOAT GAN 方法 [7] 从正面图像生成三翼机特征。这个过程可以形式化为：

	Tinit,I	=𝒢ψenc⁢(I)		(3)
	Ψinit,I	={Tinit,I,ψdec,ψSR}		(3)

where G denotes the GAN, which is parameterized by the encoder ψenc, decoder ψdec and superresolution ψSR parameters, T represents triplane features and Ψinit,I represents the initialized NeRF parameters for image I.
式中 G 表示 GAN，由编码器 ψenc 、解码器和 ψdec 超分辨率 ψSR 参数参数化， T 表示三平面特征， Ψinit,I 表示图像 I 初始化的 NeRF 参数。

While GAN-generated portraits effectively capture frontal details, they are constrained by the lack of 360∘ priors, leading to missing geometry at the back and the Janus issue. Therefore, we devise a Portrait Geometry Restoration module to futher repair the geometry for the 3D portrait.
虽然 GAN 生成的人像有效地捕捉了正面细节，但它们受到缺乏 360∘ 先验的限制，导致背面的几何形状缺失和 Janus 问题。因此，我们设计了一个肖像几何恢复模块来进一步修复 3D 肖像的几何形状。

Portrait Geometric Restoration Module is designed to fix the structure of the initialized tri-plane with Diffusion priors. It employ diffusion models to deliver high-quality, generalized priors, and introduces a Detail Preservation Block that effectively preserves the details from the initialized priors, as shown in Fig. 2 (b).
肖像几何恢复模块旨在修复具有扩散先验的初始化三平面的结构。它采用扩散模型来提供高质量的、广义的先验，并引入了一个细节保留块，可以有效地保留初始化先验的细节，如图所示。 2（b）。

Directly optimizing the triplane can lead to the erosion of initialized details; therefore, we employ a Detail Preservation Module that features a small UNet to transform the initialized tri-plane into the desired tri-plane instead. The core idea is to propagate gradients across the entire tri-plane through conv layers, thereby effectively leveraging the global priors. It considers the overall distribution of the initialized priors and subtly adjusts it to maintain coherence through optimizing the UNet paramters.
直接优化三翼机会导致初始化细节的侵蚀;因此，我们采用了一个具有小型 UNet 的细节保留模块，将初始化的三平面转换为所需的三平面。核心思想是通过 conv 层在整个三平面上传播梯度，从而有效地利用全局先验。它考虑了初始化先验的总体分布，并通过优化 UNet 参数对其进行了微妙的调整，以保持连贯性