【读论文】SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

SC-GS: Sparse-Controlled Gaussian Splatting for Editable Dynamic Scenes

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

Inputting a monocular dynamic video, this paper uses sparse control points to drive Gaussian. Each control point has 6 DoF which is time-varying and can be predicted by a MLP. This method can enable dynamic view synthesis and motion editing but still has some limitations in inaccurate poses or intense movements.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Nerf-based methods struggle with low rendering qualities, speeds, and high memory usage. Existing 3D-GS only applies to static scenes. An intuitive method [47] involves learning a flow vector for each 3D Gaussian, but it incurs a significant time cost for training and inference(The author of 47 is a co-author of this).

Related work:

  • Dynamic NeRF

  • Dynamic Gaussian Splatting

  • 3D Deformation and Editing

    This part is relatively unfamiliar. It introduces the traditional editing methods in graphics which focus on preserving the geometric details of 3D objects during the deformation process, containing some descriptors like Laplacian coordinates, Poisson equation, and cage-based approaches.

    Recently, there have been other approaches that aim to edit the scene geometry learned from 2D images. This paper belongs to this class.

3. How

3.1 Sparse Control Points

We will first introduce the definition of control points, which is the core concept used in this article.

There are a set of sparse control points P = { ( p i ∈ R 3 , o i ∈ R + ) } , i ∈ { 1 , 2 , ⋯   , N p } \mathcal{P}=\{(p_{i}\in\mathbb{R}^{3},o_{i}\in\mathbb{R}^{+})\},i\in \{1,2,\cdots,N_{p}\} P={(piR3,oiR+)},i{ 1,2,,Np}. And o i o_i oi is a learnable radius parameter that controls how the impact of a control point on a Gaussian.

Meanwhile, for each control point k k k, we learn time-varying 6 DoF transformations [ R i t ∣ T i t ] ∈ S E ( 3 ) [R_i^t|T_i^t]\in\mathbf{SE}(3) [RitTit]SE(3) , consisting of a local frame rotation matrix R i t ∈ S O ( 3 ) R_i^t\in\mathbf{SO}(3) RitSO(3) and a translation vector T i t ∈ R 3 T_i^t\in\mathbb{R}^3 TitR3. But instead of directly optimizing the transformation parameters, we employ an MLP Ψ \Psi Ψ to learn a time-varying transformation field:

Ψ : ( p i , t ) → ( R i t , T i t ) . \Psi:(p_{i},t)\rightarrow(R_{i}^{t},T_{i}^{t}). Ψ:(pi,t)(Rit,Tit).

3.2 Dynamic Scene Rendering

After having some control points, we need to found the connection between it with the Gaussian.

We use the k-nearest neighbor (KNN) search to obtain its K(= 4) neighboring control points denoted as { p k ∣ k ∈ N j } \{p_{k}|k\in\mathcal{N}_{j}\} { pkkNj}. Then define its weight as:

w j k = w ^ j k ∑ k ∈ N j w ^ j k , where w ^ j k = exp ⁡ ( − d j k 2 2 o k 2 ) , w_{jk}=\frac{\hat w_{jk}}{\sum\limits_{k\in\mathcal{N}_{j}}\hat w_{jk}},\text{where}\hat w_{jk}=\exp(-\frac{d_{jk}^{2}}{2o_{k}^{2}}), wjk=kNjw^jkw^jk,wherew^jk=exp(2ok2djk2),

where d j k d_{jk} djk is the distance between the center of Gaussian G j G_j

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值