【读论文】Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

The first 3D Gaussian-based approach to jointly reconstruct and segment anything in the open-world 3D scene.
Each Gaussian with a compact Identity Encoding, supervised by 2D masks by SAM along with introduced 3D spatial consistency regularization, can also be further used for editing.

  • Explanation of Open-world

    An open-world scenario refers to an uncertain, dynamic and complex environment that contains a variety of objects, scenes and tasks.

    Or “open-world scene understanding” refers to the ability of a model to generalize to scenes or environments that it has not been explicitly trained on. In this context, the term “open-world” implies that the model needs to be able to adapt to and understand a wide range of scenes, including those that may be very different from the scenes in its training data.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

  • Existing methods [8, 37] rely on manually-labeled datasets or require accurately scanned 3D point clouds [33, 42] as input.
  • Existing NeRFs-based methods [14, 17, 25, 39] are computation-hungry and hard to adjust for the downstream task because the learned neural networks, such as MLPs, cannot decompose each part or module in the 3D scene easily
  • As for Radiance-based Open World Scene Understanding: Unlike our approach, most of these methods are designed for in-domain scene modeling and cannot generalize to open-world scenarios.

3. How

Following this pipeline, we will introduce it in details.

在这里插入图片描述

3.1 Anything Mask Input and Consistency

Shown in Figure 2(a), a set of multi-view captures along with the automatically generated 2D segmentations by SAM, as well as the corresponding cameras calibrated via SfM are inputs.

Shown in Figure 2(b), to assign each 2D mask a unique ID in the 3D scene, a well-trained zero-shot tracker [7] was used to propagate and associate masks. Use colors to represent different segmentation labels, and the results are shown in Figure 2(b)

3.2 3D Gaussian Rendering and Grouping

Shown in Figure 2©, all of the core concepts of this paper were used.

  1. Identity Encoding

    A new parameter, i.e., Identity Encoding is introduced to each Gaussian with original S Θ i = { p i , s i , q i , α i , c i } S_{\Theta_{i}}=\{\mathbf{p}_{i},\mathbf{s}_{i},\mathbf{q}_{i},\alpha_{i},\mathbf{c}_{i}\} SΘi={ pi,si,qi,αi,ci}. It is a compact vector of length 16 and similar to Spherical Harmonic (SH) coefficients in representing color, it is differentiable and learnable.

  2. Grouping via Rendering

    In the process of rendering labels, similar to α \alpha

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值