【读论文】Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

文章目录

  • Gaussian Grouping: Segment and Edit Anything in 3D Scenes
    • 1. What
    • 2. Why
    • 3. How
      • 3.1 Anything Mask Input and Consistency
      • 3.2 3D Gaussian Rendering and Grouping
      • 3.3 Downstream: Local Gaussian Editing

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

The first 3D Gaussian-based approach to jointly reconstruct and segment anything in the open-world 3D scene.
Each Gaussian with a compact Identity Encoding, supervised by 2D masks by SAM along with introduced 3D spatial consistency regularization, can also be further used for editing.

  • Explanation of Open-world

    An open-world scenario refers to an uncertain, dynamic and complex environment that contains a variety of objects, scenes and tasks.

    Or “open-world scene understanding” refers to the ability of a model to generalize to scenes or environments that it has not been explicitly trained on. In this context, the term “open-world” implies that the model needs to be able to adapt to and understand a wide range of scenes, including those that may be very different from the scenes in its training data.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

  • Existing methods [8, 37] rely on manually-labeled datasets or require accurately scanned 3D point clouds [33, 42] as input.
  • Existing NeRFs-based methods [14, 17, 25, 39] are computation-hungry and hard to adjust for the downstream task because the learned neural networks, such as MLPs, cannot decompose each part or module in the 3D scene easily
  • As for Radiance-based Open World Scene Understanding: Unlike our approach, most of these methods are designed for in-domain scene modeling and cannot generalize to open-world scenarios.

3. How

Following this pipeline, we will introduce it in details.

在这里插入图片描述

3.1 Anything Mask Input and Consistency

Shown in Figure 2(a), a set of multi-view captures along with the automatically generated 2D segmentations by SAM, as well as the corresponding cameras calibrated via SfM are inputs.

Shown in Figure 2(b), to assign each 2D mask a unique ID in the 3D scene, a well-trained zero-shot tracker [7] was used to propagate and associate masks. Use colors to represent different segmentation labels, and the results are shown in Figure 2(b)

3.2 3D Gaussian Rendering and Grouping

Shown in Figure 2©, all of the core concepts of this paper were used.

  1. Identity Encoding

    A new parameter, i.e., Identity Encoding is introduced to each Gaussian with original S Θ i = { p i , s i , q i , α i , c i } S_{\Theta_{i}}=\{\mathbf{p}_{i},\mathbf{s}_{i},\mathbf{q}_{i},\alpha_{i},\mathbf{c}_{i}\} SΘi={pi,si,qi,αi,ci}. It is a compact vector of length 16 and similar to Spherical Harmonic (SH) coefficients in representing color, it is differentiable and learnable.

  2. Grouping via Rendering

    In the process of rendering labels, similar to α \alpha α-blending:

    E id = ∑ i ∈ N e i α i ′ ∏ j = 1 i − 1 ( 1 − α j ′ ) , E_{\text{id}}=\sum_{i\in\mathcal{N}}e_i\alpha_i'\prod_{j=1}^{i-1}(1-\alpha_j'), Eid=iNeiαij=1i1(1αj),

    but the denotations are different. e i e_i ei is the Identity Encoding of length 16 for each Gaussian and α i ′ \alpha_i' αi is a new weight, calculated by multiplying opacity α i \alpha_i αi and Σ 2 D \Sigma^{2\mathrm{D}} Σ2D, where Σ 2 D = J W Σ 3 D W T J T \Sigma^{2\mathrm{D}}=JW\Sigma^{3\mathrm{D}}W^TJ^T Σ2D=JWΣ3DWTJT according to [61].

  3. Grouping Loss

    • 2D Identity Loss: Given the rendered 2D features E i d E_{id} Eid before as input, first add a linear layer f f f to recover its feature dimension back to K+1 and then take s o f t m a x ( f ( E i d ) ) softmax (f(Eid)) softmax(f(Eid)) for identity classification. And cross-entropy loss was used.

    • 3D Regularization Loss:

      3D Regularization Loss leverages the 3D spatial consistency, which enforces the Identity Encodings of the top k-nearest 3D Gaussians to be close in their feature distance.

      L 3 d = 1 m ∑ j = 1 m D k l ( P ∥ Q ) = 1 m k ∑ j = 1 m ∑ i = 1 k F ( e j ) log ⁡ ( F ( e j ) F ( e i ′ ) ) \mathcal{L}_{\mathrm{3d}}=\frac{1}{m}\sum_{j=1}^{m}D_{\mathrm{kl}}(P\|Q)=\frac{1}{mk}\sum_{j=1}^{m}\sum_{i=1}^{k}F(e_{j})\log\left(\frac{F(e_{j})}{F(e_{i}^{\prime})}\right) L3d=m1j=1mDkl(PQ)=mk1j=1mi=1kF(ej)log(F(ei)F(ej))

      where P P P contains the sampled Identity Encoding e e e of a 3D Gaussian, while the set Q = { e 1 ′ , e 2 ′ , . . . , e k ′ } Q=\{e_1^{\prime},e_2^{\prime},...,e_k^{\prime}\} Q={e1,e2,...,ek} consists of its k k k nearest neighbors in 3D spatial space.

3.3 Downstream: Local Gaussian Editing

在这里插入图片描述

Pay more attention to inpainting, first, delete the relevant 3D Gaussians and then add a small number of new Gaussians to be supervised by the 2D inpainting results by LAMA [41] during rendering.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/9319.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

10000 字详细讲解 Spring 中常用注解及其使用

如下图京东购物页面,当我们选择点击访问某一类商品时,就会向后端发起 HTTP 请求,当后端收到请求时,就会找到对应的代码逻辑对请求进行处理,那么,后端是如何来查找处理请求相对应的代码呢?答案就…

七、VUE过滤器

一、初识VUE 二、再识VUE-MVVM 三、VUE数据代理 四、VUE事件处理 五、VUE计算属性 六、Vue监视属性 七、VUE过滤器 七、VUE内置指令 九、VUE组件 过滤器: 对要显示的数据进行特定格式化后再显示(适用于一些简单逻辑的处理)。 语法&#xff…

计算机考研|25人太难了,408会炸,还是自命题会炸?

自命题已经不是炸不炸的问题了,是有没有学上的问题。 我记得去年九月一些学校宣布改考408的时候,整个群里都炸了,同学一片哀嚎。要知道九月的时候要重新准备408肯定是不可能了,一来408复习的基础阶段已经过去了,二来英…

人工智能哪些大学比较好

人工智能领域的大学有很多,以下是一些国际上被广泛认可的一流大学: 1. **斯坦福大学(Stanford University)** - 位于美国加州的斯坦福大学拥有顶尖的人工智能研究中心,并在机器学习、自然语言处理等领域处于领先地位。…

phpstudy(MySQL启动又立马停止)问题的解决办法

方法一:查看本地安装的MySQL有没有启动 1.鼠标右击开始按钮选择计算机管理 2.点击服务和应用程序 3.找到服务双击 4.找到MySQL服务 5.双击查看是否启动,如启动则停止他,然后确定,重新打开phpstudy,启动Mysql. 方法二&#xff…

Scala特殊符号含义

1. ::: (三个冒号)只用于连接两个List类型的集合 val a List(1,2) val b List(3,4) val c a ::: b2. :: (两个冒号)表示普通元素与List的连接操作 val a 1 val b List(3,4) val c 1 :: b::是右侧对象的方法,即它是对象b的…

利用webpack拆分css

利用webpack拆分css {ignore} 要拆分css,就必须把css当成像js那样的模块;要把css当成模块,就必须有一个构建工具(webpack),它具备合并代码的能力 而webpack本身只能读取css文件的内容、将其当作JS代码进行…

【Langchain实践】Few-shot examples for chat models 学习记录

构建向量存储 知识库examples是一个列表,列表元素是字典。与输入相关的文本放在from_text函数的第一个参数。embedding是嵌入模型,这部分如何使用本地模型,参考上一篇博客。metadatas是原始数据,也就是知识库。 to_vectorize […

C#:求解出n以内所有能被5整除的正整数的乘积

任务描述 本关任务:求解出n以内(包含n)所有能被5整除的正整数数的乘积s。 输入 从键盘输入一个正整数n,输入的n不超过100。 输出 对于每个样例n,输出n以内(包含n)所有能被5整除的正整数的乘积。 样例输入 19 ####样…

[鸟哥私房菜]4.首次登录与在线求助

第4章 首次登录与在线求助 4.1.3 X Window 与命令行模式的切换 通常我们称命令行界面为终端界面、Terminal 或 Console。Linux 默认的情况下会提供六个终端(Terminal)来让用户登录, 切换的方式为使用:[Ctrl] [Alt] [F1]~[F6] …

折腾记:C++用开源库Snap7通过S7协议连接西门子PLC

初级代码游戏的专栏介绍与文章目录-CSDN博客 我的github:codetoys,所有代码都将会位于ctfc库中。已经放入库中我会指出在库中的位置。 这些代码大部分以Linux为目标但部分代码是纯C的,可以在任何平台上使用。 不是教程,是避坑指…

2022年知识付费小程序案例分享资料,海量收录.完整版,线上销售沟通的技巧和方法,怎么进行沟通?

现在是互联网时代,很多购物平台和知识付费平台在网上全面开花,因此也诞生了销售员的工作也从线下转到了线上。线上销售毕竟不同于线下销售,想要将产品卖出去,沟通的技巧和方式也很重要。 一、 线上销售沟通前的准备 1、 心理上的准…

‘ChatGLMTokenizer‘ object has no attribute ‘sp_tokenizer‘

问题 ‘ChatGLMTokenizer’ object has no attribute ‘sp_tokenizer’ 解决方案 将self.sp_tokenizer SPTokenizer(vocab_file, num_image_tokensnum_image_tokens)移动到super().init()前面 self.sp_tokenizer SPTokenizer(vocab_file, num_image_tokensnum_image_token…

TypeScript学习日志-第十七天(泛型约束)

泛型约束 当我们使用泛型时非常方便,但是在使用的过程中也会遇到很多问题,如图: 这时候就会提示错误,因为返回的是相加的值,但是不是所有的类型都能相加的,例如来个undefined类型的 就不能进行相加了&…

回顾5款我非常喜欢的软件,希望大家也能喜欢

​ 我喜欢分享好软件,这就像与老友聊天一样让我感到快乐。在这个过程中,我可以回顾这些实用的小工具,也希望它们可以帮助到更多人。 1.备份工具——Cobian Backup ​ Cobian Backup是一款功能强大的备份软件,支持自动定时备份、增量备份、差异备份等多种备份方式。…

知识付费系统 设计,课程顾问怎么跟家长微信互动?这样去维护客户

当今天下,营销方式层出不穷,可是有些机构比较保守,始终坚持自己独有的一套模式,并且屡试不爽,今天,小编就带着大家一起来聊一聊课程顾问怎么跟家长微信互动。 维系客户客户有三个方面: 1:微信私…

7-66 按层遍历二叉树

用先序和中序序列构造一棵二叉树(树中结点个数不超过10个),通过用队记录结点访问次序的方法实现对二叉树进行按层遍历,即按层数由小到大、同层由左到右输出按层遍历序列。 输入格式: 第一行输入元素个数 第二行输入先序序列,以空格隔开 第三行输入中序序列,以空格隔开…

wePWNise:一款功能强大的红队Office宏VBA代码生成工具

关于wePWNise wePWNise是一款功能强大的Office宏VBA代码生成工具,该工具基于纯Python开发,可以帮助广大研究人员生成用于Office宏或模版的VBA代码,并以此来测试目标Office环境、应用程序控制和防护机制的安全性。 wePWNise的设计理念将自动化…

libcity 笔记:基本使用方法

介绍 — Bigscity-LibCity 文档 (bigscity-libcity-docs.readthedocs.io) 1 介绍 一个统一、全面、可扩展的代码库,为交通预测领域提供了一个可靠的实验工具和便捷的开发框架目前支持 交通状态预测 交通流量预测 交通速度预测 交通需求预测 起点-终点&#xff…

宽电压降压恒压DC-DC 电源管理芯片

产品描述 AP8851L 一款宽电压范围降压型 DC-DC 电源管理芯片,内部集成使能 开关控制、基准电源、误差放大器、过 热保护、限流保护、短路保护等功能, 非常适合在宽输入电压范围具有优良 的负载和线性调整度。 AP8851L 芯片包含每周期的峰值 限流、软启…