【读论文】Gaussian Grouping: Segment and Edit Anything in 3D Scenes

Gaussian Grouping: Segment and Edit Anything in 3D Scenes

文章目录

  • Gaussian Grouping: Segment and Edit Anything in 3D Scenes
    • 1. What
    • 2. Why
    • 3. How
      • 3.1 Anything Mask Input and Consistency
      • 3.2 3D Gaussian Rendering and Grouping
      • 3.3 Downstream: Local Gaussian Editing

1. What

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

The first 3D Gaussian-based approach to jointly reconstruct and segment anything in the open-world 3D scene.
Each Gaussian with a compact Identity Encoding, supervised by 2D masks by SAM along with introduced 3D spatial consistency regularization, can also be further used for editing.

  • Explanation of Open-world

    An open-world scenario refers to an uncertain, dynamic and complex environment that contains a variety of objects, scenes and tasks.

    Or “open-world scene understanding” refers to the ability of a model to generalize to scenes or environments that it has not been explicitly trained on. In this context, the term “open-world” implies that the model needs to be able to adapt to and understand a wide range of scenes, including those that may be very different from the scenes in its training data.

2. Why

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

  • Existing methods [8, 37] rely on manually-labeled datasets or require accurately scanned 3D point clouds [33, 42] as input.
  • Existing NeRFs-based methods [14, 17, 25, 39] are computation-hungry and hard to adjust for the downstream task because the learned neural networks, such as MLPs, cannot decompose each part or module in the 3D scene easily
  • As for Radiance-based Open World Scene Understanding: Unlike our approach, most of these methods are designed for in-domain scene modeling and cannot generalize to open-world scenarios.

3. How

Following this pipeline, we will introduce it in details.

在这里插入图片描述

3.1 Anything Mask Input and Consistency

Shown in Figure 2(a), a set of multi-view captures along with the automatically generated 2D segmentations by SAM, as well as the corresponding cameras calibrated via SfM are inputs.

Shown in Figure 2(b), to assign each 2D mask a unique ID in the 3D scene, a well-trained zero-shot tracker [7] was used to propagate and associate masks. Use colors to represent different segmentation labels, and the results are shown in Figure 2(b)

3.2 3D Gaussian Rendering and Grouping

Shown in Figure 2©, all of the core concepts of this paper were used.

  1. Identity Encoding

    A new parameter, i.e., Identity Encoding is introduced to each Gaussian with original S Θ i = { p i , s i , q i , α i , c i } S_{\Theta_{i}}=\{\mathbf{p}_{i},\mathbf{s}_{i},\mathbf{q}_{i},\alpha_{i},\mathbf{c}_{i}\} SΘi={pi,si,qi,αi,ci}. It is a compact vector of length 16 and similar to Spherical Harmonic (SH) coefficients in representing color, it is differentiable and learnable.

  2. Grouping via Rendering

    In the process of rendering labels, similar to α \alpha α-blending:

    E id = ∑ i ∈ N e i α i ′ ∏ j = 1 i − 1 ( 1 − α j ′ ) , E_{\text{id}}=\sum_{i\in\mathcal{N}}e_i\alpha_i'\prod_{j=1}^{i-1}(1-\alpha_j'), Eid=iNeiαij=1i1(1αj),

    but the denotations are different. e i e_i ei is the Identity Encoding of length 16 for each Gaussian and α i ′ \alpha_i' αi is a new weight, calculated by multiplying opacity α i \alpha_i αi and Σ 2 D \Sigma^{2\mathrm{D}} Σ2D, where Σ 2 D = J W Σ 3 D W T J T \Sigma^{2\mathrm{D}}=JW\Sigma^{3\mathrm{D}}W^TJ^T Σ2D=JWΣ3DWTJT according to [61].

  3. Grouping Loss

    • 2D Identity Loss: Given the rendered 2D features E i d E_{id} Eid before as input, first add a linear layer f f f to recover its feature dimension back to K+1 and then take s o f t m a x ( f ( E i d ) ) softmax (f(Eid)) softmax(f(Eid)) for identity classification. And cross-entropy loss was used.

    • 3D Regularization Loss:

      3D Regularization Loss leverages the 3D spatial consistency, which enforces the Identity Encodings of the top k-nearest 3D Gaussians to be close in their feature distance.

      L 3 d = 1 m ∑ j = 1 m D k l ( P ∥ Q ) = 1 m k ∑ j = 1 m ∑ i = 1 k F ( e j ) log ⁡ ( F ( e j ) F ( e i ′ ) ) \mathcal{L}_{\mathrm{3d}}=\frac{1}{m}\sum_{j=1}^{m}D_{\mathrm{kl}}(P\|Q)=\frac{1}{mk}\sum_{j=1}^{m}\sum_{i=1}^{k}F(e_{j})\log\left(\frac{F(e_{j})}{F(e_{i}^{\prime})}\right) L3d=m1j=1mDkl(PQ)=mk1j=1mi=1kF(ej)log(F(ei)F(ej))

      where P P P contains the sampled Identity Encoding e e e of a 3D Gaussian, while the set Q = { e 1 ′ , e 2 ′ , . . . , e k ′ } Q=\{e_1^{\prime},e_2^{\prime},...,e_k^{\prime}\} Q={e1,e2,...,ek} consists of its k k k nearest neighbors in 3D spatial space.

3.3 Downstream: Local Gaussian Editing

在这里插入图片描述

Pay more attention to inpainting, first, delete the relevant 3D Gaussians and then add a small number of new Gaussians to be supervised by the 2D inpainting results by LAMA [41] during rendering.

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/web/9319.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

10000 字详细讲解 Spring 中常用注解及其使用

如下图京东购物页面,当我们选择点击访问某一类商品时,就会向后端发起 HTTP 请求,当后端收到请求时,就会找到对应的代码逻辑对请求进行处理,那么,后端是如何来查找处理请求相对应的代码呢?答案就…

计算机考研|25人太难了,408会炸,还是自命题会炸?

自命题已经不是炸不炸的问题了,是有没有学上的问题。 我记得去年九月一些学校宣布改考408的时候,整个群里都炸了,同学一片哀嚎。要知道九月的时候要重新准备408肯定是不可能了,一来408复习的基础阶段已经过去了,二来英…

人工智能哪些大学比较好

人工智能领域的大学有很多,以下是一些国际上被广泛认可的一流大学: 1. **斯坦福大学(Stanford University)** - 位于美国加州的斯坦福大学拥有顶尖的人工智能研究中心,并在机器学习、自然语言处理等领域处于领先地位。…

phpstudy(MySQL启动又立马停止)问题的解决办法

方法一:查看本地安装的MySQL有没有启动 1.鼠标右击开始按钮选择计算机管理 2.点击服务和应用程序 3.找到服务双击 4.找到MySQL服务 5.双击查看是否启动,如启动则停止他,然后确定,重新打开phpstudy,启动Mysql. 方法二&#xff…

[鸟哥私房菜]4.首次登录与在线求助

第4章 首次登录与在线求助 4.1.3 X Window 与命令行模式的切换 通常我们称命令行界面为终端界面、Terminal 或 Console。Linux 默认的情况下会提供六个终端(Terminal)来让用户登录, 切换的方式为使用:[Ctrl] [Alt] [F1]~[F6] …

TypeScript学习日志-第十七天(泛型约束)

泛型约束 当我们使用泛型时非常方便,但是在使用的过程中也会遇到很多问题,如图: 这时候就会提示错误,因为返回的是相加的值,但是不是所有的类型都能相加的,例如来个undefined类型的 就不能进行相加了&…

回顾5款我非常喜欢的软件,希望大家也能喜欢

​ 我喜欢分享好软件,这就像与老友聊天一样让我感到快乐。在这个过程中,我可以回顾这些实用的小工具,也希望它们可以帮助到更多人。 1.备份工具——Cobian Backup ​ Cobian Backup是一款功能强大的备份软件,支持自动定时备份、增量备份、差异备份等多种备份方式。…

wePWNise:一款功能强大的红队Office宏VBA代码生成工具

关于wePWNise wePWNise是一款功能强大的Office宏VBA代码生成工具,该工具基于纯Python开发,可以帮助广大研究人员生成用于Office宏或模版的VBA代码,并以此来测试目标Office环境、应用程序控制和防护机制的安全性。 wePWNise的设计理念将自动化…

libcity 笔记:基本使用方法

介绍 — Bigscity-LibCity 文档 (bigscity-libcity-docs.readthedocs.io) 1 介绍 一个统一、全面、可扩展的代码库,为交通预测领域提供了一个可靠的实验工具和便捷的开发框架目前支持 交通状态预测 交通流量预测 交通速度预测 交通需求预测 起点-终点&#xff…

Baidu Comate智能编码助手:大学生的代码编写助手

Baidu Comate智能编码助手:大学生的代码编写助手 前言一、关于Baidu Comate智能编码助手1.1 Baidu Comate智能编码助手简介1.2 产品功能 二、安装使用(本文以pycharm为例)三、我的百度Comate之旅3.1智能推荐3.1.1 单行推荐3.1.2 多行推荐 3.2…

pg数据库学习知识要点分析-1

知识要点1 对象标识OID 在PostgreSQL内部,所有的数据库对象都通过相应的对象标识符(object identifier,oid)进行管理,这些标识符是无符号的4字节整型。数据库对象与相应oid 之间的关系存储在对应的系统目录中&#xf…

AI论文速读 |2024[IJCAI]TrajCL: 稳健轨迹表示:通过因果学习隔离环境混杂因素

题目: Towards Robust Trajectory Representations: Isolating Environmental Confounders with Causal Learning 作者:Kang Luo, Yuanshao Zhu, Wei Chen, Kun Wang(王琨), Zhengyang Zhou(周正阳), Sijie Ruan(阮思捷), Yuxuan Liang(梁宇轩) 机构&a…

SAP-PP-MM特殊库存的生产发料

如果有个物料是在特殊库存E,那么往生产订单发料是如何确定哪一个组件消耗这个特殊库存呢? 在生产订单中有哪些标记确定特殊库存?确定销售订单和行项目? 通过上图可以看到特殊库存标记1,也就是单独客户库存。 其他的特…

洗地机什么品牌好?洗地机怎么选?618洗地机选购指南

随着科技的飞速发展,洗地机以其高效的清洁能力、稳定的性能和用户友好的设计而闻名,不仅可以高效吸尘、拖地,还不用手动洗滚布,已经逐渐成为现代家庭不可或缺的清洁助手。然而,在众多品牌和型号中,如何选择…

C++语言·string类

1. 为什么有string类 C语言中,字符串是以\0结尾的一些字符的集合,为了操作方便,C标准库中提供了一些str系列的库函数(strcpy,strcat),但是这些库函数与字符串是分离开的,不太符合OOP(Object Oriented Programming面向对…

【深耕 Python】Quantum Computing 量子计算机(3)重要数学公式一览

写在前面 往期量子计算机博客: 【深耕 Python】Quantum Computing 量子计算机(1)图像绘制基础 【深耕 Python】Quantum Computing 量子计算机(2)绘制电子运动平面波 正文 偏微分: 交换关系&#xff…

GtkButton事件处理、事件的捕获、鼠标事件等

事件 事件处理 GTK 所提供的工具库与其应用程序都是基于事件触发机制来管理, 所有的应用程序都是基于事件驱动。 如果没有事件发生, 应用程序将处于等待状态, 不会执行任何操作, 一旦事件发生, 将根据不同的事件做出…

Offer必备算法37_记忆化搜索_五道力扣题详解(由易到难)

目录 记忆化搜索概念和使用场景 ①力扣509. 斐波那契数 解析代码1_循环 解析代码2_暴搜递归 解析代码3_记忆化搜索 解析代码4_动态规划 ②力扣62. 不同路径 解析代码1_暴搜递归(超时) 解析代码2_记忆化搜索 解析代码3_动态规划 ③力扣300. 最…

Java12基础(Package包 作用域 String字符串)

目录 一. Package包 import关键字 命名规范 二. 作用域 三. String字符串(进阶) 创建方式: 内存情况: 1. 字符串的搜索 2. trim()方法 3. 替换字符串 4. 分割字符串 5. 拼接字符串 6. 格式化字符串 7. 类型转换 8. 转换为char[ ]字符数组 9. 字符编码 10. Str…

Navicat导入sql报错[Err] 1046 - No database selected

Navicat导入sql报错[Err] 1046 - No database selected ​ 今天系统重装了,就很完蛋。所有东西都重新下载安装。向Navicat导入sql的时候导入失败: 报错[Err] 1046 - No database selected。我很疑惑地又导了几次。当然又全都失败. 错误造成原因&#x…