【读论文】【精读】3D Gaussian Splatting for Real-Time Radiance Field Rendering

文章目录

    • 1. What:
    • 2. Why:
    • 3. How:
      • 3.1 Real-time rendering
      • 3.2 Adaptive Control of Gaussians
      • 3.3 Differentiable 3D Gaussian splatting
    • 4. Self-thoughts

1. What:

What kind of thing is this article going to do (from the abstract and conclusion, try to summarize it in one sentence)

To simultaneously satisfy the requirements of efficiency and quality, this article begins by establishing a foundation with sparse points using 3D Gaussian distributions to preserve desirable space. It then progresses to optimizing anisotropic covariance to achieve an accurate representation. Lastly, it introduces a cutting-edge, visibility-aware rendering algorithm designed for rapid processing, thereby achieving state-of-the-art results in the field.

2. Why:

Under what conditions or needs this research plan was proposed (Intro), what problems/deficiencies should be solved at the core, what others have done, and what are the innovation points? (From Introduction and related work)

Maybe contain Background, Question, Others, Innovation:

Three aspects of related work can explain this question.

  1. Traditional reconstructions such as SfM and MVS need to re-project and
    blend the input images into the novel view camera, and use the
    geometry to guide this re-projection(From 2D to 3D).

    Sad: Cannot completely recover from unreconstructed regions, or from “over-reconstruction”, when MVS generates inexistent geometry.

  2. Neural Rendering and Radiance Fields

    Neural rendering represents a broader category of techniques that leverage deep learning for image synthesis, while radiance field is a specific technique within neural rendering focused on the scene representation of light and color in 3D spaces.

  • Deep Learning was mainly used on MVS-based geometry before, which is also its major drawback.

  • Nerf is along the way of volumetric representation, which introduced positional encoding and importance sampling.

  • Faster training methods focus on the use of spatial data structures to store (neural) features that are subsequently interpolated during volumetric ray-marching, different encodings, and MLP capacity.

  • Today, notable works include InstantNGP and Plenoxels both rely on Spherical Harmonics.

    Understand Spherical Harmonics as a set of basic functions to fit a geometry in a 3D spherical coordinate system.

    球谐函数介绍(Spherical Harmonics) - 知乎 (zhihu.com)

  1. Point-Based Rendering and Radiance Fields
  • The methods in human performance capture inspired the choice of 3D Gaussians as scene representation.
  • Point-based and spherical rendering is achieved before.

3. How:

请添加图片描述

Through the Gradient Flow in this paper’s pipeline, we are trying to connect Part4, 5, and 6 in this paper.

Firstly, start from the loss function, which is combined by a L 1 {\mathcal L}_{1} L1 loss and a S S I M SSIM SSIM index, just as shown below:

L = ( 1 − λ ) L 1 + λ L D − S S I M . (1) {\mathcal L}=(1-\lambda){\mathcal L}_{1}+\lambda{\mathcal L}_{\mathrm{D-SSIM}}.\tag{1} L=(1λ)L1+λLDSSIM.(1)

It found a relation between the actual image and the rendering image. So to finish the optimization, we need to dive into the process of rendering. From the chapter on related work, we know Point-based α \alpha α-blending and NeRF-style volumetric rendering share essentially the same image formation model. That is

C = ∑ i = 1 N T i ( 1 − exp ⁡ ( − σ i δ i ) ) c i w i t h T i = exp ⁡ ( − ∑ j = 1 i − 1 σ j δ j ) . (2) C=\sum_{i=1}^{N}T_{i}(1-\exp(-\sigma_{i}\delta_{i}))c_{i}\quad\mathrm{with}\quad T_{i}=\exp\left(-\sum_{j=1}^{i-1}\sigma_{j}\delta_{j}\right).\tag{2} C=i=1NTi(1exp(σiδi))ciwithTi=exp(j=1i1σjδj).(2)

And this paper actually uses a typical neural point-based approach just like (2), which can be represented as:

C = ∑ i ∈ N c i α i ∏ j = 1 i − 1 ( 1 − α j ) (3) C=\sum_{i\in N}c_{i}\alpha_{i}\prod_{j=1}^{i-1}(1-\alpha_{j}) \tag{3} C=iNciαij=1i1(1αj)(3)

From this formulation, we can know what the representation of volume should contain the information of color c c c and transparency α \alpha α. These are attached to the gaussian, where Spherical Harmonics was used to represent color, just like Plenoxels. The other attributes used are the position and covariance matrix. So, now we have introduced the four attributes to represent the scene, that is positions 𝑝, 𝛼, covariance Σ, and SH coefficients representing color 𝑐 of each Gaussian.
After knowing the basic elements we need to use, now let’s work backward, starting with rendering, which was addressed in the author’s previous paper.

3.1 Real-time rendering

This method is independent of the propagation of gradients but is critical for real-time performance, which was published in the author’s paper before.
在这里插入图片描述

In the previous game, someone had tried to model the world in ellipsoid and render it. This is the same as the render process of Gaussian splatting. But the latter uses lots of techniques in the utilization of threads and GPU.

  • Firstly, it starts by splitting the screen into 16×16 tiles and then proceeds to cull 3D Gaussians against the view frustum and each tile, only keeping Gaussians with a 99% confidence interval intersecting the view frustum.
  • Then instantiate each Gaussian according to the number of tiles they overlap and assign each instance a key that combines view space depth and tile ID.
  • Then sort Gaussians based on these keys using a single fast GPU Radix sort.
  • Finally, launching one thread block for each tile, for a given pixel, accumulate color and transparency values by traversing the lists front-to-back, until α \alpha α goes to one.

3.2 Adaptive Control of Gaussians

In the process of fitting gaussian to the scene, we should utilize the number and volume of gaussian to strengthen the representation of the scene. It contained two methods named clone and split, as shown below.

在这里插入图片描述

These were judged by the view-space positional gradients. Both under-reconstruction and over-construction have large view-space positional gradients. We will clone or split the gaussian according to different conditions.

3.3 Differentiable 3D Gaussian splatting

We have known the process of rendering and control of gaussian. Finally, we will talk about how to backward the gradients to where we can optimize. This is mainly about the processing of Gaussian function.

The basic simplified formulation of 3D Gaussain can be represented as:

G ( x ) = e − 1 2 ( x ) T Σ − 1 ( x ) . (4) G(x)=e^{-\frac{1}{2}(x)^{T}\Sigma^{-1}(x)}.\tag{4} G(x)=e21(x)TΣ1(x).(4)

We will use α \alpha α-blending to combine it to generate the rendering picture, so that we can calculate the loss function and finish the optimization. So now we need to know how to optimize and calculate the gradients of Gaussian.

When rasterizing, the three-dimensional scene needs to be transformed into a two-dimensional space. The author hopes that the 3D Gaussian will maintain its distribution during the transformation (otherwise, if the raster finish has nothing to do with Gaussian, all the efforts will be in vain). So we should choose a method to transfer the covariance matrix to camera coordinate without change the affine relation. That is

Σ ′ = J W Σ W T J T , (5) \Sigma'=JW\Sigma W^{T}J^{T},\tag{5} Σ=JWΣWTJT,(5)

where J J J is the Jacobian of the affine approximation of the projective transformation.

Another problem is that the covariance matrix must be semi-definite. So we use a scaling matrix 𝑆 and rotation matrix 𝑅 to assure it. That is

Σ = R S S T R T (6) \Sigma=RSS^{T}R^{T}\tag{6} Σ=RSSTRT(6)

And then we can use a 3D vector 𝑠 for scaling and a quaternion 𝑞 to represent rotation. The gradients will backward to them. These are the whole process of optimization.

4. Self-thoughts

  1. Summary of different representation
  • Explicit representation: Mesh, Point Cloud
  • Implicit representation
    • Volumetric representation: Nerf

      The density value returned by the sample points reflects whether there is geometric occupancy here.

    • Surface representation: SDF(Signed Distance Function)

      Outputs the distance to the nearest surface in the space from this point, where a positive value indicates outside the surface, and a negative value indicates inside the surface.

Refer:

[1]: 3D Gaussian Splatting:用于实时的辐射场渲染-CSDN博客

[2]: 【三维重建】3D Gaussian Splatting:实时的神经场渲染-CSDN博客

[3]: 3D Gaussian Splatting中的数学推导 - 知乎 (zhihu.com)

[4]: [NeRF坑浮沉记]3D Gaussian Splatting入门:如何表达几何 - 知乎 (zhihu.com)

本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若转载,请注明出处:http://www.mzph.cn/news/740549.shtml

如若内容造成侵权/违法违规/事实不符,请联系多彩编程网进行投诉反馈email:809451989@qq.com,一经查实,立即删除!

相关文章

图片和PDF 加水印去水印

图片和PDF 加水印去水印 前要1. 图片加水印1.1 方法11.2 方法2 2. 去水印3. pdf加水印4. pdf 去水印 前要 网上查了很多资料, 汇总了几个不错的代码, 顺便做个笔记 1. 图片加水印 1.1 方法1 简单方便, 后也好处理 # -*- coding:utf-8 -*- import os from PIL import Imag…

24-Java策略模式 ( Strategy Pattern )

Java策略模式 摘要实现范例 策略模式的重心不是如何实现算法,而是如何组织、调用这些算法,从而让程序结构更加灵活,具有更好的维护性和扩展性。 策略模式属于行为型模式 摘要 1. 意图 针对一组算法,将每一个算法封装到具有共…

微服务day01 -- SpringCloud01 -- (Eureka , Ribbon , Nacos)

介绍微服务 1.认识微服务(p1-p5) 随着互联网行业的发展,对服务的要求也越来越高,服务架构也从单体架构逐渐演变为现在流行的微服务架构。这些架构之间有怎样的差别呢? 1.0.学习目标 了解微服务架构的优缺点 1.1.单体架构 单体架构&#…

vscode使用remote-ssh免密连接服务器

你还在使用XShell、Hyper、FinalShell等等SSH客户端软件吗,作为前端的我们,一直在用的功能强大的开发工具vscode,早已实现SSH连接功能(借助官方提供的插件)。而且更加好用,可以直接打开服务器上的文件&…

蓝桥杯练习系统(算法训练)ALGO-975 P0802字符串表达式

资源限制 内存限制:256.0MB C/C时间限制:1.0s Java时间限制:3.0s Python时间限制:5.0s 编写一个字符串表达式求解函数int expression(char* s); 输入一个字符串表达式,返回它的结果。表达式长度不会超过100。表…

Qt 使用RAW INPUT获取HID触摸屏,笔设备,鼠标的原始数据,最低受支持的客户端:Windows XP [仅限桌面应用]

在开发绘图应用程序时,经常会需要读取笔设备的数据,通过对笔数据的解析,来判断笔的坐标,粗细。如果仅仅只是读取鼠标的坐标,就需要人为在应用程序端去修改笔的粗细,并且使用体验不好,如果可以实…

解析Perl爬虫代码:使用WWW__Mechanize__PhantomJS库爬取stackoverflow.com的详细步骤

在这篇文章中,我们将探讨如何使用Perl语言和WWW::Mechanize::PhantomJS库来爬取网站数据。我们的目标是爬取stackoverflow.com的内容,同时使用爬虫代理来和多线程技术以提高爬取效率,并将数据存储到本地。 Perl爬虫代码解析 首先&#xff0…

【Docker】 ubuntu18.04编译时内存不足需要使用临时交换分区解决“c++: internal compiler error“错误

【Docker】 ubuntu18.04编译时内存不足需要使用临时交换分区解决"c: internal compiler error"错误 问题描述 安装独立功能包时编译不成功,出现 “c: internal compiler error: Killed(program cciplus)” 错误。 解决方案 出现这个问题的原因大概率是…

模拟电子技术实验(二)

单选题 1. 本实验的实验目的中,输出电阻测量是第几个目的? A. 1个。 B. 2个。 C. 3个。 D. 4个。 答案:C 评语:10分 单选题 2.本实验电路有一个元件参数有问题,需要修改? A. …

[数据集][目标检测]变电站缺陷检测数据集VOC+YOLO格式8307张17类别

数据集格式:Pascal VOC格式YOLO格式(不包含分割路径的txt文件,仅仅包含jpg图片以及对应的VOC格式xml文件和yolo格式txt文件) 图片数量(jpg文件个数):8307 标注数量(xml文件个数):8307 标注数量(txt文件个数):8307 标注…

如何理解Redis中的缓存雪崩,缓存穿透,缓存击穿?

目录 一、缓存雪崩 1.1 解决缓存雪崩问题 二、缓存穿透 2.1 解决缓存穿透 三、缓存击穿 3.1 解决缓存击穿 3.2 如何保证数据一致性问题? 一、缓存雪崩 缓存雪崩是指短时间内,有大量缓存同时过期,导致大量的请求直接查询数据库&#xf…

三维卡通数字人解决方案,支持unity和Maya

企业对于品牌形象和宣传手段的创新需求日益迫切,美摄科技凭借其在三维卡通数字人领域的深厚积累,推出了面向企业的三维卡通数字人解决方案,帮助企业轻松打造独具特色的虚拟形象,提升品牌影响力和市场竞争力。 美摄科技的三维卡通…

Postman请求API接口测试步骤和说明

Postman请求API接口测试步骤 本文测试的接口是国内数智客(www.shuzike.com)的API接口手机三要素验证,验证个人的姓名,身份证号码,手机号码是否一致。 1、设置接口的Headers参数。 Content-Type:applicati…

Kafka消费者重平衡

「(重平衡)Rebalance本质上是一种协议,规定了一个Consumer Group下的所有Consumer如何达成一致,来分配订阅Topic的每个分区」。 比如某个Group下有20个Consumer实例,它订阅了一个具有100个分区的Topic。 正常情况下&…

【2024.03.12】定时执行专家 V7.2 发布 - TimingExecutor V7.2 Release

目录 ▉ 软件介绍 ▉ 新版本 V7.2 下载地址 ▉ V7.2 新功能 ▼2024-03-12 V7.2 - 更新日志 ▉ V7.x 新UI设计 ▉ 软件介绍 《定时执行专家》是一款制作精良、功能强大、毫秒精度、专业级的定时任务执行软件。软件具有 25 种【任务类型】、12 种【触发器】触发方式&#x…

【C++算法模板】图的存储-邻接矩阵

文章目录 邻接矩阵洛谷3643 图的存储 邻接矩阵 邻接矩阵相比于上一篇博客邻接表的讲解要简单得多 数据结构,如果将二维数组 g g g 定义为全局变量,那默认初始化应该为 0 0 0 ,如果题目中存在自环,可以做特判, m e …

因为manifest.json文件引起的 android-chrome-192x192.png 404 (Not Found)

H5项目打包之后,总是有这个报错,有时候还有别的icon也找不见 一通调查之后,发现是因为引入了一个vue插件 这个插件引入之后,webpack打包的时候就会自动在dist文件夹中产生一个manifest.json文件这个文件里面主要就是一些icon地址的…

制作图片马:二次渲染(upload-labs第17关)

代码分析 $im imagecreatefromjpeg($target_path);在本关的代码中这个imagecreatefromjpeg();函数起到了将上传的图片打乱并重新组合。这就意味着在制作图片马的时候要将木马插入到图片没有被改变的部分。 gif gif图的特点是无损,我们可以对比上传前后图片的内容…

【问题解决】VMWare虚拟机主IP地址:网络信息不可用

在今天想用man命令查看相关函数的帮助时,突然发现自己的XShell连接不上虚拟机,通过在终端使用ifconfig命令也只查看得到本地回环,虽然最近常遇到这个问题,但一般通过重启虚拟机就能得以解决。 不过这次重启也不奏效了&#xff0c…

Java输出流之BufferWriter类

咦咦咦,各位小可爱,我是你们的好伙伴——bug菌,今天又来给大家普及Java SE相关知识点了,别躲起来啊,听我讲干货还不快点赞,赞多了我就有动力讲得更嗨啦!所以呀,养成先点赞后阅读的好…