FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
FreGS:具有渐进频率正则化的3D高斯溅射
张家慧 1 詹方能 2 许慕玉 1 卢世坚 1 邢志伟 3, 4
1Nanyang Technological University 2Max Planck Institute for Informatics
1 南洋理工大学 2 马普信息研究所
3Carnegie Mellon University 4MBZUAI
3 卡内基梅隆大学 4 MBZUAI
Abstract 摘要 [2403.06908] FreGS: 3D Gaussian Splatting with Progressive Frequency Regularization
3D Gaussian splatting has achieved very impressive performance in real-time novel view synthesis. However, it often suffers from over-reconstruction during Gaussian densification where high-variance image regions are covered by a few large Gaussians only, leading to blur and artifacts in the rendered images. We design a progressive frequency regularization (FreGS) technique to tackle the over-reconstruction issue within the frequency space. Specifically, FreGS performs coarse-to-fine Gaussian densification by exploiting low-to-high frequency components that can be easily extracted with low-pass and high-pass filters in the Fourier space. By minimizing the discrepancy between the frequency spectrum of the rendered image and the corresponding ground truth, it achieves high-quality Gaussian densification and alleviates the over-reconstruction of Gaussian splatting effectively. Experiments over multiple widely adopted benchmarks (e.g., Mip-NeRF360, Tanks-and-Temples and Deep Blending) show that FreGS achieves superior novel view synthesis and outperforms the state-of-the-art consistently.
三维高斯溅射在实时新颖视图合成中取得了令人瞩目的性能。然而,它经常遭受过度重建期间的高斯致密化,其中高方差图像区域仅由几个大的高斯覆盖,导致在渲染图像中的模糊和伪影。我们设计了一种渐进频率正则化(FreGS)技术来解决频率空间内的过度重建问题。具体来说,FreGS通过利用低到高的频率分量来执行从粗到细的高斯致密化,这些频率分量可以很容易地在傅立叶空间中用低通和高通滤波器提取。通过最小化渲染图像的频谱和相应的地面真实值之间的差异,它实现了高质量的高斯致密化,并有效地消除了高斯飞溅的过度重建。在多个广泛采用的基准上进行实验(例如,,Mip-NeRF 360,Tanks-and-Temples and Deep Blending)表明,FreGS实现了上级的新颖视图合成,并且始终优于最先进的视图合成。
Figure 1: The proposed FreGS mitigates the over-reconstruction of Gaussian densification and renders images with much less blur and artifact as compared with the 3D Gaussian splatting (3D-GS). For the two sample images from Mip-NeRF360 [2], (a) and (b) show the Rendered Image and the Gaussian Visualization of the highlighted regions, as well as the Spectra of over-reconstructed areas in the rendered image by 3D-GS and corresponding areas in FreGS. The Gaussian Visualization shows how the learnt rasterized 3D Gaussians compose images (all Gaussians are rasterized with full opacity). The Spectra are generated via image Fourier transformation, where the colour changes from blue to green as the spectrum amplitude changes from small to large.
图一:所提出的FreGS减轻了过度重建的高斯致密化和渲染图像与更少的模糊和伪影相比,3D高斯飞溅(3D-GS)。对于来自Mip-NeRF 360 [2]的两个样本图像,(a)和(B)显示了渲染图像和高亮区域的高斯可视化,以及3D-GS渲染图像中过度重建区域的光谱和FreGS中的相应区域。高斯可视化显示了学习的光栅化3D高斯如何组成图像(所有高斯都是完全不透明的光栅化)。光谱是通过图像傅里叶变换生成的,其中随着光谱幅度从小到大的变化,颜色从蓝色变为绿色。
*Shijian Lu is the corresponding author.
卢世坚为通讯作者。
1Introduction 1介绍
Novel View Synthesis (NVS) has been a pivotal task in the realm of 3D computer vision which holds immense significance in various applications such as virtual reality, image editing, etc. It aims for generating images from arbitrary viewpoints of a scene, often necessitating precise modelling of the scene from multiple scene images. Leveraging implicit scene representation and differentiable volume rendering, NeRF [21] and its extension [1, 2] have recently achieved remarkable progress in novel view synthesis. However, NeRF is inherently plagued by long training and rendering time. Though several NeRF variants [22, 5, 9, 26, 7] speed up the training and rendering greatly, they often sacrifice the quality of rendered images notably, especially while handling high-resolution rendering.
新视图合成(NVS)一直是三维计算机视觉领域的一项关键任务,在虚拟现实、图像编辑等各种应用中具有重要意义。它旨在从场景的任意视点生成图像,通常需要从多个场景图像中精确建模场景。利用隐式场景表示和可微分体绘制,NeRF [21]及其扩展[1,2]最近在新视图合成方面取得了显着进展。然而,NeRF固有地受到长训练和渲染时间的困扰。虽然几个NeRF变体[22,5,9,26,7]大大加快了训练和渲染速度,但它们通常会显着牺牲渲染图像的质量,特别是在处理高分辨率渲染时。
As a compelling alternative to NeRF, 3D Gaussian splatting (3D-GS) [16] has attracted increasing attention by offering superb training and inference speed while maintaining competitive rendering quality. By introducing anisotropic 3D Gaussians together with adaptive density control of Gaussian properties, 3D-GS can learn superb and explicit scene representations for novel view synthesis. It replaces the cumbersome volume rendering in NeRF by efficient splatting, which directly projects 3D Gaussians onto a 2D plane and ensures real-time rendering. However, 3D-GS often suffers from over-reconstruction [16] during Gaussian densification, where high-variance image regions are covered by a few large Gaussians only which leads to clear deficiency in the learnt representations. The over-reconstruction can be clearly observed with blur and artifacts in the rendered 2D images as well as the discrepancy of frequency spectrum of the render images (by 3D-GS) and the corresponding ground truth as illustrated in Fig. 1.
作为NeRF的一个引人注目的替代方案,3D高斯溅射(3D-GS)[16]通过提供出色的训练和推理速度,同时保持有竞争力的渲染质量,吸引了越来越多的关注。通过引入各向异性的3D高斯函数以及高斯属性的自适应密度控制,3D-GS可以学习出色和明确的场景表示,用于新颖的视图合成。它取代了NeRF中繁琐的体绘制,通过高效的splatting,直接将3D高斯投影到2D平面上,并确保实时渲染。然而,3D-GS在高斯致密化过程中经常遭受过度重建[16],其中高方差图像区域仅被一些大的高斯覆盖,这导致学习表示中的明显不足。 可以清楚地观察到过度重建,其中渲染的2D图像中有模糊和伪影,以及渲染图像(通过3D-GS)的频谱与对应的地面实况的差异,如图1所示。
Based on the observation that the over-reconstruction manifests clearly by the discrepancy in frequency spectra, we design FreGS, an innovative 3D Gaussian splatting technique that addresses the over-reconstruction by regularizing the frequency signals in the Fourier space. FreGS introduces a novel frequency annealing technique to achieve progressive frequency regularization. Specifically, FreGS takes a coarse-to-fine Gaussian densification process by annealing the regularization progressively from low-frequency signals to high-frequency signals, based on the rationale that low-frequency and high-frequency signals usually encode large-scale (e.g., global patterns and structures which are easier to model) and small-scale features (e.g., local details which are harder to model), respectively. The progressive regularization strives to minimize the discrepancy of frequency spectra of the rendered image and the corresponding ground truth, which provides faithful guidance in the frequency space and complements the pixel-level L1 loss in the spatial space effectively. Extensive experiments show that FreGS mitigates the over-reconstruction and greatly improves Gaussian densification and novel view synthesis as illustrated in Fig. 1.
基于观察到的过度重建清楚地表现为频谱的差异,我们设计了FreGS,一种创新的3D高斯溅射技术,通过在傅立叶空间中正则化频率信号来解决过度重建问题。FreGS引入了一种新的频率退火技术来实现渐进频率正则化。具体地,FreGS基于低频和高频信号通常编码大尺度(例如,更容易建模的全局图案和结构)和小尺度特征(例如,更难建模的局部细节)。 渐进正则化努力最小化渲染图像的频谱和相应的地面真值的差异,这在频率空间中提供了忠实的指导,并有效地补充了空间空间中的像素级L1损失。大量的实验表明,FreGS减轻了过度重建,大大提高了高斯致密化和新颖的视图合成,如图1所示。
The contributions of this work can be summarized in three aspects. First, we propose FreGS, an innovative 3D Gaussian splatting framework that addresses the over-reconstruction issue via frequency regularization in the frequency space. To the best of our knowledge, this is the first effort that tackles the over-reconstruction issue of 3D Gaussian splatting from a spectral perspective. Second, we design a frequency annealing technique for progressive frequency regularization. The annealing performs regularization from low-to-high frequency signals progressively, achieving faithful coarse-to-fine Gaussian densification. Third, experiments over multiple benchmarks show that FreGS achieves superior novel view synthesis and outperforms the 3D-GS consistently.
这项工作的贡献可以概括为三个方面。首先,我们提出了FreGS,一个创新的3D高斯溅射框架,通过频率空间中的频率正则化来解决过度重建问题。据我们所知,这是第一次从光谱角度解决3D高斯溅射的过度重建问题。其次,我们设计了一个渐进频率正则化的频率退火技术。退火从低到高的频率信号逐步执行正则化,实现忠实的粗到细高斯致密化。第三,多个基准测试的实验表明,FreGS实现了上级新颖的视图合成,并始终优于3D-GS。
2Related Work 2相关工作
2.1Neural Rendering for Novel View Synthesis
2.1用于新视图合成的神经绘制
Novel view synthesis aims to generate new, unseen views of a scene or object from a set of existing images or viewpoints. When deep learning became popular in early days, CNNs were explored for novel view synthesis [12, 27, 32], e.g., they were adopted to predict blending weights for image rendering in [12]. Later, researchers exploit CNNs for volumetric ray-marching [14, 29]. For example, Sitzmann et al. propose Deepvoxels [29] which builds a persistent 3D volumetric scene representation and then achieves rendering via volumetric ray-marching.
新颖的视图合成旨在从一组现有的图像或视点生成场景或对象的新的、不可见的视图。当深度学习在早期变得流行时,CNN被探索用于新颖的视图合成[12,27,32],例如,在[12]中,它们被用于预测图像渲染的混合权重。后来,研究人员利用CNN进行体积射线行进[14,29]。例如,Sitzmann等人提出了Deepvoxels [29],它构建了一个持久的3D体积场景表示,然后通过体积光线行进实现渲染。
Recently, neural radiance field (NeRF) [21] has been widely explored to achieve novel view synthesis via implicit scene representation and differentiable volume rendering. It exploits MLPs to model 3D scene representations from multi-view 2D images and can generate novel views with superb multi-view consistency. A number of NeRF variants have also been designed for handling various challenging conditions, such as dynamic scenes [24, 6, 8, 10, 33], free camera pose [20, 19, 38, 3, 39] and few shot [4, 23, 15, 36], antialiasing [2]. However, novel view synthesis with NeRF often comes at the expense of extremely long training and rendering times. Several studies [22, 9, 26, 5, 7, 13] attempt to reduce the training and rendering times. For example, KiloNeRF [26] speeds up NeRF rendering by utilizing thousands of tiny MLPs instead of one single large MLP. Chen et al. [5] leverage 4D tensor to represent full volume field and factorize the tensor into several compact low-rank tensor components for efficient radiance field reconstruction. Muller et al. propose InstantNGP [22] which introduces multi-resolution hash tables to achieve fast training and real-time rendering. However, most aforementioned work tends to sacrifice the quality of synthesized images especially while handling high-resolution rendering.
最近,神经辐射场(NeRF)[21]已被广泛探索,以通过隐式场景表示和可微分体绘制实现新颖的视图合成。它利用MLP从多视图2D图像建模3D场景表示,并可以生成具有极好的多视图一致性的新视图。许多NeRF变体也被设计用于处理各种具有挑战性的条件,例如动态场景[24,6,8,10,33],自由相机姿势[20,19,38,3,39]和少数镜头[4,23,15,36],抗锯齿[2]。然而,使用NeRF进行新颖的视图合成通常以极长的训练和渲染时间为代价。一些研究[22,9,26,5,7,13]试图减少训练和渲染时间。例如,KiloNeRF [26]通过使用数千个微小的MLP而不是一个大型MLP来加速NeRF渲染。Chen等人 [5]利用4D张量来表示全体积场,并将张量分解为几个紧凑的低秩张量分量,以实现高效的辐射场重建。Muller等人提出了InstantNGP [22],它引入了多分辨率哈希表来实现快速训练和实时渲染。然而,大多数上述工作往往牺牲合成图像的质量,特别是在处理高分辨率渲染。
As a compelling alternative to NeRF, 3D Gaussian splatting [16] introduces anisotropic 3D Gaussians and efficient differentiable splatting which enables high-quality explicit scene representation while maintaining efficient training and real-time rendering. However, the over-reconstruction of 3D Gaussians during Gaussian densification often introduces blur and artifacts in the rendered images. Our FreGS introduces progressive frequency regularization with frequency annealing to mitigate the over-reconstruction issue, enabling superior Gaussian densification and high-quality novel view synthesis.
作为NeRF的一个令人信服的替代方案,3D高斯溅射[16]引入了各向异性3D高斯和有效的可微溅射,从而实现了高质量的显式场景表示,同时保持了有效的训练和实时渲染。然而,在高斯致密化过程中,3D高斯的过度重建经常在渲染图像中引入模糊和伪影。我们的FreGS引入了渐进频率正则化和频率退火,以减轻过度重建问题,从而实现上级高斯致密化和高质量的新颖视图合成。
Figure 2: Overview of the proposed FreGS. 3D Gaussians are initialized by structure-from-motion. After splatting 3D Gaussians, we can obtain 2D Gaussians and then leverage standard �-blending for rendering. Frequency spectra �^ and � are generated by applying Fourier transform to rendered image �^ and ground truth �, respectively. Frequency regularization is achieved by regularizing discrepancies of amplitude |�(�,�)| and phase∠�(�,�) in Fourier space. A novel frequency annealing technique is designed to achieve progressive frequency regularization. With low-pass filter �� and dynamic high-pass filter �ℎ, low-to-high frequency components are progressively leveraged to perform coarse-to-fine Gaussian densification. Note, the progressive frequency regularization is complementary to the pixel-wise loss between �^ and �. The red dashed line highlights the regularization process for Gaussian densification.
图2:拟议的FreGS概述。3D高斯模型由运动恢复结构(structure-from-motion)初始化。在splatter 3D Gaussians之后,我们可以获得2D Gaussians,然后利用标准的 � 混合进行渲染。频谱 �^ 和频谱 � 分别通过对渲染图像 �^ 和地面实况 � 应用傅立叶变换来生成。频率正则化通过正则化傅立叶空间中的幅度 |�(�,�)| 和相位 ∠�(�,�) 的差异来实现。设计了一种新的频率退火技术来实现渐进频率正则化。利用低通滤波器 �� 和动态高通滤波器 �ℎ ,逐渐利用低到高频分量来执行从粗到细的高斯致密化。注意,渐进频率正则化与 �^ 和 � 之间的逐像素损失互补。红色虚线突出显示了高斯致密化的正则化过程。
2.2Frequency in Neural Rendering
2.2神经渲染中的频率
NeRF has been widely explored in the frequency space. For example, [21] exploits sinusoidal functions of varying frequencies to encode inputs, overcoming the constraint of neural networks which often struggle to learn high-frequency information from low-dimensional inputs [30, 31, 37]. Several NeRF variants [19, 24, 34, 36, 35] also demonstrate the importance of learning in the frequency space under various challenging scenarios. For example, BARF [19] gradually increases frequency for learning NeRF without camera poses. WaveNeRF [35] introduces wavelet frequency decomposition into multi-view stereo for achieving generalizable NeRF. Differently, FreGS boosts 3D Gaussian splatting in the frequency space, demonstrating that progressive frequency regularization with frequency annealing can lead more effective Gaussian densification and advanced novel view synthesis.
NeRF在频率空间中得到了广泛的研究。例如,[21]利用不同频率的正弦函数对输入进行编码,克服了神经网络的约束,神经网络通常难以从低维输入中学习高频信息[30,31,37]。几个NeRF变体[19,24,34,36,35]也证明了在各种具有挑战性的场景下在频率空间中学习的重要性。例如,BARF [19]逐渐增加学习NeRF的频率,而无需相机姿势。WaveNeRF [35]将小波频率分解引入多视图立体,以实现可推广的NeRF。随后,FreGS增强了频率空间中的3D高斯溅射,表明使用频率退火的渐进频率正则化可以导致更有效的高斯致密化和先进的新颖视图合成。
3Proposed Method 3建议方法
We propose FreGS, a novel 3D Gaussian splatting with progressive frequency regularization which is the first to alleviate the over-reconstruction issue of 3D Gaussian splatting from frequency perspective. Fig. 2 shows the overview of FreGS. The original 3D Gaussian splatting [16] (3D-GS), including Gaussian densification, is briefly introduced in Sec. 3.1. In Sec. 3.2, we first reveal the reason for the effectiveness of frequency regularization in addressing the over-reconstruction issue and improving Gaussian densification. Then, we describe amplitude and phase discrepancies employed for frequency regularization within the Fourier space. To reduce the difficulty of Gaussian densification, we design frequency annealing technique (Sec. 3.3) to achieve progressive frequency regularization, which can gradually exploit low-to-high frequency components to perform coarse-to-fine Gaussian densification.
我们提出了FreGS,一种新的3D高斯溅射与渐进频率正则化,这是第一个从频率的角度来减轻3D高斯溅射的过度重建问题。图2显示了FreGS的概述。原始的3D高斯溅射[16](3D-GS),包括高斯致密化,在第2节中简要介绍。3.1.节中3.2,我们首先揭示了频率正则化在解决过度重建问题和改善高斯致密化方面的有效性的原因。然后,我们描述了振幅和相位的差异,用于频率正则化内的傅立叶空间。为了降低高斯致密化的难度,我们设计了频率退火技术(Sec. 3.3)以实现渐进频率正则化,其可以逐渐利用低到高的频率分量来执行从粗到细的高斯致密化。
3.1Preliminary
3D Gaussian Splatting. 3D高斯散射
3D-GS models scene representations explicitly with anisotropic 3D Gaussians and achieves real-time rendering by efficient differentiable splatting. Given a sparse point cloud generated by structure-from-motion [11, 28], a set of 3D Gaussians is created, each of which is represented by a covariance matrix Σ, center position �, opacity � and spherical harmonics coefficients representing color �, where the covariance matrix Σ is represented by scaling matrix and rotation matrix for differentiable optimization.
3D-GS采用各向异性的3D高斯模型显式地对场景表示进行建模,并通过有效的可微溅射实现实时绘制。给定由运动恢复结构生成的稀疏点云[11,28],创建一组3D高斯,每个高斯由协方差矩阵 Σ 、中心位置 � 、不透明度 � 和表示颜色 � 的球谐系数表示,其中协方差矩阵 Σ 由用于可微优化的缩放矩阵和旋转矩阵表示。
Gaussian densification aims to transform the initial sparse set of Gaussians to a more densely populated set, enhancing its ability to accurately represent the scene. It mainly focuses on two cases. The first is the regions with missing geometric features (corresponding to under-reconstruction) while the other is the large high-variance regions covered by a few large Gaussians only (corresponding to over-reconstruction). Both of these cases result in inadequate representation of regions within scenes. For under-reconstruction, Gaussians are densified by cloning the Gaussians, which increases both the total volume and the number of Gaussians. For over-reconstruction, Gaussian densification is achieved by dividing large Gaussians into multiple smaller Gaussians, which keeps the total volume but increases the number of Gaussians.
高斯致密化的目的是将高斯的初始稀疏集转换为更密集的集合,增强其准确表示场景的能力。主要集中在两个案例。第一个是丢失几何特征的区域(对应于重建不足),而另一个是仅由几个大高斯覆盖的大的高方差区域(对应于重建过度)。这两种情况都导致场景内区域的表示不足。对于欠重建,通过克隆高斯来致密高斯,这增加了高斯的总体积和数量。对于过重建,通过将大的高斯体划分为多个较小的高斯体来实现高斯致密化,这保持了总体积,但增加了高斯体的数量。
For rendering, 3D Gaussians are projected to a 2D plane by splatting. The rendering can then be achieved via the �-blending. Specifically, the color � of a pixel can be computed by blending � ordered 2D Gaussians that overlap the pixel , which can be formulated by:
对于渲染,通过溅射将3D高斯投影到2D平面。然后可以通过 � 混合来实现渲染。具体地,像素的颜色 � 可以通过混合与像素重叠的 � 有序2D高斯来计算,其可以由下式表示:
�=∑�∈�����∏�=1�−1(1−��), | (1) |
where the color �� and �� are calculated by multiplying the covariance matrix of �-th 2D Gaussian by per-point spherical harmonics coefficients and opacity, respectively.
其中,颜色 �� 和 �� 是通过将第 � 个2D高斯的协方差矩阵分别乘以每点球谐系数和不透明度来计算的。
3.2Frequency Regularization
3.2频率正则化
Figure 3: Average pixel gradients within over-reconstruction regions and well-reconstruction regions in scene ‘Bicycle’. The curve with circle (w/o frequency regularization (FR)) represents the method equivalent to 3D-GS [16], which utilizes pixel-wise L1 loss in the spatial domain only. As the Gaussian densification is terminated after the 15000�ℎ iteration as in 3D-GS, we only show comparisons before the 15000�ℎ iteration. It can be observed that the frequency regularization can increase the pixel gradient within over-reconstruction regions significantly. Thus, compared with L1 loss, the frequency regularization shows superior capability in revealing the over-reconstruction region.
图3:场景“自行车”中过度重建区域和良好重建区域内的平均像素梯度。带圆圈的曲线(无频率正则化(FR))表示与3D-GS [16]等效的方法,该方法仅在空间域中利用像素级L1损失。由于高斯致密化在15000 �ℎ 迭代之后终止,如在3D-GS中,我们仅示出15000 �ℎ 迭代之前的比较。可以观察到,频率正则化可以显著增加过重构区域内的像素梯度。因此,与L1损失相比,频率正则化在揭示过重构区域方面显示出上级能力。
In this section, we first explore the reason why 3D-GS leads to over-reconstruction. We compute the average gradient of pixels within the over-reconstruction regions, tracking its changes as training progresses. As Fig. 3 shows, with a naive pixel-wise L1 loss, the average gradient could be quite small although the regions are not well reconstructed, which misleads the Gaussian densification. Specifically, the small pixel gradients are back-propagated to 2D splats for this pixel and the corresponding 3D Gaussians. As Gaussian densification is not applied to Gaussians with small gradients [16], these Gaussians cannot be densified through splitting into smaller Gaussians, leading to over-reconstruction. The consequence of over-reconstruction is an insufficient representation of regions, marked by deficiencies in both overall structure (low-frequency information) and details (high-frequency information). Compared with pixel space, the over-reconstruction region can be better revealed in frequency space by explicitly disentangling different frequency components. Thus, it is intuitive to guide the Gaussian densification by explicitly applying regularization in frequency domain. Fig. 3 shows that the average pixel gradient increases significantly with frequency regularization, demonstrating its effectiveness. We thus conclude that with frequency regularization, Gaussians can be adaptively densified in the over-reconstruction regions. In contrast, L1 loss cannot differentiate over-reconstructed regions from well-reconstructed ones, leading to many redundant Gaussians created in well-reconstructed regions.
在本节中,我们首先探讨3D-GS导致过度重建的原因。我们计算过重建区域内像素的平均梯度,跟踪其随着训练进行的变化。如图3所示,在原始像素L1损失的情况下,平均梯度可能非常小,尽管区域没有很好地重建,这会误导高斯致密化。具体地,小像素梯度被反向传播到该像素的2D splats和对应的3D高斯。由于高斯致密化不适用于具有小梯度的高斯[16],因此这些高斯不能通过分裂成更小的高斯来致密化,从而导致过度重建。过度重建的后果是区域的表示不足,其特征在于整体结构(低频信息)和细节(高频信息)的缺陷。 与像素空间相比,在频率空间中通过明确地分离不同的频率分量,可以更好地揭示过重构区域。因此,通过在频域中显式地应用正则化来指导高斯致密化是直观的。图3显示,平均像素梯度随着频率正则化显著增加,证明了其有效性。因此,我们得出结论,频率正则化,高斯可以自适应地在过度重建区域的致密化。相比之下,L1损失不能区分过度重建区域和良好重建区域,导致在良好重建区域中产生许多冗余高斯。
Based on the above analysis, we design FreGS which aims to boost 3D Gaussian splatting from frequency perspective. Specifically, it alleviates over-reconstruction and improves Gaussian densification by minimizing the discrepancy between the frequency spectrum of rendered images and corresponding ground truth. Amplitude and phase, as two major elements of frequency, can capture different information of the image. Therefore, we achieve the frequency regularization by regularizing the amplitude and phase discrepancies between rendered images �^∈ℝ�×�×� and ground truth �∈ℝ�×�×� within Fourier space.
基于上述分析,我们设计了FreGS,其目的是从频率的角度提高3D高斯溅射。具体而言,它通过最小化渲染图像的频谱与相应的地面实况之间的差异来消除过度重建并提高高斯致密化。幅值和相位作为频率的两个主要成分,可以捕捉图像的不同信息。因此,我们通过在傅立叶空间内正则化渲染图像 �^∈ℝ�×�×� 和地面实况 �∈ℝ�×�×� 之间的幅度和相位差异来实现频率正则化。
Here, we detail the amplitude and phase discrepancies. We first convert �^ and � to corresponding frequency representations �^ and � by 2D discrete Fourier transform. Take � as an example:
在这里,我们详细说明了幅度和相位差异。我们首先通过2D离散傅立叶变换将 �^ 和 � 转换为对应的频率表示 �^ 和 � 。以 � 为例:
�(�,�)=∑�=0�−1∑�=0�−1�(�,�)⋅�−�2�(���+���), | (2) |
where (�,�) and (�,�) represent the coordinates in an image and its frequency spectrum, respectively. �(�,�) and �(�,�) denote the pixel value and complex frequency value, respectively. Then, �(�,�) can be expressed in terms of amplitude |�(�,�)| and phase ∠�(�,�) as below:
其中 (�,�) 和 (�,�) 分别表示图像中的坐标及其频谱。 �(�,�) 和 �(�,�) 分别表示像素值和复频率值。然后,可以用振幅 |�(�,�)| 和相位 ∠�(�,�) 如下表示 �(�,�) :
|�(�,�)|=��(�,�)2+��(�,�)2 | (3) |
∠�(�,�)=������(��(�,�)��(�,�)), | (4) |
where ��(�,�) and ��(�,�) represent the imaginary components and the real components of �(�,�).
其中 ��(�,�) 和 ��(�,�) 表示 �(�,�) 的虚部和真实的部。
The amplitude and phase discrepancies (denoted as �� and ��) between the rendered image �^ and the ground truth � can be obtained with the Euclidean metric. Besides, we compute the amplitude and phase of all frequency components to assess the disparities accurately, which are then averaged to derive the final discrepancies as follows:
可以利用欧几里德度量来获得渲染图像 �^ 和地面实况 � 之间的幅度和相位差异(表示为 �� 和 �� )。此外,我们计算所有频率分量的幅度和相位以准确评估差异,然后将其平均以得出最终差异,如下所示:
��=1��∑�=0�−1∑�=0�−1||�(�,�)|−|�^(�,�)|| | (5) |
��=1��∑�=0�−1∑�=0�−1|∠�(�,�)−∠�^(�,�)|, | (6) |
where �(�,�) and �^(�,�) denote the complex frequency value of � and �^, respectively.
其中 �(�,�) 和 �^(�,�) 分别表示 � 和 �^ 的复频率值。
3.3Frequency Annealing 3.3频率退火
Figure 4: The comparison of different frequency regularizations. The naive frequency regularization directly employs amplitude and phase discrepancies without distinguishing between low and high frequency. The proposed progressive frequency regularization introduces frequency annealing technique to achieve low-to-high frequency regularization for coarse-to-fine Gaussian densification. It can be observed that the proposed progressive frequency regularization can achieve finer Gaussian densification and superior novel view synthesis. Zoom in for best view.
图4:不同频率正则化的比较。朴素频率正则化直接采用幅度和相位差异,而不区分低频和高频。提出的渐进频率正则化引入频率退火技术,以实现从低到高的频率正则化的粗到细的高斯致密化。可以观察到,所提出的渐进频率正则化可以实现更精细的高斯致密化和上级新颖视图合成。放大以获得最佳视图。
Though naively adopting the amplitude and phase discrepancies (without distinguishing between low and high frequency) as the frequency regularization can mitigate over-reconstruction to some extent, it still suffers from restricted Gaussian densification and significantly biases 3D Gaussian splatting towards undesirable artifacts (as shown in Fig. 4). As low and high frequency relates to large-scale features (e.g., global patterns and structures) and small-scale features (e.g., local details), respectively, we design frequency annealing technique to perform progressive frequency regularization, which gradually leverages low-to-high frequency to achieve coarse-to-fine Gaussian densification. With frequency annealing technique, superior Gaussian densification can be achieved as shown in Fig. 4.
虽然单纯地采用幅度和相位差异(不区分低频和高频)作为频率正则化可以在一定程度上减轻过度重建,但它仍然受到限制的高斯致密化的影响,并且显著地使3D高斯飞溅偏向于不期望的伪影(如图4所示)。由于低频和高频涉及大尺度特征(例如,全局图案和结构)和小尺度特征(例如,局部细节),分别,我们设计了频率退火技术来执行渐进频率正则化,逐渐利用低到高的频率,以实现从粗到细的高斯致密化。利用频率退火技术,可以实现上级高斯致密化,如图4所示。
Specifically, to achieve frequency annealing, we incorporate the low-pass filter �� and dynamic high-pass filter �ℎ in Fourier space to extract low and high frequency (denoted as ��(�,�) and ��(�,�)), respectively.
具体地,为了实现频率退火,我们在傅立叶空间中结合低通滤波器 �� 和动态高通滤波器 �ℎ 以分别提取低频和高频(表示为 ��(�,�) 和 ��(�,�) )。
��(�,�)=�(�,�)��(�,�) | (7) |
��(�,�)=�(�,�)�ℎ(�,�). | (8) |
The corresponding amplitude and phase discrepancies for low and high frequency can then be formulated as follows:
低频和高频的相应幅度和相位差可以用公式表示如下:
���=1��∑�=0�−1∑�=0�−1||��(�,�)|−|��^(�,�)|| | (9) |
���=1��∑�=0�−1∑�=0�−1|∠��(�,�)−∠��^(�,�)| | (10) |
�ℎ�=1��∑�=0�−1∑�=0�−1||��(�,�)|−|��^(�,�)|| | (11) |
�ℎ�=1��∑�=0�−1∑�=0�−1|∠��(�,�)−∠��^(�,�)| | (12) |
where ���, ���, �ℎ� and �ℎ� represent low-frequency amplitude discrepancy, low-frequency phase discrepancy, dynamic high-frequency amplitude discrepancy and dynamic high-frequency phase discrepancy, respectively.
其中, ��� 、 ��� 、 �ℎ� 和 �ℎ� 分别表示低频幅度差异、低频相位差异、动态高频幅度差异和动态高频相位差异。
Figure 5: Qualitative comparisons of FreGS with three state-of-the-art methods in novel view synthesis. Note that for fair comparison as well as trade-off balance between synthesis quality and memory consumption, we train FreGS with similar number of Gaussians as 3D-GS for these datasets (details in Sec.4.2). The comparisons are conducted over multiple indoor and outdoor scenes including ‘Garden’ and ‘Room’ from Mip-NeRF360, ‘Train’ and ‘Truck’ from Tank&Temple, and ‘Drjohnson’ from Deep Blending. ‘GT’ denotes the ground-truth images. FreGS achieves superior image rendering with much less artifacts but more fine details.
图5:FreGS与三种最先进的方法在新视图合成中的定性比较。请注意,为了公平比较以及综合质量和内存消耗之间的权衡,我们为这些数据集训练了FreGS,其中高斯数与3D-GS相似(详见第4.2节)。这些比较是在多个室内和室外场景中进行的,包括Mip-NeRF 360的“花园”和“房间”,Tank&Temple的“火车”和“卡车”,以及Deep Blending的“Drjohnson”。'GT'表示地面实况图像。FreGS实现了上级图像渲染,伪影少得多,但细节更精细。
For the progressive frequency regularization with the frequency annealing, we initiate it by regularizing the low-frequency discrepancies and then gradually integrate the high-frequency components as the training progresses. The gradual incorporation of high frequency can be achieved with the dynamic high-pass filter �ℎ, where the frequency band range �� allowed to pass at the �-th (�∈[�0,�]) iteration can be expressed by:
对于带有频率退火的渐进频率正则化,我们通过正则化低频差异来启动它,然后随着训练的进行逐渐整合高频分量。可以用动态高通滤波器 �ℎ 实现高频的逐渐并入,其中允许在第 � 次( �∈[�0,�] )迭代时通过的频带范围 �� 可以表示为:
�0<��<(�−�0)(�−�0)�−�0+�0, | (13) |
where �0 and � denote the maximum range allowed by the low-pass filter and the maximum range of frequency spectrum, respectively. Note that we take the center point (�/2,�/2) as the coordinate origin. �, �0 and � represent the current iteration, the starting and end iterations of introducing high-frequency components, respectively. Regularization applied to low-to-high frequency results in coarse-to-fine Gaussian densification. The progressive frequency regularization ℒ� can be formulated as follows:
其中 �0 和 � 分别表示低通滤波器允许的最大范围和频谱的最大范围。注意,我们将中心点 (�/2,�/2) 作为坐标原点。 � 、 �0 和 � 分别表示当前迭代、引入高频分量的开始迭代和结束迭代。应用于低到高频率的正则化导致从粗到细的高斯致密化。渐进频率正则化 ℒ� 可以公式化如下:
ℒ�={��(���+���),0<�≤�0��(���+���)+�ℎ(�ℎ�+�ℎ�),�>�0, | (14) |
where �� and �ℎ represent the training weights for low frequency and high frequency, respectively. 重试 错误原因
4Experiments 4实验
4.1Datasets and Implementation Details 重试 错误原因
Datasets 数据集
For training and testing, we follow the dataset setting of 3D-GS [16] and conduct experiments on images of a total of 11 real scenes. Specifically, we evaluate FreGS on all nine scenes of Mip-NeRF360 dataset [2] and two scenes from the Tanks&Temples dataset [18]. The selected scenes exhibit diverse styles, ranging from bounded indoor environments to unbounded outdoor ones. To divide the datasets into training and test sets, we follow 3D-GS and allocate every 8�ℎ photo to the test set. The resolution of all involved images is the same as in 3D-GS as well. 重试 错误原因
Datasets | Mip-NeRF360 | Tanks&Temples 坦克和寺庙 | Deep Blending 深度交融 | ||||||
---|---|---|---|---|---|---|---|---|---|
Methods | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ | SSIM↑ | PSNR↑ | LPIPS↓ |
Plenoxels | 0.626 | 23.08 | 0.463 | 0.719 | 21.08 | 0.379 | 0.795 | 23.06 | 0.510 |
INGP-Base | 0.671 | 25.30 | 0.371 | 0.723 | 21.72 | 0.330 | 0.797 | 23.62 | 0.423 |
INGP-Big | 0.699 | 25.59 | 0.331 | 0.745 | 21.92 | 0.305 | 0.817 | 24.96 | 0.390 |
Mip-NeRF360 | 0.792 | 27.69 | 0.237 | 0.759 | 22.22 | 0.257 | 0.901 | 29.40 | 0.245 |
3D-GS | 0.815 | 27.21 | 0.214 | 0.841 | 23.14 | 0.183 | 0.903 | 29.41 | 0.243 |
FreGS(Ours) | 0.826 | 27.85 | 0.209 | 0.849 | 23.96 | 0.178 | 0.904 | 29.93 | 0.240 |
Table 1:Quantitative comparisons on the dataset Mip-NeRF360, Tank&Temple and Deep Blending. Note that for fair comparison as well as trade-off balance between synthesis quality and memory consumption, we train FreGS with similar number of Gaussians as 3D-GS for these datasets (details in Sec.4.2). All methods are trained with the same training data. INGP-Base and INGP-Big refer to the InstantNGP [22] with a basic configuration and a slightly larger network [22], respectively. Best score, second best score and thrid best score are in red, orange and yellow respectively.
表1:数据集Mip-NeRF 360、Tank&Temple和Deep Blending的定量比较。请注意,为了公平比较以及综合质量和内存消耗之间的权衡,我们为这些数据集训练了FreGS,其中高斯数与3D-GS相似(详见第4.2节)。所有方法都使用相同的训练数据进行训练。INGP-Base和INGP-Big分别指具有基本配置和稍大网络的InstantNGP [22]。最好成绩、第二好成绩和第三好成绩分别用红色、橙子和黄色表示。
Datasets | Mip-NeRF360 | Tank&Temple | ||||
---|---|---|---|---|---|---|
Methods | PSNR↑ | SSIM↑ | LPIPS↓ | PSNR↑ | SSIM↑ | LPIPS↓ |
Base | 27.21 | 0.815 | 0.214 | 23.14 | 0.841 | 0.183 |
Base+FR | 27.63 | 0.818 | 0.213 | 23.76 | 0.844 | 0.181 |
Base+FR+FA | 27.85 | 0.826 | 0.209 | 23.96 | 0.849 | 0.178 |
Table 2: Ablation studies of the proposed FreGS on the datasets Mip-NeRF360 and Tank&Temple. The baseline Base adopts pixel-level L1 loss and the D-SSIM term for 3D Gaussian splatting in spatial space. Our Base+FR incorporates frequency regularization (FR) to address the over-reconstruction in the frequency space. The Base+FR+FA (i.e., FreGS) further introduces our proposed frequency annealing technique (FA) to achieve progressive frequency regularization. Note that for fair comparison, we train Base+FR+FA with similar number of Gaussians as Base by increasing the gradient threshold. Besides, Base+FR and Base+FR+FA have the same gradient threshold.
表2:在Mip-NeRF 360和Tank&Temple数据集上对拟定FreGS进行的消融研究。基线Base采用像素级L1损失和D-SSIM项用于空间空间中的3D高斯溅射。我们的Base+FR结合了频率正则化(FR)来解决频率空间中的过度重建问题。基础+FR+FA(即,FreGS)进一步引入了我们提出的频率退火技术(FA)来实现渐进频率正则化。请注意,为了公平比较,我们通过增加梯度阈值来训练Base+FR+FA,其高斯数与Base相似。此外,Base+FR和Base+FR+FA具有相同的梯度阈值。
Implementation 执行
For progressive frequency regularization, we initiate it with low-frequency amplitude and phase discrepancies and then extend the regularization to progressively encompass high-frequency amplitude and phase discrepancies for fine Gaussian densification. Note, the pixel-level L1 loss in the spatial space plus the D-SSIM term is used in the whole training process, which complements the proposed progressive frequency regularization in the frequency space. We stop the Gaussian densification after the 15000th iteration as in 3D-GS [16]. Note that the frequency regularization terminates once Gaussian densification ends. For stable optimization, we start the optimization by working with an image resolution that is four times smaller than the original images as in 3D-GS. After 500 iterations, we increase the image resolution to the original size by upsampling. We adopt Adam optimizer [17] to train the FreGS and use the Pytorch framework [25] for implementation. For the rasterization, we keep the custom CUDA kernels used in 3D-GS. 重试 错误原因
4.2Comparisons with the State-of-the-Art
4.2与最新技术水平的比较
We compare FreGS with 3D-GS [16] as well as other four NeRF-based methods [7, 22, 2] over various scenes in datasets Mip-NeRF360 and Tank&Temple. For fair comparison as well as the trade-off balance between memory and performance, we train FreGS with a similar number of Gaussians as 3D-GS for these datasets, which is achieved by introducing progressive frequency regularization while increasing the gradient threshold. All compared methods are trained with the same training data and hardware. Table 1 shows experimental results over the same test images as described in Section 4.1. We can observe that FreGS outperforms the state-of-the-art 3D-GS consistently in PSNR, SSIM and LPIPS across all real scenes. The superior performance is largely attributed to our proposed progressive frequency regularization which alleviates the over-reconstruction issue of Gaussians and improves the Gaussian densification effectively. In addition, FreGS surpasses Mip-NeRF360, INGP-Base, INGP-Big, and Plenoxels by significant margins in terms of the image rendering quality. As Fig. 5 shows, FreGS achieves superior novel view synthesis with less artifacts and finer details.
我们将FreGS与3D-GS [16]以及其他四种基于NeRF的方法[7,22,2]在数据集Mip-NeRF 360和Tank&Temple的各种场景中进行比较。为了公平比较以及内存和性能之间的权衡,我们使用与这些数据集的3D-GS相似数量的高斯函数训练FreGS,这是通过引入渐进频率正则化同时增加梯度阈值来实现的。所有比较的方法都使用相同的训练数据和硬件进行训练。表1显示了与第4.1节所述相同的测试图像的实验结果。我们可以观察到,FreGS在所有真实的场景中的PSNR、SSIM和LPIPS方面始终优于最先进的3D-GS。这种上级性能主要归功于我们提出的渐进频率正则化方法,它克服了高斯的过度重建问题,有效地提高了高斯密度。 此外,FreGS在图像渲染质量方面超过了Mip-NeRF 360,INGP-Base,INGP-Big和Plenoxels。如图5所示,FreGS实现了具有较少伪影和更精细细节的上级新颖视图合成