Occam’s LGS: A simple approach for Language Gaussian Splatting

1Johns Hopkins University, 2INSAIT Sofia University
*jcheng65@jh.edu

TL;DR:
🎯 We present a simple method to lift 2D language features to 3D Gaussian Splats without complex modules or training.
🚀 Our optimization-based approach is 100x faster, works with any feature dimension, and accurately models the rendering process.

Interpolate start reference image

Overview of our training-free language 3D Gaussian Splatting method, achieving SOTA performance with only 15s runtime.

Interpolate start reference image

Feature comparison with existing methods.

*Visualization: SAM+CLIP features, reduced to 3-dimension via LangSplat autoencoder for visualization purpose, uplifted to 3D

Abstract

In this work, we show that the sophisticated techniques for language-grounded 3D Gaussian Splatting are simply unnecessary. Instead, we apply Occam's razor to the task at hand and perform weighted multi-view feature aggregation using the weights derived from the standard rendering process, followed by a simple heuristic-based noisy Gaussian filtration. Doing so offers us state-of-the-art results with a speed-up of two orders of magnitude. We showcase our results in two commonly used benchmark datasets: LERF and 3D-OVS. Our simple approach allows us to perform reasoning directly in the language features, without any compression whatsoever. Such modeling in turn offers easy scene manipulation, unlike the existing methods -- which we illustrate using an application of object insertion in the scene. Furthermore, we provide a thorough discussion regarding the significance of our contributions within the context of the current literature. Our source code will be made publicly available.

Approach

Interpolate start reference image

Overview: Occam's LGS consists of three stages: (1) Forward rendering with 3D Gaussian Splatting to obtain opacity α, projected positions xi' and pixels pi, (2) Weighted aggregation of multi-view semantic features via alpha blending, and (3) Filtering of invisible Gaussians

More Visualizations

Visualization of LERF Queries

Comparison

We show the comparison with other works.

Interpolate start reference image

3D-OVS Dataset

We also show the visualization of the relevancy map for the 3D-OVS dataset.

Interpolate start reference image

Room Scene

Interpolate start reference image

Bench Scene

Interpolate start reference image

Lawn Scene

Application: Object insertion

We select and extract an "Pocelain hand" and "Waldo" (represented by its Gaussians) from the Figurines scene of LERF. By simply copying the object's Gaussians together with their parameters and semantic features, the new object seamlessly integrates into the Teatime scene while preserving its semantic features.

Interpolate start reference image

BibTeX

@article{cheng2024occamslgssimpleapproach,
  title={Occam's LGS: A Simple Approach for Language Gaussian Splatting}, 
  author={Jiahuan Cheng and Jan-Nico Zaech and Luc Van Gool and Danda Pani Paudel},
  year={2024},
  eprint={2412.01807}
}