Occam’s LGS: An Efficient Approach for Language Gaussian Splatting

1Johns Hopkins University, 2INSAIT Sofia University
*jcheng65@jh.edu

TL;DR:
🎯 We present a simple method to lift 2D language features to 3D Gaussian Splats without complex modules or training.
🚀 Our optimization-based approach is 100x faster, works with any feature dimension, and accurately models the rendering process.

Interpolate start reference image

Overview of our training-free language 3D Gaussian Splatting method, achieving SOTA performance with only 15s runtime.

Interpolate start reference image

Feature comparison with existing methods.

*Visualization: SAM+CLIP features, reduced to 3-dimension via LangSplat autoencoder for visualization purpose, uplifted to 3D

Abstract

TL;DR: Gaussian Splatting is a widely adopted approach for 3D scene representation, offering efficient, high-quality reconstruction and rendering. A key reason for its success is the simplicity of representing scenes with sets of Gaussians, making it interpretable and adaptable. To enhance understanding beyond visual representation, recent approaches extend Gaussian Splatting with semantic vision-language features, enabling open-set tasks. Typically, these language features are aggregated from multiple 2D views, however, existing methods rely on cumbersome techniques, resulting in high computational costs and longer training times.

In this work, we show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. Instead, we follow a probabilistic formulation of Language Gaussian Splatting and apply Occam’s razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique. Doing so offers us state-ofthe-art results with a speed-up of two orders of magnitude without any compression, allowing for easy scene manipulation

Approach

Interpolate start reference image

Overview: Occam's LGS consists of two main stages: (1) Forward rendering with 3D Gaussian Splatting to obtain alpha blending weights w, projected positions xi' and pixels pi, followed by weighted aggregation of multi-view semantic features and (2) Filtering of noisy Gaussians

More Visualizations

Visualization of LERF Queries

Comparison

We show the comparison with other works.

Interpolate start reference image

3D-OVS Dataset

We also show the visualization of the relevancy map for the 3D-OVS dataset.

Interpolate start reference image

Room Scene

Interpolate start reference image

Bench Scene

Interpolate start reference image

Lawn Scene

Application: Object insertion

We select and extract an "Pocelain hand" and "Waldo" (represented by its Gaussians) from the Figurines scene of LERF. By simply copying the object's Gaussians together with their parameters and semantic features, the new object seamlessly integrates into the Teatime scene while preserving its semantic features.

Interpolate start reference image

BibTeX

@article{cheng2024occamslgssimpleapproach,
  title={Occam's LGS: A Simple Approach for Language Gaussian Splatting}, 
  author={Jiahuan Cheng and Jan-Nico Zaech and Luc Van Gool and Danda Pani Paudel},
  year={2024},
  eprint={2412.01807}
}