Occam’s LGS: An Efficient Approach for Language Gaussian Splatting

TL;DR:
🎯 We present a simple method to lift 2D language features to 3D Gaussian Splats without complex modules or training.
🚀 Our optimization-based approach is 100x faster, works with any feature dimension, and accurately models the rendering process.

Overview of our training-free language 3D Gaussian Splatting method, achieving SOTA performance with only 15s runtime.

Feature comparison with existing methods.

Abstract

TL;DR: Gaussian Splatting is a widely adopted approach for 3D scene representation, offering efficient, high-quality reconstruction and rendering. A key reason for its success is the simplicity of representing scenes with sets of Gaussians, making it interpretable and adaptable. To enhance understanding beyond visual representation, recent approaches extend Gaussian Splatting with semantic vision-language features, enabling open-set tasks. Typically, these language features are aggregated from multiple 2D views, however, existing methods rely on cumbersome techniques, resulting in high computational costs and longer training times.

In this work, we show that the complicated pipelines for language 3D Gaussian Splatting are simply unnecessary. Instead, we follow a probabilistic formulation of Language Gaussian Splatting and apply Occam’s razor to the task at hand, leading to a highly efficient weighted multi-view feature aggregation technique. Doing so offers us state-ofthe-art results with a speed-up of two orders of magnitude without any compression, allowing for easy scene manipulation

Approach

Overview: Occam's LGS consists of two main stages: (1) Forward rendering with 3D Gaussian Splatting to obtain alpha blending weights w, projected positions x_i' and pixels p_i, followed by weighted aggregation of multi-view semantic features and (2) Filtering of noisy Gaussians

More Visualizations

Visualization of LERF Queries

Comparison

We show the comparison with other works.

3D-OVS Dataset

We also show the visualization of the relevancy map for the 3D-OVS dataset.

Room Scene

Bench Scene

Lawn Scene

Application: Object insertion

We select and extract an "Pocelain hand" and "Waldo" (represented by its Gaussians) from the Figurines scene of LERF. By simply copying the object's Gaussians together with their parameters and semantic features, the new object seamlessly integrates into the Teatime scene while preserving its semantic features.

@article{cheng2024occamslgssimpleapproach, title={Occam's LGS: A Simple Approach for Language Gaussian Splatting}, author={Jiahuan Cheng and Jan-Nico Zaech and Luc Van Gool and Danda Pani Paudel}, year={2024}, eprint={2412.01807} }