Home  | Publications | ALW+25

GALA: Guided Attention With Language Alignment for Open Vocabulary Gaussian Splatting

MCML Authors

Abstract

3D scene reconstruction and understanding have gained increasing popularity, yet existing methods still struggle to capture fine-grained, language-aware 3D representations from 2D images. In this paper, we present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning. To extend to generalized language feature fields, we introduce the core contribution of GALA, a cross-attention module with two learnable codebooks that encode view-independent semantic embeddings. This design not only ensures intra-instance feature similarity but also supports seamless 2D and 3D open-vocabulary queries. It reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning. Extensive experiments on real-world datasets demonstrate GALA's remarkable open-vocabulary performance on both 2D and 3D.

misc


Preprint

Aug. 2025

Authors

E. Alegret • K. LiS. Wang • S. Liang • M. Niemeyer • S. GasperiniN. Navab • F. Tombari

Links


Research Area

 C1 | Medicine

BibTeXKey: ALW+25

Back to Top