Home | Publications | ALW+25

GALA: Guided Attention With Language Alignment for Open Vocabulary Gaussian Splatting

MCML Authors

Kunyi Li

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Sen Wang

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Stefano Gasperini

Dr.

→ Group Nassir Navab
Computer Aided Medical Procedures & Augmented Reality

Nassir Navab

Prof. Dr.

Core PI

Computer Aided Medical Procedures & Augmented Reality

Abstract

3D scene reconstruction and understanding have gained increasing popularity, yet existing methods still struggle to capture fine-grained, language-aware 3D representations from 2D images. In this paper, we present GALA, a novel framework for open-vocabulary 3D scene understanding with 3D Gaussian Splatting (3DGS). GALA distills a scene-specific 3D instance feature field via self-supervised contrastive learning. To extend to generalized language feature fields, we introduce the core contribution of GALA, a cross-attention module with two learnable codebooks that encode view-independent semantic embeddings. This design not only ensures intra-instance feature similarity but also supports seamless 2D and 3D open-vocabulary queries. It reduces memory consumption by avoiding per-Gaussian high-dimensional feature learning. Extensive experiments on real-world datasets demonstrate GALA's remarkable open-vocabulary performance on both 2D and 3D.

misc ALW+25

Preprint

Aug. 2025

Authors

E. Alegret • K. Li • S. Wang • S. Liang • M. Niemeyer • S. Gasperini • N. Navab • F. Tombari

Links

arXiv

In Collaboration

Google
Visualais

Research Area

C1 | Medicine

BibTeXKey: ALW+25

#p-navab