Home  | Publications | OHP+25

Location-Free Scene Graph Generation

MCML Authors

Abstract

Scene Graph Generation (SGG) is a visual understanding task that describes a scene as a graph of entities and their relationships, traditionally relying on spatial labels like bounding boxes or segmentation masks. These requirements increase annotation costs and complicate integration with other modalities where spatial synchronization may be unavailable. In this work, we investigate the feasibility and effectiveness of scene graphs without location information, offering an alternative paradigm for scenarios where spatial data is unavailable. To this end, we propose the first method to generate location-free scene graphs, directly from images, evaluate their correctness and show the usefulness of such location-free scene graphs in several downstream tasks. Our proposed method, Pix2SG, models scene graph generation as an autoregressive sequence modeling task, predicting all instances and their relations as one output sequence. To enable evaluation without location matching, we propose a heuristic tree search algorithm that matches predicted scene graphs with ground truth graphs, bypassing the need for location-based metrics. We demonstrate the effectiveness of location-free scene graphs on three benchmark datasets and two downstream tasks -- image retrieval and visual question showing they can achieve competitive performance with significantly less annotations. Our findings suggest that location-free scene graphs can still be generated and utilized effectively without location information, thus opening new avenues for scalable, structured and efficient visual representations, such as for multimodal scene understanding by reducing dependency on modality-specific annotations. The code will be made available upon acceptance.

inproceedings


MULA @CVPR 2025

8th Multimodal Learning and Applications Workshop at IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025.

Authors

E. ÖzsoyF. HolmC. Pellegrini • T. Czempiel • M. Saleh • N. NavabB. Busam

Links

DOI

Research Areas

 B1 | Computer Vision

 C1 | Medicine

BibTeXKey: OHP+25

Back to Top