Home | Publications | TGB+25a

Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

MCML Authors

Morteza Ghahremani

Dr.

→ Group Christian Wachinger
Artificial Intelligence in Medical Imaging

Abstract

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module-a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: (i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and (ii) Attribute binding, ensuring that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models like Stable Diffusion and Gligen, markedly enhancing models' performance in addressing these key challenges. We assess our technique on the well-established CompBench and TIFA score benchmarks, and HRS dataset where B2B not only surpasses methods specialized in either attribute binding or layout guidance but also uniquely excels by integrating these capabilities to deliver enhanced overall performance.

article TGB+25a

IEEE Transactions on Multimedia

Early Access. Sep. 2025.

Authors

A. Taghipour • M. Ghahremani • M. Bennamoun • A. M. Rekavandi • H. Laga • F. Boussaid

Links

DOI

Research Area

C1 | Medicine

BibTeXKey: TGB+25a

#p-wachinger