Home  | Publications | TGB+25a

Box It to Bind It: Unified Layout Control and Attribute Binding in Text-to-Image Diffusion Models

MCML Authors

Abstract

While latent diffusion models (LDMs) excel at creating imaginative images, they often lack precision in semantic fidelity and spatial control over where objects are generated. To address these deficiencies, we introduce the Box-it-to-Bind-it (B2B) module-a novel, training-free approach for improving spatial control and semantic accuracy in text-to-image (T2I) diffusion models. B2B targets three key challenges in T2I: catastrophic neglect, attribute binding, and layout guidance. The process encompasses two main steps: (i) Object generation, which adjusts the latent encoding to guarantee object generation and directs it within specified bounding boxes, and (ii) Attribute binding, ensuring that generated objects adhere to their specified attributes in the prompt. B2B is designed as a compatible plug-and-play module for existing T2I models like Stable Diffusion and Gligen, markedly enhancing models' performance in addressing these key challenges. We assess our technique on the well-established CompBench and TIFA score benchmarks, and HRS dataset where B2B not only surpasses methods specialized in either attribute binding or layout guidance but also uniquely excels by integrating these capabilities to deliver enhanced overall performance.

article


IEEE Transactions on Multimedia

Early Access. Sep. 2025.
Top Journal

Authors

A. Taghipour • M. Ghahremani • M. Bennamoun • A. M. Rekavandi • H. Laga • F. Boussaid

Links

DOI

Research Area

 C1 | Medicine

BibTeXKey: TGB+25a

Back to Top