Home | Publications | BKN+25

Continuous, Subject-Specific Attribute Control in T2I Models by Identifying Semantic Directions

MCML Authors

Felix Krause

→ Group Björn Ommer
Computer Vision & Learning

Vincent Tao Hu

Dr.

→ Group Björn Ommer
Computer Vision & Learning

Björn Ommer

Prof. Dr.

Principal Investigator

Computer Vision & Learning

Abstract

In recent years, advances in text-to-image (T2I) diffusion models have substantially elevated the quality of their generated images. However, achieving fine-grained control over attributes remains a challenge due to the limitations of natural language prompts (such as no continuous set of intermediate descriptions existing between ``person'' and ``old person''). Even though many methods were introduced that augment the model or generation process to enable such control, methods that do not require a fixed reference image are limited to either enabling global fine-grained attribute expression control or coarse attribute expression control localized to specific subjects, not both simultaneously. We show that there exist directions in the commonly used token-level CLIP text embeddings that enable fine-grained subject-specific control of high-level attributes in text-to-image models. Based on this observation, we introduce one efficient optimization-free and one robust optimization-based method to identify these directions for specific attributes from contrastive text prompts. We demonstrate that these directions can be used to augment the prompt text input with fine-grained control over attributes of specific subjects in a compositional manner (control over multiple attributes of a single subject) without having to adapt the diffusion model.

inproceedings BKN+25

CVPR 2025

IEEE/CVF Conference on Computer Vision and Pattern Recognition. Nashville, TN, USA, Jun 11-15, 2025.

Authors

S. A. Baumann • F. Krause • M. Neumayr • N. Stracke • M. Sevi • V. T. Hu • B. Ommer

Links

DOI GitHub

Research Area

B1 | Computer Vision

BibTeXKey: BKN+25

#p-ommer