Home  | News

30.10.2025

Teaser image to Strength of gender biases in AI images varies across languages

Strength of Gender Biases in AI Images Varies Across Languages

Alexander Fraser Shows Text-to-Image Generators Reproduce and Magnify Role Stereotypes

The team of MCML PI Alexander Fraser together with researchers at TU Darmstadt have studied how text-to-image generators deal with gender stereotypes in various languages. The results show that the models not only reflect gender biases, but also amplify them. The direction and strength of the distortion depends on the language in question.

In social media, web searches and on posters: AI-generated images can now be found everywhere. Large language models (LLMs) such as ChatGPT are capable of converting simple input into deceptively realistic images. Researchers have now demonstrated that the generation of such artificial images not only reproduces gender biases, but actually magnifies them.

Models in Different Languages Investigated

The study explored models across nine languages and compared the results. Previous studies had generally focused only on English-language models. As a benchmark, the team developed the Multilingual Assessment of Gender Bias in Image Generation (MAGBIG). It is based on carefully controlled occupational designations. The study investigated four different types of prompts: direct prompts that use the ‘generic masculine’ in languages in which the generic term for an occupation is grammatically masculine (‘doctor‘), indirect descriptions (‘a person working as a doctor‘), explicitly feminine prompts (‘female doctor‘) and ‘gender star’ prompts (the German convention intended to create a gender-neutral designation by using an asterisk, e.g. ‘Ärzt*innen’ for doctors).

To make the results comparable, the researchers included languages in which the names of occupations are gendered, such as German, Spanish and French. In addition, the model incorporated languages such as English and Japanese that use only one grammatical gender but have gendered pronouns (‘her’, ‘his’). And finally, it included languages without grammatical gender: Korean and Chinese.

AI Images Perpetuate and Magnify Role Stereotypes

The results of the study show that direct prompts with the generic masculine show the strongest biases. For example, such occupations as ‘accountant’ produce mostly images of white males, while prompts referring to caregiving professions tend to generate female-presenting images. Gender-neutral or ‘gender-star’ forms only slightly mitigated these stereotypes, while images resulting from explicitly feminine prompts showed almost exclusively women. Along with the gender distribution, the researchers also analyzed how well the models understood and executed the various prompts. While neutral formulations were seen to reduce gender stereotypes, they also led to a lower quality of matches between the text input and the generated image.

“Our results clearly show that the language structures have a considerable influence on the balance and bias of AI image generators,” says Alexander Fraser, Professor for Data Analytics & Statistics at TUM Campus in Heilbronn. “Anyone using AI systems should be aware that different wordings may result in entirely different images and may therefore magnify or mitigate societal role stereotypes.”

“AI image generators are not neutral—they illustrate our prejudices in high resolution, and this depends crucially on language. Especially in Europe, where many languages converge, this is a wake-up call: fair AI must be designed with language sensitivity in mind,“ adds Kristian Kersting, co-director of hessian.AI and co-spokesperson for the ”Reasonable AI” cluster of excellence at TU Darmstadt.

Remarkably, bias varies across languages without a clear link to grammatical structures. For example, switching from French to Spanish prompts leads to a substantial increase in gender bias, despite both languages distinguishing in the same way between male and female occupational terms.

A* Conference
F. Friedrich • K. Hämmerl • P. Schramowski • M. Brack • J. Libovicky • K. Kersting • A. Fraser
Multilingual Text-to-Image Generation Magnifies Gender Stereotypes and Prompt Engineering May Not Help You.
ACL 2025 - 63rd Annual Meeting of the Association for Computational Linguistics. Vienna, Austria, Jul 27-Aug 01, 2025. DOI
#research #research-project #fraser

Related

Link to How Should Researchers Report Their Use of LLMs?

10.06.2026

How Should Researchers Report Their Use of LLMs?

Is AI making science impossible to replicate? Stefan Feuerriegel and the MCML team introduce the GUIDE-LLM framework in Nature.

Read more
Link to Benjamin Lange: The Real Risk of AI Agents is Manipulation Through Kindness

02.06.2026

Benjamin Lange: The Real Risk of AI Agents Is Manipulation Through Kindness

MCML Junior Research Group Leader Benjamin Lange examines how trust in AI agents can itself become a source of risk.

Read more
Tiny logo
Link to MCML at CVPR 2026

02.06.2026

MCML at CVPR 2026

MCML researchers are represented with 28 papers at CVPR 2026 (26 Main, and 2 Workshops).

Read more
Tiny logo
Link to MCML at ICRA 2026

29.05.2026

MCML at ICRA 2026

MCML researchers are represented with 4 papers at ICRA 2026 (3 Main, and 1 Workshop).

Read more
Link to Zeynep Akata: To Trust AI, We Need to Understand What Goes On Behind the Scenes

28.05.2026

Zeynep Akata: To Trust AI, We Need to Understand What Goes on Behind the Scenes

MCML PI Zeynep Akata explains that to trust AI, we must understand its inner workings, address foundation model bias, and make explainability central.

Read more
Back to Top