18.12.2025
“See, Don’t Assume”: Revealing and Reducing Gender Bias in AI
MCML Research Insight - With Leander Girrbach, Yiran Huang, Stephan Alaniz and Zeynep Akata
Using AI and LLMs at work feels almost unavoidable today: they make things easier, but they can also go wrong in important ways. One of the trickiest problems? Gender bias. For example, if you ask about someone’s skills from a photo, it may confidently label them a “born leader” or “working well under pressure” with no real evidence.
A recent ICLR 2025 paper from MCML, Helmholtz Munich, and UC Berkeley puts 22 popular open-source vision-language assistants under the microscope and shows how to spot gender bias and dial it down without reducing overall performance. In their paper “Revealing and reducing Gender Biases in Vision and Language Assistants (VLAs)”, MCML Junior Members Leander Girrbach, Yiran Huang and former member Stephan Alaniz, with MCML PI Zeynep Akata and collaborator Trevor Darrell investigate what biases these models exhibit and evaluate ways to reduce them.
©Girrbach et al.
Figure 1: The authors measure gender bias across personality traits, work-related soft skills, and occupations. First, they collect suitable attributes and integrate them into a predefined prompt template. The prompt and an image are provided to the VLAs. They analyze the VLAs’ responses by comparing the probability of outputting the "yes" option across genders and apply several debiasing methods.
«We study gender bias in 22 popular open-source VLAs with respect to personality traits, skills, and occupations. Our results show that VLAs replicate human biases likely present in the data, such as real-world occupational imbalances.»
Leander Girrbach et al.
MCML Junior Members
How it Works
The authors test three areas, i.e., personality traits, skills, and occupations, to see how the models assign these to men and women. Since the data used to train such systems can contain hidden stereotypes, those same biases may quietly carry over into the model’s behavior. Without checking or correcting for these biases, AI systems risk repeating and reinforcing real-world stereotypes in their answers.
To measure gender bias, they select a set of 5,000 pictures, balanced by gender and with no obvious job clues like uniforms or tools. Then, for the different personality traits and skills, they ask models whether the person shown in the image possesses them, or for occupations, whether they would be well-suited to do the job. Models are prompted to answer with Yes/No/Unsure, and prompts are phrased in multiple ways to avoid effects of specific formulations. As a bias metric, they track the model’s chance of saying “Yes”. The gap in those “Yes” scores for men versus women on the same question is considered a bias.
Once they find bias, they test different ways to fix it: retraining the model (fully or with parameter-efficient (low-rank) updates), changing prompts, pruning the model, while adding a small penalty that encourages neutral answers (like “Unsure”) when there’s no clear evidence in the image.
«We recommend using full fine-tuning for more aggressive gender bias reduction at the cost of greater performance loss and using LoRAs for more cautious debiasing while maintaining more of the original performance.»
Leander Girrbach et al.
MCML Junior Members
Results
The results showed a clear pattern of bias. Men were more often described with negative traits like “moody” or “arrogant”, while women were labeled with more positive ones. In terms of skills, men were linked to things like “leadership” and “working under pressure”, while women were more often said to be “good communicators” or “great at multitasking”.
Occupation biases closely matched real-world patterns, too: roles like “construction worker” and “mechanic” were mostly assigned to men, while jobs like “nurse” and “secretary” were more often linked to women, mirroring existing labor market inequalities. Many models also avoided saying “Unsure”, giving confident answers even when there wasn’t enough visual evidence.
When the researchers adjusted the models to be fairer, the results improved without a significant loss in performance on general tasks unrelated to gender bias. The most effective way to reduce bias is to train the model to be less biased. When training the entire model, bias is reduced the most. However, performance on unrelated tasks also takes a larger hit. Parameter-efficient fine-tuning reduces bias slightly less well, but preserves performance on other tasks better. These changes reduce gender bias by roughly half for personality traits and nearly eliminate it for skills and jobs. Other fixes, such as tweaking prompts or pruning model parameters, helped less and sometimes even hurt accuracy. After these adjustments, the models responded more reasonably, such as choosing “Unsure” when there was no clear visual proof, rather than guessing.
©Girrbach et al.
Figure 2: Qualitative results demonstrating the effect of different debiasing methods. For two images (female-labeled on top and male-labeled on bottom), the distributions over options before (red plot) and after debiasing (blue-green plots) for four different prompt variants.
Why it Matters
These models show up in hiring tools, education, support, and everyday apps. Unchecked bias equals scaled stereotypes. Therefore, it is necessary to test these models for potential biases, especially gender bias, and develop methods to reduce them.
Challenges
Though the authors understand that gender is not only a binary concept, due to data constraints, they used a binary male/female distinction to understand bias. Future work could address this limitation, and also add an axis of ethnicity in the analysis to understand how biases regarding it could perhaps be reflected in the output of LLMs.
Further Reading & Reference
If you’d like to dig deeper into how the team tested, quantified, and reduced gender bias in vision–language models or try the benchmark yourself, check out the full paper, presented at ICLR 2025, one of the world’s leading machine learning conferences.
Revealing and Reducing Gender Biases in Vision and Language Assistants (VLAs).
ICLR 2025 - 13th International Conference on Learning Representations. Singapore, Apr 24-28, 2025. URL
Share Your Research!
Get in touch with us!
Are you an MCML Junior Member and interested in showcasing your research on our blog?
We’re happy to feature your work—get in touch with us to present your paper.
Related
16.12.2025
Fabian Theis Featured in Handelsblatt on the Future of AI in Precision Medicine
MCML PI Fabian Theis discusses AI-driven precision medicine and its growing impact on individualized healthcare and biomedical research.
16.12.2025
Gitta Kutyniok Featured in VDI Nachrichten on AI Ethics
Gitta Kutyniok discusses measurable criteria for ethical AI, promoting safe and responsible autonomous decision-making.
16.12.2025
Hinrich Schütze Featured in WirtschaftsWoche on Innovative AI Approaches
Hinrich Schütze discusses Giotto.ai’s efficient AI models, highlighting memory separation and context-aware decoding to improve robustness.
16.12.2025
Xiaoxiang Zhu Featured in Focus Online on Global 3D Building Atlas
Xiaoxiang Zhu maps 2.75B buildings in 3D, revealing global urbanization, housing, and social inequalities using AI.