05.06.2022

Teaser image to

MCML Researchers With Two Papers at NAACL 2022

Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2022). Seattle, WA, USA, 10.06.2022–15.06.2022

We are happy to announce that MCML researchers are represented with two papers at NAACL 2022. Congrats to our researchers!

Findings Track (2 papers)

V. Steinborn, P. Dufter, H. Jabbar and H. Schütze.
An Information-Theoretic Approach and Dataset for Probing Gender Stereotypes in Multilingual Masked Language Models.
NAACL 2022 - Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Seattle, WA, USA, Jun 10-15, 2022. DOI
Abstract

Bias research in NLP is a rapidly growing and developing field. Similar to CrowS-Pairs (Nangia et al., 2020), we assess gender bias in masked-language models (MLMs) by studying pairs of sentences with gender swapped person references.Most bias research focuses on and often is specific to English.Using a novel methodology for creating sentence pairs that is applicable across languages, we create, based on CrowS-Pairs, a multilingual dataset for English, Finnish, German, Indonesian and Thai.Additionally, we propose SJSD, a new bias measure based on Jensen–Shannon divergence, which we argue retains more information from the model output probabilities than other previously proposed bias measures for MLMs.Using multilingual MLMs, we find that SJSD diagnoses the same systematic biased behavior for non-English that previous studies have found for monolingual English pre-trained MLMs. SJSD outperforms the CrowS-Pairs measure, which struggles to find such biases for smaller non-English datasets.

MCML Authors
Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Computational Linguistics


M. Zhao, F. Mi, Y. Wang, M. Li, X. Jiang, Q. Liu and H. Schütze.
LMTurk: Few-Shot Learners as Crowdsourcing Workers in a Language-Model-as-a-Service Framework.
NAACL 2022 - Findings of the Annual Conference of the North American Chapter of the Association for Computational Linguistics. Seattle, WA, USA, Jun 10-15, 2022. DOI
Abstract

Vast efforts have been devoted to creating high-performance few-shot learners, i.e., large-scale pretrained language models (PLMs) that perform well with little downstream task training data. Training PLMs has incurred significant cost, but utilizing the few-shot learners is still challenging due to their enormous size. This work focuses on a crucial question: How to make effective use of these few-shot learners? We propose LMTurk, a novel approach that treats few-shotlearners as crowdsourcing workers. The rationale is that crowdsourcing workers are in fact few-shot learners: They are shown a few illustrative examples to learn about a task and then start annotating. LMTurk employs few-shot learners built upon PLMs as workers. We show that the resulting annotations can be utilized to train models that solve the task well and are small enough to be deployable in practical scenarios. Active learning is integrated into LMTurk to reduce the amount of queries made to PLMs, minimizing the computational cost of running PLM inference passes. Altogether, LMTurk is an important step towards making effective use of current PLMs.

MCML Authors
Link to Profile Hinrich Schütze

Hinrich Schütze

Prof. Dr.

Computational Linguistics


05.06.2022


Subscribe to RSS News feed

Related

Link to From Physics Dreams to Algorithm Discovery - with Niki Kilbertus

13.08.2025

From Physics Dreams to Algorithm Discovery - With Niki Kilbertus

Niki Kilbertus develops AI algorithms to uncover cause and effect, making science smarter and decisions in fields like medicine more reliable.

Link to AI for Dynamic Urban Mapping - with researcher Shanshan Bai

11.08.2025

AI for Dynamic Urban Mapping - With Researcher Shanshan Bai

Shanshan Bai uses geo-tagged social media and AI to map cities in real time. Part of KI Trans, funded by DATIpilot to support AI in education.

Link to What is intelligence—and what kind of intelligence do we want in our future? With Sven Nyholm

06.08.2025

What Is Intelligence—and What Kind of Intelligence Do We Want in Our Future? With Sven Nyholm

Sven Nyholm explores how AI reshapes authorship, responsibility and creativity, calling for democratic oversight in shaping our AI future.

Link to AI for better Social Media - with researcher Dominik Bär

04.08.2025

AI for Better Social Media - With Researcher Dominik Bär

Dominik Bär develops AI for real-time counterspeech to combat hate and misinformation, part of the KI Trans project on AI in education.

Link to Fabian Theis receives 2025 ISCB Innovator Award

01.08.2025

Fabian Theis Receives 2025 ISCB Innovator Award

Fabian Theis receives 2025 ISCB Innovator Award for advancing AI in biology and mentoring the next generation of scientists.