Home  | Tags | #p_plank

#p_plank

MWG+25

BlackboxNLP-2025 MIB Shared Task: Exploring Ensemble Strategies for Circuit Localization Methods

BlackboxNLP @EMNLP 2025

#p_plank #p_schuetze

BMP+25

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It

EMNLP 2025

#p_plank

CLK+25

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

EMNLP 2025

#p_plank

DMS+25

Reason to Rote: Rethinking Memorization in Reasoning

EMNLP 2025

#p_plank

HCP+25

LiTEx: A Linguistic Taxonomy of Explanations for Understanding Within-Label Variation in Natural Language Inference

EMNLP 2025

#p_plank

LCB+25

PERSEVAL: A Framework for Perspectivist Classification Evaluation

EMNLP 2025

#p_plank

TPF25

RACQUET: Unveiling the Dangers of Overlooked Referential Ambiguity in Visual LLMs

EMNLP 2025

#p_plank

WML+25

M-ABSA: A Multilingual Dataset for Aspect-Based Sentiment Analysis

EMNLP 2025

#p_kreuter #p_plank #p_schuetze

LBB+25

Make Every Letter Count: Building Dialect Variation Dictionaries From Monolingual Corpora

Findings @EMNLP 2025

#p_plank

LWK+25a

Tracing Multilingual Factual Knowledge Acquisition in Pretraining

Findings @EMNLP 2025

#p_plank #p_schuetze

ZCP+25

MAKIEval: A Multilingual Automatic WiKidata-Based Framework for Cultural Awareness Evaluation for LLMs

Findings @EMNLP 2025

#p_hedderich #p_plank

ZPL+25

What Media Frames Reveal About Stance: A Dataset and Study About Memes in Climate Change Discourse

Findings @EMNLP 2025

#p_plank

ZHK+25a

Evaluating Large Language Models for Cross-Lingual Retrieval

Findings @EMNLP 2025

#p_plank

LCP+25a

LeWiDi-2025 at NLPerspectives: The Third Edition of the Learning With Disagreements Shared Task

LeWiDi @EMNLP 2025

#p_plank

EMK+25

Aligning NLP Models With Target Population Perspectives Using PAIR: Population-Aligned Instance Replication

NLPerspectives @EMNLP 2025

#p_kern #p_kreuter #p_plank

BWP+25

Preprint (Oct. 2025)

#p_plank

HCP+25a

Agree, Disagree, Explain: Decomposing Human Label Variation in NLI Through the Lens of Explanations

Preprint (Oct. 2025)

#p_plank

MCS+25

Too Open for Opinion? Embracing Open-Endedness in Large Language Models for Social Simulation

Preprint (Oct. 2025)

#p_kreuter #p_plank

OMP+25

If Probable, Then Acceptable? Understanding Conditional Acceptability Judgments in Large Language Models

Preprint (Oct. 2025)

#p_plank

RPB+25

BoN Appetit Team at LeWiDi-2025: Best-of-N Test-Time Scaling Can Not Stomach Annotation Disagreements (Yet)

Preprint (Oct. 2025)

#p_plank

WJP+25

Is It Thinking or Cheating? Detecting Implicit Reward Hacking by Measuring Reasoning Effort

Preprint (Oct. 2025)

#p_plank

XTP25

From Noise to Signal to Selbstzweck: Reframing Human Label Variation in the Era of Post-Training in NLP

Preprint (Oct. 2025)

#p_plank

BBB+25

LLMs Instead of Human Judges? a Large Scale Empirical Study Across 20 NLP Evaluation Tasks

ACL 2025

#p_plank

ELP+25

Probing LLMs for Multilingual Discourse Generalization Through a Unified Label Set

ACL 2025

#p_hedderich #p_plank

HWZ+25

What's the Difference? Supporting Users in Identifying the Effects of Prompt and Model Changes Through Token Patterns

ACL 2025

#p_hedderich #p_plank

MLZ+25

Pragmatics in the Era of Large Language Models: A Survey on Datasets, Evaluation, Opportunities and Challenges

ACL 2025

#p_kreuter #p_plank

MYH+25

Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study

ACL 2025

#p_bischl #p_kreuter #p_plank

MWP25

Circuit Compositions: Exploring Modular Structures in Transformer-Based Language Models

ACL 2025

#p_plank

SFP25

Do LLMs Give Psychometrically Plausible Responses in Educational Assessments?

BEA @ACL 2025

#p_plank

BFT25

Analyzing the Effect of Linguistic Similarity on Cross-Lingual Transfer: Tasks and Experimental Setups Matter

Findings @ACL 2025

#p_plank

CPK+25

A Rose by Any Other Name: LLM-Generated Explanations Are Good Proxies for Human Explanations to Collect Label Distributions on NLI

Findings @ACL 2025

#p_plank

GAB+25

Revisiting Active Learning Under (Human) Label Variation

Preprint (Jul. 2025)

#p_bischl #p_kauermann #p_plank

BWF+25

A Multi-Dialectal Dataset for German Dialect ASR and Dialect-to-Standard Speech Translation

Preprint (Jun. 2025)

#p_plank

CLP+25

Evaluation Should Not Ignore Variation: On the Impact of Reference Set Choice on Summarization Metrics

Preprint (Jun. 2025)

#p_plank

JSH+25

MultiplEYE: Creating a Multilingual Eye-Tracking-While-Reading Corpus

ETRA 2025

#p_plank

EDM+25

Grokking ExPLAIND: Unifying Model, Data, and Training Attribution to Study Model Behavior

Preprint (May. 2025)

#p_hedderich #p_plank

SDH+25

Languages in Multilingual Speech Foundation Models Align Both Phonetically and Semantically

Preprint (May. 2025)

#p_plank

WWL+25

Refusal Direction Is Universal Across Safety-Aligned Languages

Preprint (May. 2025)

#p_plank #p_schuetze

SP25

Dialetto, Ma Quanto Dialetto? Transcribing and Evaluating Dialects on a Continuum

Findings @NAACL 2025

#p_plank

MES+25

Lost in Inference: Rediscovering the Role of Natural Language Inference for Large Language Models

NAACL 2025

#p_plank

Bla25

Beyond 'Noisy' Text: How (And Why) to Process Dialect Data

W-NUT @NAACL 2025

#p_plank

WHR+25

Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation

ICLR 2025

#p_plank

MZR+25

Enabling Systematic Generalization in Abstract Spatial Reasoning Through Meta-Learning for Compositionality

Preprint (Apr. 2025)

#p_plank

SWZ+25

Think Before Refusal : Triggering Safety Reflection in LLMs to Mitigate False Refusal Behavior

Preprint (Mar. 2025)

#p_navab #p_plank

LFP25

Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA

AAAI 2025

#p_plank #p_seidl

FMB+25

Using Natural Language Processing to Analyse Text Data in Behavioural Science

Nature Reviews Psychology 4. Feb. 2025

#p_feuerriegel #p_plank

XSE+25

Better Aligned With Survey Respondents or Training Data? Unveiling Political Leanings of LLMs on U.S. Supreme Court Cases

Preprint (Feb. 2025)

#p_plank

LKB+25

Cross-Dialect Information Retrieval: Information Access in Low-Resource and High-Variance Languages

COLING 2025

#p_plank

MBP25

Evaluating Pixel Language Models on Non-Standardized Languages

COLING 2025

#p_plank

BKP25

Add Noise, Tasks, or Layers? MaiNLP at the VarDial 2025 Shared Task on Norwegian Dialectal Slot and Intent Detection

VarDial @COLING 2025

#p_plank

KBP25

Improving Dialectal Slot and Intent Detection With Auxiliary Tasks: A Multi-Dialectal Bavarian Case Study

VarDial @COLING 2025

#p_plank

LPP+25

Neural Text Normalization for Luxembourgish Using Real-Life Variation Data

VarDial @COLING 2025

#p_plank

ZLW+24

FinerCut: Finer-Grained Interpretable Layer Pruning for Large Language Models

Compression Workshop @NeurIPS 2024

#p_bischl #p_plank

BCF+24a

PERSEID - Perspectivist Irony Detection: A CALAMITA Challenge

CLiC-It 2024

#p_plank

BCW+24

Data Augmentation Through Back-Translation for Stereotypes and Irony Detection

CLiC-It 2024

#p_plank

FPS+24

GFG - Gender-Fair Generation: A CALAMITA Challenge

CLiC-It 2024

#p_plank

LAS+24

GDTB: Genre Diverse Data for English Shallow Discourse Parsing Across Modalities, Text Types, and Domains

EMNLP 2024

#p_plank

MP24b

Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

EMNLP 2024

#p_plank

BCL+24

I’m Sure You’re a Real Scholar Yourself: Exploring Ironic Content Generation by Large Language Models

Findings @EMNLP 2024

#p_plank

CWP+24

'Seeing the Big Through the Small': Can LLMs Approximate Human Judgment Distributions on NLI From a Few Explanations?

Findings @EMNLP 2024

#p_plank

MWH+24

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Findings @EMNLP 2024

#p_hedderich #p_kreuter #p_plank

SLF+24

To Know or Not to Know? Analyzing Self-Consistency of Large Language Models Under Ambiguity

Findings @EMNLP 2024

#p_plank

WZP+24

MultiClimate: Multimodal Stance Detection on Climate Change Videos

NLP4PI @EMNLP 2024

#p_plank

MP24a

Beyond Accuracy: Evaluating the Reasoning Behavior of Large Language Models--A Survey

COLM 2024

#p_plank

WHM24

Look at the Text: Instruction-Tuned Language Models Are More Robust Multiple Choice Selectors Than You Think

COLM 2024

#p_kreuter #p_plank

BKP+24

MaiBaam Annotation Guidelines

Preprint (Oct. 2024)

#p_plank

CWM+24

Understanding When Tree of Thoughts Succeeds: Larger Models Excel in Generation, Not Discrimination

Preprint (Oct. 2024)

#p_hedderich #p_plank

BPS+24

What Do Dialect Speakers Want? a Survey of Attitudes Towards Language Technology for German Dialects

ACL 2024

#p_plank #p_schuetze

MP24

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

ACL 2024

#p_plank

WPM+24

VariErr NLI: Separating Annotation Error From Human Label Variation

ACL 2024

#p_plank

XTI+24

ACL 2024

#p_plank

ZPB24

CLIMATELI: Evaluating Entity Linking on Climate Change Data

ClimateNLP @ACL 2024

#p_plank

WMH+24

My Answer Is C: First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

Findings @ACL 2024

#p_kreuter #p_plank

EPK24

Position: Insights From Survey Methodology Can Improve Training Data

ICML 2024

#p_kreuter #p_plank

ZSP+24a

MaiNLP at SemEval-2024 Task 1: Analyzing Source Language Selection in Cross-Lingual Textual Relatedness

SemEval @NAACL 2024

#p_plank

BKB+24

MaiBaam: A Multi-Dialectal Bavarian Universal Dependency Treebank

LREC-COLING 2024

#p_plank #p_schuetze

MP24c

IndirectQA: Understanding Indirect Answers to Implicit Polar Questions in French and Spanish

LREC-COLING 2024

#p_plank

PSS+24

Sebastian, Basti, Wastl?! Recognizing Named Entities in Bavarian Dialectal Data

LREC-COLING 2024

#p_plank

WJG+24

Slot and Intent Detection Resources for Bavarian and Lithuanian: Assessing Translations vs Natural Queries to Digital Assistants

LREC-COLING 2024

#p_plank

ZWH+24

Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons

LREC-COLING 2024

#p_plank #p_schuetze

GHA+24

More Labels or Cases? Assessing Label Variation in Natural Language Inference

UnImplicit 2024

#p_bischl #p_plank

PSL+24

Different Tastes of Entities: Investigating Human Label Variation in Named Entity Annotations

UnImplicit 2024

#p_plank

ABP24

Exploring the Robustness of Task-Oriented Dialogue Systems for Colloquial German Varieties

EACL 2024

#p_plank

BFP+24

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

EACL 2024

#p_plank

ZGK+24

NNOSE: Nearest Neighbor Occupational Skill Extraction

EACL 2024

#p_plank

ZGP24

Entity Linking in the Job Market Domain

Findings @EACL 2024

#p_plank

SPP+24

EEVEE: An Easy Annotation Tool for Natural Language Processing

LAW @EACL 2024

#p_plank

WLA+24a

Donkii: Characterizing and Detecting Errors in Instruction-Tuning Datasets

LAW @EACL 2024

#p_plank

ZWS+23

LoHoRavens: A Long-Horizon Language-Conditioned Benchmark for Robotic Tabletop Manipulation

Robot Learning @NeurIPS 2023

#p_plank #p_schuetze

GBA+23

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

EMNLP 2023

#p_plank

LMG+23

Establishing Trustworthiness: Rethinking Tasks and Model Evaluation

EMNLP 2023

#p_plank

WP23

ACTOR: Active Learning With Annotator-Specific Classification Heads to Embrace Human Label Variation

EMNLP 2023

#p_plank

XTI+23

From Dissonance to Insights: Dissecting Disagreements in Rationale Construction for Case Outcome Classification

EMNLP 2023

#p_plank

MGP+23

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts During Language Model Training

Findings @EMNLP 2023

#p_plank

WP23a

ActiveAED: A Human in the Loop Improves Annotation Error Detection

Findings @ACL 2023

#p_plank

BDI+23

Uncertainty in Natural Language Generation: From Theory to Applications

Preprint (Jul. 2023)

#p_plank

BSP23b

A Survey of Corpora for Germanic Low-Resource Languages and Dialects

NoDaLiDa 2023

#p_plank #p_schuetze

WWS+23a

How to Distill Your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives

EACL 2023

#p_plank #p_schuetze

BSP23a

Does Manipulating Tokenization Aid Cross-Lingual Transfer? a Study on POS Tagging for Non-Standardized Languages

VarDial @EACL 2023

#p_plank #p_schuetze

BAP+22

Stop Measuring Calibration When Humans Disagree

EMNLP 2022

#p_plank

BMZ+22

Evidence > Intuition: Transferability Estimation for Encoder Selection

EMNLP 2022

#p_plank

MVP22

Spectral Probing

EMNLP 2022

#p_plank

Pla22

The 'Problem' of Human Label Variation: On Ground Truth in Data, Modeling and Evaluation

EMNLP 2022

#p_plank

BP22a

CrossRE: A Cross-Domain Dataset for Relation Extraction

Findings @EMNLP 2022

#p_plank

UBM+22

Experimental Standards for Deep Learning in Natural Language Processing Research

Findings @EMNLP 2022

#p_plank
Back to Top