Home  | Publications | MTR+26a

Leveraging Sample Difficulty in Computer Audition Analysis

MCML Authors

Link to Profile Björn Schuller

Björn Schuller

Prof. Dr.

Core PI

Abstract

Despite the abundance and variety of computer audition (CA) tasks, only a few studies have investigated the delicate interplay between task data and deep learning model training. From a data perspective, the current literature lacks explanations of how CA-specific dataset characteristics influence model training, and why some samples are harder to learn than others. To bridge this gap, we leverage model-based estimations of sample difficulty as a tool to identify hard and easy samples from a dataset, allowing us to dive into aspects of difficulty in three common but dissimilar CA tasks: acoustic scene classification, speech command recognition and music genre recognition. Our results indicate that the difficulty of training data can provide a good estimation of test performance on a class-level. We further identify distributional differences between hard and easy samples, which, in the case of the speech commands dataset, correspond to wrongly labelled or non-speech samples and an undesirable model focus on the edges of the input. Finally, we analyse how the inclusion and exclusion of the easiest and hardest samples within datasets impacts model training.

article MTR+26a


IEEE Access

11.2023. May. 2026.
Top Journal

Authors

M. MillingA. Triantafyllopoulos • S. Rampp • A. Akman • B. W. Schuller

Links

DOI

Research Area

 B3 | Multimodal Perception

BibTeXKey: MTR+26a

Back to Top