Despite the abundance and variety of computer audition (CA) tasks, only a few studies have investigated the delicate interplay between task data and deep learning model training. From a data perspective, the current literature lacks explanations of how CA-specific dataset characteristics influence model training, and why some samples are harder to learn than others. To bridge this gap, we leverage model-based estimations of sample difficulty as a tool to identify hard and easy samples from a dataset, allowing us to dive into aspects of difficulty in three common but dissimilar CA tasks: acoustic scene classification, speech command recognition and music genre recognition. Our results indicate that the difficulty of training data can provide a good estimation of test performance on a class-level. We further identify distributional differences between hard and easy samples, which, in the case of the speech commands dataset, correspond to wrongly labelled or non-speech samples and an undesirable model focus on the edges of the input. Finally, we analyse how the inclusion and exclusion of the easiest and hardest samples within datasets impacts model training.
article MTR+26a
BibTeXKey: MTR+26a