We investigate the performance difference between training generic and task-based systems for the automatic detection of patients with Amyotrophic Lateral Sclerosis (ALS) from speech. We exploit the paralinguistic information embedded in their speech while producing the sustained vowel /a:/, repeating the syllables /da/-/da/ and /da/-/ba/ – separately –, reading a text passage, and describing a picture. While the former system consists of a single model, the latter is composed of five task-dedicated models, each one in charge of processing the speech samples corresponding to each task. We also analyse the performance of each task-dedicated model individually. We conduct our experiments on the novel, German-speaking AIMnd dataset. The obtained results – assessed in terms of the Unweighted Average Recall (UAR) – indicate that the task-based systems outperform the generic ones in two out of the four scenarios explored. The generic system only outperforms the task-based system in one scenario. In terms of the task-dedicated models, the SVClinear-based classifier exploiting the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS) extracted from the sustained vowel /a:/ production task yields the best performance on the Test set with a UAR of 92%.
inproceedings MGH+25
BibTeXKey: MGH+25