Přístupnostní navigace
E-application
Search Search Close
Master's Thesis
Author of thesis: Bc. Tereza Beránková
Acad. year: 2025/2026
Supervisor: Ing. Kryštof Novotný
Reviewer: Ing. Richard Ladislav
This master’s thesis focuses on the detection of Hypokinetic Dysarthria (HD) using acoustic models based on Self-Supervised Learning (SSL). HD is a common manifestation of neurodegenerative diseases, and its early detection is crucial for monitoring disease progression. Current clinical diagnostics, however, often rely on subjective assessment, creating a need for objective and automated tools based on speech signal analysis. The main objective of this thesis is to evaluate the applicability of acoustic embeddings extracted from pretrained models for the automatic detection of this pathology. The thesis includes a literature review of the current state of knowledge regarding the manifestations of HD, an analysis of encoder architectures, and the use of acoustic embeddings for HD detection. Based on the theoretical background, research questions are formulated and Python scripts are developed for embedding extraction, statistical aggregation, and classification. The obtained results are compared using classification metrics in order to determine the most suitable combination of classifier, aggregation method, and SSL model for the employed dataset. A smaller dataset consisting of 53 healthy controls and 101 patients with Parkinson’s Disease (PD) was used for the experimental evaluation. The experiments utilized Wav2Vec 2.0, Whisper, and HuBERT models in various size variants, together with the eXtreme Gradient Boosting (XGB), Support Vector Machine (SVM), and Multi-Layer Perceptron (MLP) classifiers. The main outcome of the thesis is an evaluation of the effectiveness of selected models for the automatic classification of patients with HD. The best performance was achieved by the Whisper model, specifically Whisper Large combined with embedding aggregation using the mean and Standard Deviation (SD) together with the SVM classifier. This configuration achieved the best value of the evaluated metric Area Under the Receiver Operating Characteristic Curve (ROC--AUC). Very good results across the remaining evaluated metrics were also achieved by the Whisper Tiny and HuBERT XLarge models.
Hypokinetic Dysarthria, speech analysis, Self--Supervised learning, acoustic embeddings, Transformer, Wav2Vec 2.0, HuBERT, Whisper, encoder, embedding extraction, embedding aggregation, classification, eXtreme Gradient Boosting, Support Vector Machine, Multi-Layer Perceptron
Date of defence
11.06.2026
Result of the defence
Defended (thesis was successfully defended)
Grading
A
Process of defence
Studentka prezentovala výsledky své práce a komise byla seznámena s posudky. Otázky oponenta a komise: Adresovali jste nějak doplňování nulami (zero padding) při agregaci vektorů? Pokud ano, jak? V práci reportujete nejlepší průměrnou agregační metodu jako kombinaci mezikvartilního intervalu a mediánu. Jak byste tento fakt interpretovala? Jaké nejvyšší úspěšnosti bylo při klasifikaci dosaženo? Studentka obhájila diplomovou práci a odpověděla na otázky členů komise a oponenta.
Language of thesis
Czech
Faculty
Fakulta elektrotechniky a komunikačních technologií
Department
Department of Telecommunications
Study programme
Audio Engineering (MPC-AUD)
Specialization
Audio Production and Recording (AUDM-ZVUK)
Composition of Committee
PhDr. Aleš Dvořák (člen) prof. Ing. Jiří Mekyska, Ph.D. (předseda) doc. Ing. MgA. Mgr. Dan Dlouhý, Ph.D. (místopředseda) Ing. Miroslav Balík, Ph.D. (člen) Ing. Michal Švento (člen)
Supervisor’s reportIng. Kryštof Novotný
Grade proposed by supervisor: A
Reviewer’s reportIng. Richard Ladislav
Grade proposed by reviewer: A
Responsibility: Mgr. et Mgr. Hana Odstrčilová