Master's Thesis

Automatic detection of hypokinetic dysarthria using x-vectors

Final Thesis 3.58 MB Appendix 464.42 kB

Author of thesis: Bc. Josef Macek

Acad. year: 2025/2026

Abstract:

The aim of this master’s thesis is to evaluate the applicability of interpretable speech
biomarkers and deep speech representations for the automatic classification of individuals
with Parkinson’s disease and healthy controls. The thesis is based on the assumption
that Parkinson’s disease affects speech production, particularly in relation to hypokinetic
dysarthria, and that these changes can be captured using acoustic and deep speech
features.
The experimental part is based on a database of spontaneous monologues. Adjusted
interpretable speech biomarkers and x-vector representations extracted using a pretrained
ECAPA-TDNN model are used. Three approaches for combining both types of features
were designed and compared: late fusion, early fusion, and hybrid fusion. The biomarkerbased models as well as the early and hybrid fusion models use the XGBoost algorithm
with hyperparameter tuning performed using Optuna, whereas the x-vector branch and
the late-fusion meta-model are based on logistic regression. The final evaluation was
performed using the leave-one-out method.
The results show that x-vector representations provide stronger discriminative information than standalone interpretable biomarkers. The standalone x-vector model based on
logistic regression achieved an accuracy of 0.7519 and an AUC value of 0.8045, while
the biomarker-based XGBoost model achieved an accuracy of 0.6589 and an AUC value
of 0.6878. Fusion approaches made it possible to combine the performance of deep representations with the information contained in interpretable biomarkers; however, their
contribution depended on the specific feature combination strategy. Among the fusion
approaches, hybrid fusion achieved the best results, with an accuracy of 0.7597 and an
AUC value of 0.8117. The results confirm the potential of automatic speech analysis
for assessing speech manifestations of Parkinson’s disease; however, they must be interpreted with respect to the limited size of the dataset, the validation protocol used, and
the possible influence of individual differences between speakers.

Keywords:

ECAPA-TDNN, feature fusion, hypokinetic dysarthria, Optuna, Parkinson’s disease, speech biomarker, XGBoost, x-vector

Date of defence

11.06.2026

Result of the defence

Defended (thesis was successfully defended)

znamkaAznamka

Grading

Process of defence

Student prezentoval výsledky své práce a komise byla seznámena s posudky. Student obhájil diplomovou práci a odpověděl na otázky členů komise a oponenta. Otázky oponenta: Čím si vysvětlujete zlepšení výsledků při využití principu hybridní fúze v porovnání s principy časné a pozdní fúze?

Language of thesis

Czech

Faculty

Fakulta elektrotechniky a komunikačních technologií

Department

Department of Telecommunications

Study programme

Audio Engineering (MPC-AUD)

Specialization

Audio Production and Recording (AUDM-ZVUK)

Composition of Committee

prof. Ing. Zdeněk Smékal, CSc. (předseda)
Ing.MgA. Edgar Mojdl, Ph.D. (místopředseda)
Dr. Ing. Libor Husník (člen)
Ing. Václav Mach, Ph.D. (člen)
Ing. Matěj Ištvánek, Ph.D. (člen)

Supervisor’s report
Ing. Daniel Kováč, Ph.D.

Student během řešení diplomové práce pracoval aktivně, pravidelně konzultoval postup řešení a průběžně reagoval na připomínky. Prokázal schopnost samostatně pracovat s odbornou literaturou a osvojit si metody z oblasti automatické analýzy řeči a strojového učení. Stanovené cíle práce byly splněny v plném rozsahu. Student provedl rešerši problematiky, navrhl a implementoval experimentální systém a realizoval rozsáhlé experimenty zaměřené na porovnání různých přístupů ke klasifikaci řečových nahrávek. Oceňuji zejména systematické vyhodnocení dosažených výsledků. Technická zpráva je přehledně strukturovaná a na velmi dobré odborné úrovni. Výsledky jsou prezentovány srozumitelně a vhodně diskutovány. V práci se místy objevují méně přesné formulace a některé části by si zasloužily podrobnější teoretické zdůvodnění, tyto nedostatky však nesnižují celkovou kvalitu předloženého řešení. Student prokázal schopnost samostatně řešit odborný problém a splnil požadavky kladené na diplomovou práci. Práci celkově hodnotím výborně. Points proposed by supervisor: 94

Grade proposed by supervisor: A

Reviewer’s report
Ing. Richard Ladislav

Téma diplomové práce, zaměřené na detekci hypokinetické dysartrie pomocí jednorozměrných hlubokých reprezentací řečového signálu, je velice aktuální. Text samotné technické zprávy je na vysoké jazykové úrovni, pracuje s aktuální literaturou a i po formální stránce je mu těžko co vytknout. Z metodologického hlediska student používá robustní a validní postupy. Obzvlášť oceňuji adresování problému zavádějících faktorů. Zajímavým postupem je také využití analýzy hlavních komponent pro navazující redukci rozměru latentního prostoru x-vektorů. Z výše uvedených důvodů proto usuzuji, že student splnil cíle práce v plné míře. Topics for thesis defence:

Čím si vysvětlujete zlepšení výsledků při využití principu hybridní fúze v porovnání s principy časné a pozdní fúze?

Points proposed by reviewer: 93

Grade proposed by reviewer: A

Responsibility: Mgr. et Mgr. Hana Odstrčilová

VUT

Faculties and university institutes

Parts

Automatic detection of hypokinetic dysarthria using x-vectors