Přístupnostní navigace
E-application
Search Search Close
Doctoral Thesis
Author of thesis: Ing. Ladislav Mošner, Ph.D.
Acad. year: 2024/2025
Supervisor: prof. Dr. Ing. Jan Černocký
Reviewers: Marc Delcroix, Prof. Dr. Reinhold Häb-Umbach
Far-field speech processing has gained increasing attention in recent years with the advent of smart speakers, home assistants, and meeting transcription systems. To support these applications, robust far-field speech processing techniques are required. A key task enabling personalized interaction is speaker verification. Compared to close-talking conditions, far-field systems face additional challenges such as reverberation and background noise, which degrade the target speech. To mitigate these effects, far-field devices typically employ microphone arrays that provide spatial information. These challenges and opportunities motivate this thesis, focusing on multi-channel speaker verification. Despite significant progress in related fields of speech processing, multi-channel speaker verification remains underexplored, hindered by limited data resources and specialized techniques. This thesis focuses on both aspects. On the data side, we repurposed existing publicly available corpora and created the MultiSV dataset, which provides simulated multi-channel mixtures with speech/noise training targets and speaker labels. MultiSV also defines multiple evaluation protocols based on retransmitted recordings, supporting various scenarios, such as single clean versus multi-channel corrupted enrollment. To support training more data-demanding models, we further introduced an extended dataset, MultiSV2. On the modeling side, we first approached multi-channel speaker embedding extraction using a cascaded strategy, decomposing the problem into multi-channel preprocessing and single-channel embedding extraction. Motivated by advances in speech separation, we designed models ranging from signal-processing-based methods to hybrid neural network and beamforming front-ends. Notably, we proposed direct and indirect mask prediction for mask-based beamforming, and the reference-channel attention (RCA) combiner, which generalizes single-channel separation models to multi-channel inputs. Recognizing the limitations of cascaded models, such as error propagation and different objectives of the modules, we next explored unified architectures for multi-channel embedding extraction. Leveraging MultiSV2, we fine-tuned cascaded components jointly with the end-task loss, and subsequently introduced METRO, a general framework that extends self-supervised speech representation models to multi-channel settings. METRO yields multi-channel speaker embeddings. However, it is general and potentially applicable to other speech processing tasks.
multi-channel speaker verification, microphone arrays, beamforming, speech separation, speaker embedding extraction, MultiSV
Date of defence
14.01.2026
Result of the defence
Defended (thesis was successfully defended)
Process of defence
The student presented the goals and results that he achieved within the solution of the dissertation. The student has competently answered the questions of the committee members and reviewers. The discussion is recorded on the discussion sheets, which are attached to the protocol. Number of discussion sheets: 4. The committee has agreed unanimously that the student has fulfilled the requirements for being awarded the academic title Ph.D. The committee unanimously recommends, and the opponents support, to awarding the thesis the Dean's Award for an exceptionally high-quality dissertation. The candidate presented excellent technical results, excellent presentation and pedagogical skills and excellent publication activity including Google Scholar h-index of 14.
Language of thesis
English
Faculty
Fakulta informačních technologií
Department
Department of Computer Graphics and Multimedia
Study programme
Information Technology (DIT)
Composition of Committee
doc. Ing. Zdeněk Vašíček, Ph.D. (předseda) prof. Ing. Zbyněk Koldovský, Ph.D. (člen) doc. Ing. Pavel Král, Ph.D. (člen) doc. Ing. Jiří Schimmel, Ph.D. (člen) doc. RNDr. Petr Sojka, Ph.D. (člen)
Supervisor’s reportprof. Dr. Ing. Jan Černocký
Reviewer’s reportMarc Delcroix
Reviewer’s reportProf. Dr. Reinhold Häb-Umbach
Responsibility: Mgr. et Mgr. Hana Odstrčilová