Doctoral Thesis

Improving Robustness of Speaker Recognition using Discriminative Techniques

Final Thesis 7.17 MB Summary of Thesis 7.17 MB

Author of thesis: Ing. Ondřej Novotný, Ph.D.

Acad. year: 2021/2022

Abstract:

This work deals with discriminative techniques in speaker verification systems to improve robustness of the systems against factors that negatively affect their performance. These factors include noise, reverberation, or the transmission channel.

The thesis consists of two main parts. In the first part, it deals with a theoretical introduction to current state-of-the-art speaker verification systems. The recognition system's steps are described, starting from the extraction of acoustic features, the extraction of vector representations of recordings, and the final recognition score computation. Particular emphasis is paid to the techniques of extraction of a vector representation of a recording, where we describe two different paradigms: the i-vectors and the x-vectors.
The second part of the work focuses more on discriminative techniques to increase robustness. Their description is organized to match the gradual passage of the recording through the verification system. First, attention is paid to signal pre-processing using a neural network for noise reduction and speech enhancement. This pre-processing is a universal technique independent of the verification system.
The work follows by focusing on the use of a discriminative approach in the extraction of features and the extraction of vector representations of recordings.

Furthermore, this work sheds light on the transition from generative systems to discriminative systems.
In order to give a fuller context, the work also describes techniques that had historically preceded this transition. All presented techniques are always experimentally verified and their advantages evaluated.
We are proposing several techniques that have proved successful in both the generative approach in the form of i-vectors and discriminative x-vectors, and thanks to them, considerable improvement has been achieved.
For completeness, in the field of robustness, other techniques are included in the work, such as normalization of scores or multi-condition training.
Finally, the work deals with the robustness of discriminative systems in terms of data used in their training.

Keywords:

Speaker verification, generative training, discriminative training, speech enhancement, i-vector, x-vector, robustness, noise, reverberation, neural networks.

Date of defence

03.12.2021

Result of the defence

Defended (thesis was successfully defended)

znamkaPznamka

Process of defence

Student přednesl cíle a výsledky, kterých v rámci řešení disertační práce dosáhl. V rozpravě student odpověděl na otázky komise a oponentů a hostů. Diskuze je zaznamenána na diskuzních lístcích, které jsou přílohou protokolu. Počet diskuzních lístků: 1 Komise se v závěru jednomyslně usnesla, že student splnil podmínky pro udělení akademického titulu doktor.

Language of thesis

English

Faculty

Fakulta informačních technologií

Department

Department of Computer Graphics and Multimedia

Study programme

Computer Science and Engineering (CSE-PHD-4)

Field of study

Computer Science and Engineering (DVI4)

Composition of Committee

prof. Ing. Martin Drahanský, Ph.D. (předseda)
prof. Ing. Adam Herout, Ph.D. (člen)
doc. RNDr. Aleš Horák, Ph.D. (člen)
doc. Ing. Radim Kolář, Ph.D. (člen)
doc. Ing. Petr Pollák, CSc. (člen)

Supervisor’s report
prof. Dr. Ing. Jan Černocký

File inserted by supervisor	Size
Hodnocení školitele [.pdf]	72,27 kB

Reviewer’s report
Luciana Ferrer

File inserted by the reviewer	Size
Posudek oponenta [.pdf]	75,35 kB

Reviewer’s report
Petr Pollák

File inserted by the reviewer	Size
Posudek oponenta [.pdf]	48,35 kB

Responsibility: Mgr. et Mgr. Hana Odstrčilová

VUT

Faculties and university institutes

Parts

Improving Robustness of Speaker Recognition using Discriminative Techniques