Přístupnostní navigace
E-application
Search Search Close
Publication result detail
PÁLKA, P.; LANDINI, F.; KLEMENT, D.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; DELCROIX, M.; BURGET, L.
Original Title
Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization
English Title
Type
Paper in proceedings outside WoS and Scopus
Original Abstract
In spite of the popularity of end-to-end diarization systems nowadays, modular systems comprised of voice activity detection (VAD), speaker embedding extraction plus clustering, and overlapped speech detection (OSD) plus handling still attain competitive performance in many conditions. However, one of the main drawbacks of modular systems is the need to run (and train) different modules independently. In this work, we propose an approach to jointly train a model to produce speaker embeddings, VAD and OSD simultaneously and reach competitive performance at a fraction of the inference time of a modular approach. Furthermore, the joint inference leads to a simplified overall pipeline which brings us one step closer to a unified clustering-based method that can be trained end-to-end towards a diarization-specific objective.
English abstract
Keywords
speaker diarization, speaker embedding, voice activity detection, overlapped speech detection
Key words in English
Authors
Released
08.09.2025
Publisher
IEEE Signal Processing Society
Location
Palermo
ISBN
978-9-46-459362-4
Pages from
31
Pages to
35
Pages count
5
URL
https://www.fit.vut.cz/research/publication/13567/
BibTex
@inproceedings{BUT198669, author="Petr {Pálka} and Federico Nicolás {Landini} and Dominik {Klement} and Mireia {Diez Sánchez} and Anna {Silnova} and Marc {Delcroix} and Lukáš {Burget}", title="Joint Training of Speaker Embedding Extractor, Speech and Overlap Detection for Diarization", year="2025", pages="31--35", publisher="IEEE Signal Processing Society", address="Palermo", isbn="978-9-46-459362-4", url="https://www.fit.vut.cz/research/publication/13567/" }
Documents
palka_eusipco2025_final