Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S.
Originální název
Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.
Anglický abstrakt
Klíčová slova
speaker diarization, end-to-end, VBx, clustering
Klíčová slova v angličtině
Autoři
Rok RIV
2024
Vydáno
20.08.2023
Nakladatel
International Speech Communication Association
Místo
Dublin
Kniha
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
ISSN
1990-9772
Periodikum
Proceedings of Interspeech
Svazek
2023
Číslo
08
Stát
Francouzská republika
Strany od
3477
Strany do
3481
Strany počet
5
URL
https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf
BibTex
@inproceedings{BUT185573, author="DELCROIX, M. and TAWARA, N. and DIEZ SÁNCHEZ, M. and LANDINI, F. and SILNOVA, A. and OGAWA, A. and NAKATANI, T. and BURGET, L. and ARAKI, S.", title="Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization", booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH", year="2023", journal="Proceedings of Interspeech", volume="2023", number="08", pages="3477--3481", publisher="International Speech Communication Association", address="Dublin", doi="10.21437/Interspeech.2023-628", issn="1990-9772", url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf" }
Dokumenty
delcroix23_interspeech2023_multi-stream