Publication detail

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

DELCROIX, M. TAWARA, N. DIEZ SÁNCHEZ, M. LANDINI, F. SILNOVA, A. OGAWA, A. NAKATANI, T. BURGET, L. ARAKI, S.

Original Title

Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization

Type

conference paper

Language

English

Original Abstract

Combining end-to-end neural speaker diarization (EEND) with vector clustering (VC), known as EEND-VC, has gained interest for leveraging the strengths of both methods. EEND-VC estimates activities and speaker embeddings for all speakers within an audio chunk and uses VC to associate these activities with speaker identities across different chunks. EEND-VC generates thus multiple streams of embeddings, one for each speaker in a chunk. We can cluster these embeddings using constrained agglomerative hierarchical clustering (cAHC), ensuring embeddings from the same chunk belong to different clusters. This paper introduces an alternative clustering approach, a multi-stream extension of the successful Bayesian HMM clustering of x-vectors (VBx), called MS-VBx. Experiments on three datasets demonstrate that MS-VBx outperforms cAHC in diarization and speaker counting performance.

Keywords

speaker diarization, end-to-end, VBx, clustering

Authors

DELCROIX, M.; TAWARA, N.; DIEZ SÁNCHEZ, M.; LANDINI, F.; SILNOVA, A.; OGAWA, A.; NAKATANI, T.; BURGET, L.; ARAKI, S.

Released

20. 8. 2023

Publisher

International Speech Communication Association

Location

Dublin

ISBN

1990-9772

Periodical

Proceedings of Interspeech

Year of study

2023

Number

08

State

French Republic

Pages from

3477

Pages to

3481

Pages count

5

URL

BibTex

@inproceedings{BUT185573,
  author="DELCROIX, M. and TAWARA, N. and DIEZ SÁNCHEZ, M. and LANDINI, F. and SILNOVA, A. and OGAWA, A. and NAKATANI, T. and BURGET, L. and ARAKI, S.",
  title="Multi-Stream Extension of Variational Bayesian HMM Clustering (MS-VBx) for Combined End-to-End and Vector Clustering-based Diarization",
  booktitle="Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="08",
  pages="3477--3481",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-628",
  issn="1990-9772",
  url="https://www.isca-speech.org/archive/pdfs/interspeech_2023/delcroix23_interspeech.pdf"
}