Publication result detail

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L.

Original Title

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

English Title

Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?

Type

Paper in proceedings outside WoS and Scopus

Original Abstract

In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what in- formation is essential for the model. EEND-EDA utilizes attrac- tors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to con- tain speaker characteristic information. On the other hand, giv- ing the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite ar- chitectural differences in EEND systems, the notion of attrac- tors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclu- sions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.

English abstract

In this paper, we apply the variational information bottleneck approach to end-to-end neural diarization with encoder-decoder attractors (EEND-EDA). This allows us to investigate what in- formation is essential for the model. EEND-EDA utilizes attrac- tors, vector representations of speakers in a conversation. Our analysis shows that, attractors do not necessarily have to con- tain speaker characteristic information. On the other hand, giv- ing the attractors more freedom to allow them to encode some extra (possibly speaker-specific) information leads to small but consistent diarization performance improvements. Despite ar- chitectural differences in EEND systems, the notion of attrac- tors and frame embeddings is common to most of them and not specific to EEND-EDA. We believe that the main conclu- sions of this work can apply to other variants of EEND. Thus, we hope this paper will be a valuable contribution to guide the community to make more informed decisions when designing new systems.

Keywords

End-to-End Neural Diarization, Speaker Characteristic Information

Key words in English

End-to-End Neural Diarization, Speaker Characteristic Information

Authors

ZHANG, L.; STAFYLAKIS, T.; LANDINI, F.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; BURGET, L.

Released

18.06.2024

Publisher

International Speech Communication Association

Location

Québec City

Book

Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop

Pages from

123

Pages to

130

Pages count

8

URL

BibTex

@inproceedings{BUT193432,
  author="ZHANG, L. and STAFYLAKIS, T. and LANDINI, F. and DIEZ SÁNCHEZ, M. and SILNOVA, A. and BURGET, L.",
  title="Do End-to-End Neural Diarization Attractors Need to Encode Speaker Characteristic Information?",
  booktitle="Proceedings of Odyssey 2024: The Speaker and Language Recognition Workshop",
  year="2024",
  pages="123--130",
  publisher="International Speech Communication Association",
  address="Québec City",
  doi="10.21437/odyssey.2024-18",
  url="https://www.isca-archive.org/odyssey_2024/zhang24_odyssey.pdf"
}

Documents