Detail publikačního výsledku

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L.

Originální název

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

Anglický název

DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors

Druh

Článek WoS

Originální abstrakt

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.

Anglický abstrakt

Until recently, the field of speaker diarization was dominated by cascaded systems. Due to their limitations, mainly re- garding overlapped speech and cumbersome pipelines, end-to-end models have gained great popularity lately. One of the most success- ful models is end-to-end neural diarization with encoder-decoder based attractors (EEND-EDA). In this work, we replace the EDA module with a Perceiver-based one and show its advantages over EEND-EDA; namely obtaining better performance on the largely studied Callhome dataset, finding the quantity of speakers in a conversation more accurately, and faster inference time. Further- more, when exhaustively compared with other methods, our model, DiaPer, reaches remarkable performance with a very lightweight design. Besides, we perform comparisons with other works and a cascaded baseline across more than ten public wide-band datasets. Together with this publication, we release the code of DiaPer as well as models trained on public and free data.

Klíčová slova

Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.

Klíčová slova v angličtině

Attractor, DiaPer, end-to-end neural diarization, perceiver, speaker diarization.

Autoři

LANDINI, F.; DIEZ SÁNCHEZ, M.; STAFYLAKIS, T.; BURGET, L.

Rok RIV

2025

Vydáno

03.07.2024

ISSN

1558-7916

Periodikum

IEEE Transactions on Audio Speech and Language Processing

Svazek

32

Číslo

7

Stát

Spojené státy americké

Strany od

3450

Strany do

3465

Strany počet

16

URL

BibTex

@article{BUT189802,
  author="Federico Nicolás {Landini} and Mireia {Diez Sánchez} and Themos {Stafylakis} and Lukáš {Burget}",
  title="DiaPer: End-to-End Neural Diarization With Perceiver-Based Attractors",
  journal="IEEE Transactions on Audio Speech and Language Processing",
  year="2024",
  volume="32",
  number="7",
  pages="3450--3465",
  doi="10.1109/TASLP.2024.3422818",
  issn="1558-7916",
  url="https://ieeexplore.ieee.org/document/10584294"
}

Dokumenty