Přístupnostní navigace
E-application
Search Search Close
Publication result detail
POLOK, A.; KLEMENT, D.; HAN, J.; SEDLÁČEK, Š.; YUSUF, B.; MACIEJEWSKI, M.; WIESNER, M.; BURGET, L.
Original Title
BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge
English Title
Type
Paper in proceedings outside WoS and Scopus
Original Abstract
This paper presents our method for tackling the CHIME-8 chal- lenge's NOTSOFAR-1 task, which requires participants to per- form multi-speaker automatic speech recognition (ASR) using audio from distant microphone arrays. We modify the Pyan- note3 diarization pipeline, incorporating pre-trained WavLM as local EEND to adapt effectively to new domains, and we intro- duce two diarization-aware approaches to ASR by condition- ing Whisper on diarization outputs for target-speaker ASR. The first method, which we refer to as Query-Key Biasing, modi- fies Whisper's attention mechanism and positional embeddings with a learnable attention mask to exclude non-target speaker segments in the audio. The second method, called Frame- Level Diarization-Dependent Transformations, applies affine, diarization-dependent transformations with trainable parame- ters to the inputs of one or more transformer blocks. We also extend both the ASR and diarization systems to a multichannel setup by incorporating cross-channel communication into our models. Finally, we report the performance of these approaches on the NOTSOFAR-1 dataset.
English abstract
Keywords
multi-talker speech recognition, CHiME-8, NOTSOFAR-1, target-speaker
Key words in English
Authors
Released
06.09.2024
Publisher
International Speech Communication Association
Location
Kos Island
Book
Proceedings of CHiME 2024 Workshop
Pages from
18
Pages to
22
Pages count
5
URL
https://www.isca-archive.org/chime_2024/polok24_chime.pdf
BibTex
@inproceedings{BUT194002, author="Alexander {Polok} and Dominik {Klement} and Jiangyu {Han} and Šimon {Sedláček} and Bolaji {Yusuf} and Matthew {Maciejewski} and Matthew {Wiesner} and Lukáš {Burget}", title="BUT/JHU System Description for CHiME-8 NOTSOFAR-1 Challenge", booktitle="Proceedings of CHiME 2024 Workshop", year="2024", pages="18--22", publisher="International Speech Communication Association", address="Kos Island", doi="10.21437/CHiME.2024-4", url="https://www.isca-archive.org/chime_2024/polok24_chime.pdf" }
Documents
polok24_chime_at Interspeech