Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail aplikovaného výsledku
Polok, A., Klement, D., Kocour, M.
Originální název
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition
Anglický název
Druh
Software
Abstrakt
DiCoW (Diarization-Conditioned Whisper) is a Target Speaker Automatic Speech Recognition (TS-ASR) system that integrates speaker diarization cues into OpenAI's Whisper model. By conditioning on speaker identity, DiCoW enables highly accurate transcription of a target speaker's speech in complex, multi-speaker environments. At the time of publication, DiCoW achieves state-of-the-art performance on the Libri2Mix and AMI benchmarks. The system was recognized with the Jury Award at CHiME-8 Task 2 – NOTSOFAR challenge and secured Best Reproducibility Award in the Challenge and Workshop on Multilingual Conversational Speech Language Model (MLC-SLM).
Abstrakt anglicky
Klíčová slova
Diarization, Conditioned Whisper, Target Speaker, Automatic Speech Recognition
Klíčová slova anglicky
Licenční poplatek
K využití výsledku jiným subjektem je vždy nutné nabytí licence
www
https://github.com/BUTSpeechFIT/TS-ASR-Whisper https://github.com/BUTSpeechFIT/DiCoW https://github.com/BUTSpeechFIT/SOT-DiCoW