Detail výsledku VaV

Originální název

DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition

Anglický název

DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition

Druh

Software

Abstrakt

DiCoW (Diarization-Conditioned Whisper) is a Target Speaker Automatic Speech Recognition (TS-ASR) system that integrates speaker diarization cues into OpenAI's Whisper model. By conditioning on speaker identity, DiCoW enables highly accurate transcription of a target speaker's speech in complex, multi-speaker environments. At the time of publication, DiCoW achieves state-of-the-art performance on the Libri2Mix and AMI benchmarks. The system was recognized with the Jury Award at CHiME-8 Task 2 – NOTSOFAR challenge and secured Best Reproducibility Award in the Challenge and Workshop on Multilingual Conversational Speech Language Model (MLC-SLM).

Abstrakt anglicky

DiCoW (Diarization-Conditioned Whisper) is a Target Speaker Automatic Speech Recognition (TS-ASR) system that integrates speaker diarization cues into OpenAI's Whisper model. By conditioning on speaker identity, DiCoW enables highly accurate transcription of a target speaker's speech in complex, multi-speaker environments. At the time of publication, DiCoW achieves state-of-the-art performance on the Libri2Mix and AMI benchmarks. The system was recognized with the Jury Award at CHiME-8 Task 2 – NOTSOFAR challenge and secured Best Reproducibility Award in the Challenge and Workshop on Multilingual Conversational Speech Language Model (MLC-SLM).

Klíčová slova

Diarization, Conditioned Whisper, Target Speaker, Automatic Speech Recognition

Klíčová slova anglicky

Diarization, Conditioned Whisper, Target Speaker, Automatic Speech Recognition

Licenční poplatek

K využití výsledku jiným subjektem je vždy nutné nabytí licence

www

https://github.com/BUTSpeechFIT/TS-ASR-Whisper https://github.com/BUTSpeechFIT/DiCoW https://github.com/BUTSpeechFIT/SOT-DiCoW

VUT

Fakulty a vysokoškolské ústavy

Součásti

DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition