Detail výsledku VaV

Originální název

Implementing contextual biasing in GPU decoder for online ASR

Anglický název

Implementing contextual biasing in GPU decoder for online ASR

Druh

Stať ve sborníku v databázi WoS či Scopus

Originální abstrakt

GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model (LM) weights in offline and online CPU scenarios. In real-time GPU decoding, partial recognition hypotheses are produced without lattice generation, which makes the implementation of biasing more complex. The paper proposes and describes an approach to integrate contextual biasing in real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial ASR predictions, our approach also permits dynamic context switching allowing a flexible rescoring per each speech segment directly on GPU. The code is publicly released1 and tested with open-sourced test sets.

Anglický abstrakt

GPU decoding significantly accelerates the output of ASR predictions. While GPUs are already being used for online ASR decoding, post-processing and rescoring on GPUs have not been properly investigated yet. Rescoring with available contextual information can considerably improve ASR predictions. Previous studies have proven the viability of lattice rescoring in decoding and biasing language model (LM) weights in offline and online CPU scenarios. In real-time GPU decoding, partial recognition hypotheses are produced without lattice generation, which makes the implementation of biasing more complex. The paper proposes and describes an approach to integrate contextual biasing in real-time GPU decoding while exploiting the standard Kaldi GPU decoder. Besides the biasing of partial ASR predictions, our approach also permits dynamic context switching allowing a flexible rescoring per each speech segment directly on GPU. The code is publicly released1 and tested with open-sourced test sets.

Klíčová slova

real-time speech recognition, contextual adaptation, GPU decoding, finite-state transducers

Klíčová slova v angličtině

real-time speech recognition, contextual adaptation, GPU decoding, finite-state transducers

Autoři

NIGMATULINA, I.; MADIKERI, S.; VILLATORO-TELLO, E.; MOTLÍČEK, P.; ZULUAGA-GOMEZ, J.; PANDIA, K.; GANAPATHIRAJU, A.

Rok RIV

2024

Vydáno

20.08.2023

Nakladatel

International Speech Communication Association

Místo

Dublin

Kniha

Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH

ISSN

1990-9772

Periodikum

Proceedings of Interspeech

Svazek

2023

Číslo

8

Stát

Francouzská republika

Strany od

4494

Strany do

4498

Strany počet

5

URL

https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.html

Plný text v Digitální knihovně

http://hdl.handle.net/

BibTex

@inproceedings{BUT187754,
  author="NIGMATULINA, I. and MADIKERI, S. and VILLATORO-TELLO, E. and MOTLÍČEK, P. and ZULUAGA-GOMEZ, J. and PANDIA, K. and GANAPATHIRAJU, A.",
  title="Implementing contextual biasing in GPU decoder for online ASR",
  booktitle="Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH",
  year="2023",
  journal="Proceedings of Interspeech",
  volume="2023",
  number="8",
  pages="4494--4498",
  publisher="International Speech Communication Association",
  address="Dublin",
  doi="10.21437/Interspeech.2023-2449",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2023/nigmatulina23_interspeech.html"
}

Dokumenty

nigmatulina23_interspeech

VUT

Fakulty

Vysokoškolské ústavy

Součásti

Implementing contextual biasing in GPU decoder for online ASR