Detail publikačního výsledku

MGB-3 but system: Low-resource ASR on Egyptian YouTube data

VESELÝ, K.; BASKAR, M.; DIEZ SÁNCHEZ, M.; BENEŠ, K.

Originální název

MGB-3 but system: Low-resource ASR on Egyptian YouTube data

Anglický název

MGB-3 but system: Low-resource ASR on Egyptian YouTube data

Druh

Stať ve sborníku v databázi WoS či Scopus

Originální abstrakt

This paper presents a series of experiments we performed duringour work on the MGB-3 evaluations. We both describethe submitted system, as well as the post-evaluation analysis.Our initial BLSTM-HMM system was trained on 250 hoursof MGB-2 data (Al-Jazeera), it was adapted with 5 hours ofEgyptian data (YouTube). We included such techniques asdiarization, n-gram language model adaptation, speed perturbationof the adaptation data, and the use of all 4 correctreferences. The 4 references were either used for supervisionwith a confusion network, or we included each sentence 4xwith the transcripts from all the annotators. Then, it was alsohelpful to blend the augmented MGB-3 adaptation data with15 hours of MGB-2 data. Although we did not rank with oursingle system among the best teams in the evaluations, we believethat our analysis will be highly interesting not only forthe other MGB-3 challenge participants.

Anglický abstrakt

This paper presents a series of experiments we performed duringour work on the MGB-3 evaluations. We both describethe submitted system, as well as the post-evaluation analysis.Our initial BLSTM-HMM system was trained on 250 hoursof MGB-2 data (Al-Jazeera), it was adapted with 5 hours ofEgyptian data (YouTube). We included such techniques asdiarization, n-gram language model adaptation, speed perturbationof the adaptation data, and the use of all 4 correctreferences. The 4 references were either used for supervisionwith a confusion network, or we included each sentence 4xwith the transcripts from all the annotators. Then, it was alsohelpful to blend the augmented MGB-3 adaptation data with15 hours of MGB-2 data. Although we did not rank with oursingle system among the best teams in the evaluations, we believethat our analysis will be highly interesting not only forthe other MGB-3 challenge participants.

Klíčová slova

MGB-3, ASR adaptation, low-resource ASR, Egyptian Arabic, diarization

Klíčová slova v angličtině

MGB-3, ASR adaptation, low-resource ASR, Egyptian Arabic, diarization

Autoři

VESELÝ, K.; BASKAR, M.; DIEZ SÁNCHEZ, M.; BENEŠ, K.

Rok RIV

2018

Vydáno

16.12.2017

Nakladatel

IEEE Signal Processing Society

Místo

Okinawa

ISBN

978-1-5090-4788-8

Kniha

Proceedings of ASRU 2017

Strany od

368

Strany do

373

Strany počet

6

URL

BibTex

@inproceedings{BUT144502,
  author="Karel {Veselý} and Murali Karthick {Baskar} and Mireia {Diez Sánchez} and Karel {Beneš}",
  title="MGB-3 but system: Low-resource ASR on Egyptian YouTube data",
  booktitle="Proceedings of ASRU 2017",
  year="2017",
  pages="368--373",
  publisher="IEEE Signal Processing Society",
  address="Okinawa",
  doi="10.1109/ASRU.2017.8268959",
  isbn="978-1-5090-4788-8",
  url="https://www.fit.vut.cz/research/publication/11595/"
}

Dokumenty