Publication result detail

BUT System for CHiME-6 Challenge

ŽMOLÍKOVÁ, K.; KOCOUR, M.; LANDINI, F.; BENEŠ, K.; KARAFIÁT, M.; VYDANA, H.; LOZANO DÍEZ, A.; PLCHOT, O.; BASKAR, M.; ŠVEC, J.; MOŠNER, L.; MALENOVSKÝ, V.; BURGET, L.; YUSUF, B.; NOVOTNÝ, O.; GRÉZL, F.; SZŐKE, I.; ČERNOCKÝ, J.

Original Title

BUT System for CHiME-6 Challenge

English Title

BUT System for CHiME-6 Challenge

Type

Paper in proceedings outside WoS and Scopus

Original Abstract

This paper describes BUTs efforts in the development of thesystem for the CHiME-6 challenge with far-field dinner partyrecordings [1]. Our experiments are on both diarization andspeech recognition parts of the system. For diarization, we employthe VBx framework which uses Bayesian hidden Markovmodel with eigenvoice priors on x-vectors. For acoustic modeling,we explore using different subsets of data for training,different neural network architectures, discriminative training,more robust i-vectors, and semi-supervised training on Vox-Celeb data. Besides, we perform experiments with a neuralnetwork-based language model, exploring how to overcome thesmall size of the text corpus and incorporate across-segmentcontext. When fusing our best systems, we achieve 41.21 %/ 42.55 % WER on Track 1, for development and evaluation respectively,and 55.15% / 69.04 % on Track 2, for developmentand evaluation respectively.

English abstract

This paper describes BUTs efforts in the development of thesystem for the CHiME-6 challenge with far-field dinner partyrecordings [1]. Our experiments are on both diarization andspeech recognition parts of the system. For diarization, we employthe VBx framework which uses Bayesian hidden Markovmodel with eigenvoice priors on x-vectors. For acoustic modeling,we explore using different subsets of data for training,different neural network architectures, discriminative training,more robust i-vectors, and semi-supervised training on Vox-Celeb data. Besides, we perform experiments with a neuralnetwork-based language model, exploring how to overcome thesmall size of the text corpus and incorporate across-segmentcontext. When fusing our best systems, we achieve 41.21 %/ 42.55 % WER on Track 1, for development and evaluation respectively,and 55.15% / 69.04 % on Track 2, for developmentand evaluation respectively.

Keywords

diarization, neural network, acoustic model, language model, enhancement

Key words in English

diarization, neural network, acoustic model, language model, enhancement

Authors

ŽMOLÍKOVÁ, K.; KOCOUR, M.; LANDINI, F.; BENEŠ, K.; KARAFIÁT, M.; VYDANA, H.; LOZANO DÍEZ, A.; PLCHOT, O.; BASKAR, M.; ŠVEC, J.; MOŠNER, L.; MALENOVSKÝ, V.; BURGET, L.; YUSUF, B.; NOVOTNÝ, O.; GRÉZL, F.; SZŐKE, I.; ČERNOCKÝ, J.

Released

04.05.2020

Publisher

University of Sheffield

Location

Barcelona

Book

Proceedings of CHiME 2020 Virtual Workshop

Pages from

1

Pages to

3

Pages count

3

URL

BibTex

@inproceedings{BUT164067,
  author="Kateřina {Žmolíková} and Martin {Kocour} and Federico Nicolás {Landini} and Karel {Beneš} and Martin {Karafiát} and Hari Krishna {Vydana} and Alicia {Lozano Díez} and Oldřich {Plchot} and Murali Karthick {Baskar} and Ján {Švec} and Ladislav {Mošner} and Vladimír {Malenovský} and Lukáš {Burget} and Bolaji {Yusuf} and Ondřej {Novotný} and František {Grézl} and Igor {Szőke} and Jan {Černocký}",
  title="BUT System for CHiME-6 Challenge",
  booktitle="Proceedings of CHiME 2020 Virtual Workshop",
  year="2020",
  pages="1--3",
  publisher="University of Sheffield",
  address="Barcelona",
  doi="10.21437/CHiME.2020-13",
  url="https://www.isca-speech.org/archive/CHiME_2020/pdfs/CHiME_2020_paper_zmolikova.pdf"
}

Documents