Detail projektu

Zdroje financování

Neveřejný sektor - Přímé kontrakty - smluvní výzkum, neveřejné zdroje

O projektu

Speech processing in our proposal will be addressed by low-resource or language-agnostic technologies. Rather than concentrating on mining the content (for which, obviously, standard resources such as acoustic model, language model or pronunciation dictionary will be lacking), speech data will be handled by a multitude of "speech miners" that make minimum use of resources of the target language. The processing will begin with a reliable voice activity detection (VAD) capable of segmenting the signal into useful and useless portions. Often regarded as "not a rocket science", a good VAD is crucial for correct functioning of the following blocks and for human processing of speech input. Our work will improve on existing DNN-based VAD that proved its efficiency in a difficult RATS setting [Ng2012]. A processing with several phone posterior estimators with either mono-lingual or multilingual phoneme sets [Schwarz2009] will follow to provide the "miners" with a coherent low-dimensional representation. The first real "miner" will be language identification (LID) with a significant set of target languages (>60). Even if it is not sure that the target language will be in this set, LID will allow to detect segments in English or possibly in other languages for which we have ASR technology. We will follow our recent development of LID base on features derived from phone posteriors [Plchot2013] as well as on DNNs. We will also work on enrollment of a new language with very little data (down to one utterance). Another "miner" will perform basic speaking style recognition allowing to separate read speech from spontaneous. Finally, speaker recognition (SRE) or clustering will allow to gather information about speakers (in case they were previously enrolled) or at least to perform coarse speaker clustering, as for the analyst, the information on who is speaking can be equally important as what is said. Here, we will build up on our significant track in iVector-based SRE and will mainly work on automatic adaptation and calibration on unlabeled data-sets [Brummer2014]

Popis česky
Zpracování řeči v našem projektovém návrhu bude řešeno technologiemi, které jsou nízko-zdrojové nebo zkoumající jazyk. Spíše než na dolování obsahu (pro nějž jsou evidentně standardní zdroje jako např. akustický model, jazykový model nebo výslovnostní slovník nedostatečné), se budou zpracovávat data pomocí množství nástrojů na dolování řeči, které minimálně využívají zdrojů cílového jazyka.

Klíčová slova
Speech processing, language, apeech mining

Klíčová slova česky
zpracování řeči,jazyk, dolování řeči

Originální jazyk

angličtina

Řešitelé

Burget Lukáš, doc. Ing., Ph.D. - hlavní řešitel
Beneš Karel, Ing., Ph.D. - spoluřešitel
Fér Radek, Ing. - spoluřešitel
Glembek Ondřej, Ing., Ph.D. - spoluřešitel
Kocour Martin, Ing. - spoluřešitel
Ondel Lucas Antoine Francois, Mgr., Ph.D. - spoluřešitel
Skácel Miroslav, Ing. - spoluřešitel
Žmolíková Kateřina, Ing., Ph.D. - spoluřešitel

Útvary

Ústav počítačové grafiky a multimédií
- odpovědné pracoviště (16.12.2014 - nezadáno)
Výzkumná skupina dolování dat z řeči BUT Speech@FIT
- interní (16.12.2014 - 31.3.2020)
Ústav počítačové grafiky a multimédií
- příjemce (16.12.2014 - 31.3.2020)

Výsledky

ONDEL YANG, L.; BURGET, L.; ČERNOCKÝ, J.; KESIRAJU, S. Bayesian phonotactic language model for Acoustic Unit Discovery. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 5750-5754. ISBN: 978-1-5090-4117-6.
Detail

GLEMBEK, O.; KESIRAJU, S.; ONDEL YANG, L. Summary report for project "ELISA" in Year 2015. Brno: University of Southern California, 2015. 2 p.
Detail

GLEMBEK, O. Summary report for project Exploiting Language Information for Situational Awareness (ELISA) For year 2016. Brno: University of Southern California, 2016. p. 1-2.
Detail

BASKAR, M.; WATANABE, S.; ASTUDILLO, R.; HORI, T.; BURGET, L.; ČERNOCKÝ, J. Semi-supervised Sequence-to-sequence ASR using Unpaired Speech and Text. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. no. 9, p. 3790-3794. ISSN: 1990-9772.
Detail

ALAM, J.; BHATTACHARYA, G.; BRUMMER, J.; BURGET, L.; DIEZ SÁNCHEZ, M.; GLEMBEK, O.; KENNY, P.; KLČO, M.; LANDINI, F.; LOZANO DÍEZ, A.; MATĚJKA, P.; MONTEIRO, J.; MOŠNER, L.; NOVOTNÝ, O.; PLCHOT, O.; PROFANT, J.; ROHDIN, J.; SILNOVA, A.; SLAVÍČEK, J.; STAFYLAKIS, T.; ZEINALI, H. ABC NIST SRE 2018 SYSTEM DESCRIPTION. Proceedings of 2018 NIST SRE Workshop. Athens: National Institute of Standards and Technology, 2018. p. 1-10.
Detail

KESIRAJU, S.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J. Learning document representations using subspace multinomial model. In Proceedings of Interspeech 2016. San Francisco: International Speech Communication Association, 2016. p. 700-704. ISBN: 978-1-5108-3313-5.
Detail

PAPADOPOULOS, P.; TRAVADI, R.; VAZ, C.; MALANDRAKIS, N.; HERMJAKOB, U.; POURDAMGHANI, N.; PUST, M.; ZHANG, B.; PAN, X.; LU, D.; LIN, Y.; GLEMBEK, O.; BASKAR, M.; KARAFIÁT, M.; BURGET, L.; HASEGAWA-JOHNSON, M.; JI, H.; MAY, J.; KNIGHT, K.; NARAYANAN, S. Team ELISA System for DARPA LORELEI Speech Evaluation 2016. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stockholm: International Speech Communication Association, 2017. no. 08, p. 2053-2057. ISSN: 1990-9772.
Detail

KESIRAJU, S.; PAPPAGARI, R.; ONDEL YANG, L.; BURGET, L.; DEHAK, N.; KHUDANPUR, S.; ČERNOCKÝ, J.; GANGASHETTY, S. Topic identification of spoken documents using unsupervised acoustic unit discovery. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 5745-5749. ISBN: 978-1-5090-4117-6.
Detail

GLEMBEK, O. Summary report for project Exploiting Language Information for Situational Awareness (ELISA) For year 2017. Brno: University of Southern California, 2017. p. 1-2.
Detail

MATĚJKA, P.; PLCHOT, O.; ZEINALI, H.; MOŠNER, L.; SILNOVA, A.; BURGET, L.; NOVOTNÝ, O.; GLEMBEK, O. Analysis of BUT Submission in Far-Field Scenarios of VOiCES 2019 Challenge. In Proceedings of Interspeech. Proceedings of Interspeech. Graz: International Speech Communication Association, 2019. no. 9, p. 2448-2452. ISSN: 1990-9772.
Detail

WIESNER, M.; LIU, C.; ONDEL YANG, L.; HARMAN, C.; MANOHAR, V.; TRMAL, J.; HUANG, Z.; DEHAK, N.; KHUDANPUR, S. Automatic Speech Recognition and Topic Identification for Almost-Zero-Resource Languages. In Proceedings of Interspeech. Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018. no. 9, p. 2052-2056. ISSN: 1990-9772.
Detail

HANNEMANN, M.; TRMAL, J.; ONDEL YANG, L.; KESIRAJU, S.; BURGET, L. Bayesian joint-sequence models for grapheme-to-phoneme conversion. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 2836-2840. ISBN: 978-1-5090-4117-6.
Detail

BENEŠ, K.; KESIRAJU, S.; BURGET, L. i-vectors in language modeling: An efficient way of domain adaptation for feed-forward models. In Proceedings of Interspeech 2018. Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018. no. 9, p. 3383-3387. ISSN: 1990-9772.
Detail

ALAM, J.; BOULIANNE, G.; GLEMBEK, O.; LOZANO DÍEZ, A.; MATĚJKA, P.; MIZERA, P.; MONTEIRO, J.; MOŠNER, L.; NOVOTNÝ, O.; PLCHOT, O.; ROHDIN, J.; SILNOVA, A.; SLAVÍČEK, J.; STAFYLAKIS, T.; WANG, S.; ZEINALI, H. ABC NIST SRE 2019 CTS System Description. Proceedings of NIST. Sentosa, Singapore: National Institute of Standards and Technology, 2019. p. 1-6.
Detail

LIU, C.; YANG, J.; SUN, M.; KESIRAJU, S.; ROTT, A.; ONDEL YANG, L.; GHAHREMANI, P.; DEHAK, N.; BURGET, L.; KHUDANPUR, S. An Empirical evaluation of zero resource acoustic unit discovery. In Proceedings of ICASSP 2017. New Orleans: IEEE Signal Processing Society, 2017. p. 5305-5309. ISBN: 978-1-5090-4117-6.
Detail

PULUGUNDLA, B.; BASKAR, M.; KESIRAJU, S.; EGOROVA, E.; KARAFIÁT, M.; BURGET, L.; ČERNOCKÝ, J. BUT system for low resource Indian language ASR. In Proceedings of Interspeech 2018. Proceedings of Interspeech. Hyderabad: International Speech Communication Association, 2018. no. 9, p. 3182-3186. ISSN: 1990-9772.
Detail

Odpovědnost: Burget Lukáš, doc. Ing., Ph.D.

VUT

Fakulty a vysokoškolské ústavy

Součásti

DARPA Low Resource Languages for Emergent Incidents (LORELEI) - Exploiting Language Information for Situational Awareness (ELISA)