Přístupnostní navigace
E-application
Search Search Close
Project detail
Duration: 5.3.2012 — 4.11.2016
Funding resources
Neveřejný sektor - Přímé kontrakty - smluvní výzkum, neveřejné zdroje
On the project
The Babel Program will develop agile and robust speech recognition technology that can be rapidly applied to any human language in order to provide effective search capability for analysts to efficiently process massive amounts of real-world recorded speech. Today's transcription systems are built on technology that was originally developed for English, with markedly lower performance on non-English languages. These systems have often taken years to develop and cover only a small subset of the languages of the world. Babel intends to demonstrate the ability to generate a speech transcription system for any new language within one week to support keyword search performance for effective triage of massive amounts of speech recorded in challenging real-world situations.
Description in CzechCílem Babel programu je vyvinout agilní a robustní technologii pro rozpoznávání řeči, která může být rychle aplikována na jakoukoli mluvenou řeč, tak aby poskytla účinnou vyhledávací kapacitu analytikům pro efektivní zpracování záznamů velmi objemných souborů dat spontánní řeči.
Keywords speech recognition, speaker recognition, language recognition, LVCSR, feature extraction, acoustic modelling, neural-network
Default language
English
People responsible
Matějka Pavel, Ing., Ph.D. - principal person responsibleAndrla Petr, Ing. - fellow researcherCipr Tomáš, Ing. - fellow researcherKesiraju Santosh, Ph.D. - fellow researcherNovotný Ondřej, Ing., Ph.D. - fellow researcherOndel Lucas Antoine Francois, Mgr., Ph.D. - fellow researcherSkála František, Ing. - fellow researcherVeselý Karel, Ing., Ph.D. - fellow researcher
Units
Department of Computer Graphics and Multimedia - responsible department (20.5.2011 - not assigned)Speech Data Mining Research Group BUT Speech@FIT- internal (20.5.2011 - 4.11.2016)Raytheon BBN Technologies Corp- client (20.5.2011 - 4.11.2016)Department of Computer Graphics and Multimedia - beneficiary (20.5.2011 - 4.11.2016)
Results
KARAFIÁT, M.; GRÉZL, F.; VESELÝ, K.; HANNEMANN, M.; SZŐKE, I.; ČERNOCKÝ, J. BUT 2014 Babel System: Analysis of adaptation in NN based systems. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 3002-3006. ISBN: 978-1-63439-435-2.Detail
PEŠÁN, J.; BURGET, L.; HEŘMANSKÝ, H.; VESELÝ, K. DNN derived filters for processing of modulation spectrum of speech. In Proceedings of Interspeech 2015. Proceedings of Interspeech. Dresden: International Speech Communication Association, 2015. no. 09, p. 1908-1911. ISBN: 978-1-5108-1790-6. ISSN: 1990-9772.Detail
GRÉZL, F.; KARAFIÁT, M. Combination of Multilingual and Semi-Supervised Training for Under-Resourced Languages. In Proceedings of Interspeech 2014. Singapore: International Speech Communication Association, 2014. p. 820-824. ISBN: 978-1-63439-435-2.Detail
FÉR, R.; MATĚJKA, P.; GRÉZL, F.; PLCHOT, O.; ČERNOCKÝ, J. Multilingual Bottleneck Features for Language Recognition. In Proceedings of Interspeech 2015. Proceedings of Interspeech. Dresden: International Speech Communication Association, 2015. no. 09, p. 389-393. ISBN: 978-1-5108-1790-6. ISSN: 1990-9772.Detail
GRÉZL, F.; KARAFIÁT, M. Adapting Multilingual Neural Network Hierarchy to a New Language. Proceedings of the 4th International Workshop on Spoken Language Technologies for Under- resourced Languages SLTU-2014. St. Petersburg, Russia, 2014. St. Petersburg: International Speech Communication Association, 2014. p. 39-45. ISBN: 978-5-8088-0908-6.Detail
GRÉZL, F.; KARAFIÁT, M. Bottle-Neck Feature Extraction Structures for Multilingual Training and Porting. In Procedia Computer Science. Procedia Computer Science. Yogyakarta: Elsevier Science, 2016. no. 81, p. 144-151. ISSN: 1877-0509.Detail
GRÉZL, F.; KARAFIÁT, M. Boosting Performance on Low-resource Languages by Standard Corpora: AN ANALYSIS. In Proceeding of SLT 2016. San Diego: IEEE Signal Processing Society, 2016. p. 629-636. ISBN: 978-1-5090-4903-5.Detail
GRÉZL, F.; EGOROVA, E.; KARAFIÁT, M. Further Investigation into Multilingual Training and Adaptation of Stacked Bottle-neck Neural Network Structure. In Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014. p. 48-53. ISBN: 978-1-4799-7129-9.Detail
MALLIDI, S.; OGAWA, T.; VESELÝ, K.; NIDADAVOLU, P.; HEŘMANSKÝ, H. Autoencoder based multi-stream combination for noise robust speech recognition. In Proceeding of Interspeech 2015. Proceedings of Interspeech. Dresden: International Speech Communication Association, 2015. no. 09, p. 3551-3555. ISBN: 978-1-5108-1790-6. ISSN: 1990-9772.Detail
GRÉZL, F.; EGOROVA, E.; KARAFIÁT, M. Study of Large Data Resources for Multilingual Training and System Porting. In Procedia Computer Science. Procedia Computer Science. Yogyakarta: Elsevier Science, 2016. no. 81, p. 15-22. ISSN: 1877-0509.Detail
HSIAO, R.; NG, T.; GRÉZL, F.; KARAKOS, D.; TSAKALIDIS, S.; NGUYEN, L.; SCHWARTZ, R. Discriminative Semi-supervised Training for Keyword Search in Low Resource Languages. Proceedings of ASRU 2013. Olomouc: IEEE Signal Processing Society, 2013. p. 440-445. ISBN: 978-1-4799-2755-5.Detail
VESELÝ, K.; KARAFIÁT, M.; GRÉZL, F.; JANDA, M.; EGOROVA, E. The Language-Independent Bottleneck Features. Proceedings of IEEE 2012 Workshop on Spoken Language Technology. Miami: IEEE Signal Processing Society, 2012. p. 336-341. ISBN: 978-1-4673-5124-9.Detail
KARAKOS, D.; SCHWARTZ, R.; TSAKALIDIS, S.; ZHANG, L.; RANJAN, S.; NG, T.; HSIAO, R.; NGUYEN, L.; GRÉZL, F.; HANNEMANN, M.; KARAFIÁT, M.; SZŐKE, I.; VESELÝ, K. Score Normalization and System Combination for Improved Keyword Spotting. In Proceedings of ASRU 2013. Olomouc: IEEE Signal Processing Society, 2013. p. 210-215. ISBN: 978-1-4799-2755-5.Detail
GRÉZL, F.; KARAFIÁT, M. Semi-Supervised Bootstrapping Approach For Neural Network Feature Extractor Training. Proceedings of ASRU 2013. Olomouc: IEEE Signal Processing Society, 2013. p. 470-475. ISBN: 978-1-4799-2755-5.Detail
VESELÝ, K.; HANNEMANN, M.; BURGET, L. Semi-supervised Training of Deep Neural Networks. Proceedings of ASRU 2013. Olomouc: IEEE Signal Processing Society, 2013. p. 267-272. ISBN: 978-1-4799-2755-5.Detail
BURGET, L.; GLEMBEK, O.; MATĚJKA, P.; PLCHOT, O. 2012 Summary report of project "Processing and analysis of speech, automatic speaker identification". Cambridge: Raytheon BBN Technologies, 2012. 15 p.Detail
KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT Neural Network Features for Spontaneous Vietnamese in BABEL. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 5659-5663. ISBN: 978-1-4799-2892-7.Detail
VESELÝ, K.; GHOSHAL, A.; BURGET, L.; POVEY, D. Sequence-discriminative Training of Deep Neural Networks. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. no. 8, p. 2345-2349. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.Detail
MATĚJKA, P.; GLEMBEK, O.; NOVOTNÝ, O.; PLCHOT, O.; GRÉZL, F.; BURGET, L.; ČERNOCKÝ, J. Analysis Of DNN Approaches To Speaker Identification. In Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016. p. 5100-5104. ISBN: 978-1-4799-9988-0.Detail
KARAFIÁT, M.; BASKAR, M.; MATĚJKA, P.; VESELÝ, K.; GRÉZL, F.; ČERNOCKÝ, J. Multilingual BLSTM and Speaker-Specific Vector Adaptation in 2016 BUT BABEL SYSTEM. In Proceedings of SLT 2016. San Diego: IEEE Signal Processing Society, 2016. p. 637-643. ISBN: 978-1-5090-4903-5.Detail
KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; VESELÝ, K. Summary report for project "Multilingual speech recognition" in Year 2015. Brno: Raytheon BBN Technologies, 2015. 36 p.Detail
LEI, Y.; BURGET, L.; SCHEFFER, N. A Noise Robust I-Vector Extractor Using Vector Taylor Series For Speaker Recognition. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 6788-6791. ISBN: 978-1-4799-0355-9.Detail
HANNEMANN, M.; POVEY, D.; ZWEIG, G. Combining Forward and Backward Search in Decoding. Proceedings of ICASSP 2013. Vancouver: IEEE Signal Processing Society, 2013. p. 6739-6743. ISBN: 978-1-4799-0355-9.Detail
KARAFIÁT, M.; VESELÝ, K.; SZŐKE, I.; BURGET, L.; GRÉZL, F.; HANNEMANN, M.; ČERNOCKÝ, J. BUT ASR System for BABEL Surprise Evaluation 2014. In Proceedings of 2014 Spoken Language Technology Workshop. South Lake Tahoe, Nevada: IEEE Signal Processing Society, 2014. p. 501-506. ISBN: 978-1-4799-7129-9.Detail
KARAFIÁT, M.; GRÉZL, F.; HANNEMANN, M.; VESELÝ, K.; ČERNOCKÝ, J. BUT BABEL System for Spontaneous Cantonese. Proceedings of Interspeech 2013. Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013). Lyon: International Speech Communication Association, 2013. no. 8, p. 2589-2593. ISBN: 978-1-62993-443-3. ISSN: 2308-457X.Detail
PLCHOT, O.; BURGET, L.; SZŐKE, I. 2013 Summary report of project "Processing and analysis of speech, automatic speaker identification". Brno: Raytheon BBN Technologies, 2013. 25 p.Detail
KARAFIÁT, M.; BASKAR, M.; MATĚJKA, P.; VESELÝ, K.; GRÉZL, F.; BURGET, L.; ČERNOCKÝ, J. 2016 BUT Babel system: Multilingual BLSTM acoustic model with i-vector based adaptation. In Proceedings of Interspeech 2017. Proceedings of Interspeech. Stockholm: International Speech Communication Association, 2017. no. 08, p. 719-723. ISSN: 1990-9772.Detail
HEŘMANSKÝ, H.; BURGET, L.; COHEN, J.; DUPOUX, E.; FELDMAN, N.; GODFREY, J.; KHUDANPUR, S.; MACIEJEWSKI, M.; MALLIDI, S.; MENON, A.; OGAWA, T.; PEDDINTI, V.; ROSE, R.; STERN, R.; WIESNER, M.; VESELÝ, K. TOWARDS MACHINES THAT KNOW WHEN THEY DO NOT KNOW: SUMMARY OF WORK DONE AT 2014 FREDERICK JELINEK MEMORIAL WORKSHOP. In Proceedings of 2015 IEEE International Conference on Acoustics, Speech and Signal Processing. South Brisbane, Queensland: IEEE Signal Processing Society, 2015. p. 5009-5013. ISBN: 978-1-4673-6997-8.Detail
NOVOTNÝ, O.; MATĚJKA, P.; GLEMBEK, O.; PLCHOT, O.; GRÉZL, F.; BURGET, L.; ČERNOCKÝ, J. Analysis of the DNN-Based SRE Systems in Multi-language Conditions. In Proceedings of SLT 2016. San Diego: IEEE Signal Processing Society, 2016. p. 199-204. ISBN: 978-1-5090-4903-5.Detail
BRUMMER, J.; SWART, A.; PRIETO, J.; GARCIA PERERA, L.; MATĚJKA, P.; PLCHOT, O.; DIEZ SÁNCHEZ, M.; SILNOVA, A.; JIANG, X.; NOVOTNÝ, O.; ROHDIN, J.; GLEMBEK, O.; GRÉZL, F.; BURGET, L.; ONDEL YANG, L.; PEŠÁN, J.; ČERNOCKÝ, J.; KENNY, P.; ALAM, J.; BHATTACHARYA, G.; ZEINALI, H. ABC NIST SRE 2016 SYSTEM DESCRIPTION. San Diego: National Institute of Standards and Technology, 2016. p. 1-8. Detail
KARAFIÁT, M.; BURGET, L.; GRÉZL, F.; VESELÝ, K.; ČERNOCKÝ, J. Multilingual Region-Dependent Transforms. In Proceedings of the 41th IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2016), 2016. Shanghai: IEEE Signal Processing Society, 2016. p. 5430-5434. ISBN: 978-1-4799-9988-0.Detail
KARAFIÁT, M. Summary report for project "Multilingual speech recognition" in Year 2016. Brno: Raytheon BBN Technologies, 2016. 7 p.Detail
KARAFIÁT, M. 2014 Summary report of project "Processing and analysis of speech, automatic speaker identification". Brno: Raytheon BBN Technologies, 2014. 35 p.Detail
HSIAO, R.; MA, J.; HARTMANN, W.; KARAFIÁT, M.; GRÉZL, F.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J.; WATANABE, S.; CHEN, Z.; MALLIDI, S.; HEŘMANSKÝ, H.; TSAKALIDIS, S.; SCHWARTZ, R. Robust Speech Recognition in Unknown Reverberant and Noisy Conditions. In Proceedings of 2015 IEEE Automatic Speech Recognition and Understanding Workshop. Scottsdale, Arizona: IEEE Signal Processing Society, 2015. p. 533-538. ISBN: 978-1-4799-7291-3.Detail
GRÉZL, F.; KARAFIÁT, M.; VESELÝ, K. Adaptation of Multilingual Stacked Bottle-neck Neural Network Structure for New Language. In Proceedings of ICASSP 2014. Florencie: IEEE Signal Processing Society, 2014. p. 7704-7708. ISBN: 978-1-4799-2892-7.Detail
PLCHOT, O.; MATĚJKA, P.; FÉR, R.; GLEMBEK, O.; NOVOTNÝ, O.; PEŠÁN, J.; VESELÝ, K.; ONDEL YANG, L.; KARAFIÁT, M.; GRÉZL, F.; KESIRAJU, S.; BURGET, L.; BRUMMER, J.; SWART, A.; CUMANI, S.; MALLIDI, S.; LI, R. BAT System Description for NIST LRE 2015. In Proceedings of Odyssey 2016, The Speaker and Language Recognition Workshop. Proceedings of Odyssey: The Speaker and Language Recognition Workshop Odyssey 2014, Joensuu, Finland. Bilbao: International Speech Communication Association, 2016. no. 06, p. 166-173. ISSN: 2312-2846.Detail
Link
http://www.iarpa.gov/solicitations_babel.html
Responsibility: Matějka Pavel, Ing., Ph.D.