Přístupnostní navigace
E-application
Search Search Close
Project detail
Duration: 1.10.2017 — 30.9.2018
Funding resources
Neveřejný sektor - Přímé kontrakty - smluvní výzkum, neveřejné zdroje
On the project
The purpose of the Joint Research is to develop Speech enhancement front-end for robust automatic speech recognition with large amount of training data through the cooperation of NTT and BUT. The work is relying on embeddings produced by neural networks in various places of the processing chain.
Description in CzechCílem společného výzkumu je vyvinout technologie parametrizace s obohacováním řeči pro robustní automatické rozpoznávání řeči s velkým objemem trénovacích dat v rámci spolupráce mezi VUT a NTT. Práce je založena na nízkodimenzionálních reprezentacích dat (embeddings) produkovaných neuronovými sítěmi v různých místech řetězce zpracování.
Keywords speech recognition, robustness, large data, DNN embeddings
Key words in Czechrozpoznávání řeči, odolnost, velký objem dat,
Default language
English
People responsible
Žmolíková Kateřina, Ing., Ph.D. - principal person responsible
Units
Department of Computer Graphics and Multimedia - responsible department (25.9.2017 - not assigned)Speech Data Mining Research Group BUT Speech@FIT- internal (25.9.2017 - 30.9.2018)NTT Corporation- client (25.9.2017 - 30.9.2018)Research Centre of Information Technology- co-beneficiary (25.9.2017 - 30.9.2018)Department of Computer Graphics and Multimedia - beneficiary (25.9.2017 - 30.9.2018)
Results
DELCROIX, M.; ŽMOLÍKOVÁ, K.; KINOSHITA, K.; ARAKI, S.; OGAWA, A.; NAKATANI, T. SpeakerBeam: A New Deep Learning Technology for Extracting Speech of a Target Speaker Based on the Speaker's Voice Characteristics. NTT Technical Review, 2018, vol. 16, no. 11, p. 19-24. ISSN: 1348-3447.Detail
ROHDIN, J.; SILNOVA, A.; DIEZ SÁNCHEZ, M.; PLCHOT, O.; MATĚJKA, P.; BURGET, L. End-to-End DNN Based Speaker Recognition Inspired by i-Vector and PLDA. In Proceedings of ICASSP. Calgary: IEEE Signal Processing Society, 2018. p. 4874-4878. ISBN: 978-1-5386-4658-8.Detail
ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; OGAWA, A.; NAKATANI, T. Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction. In Proceedings of ASRU 2017. Okinawa: IEEE Signal Processing Society, 2017. p. 8-15. ISBN: 978-1-5090-4788-8.Detail
ŽMOLÍKOVÁ, K. Summary report of project "Speech enhancement front-end for robust automatic speech recognition with large amount of training data" for Year 2017. Brno: NTT Corporation, 2017. 1 p.Detail
DELCROIX, M.; ŽMOLÍKOVÁ, K.; KINOSHITA, K.; OGAWA, A.; NAKATANI, T. Single Channel Target Speaker Extraction and Recognition with Speaker Beam. In Proceedings of ICASSP 2018. Calgary: IEEE Signal Processing Society, 2018. p. 5554-5558. ISBN: 978-1-5386-4658-8.Detail
Responsibility: Žmolíková Kateřina, Ing., Ph.D.