Detail projektu

Zdroje financování

Neveřejný sektor - Přímé kontrakty - smluvní výzkum, neveřejné zdroje

O projektu

Current situation in language recognition
The last editions of NIST Language recognition (LRE) evaluations have shown substantial improvement in the performance of LRE systems. Both acoustic and phonotactic approaches have reached a certain maturity in both the actual modeling of target languages and coping with the adverse influences of changing channel. There are several ways how to further improve the current LRE systems and some of them were investigated in the Brno University of Technology (BUT) 2007 submission to this evaluation, for example:

discriminative training and channel compensation techniques for both acoustic and phonotactic modeling.
use of large vocabulary continuous speech recognition (LVCSR) with following confidence measures.

However, with all this beautiful science, we are still facing the old problem of any recognizer's training and testing: the lack of data. While it is easy to train and test an LRE system for languages with established speech and language resources, such as English, Mandarin, etc., rare languages lack these standard resources. Consider the example of Thai: this language is spoken by 65 million speakers, but for the NIST 2007 LRE evaluations, we disposed only of less than 2 hours distributed by NIST as part of the development package, although we have contacted several Thai speech processing labs - a large spontaneous telephone database for this language simply does not exist.

The proposed solution
This proposal aims at filling this gap by using the data acquired from public sources, namely radio broadcasts. This approach (which is pretty intuitive and we do not declare Speech@FIT to be the only place having this idea) should provide us with plenty of data that we believe will lead to:

improved performance for known languages.
ability to process languages that were so far excluded because of unavailability of data.

This approach is however far from "we record the data, push a button and will have a much better LRE system within a month". There is significant amount of work especially on the selection of data and channel normalization.

Popis česky
Projekt se zabývá zlepšením schopnosti detekce méně známých jazyků systémy pro automatickou identifikaci jazyka s použitím rozhlasových dat.

Klíčová slova
language recognition, broadcast data

Originální jazyk

angličtina

Řešitelé

Burget Lukáš, doc. Ing., Ph.D. - hlavní řešitel

Útvary

Ústav počítačové grafiky a multimédií
- odpovědné pracoviště (1.1.1989 - nezadáno)
Výzkumná skupina dolování dat z řeči BUT Speech@FIT
- interní (16.10.2008 - 14.12.2010)
Ústav počítačové grafiky a multimédií
- příjemce (16.10.2008 - 14.12.2010)

Výsledky

JANČÍK, Z.; PLCHOT, O.; BRUMMER, J.; BURGET, L.; GLEMBEK, O.; HUBEIKA, V.; KARAFIÁT, M.; MATĚJKA, P.; MIKOLOV, T.; STRASHEIM, A.; ČERNOCKÝ, J. Data selection and calibration issues in automatic language recognition - investigation with BUT-AGNITIO NIST LRE 2009 system. In Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop. Brno: International Speech Communication Association, 2010. p. 215-221. ISBN: 978-80-214-4114-9.
Detail

BRÜMMER, N.; BURGET, L.; GLEMBEK, O.; HUBEIKA, V.; JANČÍK, Z.; KARAFIÁT, M.; MATĚJKA, P.; MIKOLOV, T.; PLCHOT, O.; STRASHEIM, A. BUT-AGNITIO System Description for NIST Language Recognition Evaluation 2009. Proceedings NIST 2009 Language Recognition Evaluation Workshop. Baltimore, Maryland, USA: National Institute of Standards and Technology, 2009. p. 1-7.
Detail

MIKOLOV, T.; PLCHOT, O.; GLEMBEK, O.; MATĚJKA, P.; BURGET, L.; ČERNOCKÝ, J. PCA-based Feature Extraction for Phonotactic Language Recognition. In Proc. Odyssey 2010 - The Speaker and Language Recognition Workshop. Brno: International Speech Communication Association, 2010. p. 251-255. ISBN: 978-80-214-4114-9.
Detail

Odpovědnost: Burget Lukáš, doc. Ing., Ph.D.

VUT

Fakulty a vysokoškolské ústavy

Součásti

EOARD - Improving the capacity of language recognition systems to handle rare languages using radio broadcast data