Course detail
Modern Methods of Speech Processing
FIT-MZDAcad. year: 2025/2026
From simple systems to stochastic modelling. Hidden Markov models. Large vocabulary continuous speech recognition. Language models. Speech production, speech perception: time and frequency. Data-driven methods for feature extraction. Speech databases. Excitation in speech coding, CELP. Speaker identification.
Language of instruction
Czech, English
Mode of study
Not applicable.
Guarantor
Entry knowledge
basic knowledge of digitial signal processing, having attended a basic course on speech processing is advantageous.
Rules for evaluation and completion of the course
attending the course is not checked, the evaluation of the course is upon the results of exam or final report.
Aims
We will mention methods currently implemented in industrial applications (such as mobile phones or commercially available recognizers) but will not promissing methods existing so far only in laboratories. Attention will be paid to techniques derived using data and inspired by human autition and speech production.
This course allows students to implement simple speech processinga pplications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems, using modern methods, in academic and industrial environments.
This course allows students to implement simple speech processinga pplications, as for example voice command of a process. However, first of all it enables them to join the development of complex systems for speech recognition and coding systems, using modern methods, in academic and industrial environments.
Study aids
Not applicable.
Prerequisites and corequisites
Not applicable.
Basic literature
Not applicable.
Recommended reading
Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997
Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998
Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995
Texty z http://www.fit.vutbr.cz/~cernocky/speech/
Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998
Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000
Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998
Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995
Texty z http://www.fit.vutbr.cz/~cernocky/speech/
Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998
Classification of course in study plans
- Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
- Programme DIT Doctoral 0 year of study, winter semester, compulsory-optional
- Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
- Programme DIT-EN Doctoral 0 year of study, winter semester, compulsory-optional
Type of course unit
Lecture
39 hod., optionally
Teacher / Lecturer
Syllabus
- Review of notions: signal vectors and parameter matrices, basic statistics.
- Stochastic modeling of parameters, modeling of time by state sequences.
- Hidden Markov models: basic structure, training.
- Recognition of speech using HMM: Viterbi search, token passing.
- Pronunciation dictionaries and language models.
- Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
- Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
- Temporal properties of hearing - RASTA filtering.
- Training the feature extractor on the data - linear discriminant analysis.
- Speech databases: standards, contents, speakers, annotations.
- Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
- CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
- Current methods of speaker identification and verification.
Guided consultation in combined form of studies
26 hod., optionally
Teacher / Lecturer