Course detail

Modern Methods of Speech Processing

FIT-MZDAcad. year: 2010/2011

Not applicable.

Language of instruction

Czech, English

Mode of study

Not applicable.

Learning outcomes of the course unit

Not applicable.

Prerequisites

Not applicable.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

Not applicable.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

Not applicable.

Specification of controlled education, way of implementation and compensation for absences

Not applicable.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Psutka, J.: Komunikace s s počítačem mluvenou řečí. Academia, Praha, 1995 Gold, B., Morgan, N.: Speech and audio signal processing, John Wiley & Sons, 2000 Texty z http://www.fit.vutbr.cz/~cernocky/speech/

Recommended reading

Moore, B.C.J., : An introduction to the psychology of hearing, Academic Press, 1989 Jelinek, F.: Statistical Methods for Speech Recognition, MIT Press, 1998 Fukunaga, K.: Introduction to Statistical Pattern Recognition, Academic Press, 1990 Vapnik, V. N.: Statistical Learning Theory, Wiley-Interscience, 1998 Dutoit, T.: An Introduction to Text-To-Speech Synthesis, Kluwer Academic Publishers, 1997

Classification of course in study plans

  • Programme CSE-PHD-4 Doctoral

    branch DVI4 , 0 year of study, winter semester, elective

  • Programme CSE-PHD-4 Doctoral

    branch DVI4 , 0 year of study, winter semester, elective

Type of course unit

 

Lecture

39 hod., optionally

Teacher / Lecturer

Syllabus

  1. Review of notions: signal vectors and parameter matrices, basic statistics.
  2. Stochastic modeling of parameters, modeling of time by state sequences.
  3. Hidden Markov models: basic structure, training.
  4. Recognition of speech using HMM: Viterbi search, token passing.
  5. Pronunciation dictionaries and language models.
  6. Speech production and derived parameters: LPC, Log area ratios, line spectral pairs.
  7. Speech perception and derived parameters: Mel-frequency cepstral coefficients, Perceptual linear prediction.
  8. Temporal properties of hearing - RASTA filtering.
  9. Training the feature extractor on the data - linear discriminant analysis.
  10. Speech databases: standards, contents, speakers, annotations.
  11. Vocoders and modeling of the excitation: multi-pulse and stochastic excitations (GSM coding).
  12. CELP coding: long-term predictor, codebooks. Very low bit-rate coders.
  13. Current methods of speaker identification and verification.