Course detail

Speech Signal Analysis and Synthesis

FEKT-LASSAcad. year: 2011/2012

Phonetic description of the Czech language, signal windowing, preemphasis, pitch estimation. Representations of a speech in the time and frequency domains, short-time analysis of speech signal, selection of suitable features, word endpoints detection, linear and nonlinear time warping, isolated word recognition system, connected word recognition, suitable speech units and features for speaker recognition, hidden Markov models, speaker identification, speaker verification, speech synthesis, vocoders, some typical applications of speech and speaker recognition.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Ing. Milan Sigmund, CSc.

Department

Department of Radio Electronics (UREL)

Learning outcomes of the course unit

The students become familiar with the phonetic description of the Czech language, speech signal features, selection of suitable speech features, speech and speakers recognition systems, speech synthesis, vocoders, special integrated circuits for speech processing, some typical applications.

Prerequisites

The subject knowledge on the Bachelor´s degree level is requested.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Teaching methods depend on the type of course unit as specified in the article 7 of BUT Rules for Studies and Examinations.

Assesment methods and criteria linked to learning outcomes

Requirements for completion of a course are specified by a regulation issued by the lecturer responsible for the course and updated for every.

Course curriculum

Introduction, acoustic theory of speech production.
Vocal tract model, phonetic description of Czech language.
Preprocessing of speech signal: windowing, preemphasis.
Energy, zero-crossing rate and autocorrelation function.
Linear prediction coding and derived coefficients.
Cepstral analysis of speech signal.
Estimation of fundamental speech frequency.
Linear and nonlinear time alignments.
Deterministical and statistical classificators, hidden Markov models.
Classificators learning, error rate estimation.
Voice recognition, speaker verification and identification.
Speech synthesis methods.
Speech coding and transmission, basic types of vocoders.

Work placements

Not applicable.

Aims

The aim of the course is to make students familiar with the basic methods for automatic recognition of isolated spoken words, with the approaches for speaker verification and identification based on their voice and with the speech synthesis methods.

Specification of controlled education, way of implementation and compensation for absences

The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

SIGMUND,M. Analýza řečových signálů. Skriptum FEKT VUT, Brno 2000.
PSUTKA,J. Komunikace s počítačem mluvenou řečí. Academia, Praha 1995.

Type of course unit

Lecture

39 hours, optionally

Teacher / Lecturer

prof. Ing. Milan Sigmund, CSc.

Syllabus

01 Introduction, acoustic theory of speech production.
02 Vocal tract model, phonetic description of Czech language.
03 Preprocessing of speech signal: windowing, preemphasis.
04 Energy, zero-crossing rate and autocorrelation function.
05 Linear prediction coding and derived coefficients.
06 Cepstral analysis of speech signal.
07 Estimation of fundamental speech frequency.
08 Linear and nonlinear time alignments.
09 Deterministical and statistical classificators, hidden Markov models.
10 Classificators learning, error rate estimation.
11 Voice recognition, speaker verification and identification.
12 Speech synthesis methods.
13 Speech coding and transmission, basic types of vocoders.

Exercise in computer lab

52 hours, compulsory

Teacher / Lecturer

prof. Ing. Milan Sigmund, CSc.

Syllabus

Illustration of the speech waveform, details of phonemes.
Spectrum of typical vowel sounds, formant frequencies.
Spectrum analysis using Hamming and rectangular window.
Short-time energy and zero-crossings for(un)voiced speech.
Detection of speech/pause and word boundaries.
Linear prediction of speech waveform and derived spectra.
Transformations between speech features.
Correlations between various speech signal parameters.
Calculation of several distances between speech frames.
Automatic recognition of an unknown word.
Segmentation of a word string into phonetic units.
Measuring of fundamental frequency by Center-Clipping.
Cepstral analysis for voiced speech.
Identification of different speakers.

VUT

Faculties

University Institutes

Parts

Speech Signal Analysis and Synthesis

Type of course unit