Detail publikace

Temporal processing for feature extraction in speech recognition

ČERNOCKÝ, J.

Originální název

Temporal processing for feature extraction in speech recognition

Typ

kapitola v knize

Jazyk

angličtina

Originální abstrakt

Speech recognition is a booming research field, having large number of applications in telecommunications (especially mobile), automobile industry, consumer electronics, military and security, etc. Speech recognition systems are classically built from three basic blocks: feature extraction, acoustic matching and language modeling. While the last two are trained on data (annotated databases for acoustics and large speech corpora for the LM), feature extraction block is often neglected and most often, mel-frequency cepstral coefficients (MFCC) are used. This work concentrates on two techniques that should improve the feature extraction. The first one is temporal filtering of feature trajectories using filters designed on data using Linear Discriminant Analysis (LDA). This technique is shown to improve the recognition accuracy of isolated Czech words, confirming previous results on US-English obtained by our colleagues from OGI Portland. The second part of the work concentrates on more revolutionary approach of feature extraction using TRAPs (temporal patterns) whose fundamentals were also laid at OGI. Several experiments were conducted on three databases during author's stay at OGI. Although we have shown that TRAPs are comparable to MFCC's only on a small vocabulary recognition task, we believe that combination of frequency-band processing and neural nets will become very important in the next decade, and that they will become standard blocks of feature extraction.

Klíčová slova

automatic speech processing, speech recognition, features for speech recognition, temporal filtering, neural networks, data-driven techniques

Autoři

ČERNOCKÝ, J.

Rok RIV

2004

Vydáno

26. 4. 2003

Nakladatel

Publishing house of Brno University of Technology VUTIUM

Místo

Brno

ISBN

80-214-2395-1

Kniha

Vědecké spisy VUT

Edice

Edice Habilitační a inaugurační spisy, sv. 112

Strany počet

30

URL

BibTex

@inbook{BUT55484,
  author="Jan {Černocký}",
  title="Temporal processing for feature extraction in speech recognition",
  booktitle="Vědecké spisy VUT",
  year="2003",
  publisher="Publishing house of Brno University of Technology VUTIUM",
  address="Brno",
  series="Edice Habilitační a inaugurační spisy, sv. 112",
  pages="30",
  isbn="80-214-2395-1",
  url="http://www.fit.vutbr.cz/~cernocky/publi/2003/vutium.pdf"
}