R&D Result Detail

Original Title

DNN Based Embeddings for Language Recognition

English Title

DNN Based Embeddings for Language Recognition

Type

Paper in proceedings (conference paper)

Original Abstract

In this work, we present a language identification (LID) systembased on embeddings. In our case, an embedding is a fixed-lengthvector (similar to i-vector) that represents the whole utterance, butunlike i-vector it is designed to contain mostly information relevantto the target task (LID). In order to obtain these embeddings, wetrain a deep neural network (DNN) with sequence summarizationlayer to classify languages. In particular, we trained a DNN basedon bidirectional long short-term memory (BLSTM) recurrent neuralnetwork (RNN) layers, whose frame-by-frame outputs are summarizedinto mean and standard deviation statistics. After this poolinglayer, we add two fully connected layers whose outputs correspondto embeddings. Finally, we add a softmax output layer and train thewhole network with multi-class cross-entropy objective to discriminatebetween languages. We report our results on NIST LRE 2015and we compare the performance of embeddings and correspondingi-vectors both modeled by Gaussian Linear Classifier (GLC). Usingonly embeddings resulted in comparable performance to i-vectorsand by performing score-level fusion we achieved 7.3% relativeimprovement over the baseline.

English abstract

In this work, we present a language identification (LID) systembased on embeddings. In our case, an embedding is a fixed-lengthvector (similar to i-vector) that represents the whole utterance, butunlike i-vector it is designed to contain mostly information relevantto the target task (LID). In order to obtain these embeddings, wetrain a deep neural network (DNN) with sequence summarizationlayer to classify languages. In particular, we trained a DNN basedon bidirectional long short-term memory (BLSTM) recurrent neuralnetwork (RNN) layers, whose frame-by-frame outputs are summarizedinto mean and standard deviation statistics. After this poolinglayer, we add two fully connected layers whose outputs correspondto embeddings. Finally, we add a softmax output layer and train thewhole network with multi-class cross-entropy objective to discriminatebetween languages. We report our results on NIST LRE 2015and we compare the performance of embeddings and correspondingi-vectors both modeled by Gaussian Linear Classifier (GLC). Usingonly embeddings resulted in comparable performance to i-vectorsand by performing score-level fusion we achieved 7.3% relativeimprovement over the baseline.

Keywords

Embeddings, language recognition, LID, DNN

Key words in English

Embeddings, language recognition, LID, DNN

Authors

LOZANO DÍEZ, A.; PLCHOT, O.; MATĚJKA, P.; GONZALEZ-RODRIGUEZ, J.

RIV year

2019

Released

15.04.2018

Publisher

IEEE Signal Processing Society

Location

Calgary

ISBN

978-1-5386-4658-8

Book

Proceedings of ICASSP 2018

Pages from

5184

Pages to

5188

Pages count

5

URL

https://www.fit.vut.cz/research/publication/11723/

Full text in the Digital Library

http://hdl.handle.net/

BibTex

@inproceedings{BUT155045,
  author="Alicia {Lozano Díez} and Oldřich {Plchot} and Pavel {Matějka} and Joaquin {Gonzalez-Rodriguez}",
  title="DNN Based Embeddings for Language Recognition",
  booktitle="Proceedings of ICASSP 2018",
  year="2018",
  pages="5184--5188",
  publisher="IEEE Signal Processing Society",
  address="Calgary",
  doi="10.1109/ICASSP.2018.8462403",
  isbn="978-1-5386-4658-8",
  url="https://www.fit.vut.cz/research/publication/11723/"
}

Documents

lozano_icassp2018_0005184

VUT

Faculties

University Institutes

Parts

DNN Based Embeddings for Language Recognition