R&D Result Detail

Original Title

Speech Emotion Recognition with Deep Learning

English Title

Speech Emotion Recognition with Deep Learning

Type

Paper in proceedings (conference paper)

Original Abstract

This paper describes a method for Speech Emotion Recognition (SER) using Deep Neural Network (DNN) architecture with convolutional, pooling and fully connected layers. We used 3 class subset (angry, neutral, sad) of German Corpus (Berlin Database of Emotional Speech) containing 271 labeled recordings with total length of 783 seconds. Raw audio data were standardized so every audio file has zero mean and unit variance. Every file was split into 20 millisecond segments without overlap. We used Voice Activity Detection (VAD) algorithm to eliminate silent segments and divided all data into TRAIN (80%) VALIDATION (10%) and TESTING (10%) sets. DNN is optimized using Stochastic Gradient Descent. As input we used raw data without any feature selection. Our trained model achieved overall test accuracy of 96.97% on whole-file classification.

English abstract

This paper describes a method for Speech Emotion Recognition (SER) using Deep Neural Network (DNN) architecture with convolutional, pooling and fully connected layers. We used 3 class subset (angry, neutral, sad) of German Corpus (Berlin Database of Emotional Speech) containing 271 labeled recordings with total length of 783 seconds. Raw audio data were standardized so every audio file has zero mean and unit variance. Every file was split into 20 millisecond segments without overlap. We used Voice Activity Detection (VAD) algorithm to eliminate silent segments and divided all data into TRAIN (80%) VALIDATION (10%) and TESTING (10%) sets. DNN is optimized using Stochastic Gradient Descent. As input we used raw data without any feature selection. Our trained model achieved overall test accuracy of 96.97% on whole-file classification.

Keywords

Emotion; Speech Recognition; Deep Learning; Classification

Key words in English

Emotion; Speech Recognition; Deep Learning; Classification

Authors

HARÁR, P.; BURGET, R.; DUTTA, M.; SINGH, A.

RIV year

2018

Released

02.02.2017

Location

Noida, India

ISBN

978-1-5090-2796-5

Book

2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)

Pages from

137

Pages to

140

Pages count

4

URL

https://ieeexplore.ieee.org/document/8049931

Full text in the Digital Library

http://hdl.handle.net/

BibTex

@inproceedings{BUT133621,
  author="Pavol {Harár} and Radim {Burget} and Malay Kishore {Dutta} and Anushikha {Singh}",
  title="Speech Emotion Recognition with Deep Learning",
  booktitle="2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)",
  year="2017",
  pages="137--140",
  address="Noida, India",
  doi="10.1109/SPIN.2017.8049931",
  isbn="978-1-5090-2796-5",
  url="https://ieeexplore.ieee.org/document/8049931"
}

VUT

Faculties

University Institutes

Parts

Speech Emotion Recognition with Deep Learning