Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
HARÁR, P.; BURGET, R.; DUTTA, M.; SINGH, A.
Originální název
Speech Emotion Recognition with Deep Learning
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
This paper describes a method for Speech Emotion Recognition (SER) using Deep Neural Network (DNN) architecture with convolutional, pooling and fully connected layers. We used 3 class subset (angry, neutral, sad) of German Corpus (Berlin Database of Emotional Speech) containing 271 labeled recordings with total length of 783 seconds. Raw audio data were standardized so every audio file has zero mean and unit variance. Every file was split into 20 millisecond segments without overlap. We used Voice Activity Detection (VAD) algorithm to eliminate silent segments and divided all data into TRAIN (80%) VALIDATION (10%) and TESTING (10%) sets. DNN is optimized using Stochastic Gradient Descent. As input we used raw data without any feature selection. Our trained model achieved overall test accuracy of 96.97% on whole-file classification.
Anglický abstrakt
Klíčová slova
Emotion; Speech Recognition; Deep Learning; Classification
Klíčová slova v angličtině
Autoři
Rok RIV
2018
Vydáno
02.02.2017
Místo
Noida, India
ISBN
978-1-5090-2796-5
Kniha
2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)
Strany od
137
Strany do
140
Strany počet
4
URL
https://ieeexplore.ieee.org/document/8049931
BibTex
@inproceedings{BUT133621, author="Pavol {Harár} and Radim {Burget} and Malay Kishore {Dutta} and Anushikha {Singh}", title="Speech Emotion Recognition with Deep Learning", booktitle="2017 4th International Conference on Signal Processing and Integrated Networks (SPIN)", year="2017", pages="137--140", address="Noida, India", doi="10.1109/SPIN.2017.8049931", isbn="978-1-5090-2796-5", url="https://ieeexplore.ieee.org/document/8049931" }