Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
ŽMOLÍKOVÁ, K.; DELCROIX, M.; KINOSHITA, K.; HIGUCHI, T.; OGAWA, A.; NAKATANI, T.
Originální název
Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
Recently, schemes employing deep neural networks (DNNs) forextracting speech from noisy observation have demonstratedgreat potential for noise robust automatic speech recognition.However, these schemes are not well suited when the interferingnoise is another speaker. To enable extracting a target speakerfrom a mixture of speakers, we have recently proposed to informthe neural network using speaker information extracted froman adaptation utterance from the same speaker. In our previouswork, we explored ways how to inform the network about thespeaker and found a speaker adaptive layer approach to be suitablefor this task. In our experiments, we used speaker featuresdesigned for speaker recognition tasks as the additional speakerinformation, which may not be optimal for the speaker extractiontask. In this paper, we propose a usage of a sequence summarizingscheme enabling to learn the speaker representation jointlywith the network. Furthermore, we extend the previous experimentsto demonstrate the potential of our proposed methodas a front-end for speech recognition and explore the effect ofadditional noise on the performance of the method.
Anglický abstrakt
Klíčová slova
speaker extraction, speaker adaptive neural network, multi-speaker speech recognition, speaker representation learning, beamforming
Klíčová slova v angličtině
Autoři
Rok RIV
2018
Vydáno
16.12.2017
Nakladatel
IEEE Signal Processing Society
Místo
Okinawa
ISBN
978-1-5090-4788-8
Kniha
Proceedings of ASRU 2017
Strany od
8
Strany do
15
Strany počet
URL
http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf
BibTex
@inproceedings{BUT144503, author="Kateřina {Žmolíková} and Marc {Delcroix} and Keisuke {Kinoshita} and Takuya {Higuchi} and Atsunori {Ogawa} and Tomohiro {Nakatani}", title="Learning Speaker Representation for Neural Network Based Multichannel Speaker Extraction", booktitle="Proceedings of ASRU 2017", year="2017", pages="8--15", publisher="IEEE Signal Processing Society", address="Okinawa", doi="10.1109/ASRU.2017.8268910", isbn="978-1-5090-4788-8", url="http://www.fit.vutbr.cz/research/groups/speech/publi/2017/zmolikova_asru2017.pdf" }