Přístupnostní navigace
E-application
Search Search Close
Publication result detail
MAI, F.; ZULUAGA-GOMEZ, J.; PARCOLLET, T.; MOTLÍČEK, P.
Original Title
HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition
English Title
Type
Paper in proceedings (conference paper)
Original Abstract
State-of-the-art ASR systems have achieved promising results by modeling local and global interactions separately. While the former can be computed efficiently, global interactions are usu- ally modeled via attention mechanisms, which are expensive for long input sequences. Here, we address this by extending Hy- perMixer, an efficient alternative to attention exhibiting linear complexity, to the Conformer architecture for speech recogni- tion, leading to HyperConformer. In particular, multi-head Hy- perConformer achieves comparable or higher recognition per- formance while being more efficient than Conformer in terms of inference speed, memory, parameter count, and available train- ing data. HyperConformer achieves a word error rate of 2.9% on LibriSpeech test-clean with less than 8M neural parameters and a peak memory during training of 5.7GB, hence trainable with accessible hardware. Encoder speed is between 38% on mid-length speech and 56% on long speech faster than an equiv- alent Conformer.1)
English abstract
Keywords
Hypernetworks, HyperMixer, Efficient Auto- matic Speech Recognition, LibriSpeech, SpeechBrain
Key words in English
Authors
RIV year
2024
Released
20.08.2023
Publisher
International Speech Communication Association
Location
Dublin
Book
Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH
ISBN
1990-9772
Periodical
Proceedings of Interspeech
Volume
2023
Number
08
State
French Republic
Pages from
2213
Pages to
2217
Pages count
5
URL
https://www.isca-archive.org/interspeech_2023/mai23_interspeech.pdf
BibTex
@inproceedings{BUT187786, author="MAI, F. and ZULUAGA-GOMEZ, J. and PARCOLLET, T. and MOTLÍČEK, P.", title="HyperConformer: Multi-head HyperMixer for Efficient Speech Recognition", booktitle="Proceedings of the Annual Conference of International Speech Communication Association, INTERSPEECH", year="2023", journal="Proceedings of Interspeech", volume="2023", number="08", pages="2213--2217", publisher="International Speech Communication Association", address="Dublin", doi="10.21437/Interspeech.2023-1611", issn="1990-9772", url="https://www.isca-archive.org/interspeech_2023/mai23_interspeech.pdf" }
Documents
mai23_interspeech