R&D Result Detail

Original Title

Speech and Language Recognition with Low-rank Adaptation of Pretrained Models

English Title

Speech and Language Recognition with Low-rank Adaptation of Pretrained Models

Type

Paper in proceedings (conference paper)

Original Abstract

Finetuning large pretrained models demands considerable computational resources, posing practical constraints. Major- ity of the total number of parameters in these models are used by fully connected layers. In this work, we consider applying a semi-orthogonal constraint, followed by full finetuning to the fully connected layers reduces model parameters significantly without sacrificing efficacy in downstream tasks. Specifically, we consider wav2vec2.0 XLS-R and Whisper models for Auto- matic Speech Recognition and Language Recognition. Our re- sults show that we can reduce the model size by approximately 24% during both training and inference time with 0.7% absolute drop in performance for XLS-R and no drop in performance for Whisper for ASR. In combination with performance-efficient training with low-rank adapters, the resource requirements for training can be further reduced by up to 90%.

English abstract

Finetuning large pretrained models demands considerable computational resources, posing practical constraints. Major- ity of the total number of parameters in these models are used by fully connected layers. In this work, we consider applying a semi-orthogonal constraint, followed by full finetuning to the fully connected layers reduces model parameters significantly without sacrificing efficacy in downstream tasks. Specifically, we consider wav2vec2.0 XLS-R and Whisper models for Auto- matic Speech Recognition and Language Recognition. Our re- sults show that we can reduce the model size by approximately 24% during both training and inference time with 0.7% absolute drop in performance for XLS-R and no drop in performance for Whisper for ASR. In combination with performance-efficient training with low-rank adapters, the resource requirements for training can be further reduced by up to 90%.

Keywords

parameter reduction, language identification, speech recognition, wav2vec2.0

Key words in English

parameter reduction, language identification, speech recognition, wav2vec2.0

Authors

PRASAD, A.; MADIKERI, S.; KHALIL, D.; MOTLÍČEK, P.; SCHUEPBACH, C.

RIV year

2025

Released

01.09.2024

Publisher

International Speech Communication Association

Location

Kos Island

Book

Proceedings of Interspeech

ISBN

1990-9772

Periodical

Proceedings of Interspeech

Volume

2024

Number

9

State

French Republic

Pages from

2825

Pages to

2829

Pages count

5

URL

https://www.isca-archive.org/interspeech_2024/prasad24_interspeech.html

BibTex

@inproceedings{BUT193370,
  author="PRASAD, A. and MADIKERI, S. and KHALIL, D. and MOTLÍČEK, P. and SCHUEPBACH, C.",
  title="Speech and Language Recognition with Low-rank Adaptation of Pretrained Models",
  booktitle="Proceedings of Interspeech",
  year="2024",
  journal="Proceedings of Interspeech",
  volume="2024",
  number="9",
  pages="2825--2829",
  publisher="International Speech Communication Association",
  address="Kos Island",
  doi="10.21437/Interspeech.2024-2187",
  issn="1990-9772",
  url="https://www.isca-archive.org/interspeech_2024/prasad24_interspeech.html"
}

Documents

prasad_2024_interspeech

VUT

Faculties and university institutes

Parts

Speech and Language Recognition with Low-rank Adaptation of Pretrained Models