Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
PRASAD, A.; CAROFILIS, A.; VANDERREYDT, G.; KHALIL, D.; MADIKERI, S.; MOTLÍČEK, P.; SCHUEPBACH, C.
Originální název
Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
Self-supervised models trained with high linguistic diversity, such as the XLS-R model, can be effectively fine-tuned for the language recognition task. Typically, a back-end classifier followed by statistics pooling layer are added during train- ing. Commonly used back-end classifiers require a large num- ber of parameters to be trained, which is not ideal in limited data conditions. In this work, we explore smaller parame- ter back-ends using factorized Time Delay Neural Network (TDNN-F). The TDNN-F architecture is also integrated into Emphasized Channel Attention, Propagation and Aggregation- TDNN (ECAPA-TDNN) models, termed ECAPA-TDNN-F, reducing the number of parameters by 30 to 50% absolute, with competitive accuracies and no change in minimum cost. The results show that the ECAPA-TDNN-F can be extended to tasks where ECAPA-TDNN is suitable. We also test the effectiveness of a linear classifier and a variant, the Orthonor- mal linear classifier, previously used in x-vector type systems. The models are trained with NIST LRE17 data and evalu- ated on NIST LRE17, LRE22 and the ATCO2 LID datasets. Both linear classifiers outperform conventional back-ends with improvements in accuracy between 0.9% and 9.1%
Anglický abstrakt
Klíčová slova
Language Identification, Transformers, Wav2Vec2, fine-tuning, low-resource, out-of-domain,
Klíčová slova v angličtině
Autoři
Rok RIV
2025
Vydáno
14.04.2024
Nakladatel
IEEE Signal Processing Society
Místo
Seoul
ISBN
979-8-3503-4485-1
Kniha
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
Strany od
11921
Strany do
11925
Strany počet
5
URL
https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751
BibTex
@inproceedings{BUT193354, author="PRASAD, A. and CAROFILIS, A. and VANDERREYDT, G. and KHALIL, D. and MADIKERI, S. and MOTLÍČEK, P. and SCHUEPBACH, C.", title="Fine-Tuning Self-Supervised Models for Language Identification Using Orthonormal Constraint", booktitle="ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings", year="2024", pages="11921--11925", publisher="IEEE Signal Processing Society", address="Seoul", doi="10.1109/ICASSP48485.2024.10446751", isbn="979-8-3503-4485-1", url="https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10446751" }
Dokumenty
prasad_icassp2024_fine-tuning