Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
BARAHONA, S.; SILNOVA, A.; MOŠNER, L.; PENG, J.; PLCHOT, O.; ROHDIN, J.; ZHANG, L.; HAN, J.; PALKA, P.; LANDINI, F.; BURGET, L.; STAFYLAKIS, T.; CUMANI, S.; BOBOŠ, D.; HLAVAČEK, M.; KODOVSKY, M.; PAVLIČEK, T.
Originální název
Analysis of ABC Frontend Audio Systems for the NIST-SRE24
Anglický název
Druh
Stať ve sborníku v databázi WoS či Scopus
Originální abstrakt
We present a comprehensive analysis of the embedding extractors (frontends) developed by the ABC team for the audio track of NIST SRE 2024. We follow the two scenarios imposed by NIST: using only a provided set of telephone recordings for training (fixed) or adding publicly available data (open condition). Under these constraints, we develop the best possible speaker embedding extractors for the pre-dominant conversational telephone speech (CTS) domain. We explored architectures based on ResNet with different pooling mechanisms, recently introduced ReDimNet architecture, as well as a system based on the XLS-R model, which represents the family of large pre-trained self-supervised models. In open condition, we train on VoxBlink2 dataset, containing 110 thousand speakers across multiple languages. We observed a good performance and robustness of VoxBlink-trained models, and our experiments show practical recipes for developing state-of-the-art frontends for speaker recognition.
Anglický abstrakt
Klíčová slova
embedding extractors | NIST-SRE | speaker recognition | VoxBlink
Klíčová slova v angličtině
Autoři
Rok RIV
2026
Vydáno
17.08.2025
Nakladatel
International Speech Communication Association
Místo
Rotterdam
Kniha
Proceedings of the Annual Conference of the International Speech Communication Association Interspeech
Periodikum
Interspeech
Stát
Nizozemsko
Strany od
5763
Strany do
5767
Strany počet
5
URL
https://www.isca-archive.org/interspeech_2025/barahona25_interspeech.pdf
BibTex
@inproceedings{BUT199934, author="{} and Anna {Silnova} and Ladislav {Mošner} and Junyi {Peng} and Oldřich {Plchot} and Johan Andréas {Rohdin} and Lin {Zhang} and Jiangyu {Han} and Petr {Pálka} and Federico Nicolás {Landini} and Lukáš {Burget} and {} and Sandro {Cumani} and Dominik {Boboš} and {} and {} and {}", title="Analysis of ABC Frontend Audio Systems for the NIST-SRE24", booktitle="Proceedings of the Annual Conference of the International Speech Communication Association Interspeech", year="2025", journal="Interspeech", pages="5763--5767", publisher="International Speech Communication Association", address="Rotterdam", doi="10.21437/Interspeech.2025-2737", url="https://www.isca-archive.org/interspeech_2025/barahona25_interspeech.pdf" }
Dokumenty
barahona_2025_interspeech