Publication detail

Advancing speaker embedding learning: Wespeaker toolkit for research and production

WANG, S. CHEN, Z. HAN, B. WANG, H. XIANG, X. ROHDIN, J. SILNOVA, A. QIAN, Y. LI, H.

Original Title

Type

journal article in Web of Science

Language

English

Original Abstract

Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3'PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

Keywords

Wespeaker; Speaker embedding learning; SSL; Open-source

Authors

WANG, S.; CHEN, Z.; HAN, B.; WANG, H.; XIANG, X.; ROHDIN, J.; SILNOVA, A.; QIAN, Y.; LI, H.

Released

1. 7. 2024

ISBN

0167-6393

Periodical

Speech Communication

Year of study

162

Number

103104

State

Kingdom of the Netherlands

Pages from

Pages to

Pages count

URL

https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%

BibTex

@article{BUT193986,
  author="WANG, S. and CHEN, Z. and HAN, B. and WANG, H. and XIANG, X. and ROHDIN, J. and SILNOVA, A. and QIAN, Y. and LI, H.",
  title="Advancing speaker embedding learning: Wespeaker toolkit for research and production",
  journal="Speech Communication",
  year="2024",
  volume="162",
  number="103104",
  pages="1--12",
  doi="10.1016/j.specom.2024.103104",
  issn="0167-6393",
  url="https://pdf.sciencedirectassets.com/271578/1-s2.0-S0167639324X00060/1-s2.0-S0167639324000761/main.pdf?X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAsaCXVzLWVhc3QtMSJIMEYCIQC8Doe66%2Bu6V%2FODd2NY6EZwVTEeN05avzWi09%2FPx3ob%2FQIhAP%2BOyz3L2hXSsDYY4l3zSuz1pzOjFiaTh%"
}

Documents

wang_speech communication_2024.pdf

VUT

Faculties

University Institutes

Parts

Advancing speaker embedding learning: Wespeaker toolkit for research and production