Detail publikačního výsledku

Noise-robust speech triage

BARTOS, A.; CIPR, T.; NELSON, D.; SCHWARZ, P.; BANOWETZ, J.; JERABEK, L.

Originální název

Noise-robust speech triage

Anglický název

Noise-robust speech triage

Druh

Článek WoS

Originální abstrakt

A method is presented in which conventional speech algorithms are applied, with no modifications,to improve their performance in extremely noisy environments. It has been demonstrated that, foreigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice ofsignal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependentmodel was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performancewas optimized when the SNR of the testing and training data were close or identical. Inthis current effort multiple i-vector algorithms were used, greatly improving both processingthroughput and equal error rate classification accuracy. Using identical approaches in the samenoisy environment, performance of SID, language identification, gender identification, and diarizationwere significantly improved. A critical factor in this improvement is speech activity detection(SAD) that performs reliably in extremely noisy environments, where the speech itself is barelyaudible. To optimize SAD operation at all SNR levels, two algorithms were employed. The firstmaximized detection probability at low levels (10 dB  SNR < 10 dB) using just the voicedspeech envelope, and the second exploited features extracted from the original speech to improveoverall accuracy at higher quality levels (SNR10 dB).

Anglický abstrakt

A method is presented in which conventional speech algorithms are applied, with no modifications,to improve their performance in extremely noisy environments. It has been demonstrated that, foreigen-channel algorithms, pre-training multiple speaker identification (SID) models at a lattice ofsignal-to-noise-ratio (SNR) levels and then performing SID using the appropriate SNR dependentmodel was successful in mitigating noise at all SNR levels. In those tests, it was found that SID performancewas optimized when the SNR of the testing and training data were close or identical. Inthis current effort multiple i-vector algorithms were used, greatly improving both processingthroughput and equal error rate classification accuracy. Using identical approaches in the samenoisy environment, performance of SID, language identification, gender identification, and diarizationwere significantly improved. A critical factor in this improvement is speech activity detection(SAD) that performs reliably in extremely noisy environments, where the speech itself is barelyaudible. To optimize SAD operation at all SNR levels, two algorithms were employed. The firstmaximized detection probability at low levels (10 dB  SNR < 10 dB) using just the voicedspeech envelope, and the second exploited features extracted from the original speech to improveoverall accuracy at higher quality levels (SNR10 dB).

Klíčová slova

speech algorithms, noisy environments, multiple speaker identification

Klíčová slova v angličtině

speech algorithms, noisy environments, multiple speaker identification

Autoři

BARTOS, A.; CIPR, T.; NELSON, D.; SCHWARZ, P.; BANOWETZ, J.; JERABEK, L.

Rok RIV

2019

Vydáno

23.04.2018

ISSN

1520-8524

Periodikum

JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA

Svazek

143

Číslo

4

Stát

Spojené státy americké

Strany od

2313

Strany do

2320

Strany počet

8

URL

BibTex

@article{BUT147194,
  author="Anthony {Bartos} and Tomáš {Cipr} and Douglas {Nelson} and Petr {Schwarz} and John {Banowetz} and Ladislav {Jerabek}",
  title="Noise-robust speech triage",
  journal="JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA",
  year="2018",
  volume="143",
  number="4",
  pages="2313--2320",
  doi="10.1121/1.5031029",
  issn="0001-4966",
  url="https://asa.scitation.org/doi/10.1121/1.5031029"
}

Dokumenty