R&D Result Detail

Original Title

Automatic Speech Analysis Framework for ATC Communication in HAAWAII

English Title

Automatic Speech Analysis Framework for ATC Communication in HAAWAII

Type

Paper in proceedings (conference paper)

Original Abstract

Over the past years, several SESAR funded ex- ploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic archi- tecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively.

English abstract

Over the past years, several SESAR funded ex- ploratory projects focused on bringing speech and language technologies to the Air Traffic Management (ATM) domain and demonstrating their added value through successful applications. Recently ended HAAWAII project developed a generic archi- tecture and framework, which was validated through several tasks such as callsign highlighting, pre-filling radar labels, and readback error detection. The primary goal was to support pilot and air traffic controller communication by deploying Automatic Speech Recognition (ASR) engines. Contextual information (if available) extracted from surveillance data, flight plan data, or previous communication can be exploited via entity boosting to further improve the recognition performance. HAAWAII proposed various design attributes to integrate the ASR engine into the ATM framework, often depending on concrete technical specifics of target air navigation service providers (ANSPs). This paper gives a brief overview and provides an objective assessment of speech processing components developed and integrated into the HAAWAII framework. Specifically, the following tasks are evaluated w.r.t. application domain: (i) speech activity detection, (ii) speaker segmentation and speaker role classification, as well as (iii) ASR. To our best knowledge, HAAWAII framework offers the best performing speech technologies for ATM, reaching high recognition accuracy (i.e., error-correction done by exploiting additional contextual data), robustness (i.e., models developed using large training corpora) and support for rapid domain transfer (i.e., to new ATM sector with minimum investment). Two scenarios provided by ANSPs were used for testing, achieving callsign detection accuracy of about 96% and 95% for NATS and ISAVIA, respectively.

Keywords

HAAWAII project, Speech activity detection, Speaker segmentation, Speaker role classification, Automatic Speech Recognition.

Key words in English

HAAWAII project, Speech activity detection, Speaker segmentation, Speaker role classification, Automatic Speech Recognition.

Authors

MOTLÍČEK, P.; PRASAD, A.; NIGMATULINA, I.; HELMKE, H.; OHNEISER, O.; KLEINERT, M.

RIV year

2025

Released

27.11.2023

Publisher

SESAR Joint Undertaking

Location

Seville

Book

SESAR Innovation Days

ISBN

0770-1268

Periodical

SESAR Innovation Days

Volume

2023

Number

11

State

Kingdom of Belgium

Pages from

1

Pages to

9

Pages count

9

URL

https://www.sesarju.eu/sites/default/files/documents/sid/2023/Papers/SIDs_2023_paper_72%20final.pdf

BibTex

@inproceedings{BUT187933,
  author="MOTLÍČEK, P. and PRASAD, A. and NIGMATULINA, I. and HELMKE, H. and OHNEISER, O. and KLEINERT, M.",
  title="Automatic Speech Analysis Framework for ATC Communication in HAAWAII",
  booktitle="SESAR Innovation Days",
  year="2023",
  journal="SESAR Innovation Days",
  volume="2023",
  number="11",
  pages="1--9",
  publisher="SESAR Joint Undertaking",
  address="Seville",
  issn="0770-1268",
  url="https://www.sesarju.eu/sites/default/files/documents/sid/2023/Papers/SIDs_2023_paper_72%20final.pdf"
}

Documents

motlicek_SIDs_2023_paper_72 final

VUT

Faculties and university institutes

Parts

Automatic Speech Analysis Framework for ATC Communication in HAAWAII