The HAAWAII Framework for Automatic Speech Understanding of Air Traffic Communication

Paper ID

SIDs-2023-06

Conference

SESAR Innovation Days

Year

2023

Theme

Speech Recognition

Project Name

SESAR 2020 ER4 project HAAWAII, SESAR 2020 IR Wave 2 project PJ10-W2 PROSA

Keywords:

ABSR, air traffic control, ASRU, HAAWAII framework, Speech Recognition, Speech Understanding, Voice Recognition

Authors

Hartmut Helmke, Matthias Kleinert, Arthur Linß, Petr Motlicek, Hanno Wiese, Lucas Klamert, Julia Harfmann, Nuno Cebola, Hörður Arilíusson, Teodor Simiganoschi

DOI

https://doi.org/10.61009/SID.2023.1.04

Project Number

874464

Project Number

884287

Link

Download

Abstract

During the last decade many successful applications combining Automatic Speech Recognition and Understanding (ASRU) for Air Traffic Management applications have been proposed and demonstrated. The HAAWAII project developed a generic architecture and framework, which was validated for, e.g., callsign highlighting, pre-filling radar labels and readback error detection. It supports recognizing and understanding pilot and air traffic controller (ATCo) transmissions. Contextual information extracted from available surveillance data, from flight plan data and from previous transmissions can be exploited to significantly improve ASRU performance. Different design decisions have been taken, depending on concrete scenarios. This paper evaluates the effect of the design decisions integrated in the HAAWAII framework on overall performance for speech understanding based on eight hypotheses, of which seven are validated. Using all framework elements enables command recognition rates for ATCos of 90% for real-time applications and 93% for offline applications, respectively. The most significant impact is achieved, when callsign information from surveillance data is available: the command recognition rate improves by more than 20% absolute. Knowing apriori, whether ATCo or pilot is speaking, can provide additional improvement in command recognition rate up to 16% absolute. The reported results are based on commands from apron, approach, and enroute recorded both in laboratory and in ops room environment.