Automatic Speech Recognition and its Contextual Enhancement for Singapore ATC Voice Communication

Paper ID

SIDs-2024-023

Conference

SESAR Innovation Days

Year

2024

Theme

Speech Recognition

Project Name

Keywords:

Air Traffic Control; Callsign Recognition; Air Traffic Controller; Automatic Speech Recognition; Contextual Speech Recognition

Authors

Jayakrishnan Melur Madhathil, Nguyen Ngoc Khanh, Lee Seounghoon, Tran Anh Dung, Luong Trung Tuan and Tran Huy Dat

DOI

https://doi.org/10.61009/SID.2024.1.12

Link

Download

Abstract

In a first for Singapore Air Traffic Control (ATC), a complete pipeline of Automatic Speech Recognition (ASR) of voice communication between pilots and Air Traffic Controllers (ATCOs) is presented. Increased complexity due to multi- accented speech, cockpit noise, and speaker dependent biases were overcome by using data sufficiently large enough for training the models, collected across multiple domains namely enroute, approach and tower. We also carried out detailed benchmarking and analysis of various ASR technologies ranging from hybrid HMM-DNN to supervised End to End (E2E) to pre-trained semi-supervised models fine-tuned with ATC voice data. This benchmarking helped us to conclude that traditional hybrid HMM-DNN is still competitive enough to be used in domain-specific areas like ATC. We enhanced the Callsign Recognition Rate (CRR) from audio, with a fast, efficient method, significantly improving it. The preprocessing pipeline includes our cutting-edge Voice Activity Detection (VAD), Speaker Turn Detection, and Speaker Role Detection (SRD) pipeline. We achieved a WER of 5.48%, in addition to improving the CRR by 6.01%.