Automatic Speech Recognition and its Contextual Enhancement for Singapore ATC Voice Communication
Paper ID
Conference
Year
Theme
Project Name
Keywords:
Authors
DOI
Abstract
In a first for Singapore Air Traffic Control (ATC), a complete pipeline of Automatic Speech Recognition (ASR) of voice communication between pilots and Air Traffic Controllers (ATCOs) is presented. Increased complexity due to multi- accented speech, cockpit noise, and speaker dependent biases were overcome by using data sufficiently large enough for training the models, collected across multiple domains namely enroute, approach and tower. We also carried out detailed benchmarking and analysis of various ASR technologies ranging from hybrid HMM-DNN to supervised End to End (E2E) to pre-trained semi-supervised models fine-tuned with ATC voice data. This benchmarking helped us to conclude that traditional hybrid HMM-DNN is still competitive enough to be used in domain-specific areas like ATC. We enhanced the Callsign Recognition Rate (CRR) from audio, with a fast, efficient method, significantly improving it. The preprocessing pipeline includes our cutting-edge Voice Activity Detection (VAD), Speaker Turn Detection, and Speaker Role Detection (SRD) pipeline. We achieved a WER of 5.48%, in addition to improving the CRR by 6.01%.