Using Relative State Transformer Models for Multi-Agent Reinforcement Learning in Air Traffic Control

Paper ID

SIDs-2023-04

Conference

SESAR Innovation Days

Year

2023

Theme

Machine learning and artificial intelligence

Project Name

–

Keywords:

air traffic control, artificial intelligence, Multi-Agent Reinforcement Learning, Transformers

Authors

Jan Groot, Joost Ellerbroek and Jacco Hoekstra

DOI

https://doi.org/10.61009/SID.2023.1.02

Project Number

–

Link

Download

Abstract

Deep Reinforcement Learning has seen more usage in the field of Air Traffic Control over the last couple of years. As the number of aircraft in a given sector of airspace is not constant, there is a need for methods to be invariant to the number of agents in the system. Often this is done by making a selection of the aircraft that will be included in the state, which introduces human biases. Another option that has been used is Recurrent Neural Networks to process the entire sequence of aircraft present. These methods however are sequence-dependent and can give different results depending on the order that the aircraft are given, which is undesirable. Methods that solely rely on attention mechanisms, such as transformers, allow sequential data to be processed in a sequence-invariant manner by using multi-head attention mechanisms. However, because traditional Transformers operate on individual tokens, this does not allow for relative state information to be encoded into the hidden state. This paper shows that by performing a transformation operation on the key and value tokens, it is possible to use Transformers on relative states, at the cost of a factor (N-1) additional attention computations, where N is the number of agents in the system. This adaptation allows relative state Transformers to obtain significantly higher performance than standard Transformers. The results also showed that using attention mechanisms to construct the initial observation vector out of a total of 20 agents results in similar, but slightly lower, performance to handcrafted observation vectors, without requiring manual selection of the important agents. Future research should investigate whether additional changes to the attention mechanisms and their training can result in higher performance.