Complexity concepts in ATM
Contents
Complex networks and power laws
Complex systems are those composed of a collection of a high number of parts (elements, individuals, agents, etc.) that interact with, and adapt to, each other in a non-linear way. The main characteristics of such systems is the appearance of emergent behaviours, i.e. properties at the system-wide level that emerge from the combined actions of individuals, which cannot be understood only from information at the individual level. Important insights into the behaviour of complex networks may be gained by examining such non-linear relationships between the elements of the system, including through the representation thereof as the nodes (representing any element of a system; in this context, often airports) and links/edges/arcs (connecting pairs of nodes following some structural rule; in this context, often flights). We may create specific network representations and therefore concomitant definitions of nodes, based on the context of what we are analysing, and they could thus be more abstract, such as accumulations of airline delay costs. Links may even indicate a virtual relationship, such as the similarity with respect to some measure. For example, in a correlation-based network, two nodes characterized in terms of a certain time-series may have a link according to some unsupervised clustering procedure. The weight of the link may then depend on the value of the correlation coefficient. Complex networks are thus an abstraction of a system, in that both nodes and links may represent any kind of elements and relationships. In the last decade, many metrics have been developed to assess specific characteristics of networks. Throughout the paper, and especially in Section 4, we describe several examples. The interested reader may further refer to several reviews available in the literature (Boccaletti et al., 2006; Costa et al., 2007; Newman, 2003; Strogatz, 2001)[1]. A network is said to be ‘scale-free’ (Barabási and Albert, 1999; Barabási et al., 1999) when the nodes are not connected randomly or evenly. Often, ‘scale-free’ networks include highly connected nodes that may be described (not only in the air transport context) as ‘hubs’ of connectivity, that have a strong influence on the way the network functions (such as the internet). In a random (or exponential) network, most nodes would have a degree, k (number of connections), similar to the average degree, . As a randomly distributed network increases in size, the ratio of high-degree nodes to other nodes decreases, whereas in a scale-free network this ratio remains constant as a function of network size. The shape of the (cumulative) degree distribution can furnish important insights into the underlying structure of a network. Random networks (Erdös-Renyi graphs) show a Poisson (-like) (exponential) connectivity distribution of (the probability that a node has degree k) that peaks at and decays exponentially for large k. Here, no (or very few) nodes play any type of central role in the network. For scale-free networks decays as a power-law, , and may thus be described as being free of any single, characteristic scale (Albert et al., 2000). A power-law is an algebraic relationship between two quantities x and y, such as y = xa. (One of the reasons why power-laws are so important in critical phenomena and complex systems is that the variables that exhibit such behaviour are, indeed, scale-free). Power-laws can be observed in different ensembles and in time series, for example in the probability distribution or in auto-correlation function of some stochastic variables. When discussing the ‘decay’ of , we are referring to the heavy tail of the distribution for large k (e.g. due to the relative commonness of hubs, albeit small in absolute number).
Illustrating these effects, in Figure 0.2, the red nodes are the five nodes with the highest number of links; the green nodes are those directly connected to them. In the exponential network, only 27% of the nodes are reached by the five most connected nodes; in the scale-free network, it is more than 60%. Both networks contain 130 nodes and 215 links, with an average degree = 3.3. Scale-free networks are also characterised by a power-law decay of their cumulative distribution function (see, for example, Liljeros et al., 2001). The cumulative distribution function, expresses the proportion of nodes with degree . An (approximately) linear double-logarithmic plot is consistent with a power-law dependence in the tails of such distributions (although see also Section 4.3.3).
It is worth identifying an important common feature, and an important difference, between scale-free and randomly distributed networks. Both can behave as ‘small world’ networks (Watts and Strogatz, 1998), in that it may be possible to connect from one node to another through a (counterintuitively) small number of nodes (even in the absence of hubs). However, in the scale-free network, it is more likely that more of these connections will use a hub – this dependence is an important factor governing the behaviour of the two types of network under degradation, e.g. when and how any hubs fail. Thus, while certain scale-free networks may display high levels of resilience to degradation (due to failure or attack), others may be more sensitive than a random network, if the hubs are more likely to fail. Although the power-law distribution implies that nodes with smaller connectivities will be affected with much higher probabilities by random disruption, this cannot be assumed to be the case with air transport hubs (see Section 4.3.1).
Returning to small-worldness, this is a property displayed by many real-world networks, that is, they exhibit a low average geodesic distance, characteristic of random networks, while maintaining a high clustering coefficient (see Section 3.2.2.2). Amaral et al. (2000) have examined the statistical properties of various real-world networks. They suggest the occurrence of three classes of small-world networks:
Table 0.1 Classes of small-world network.
Network class Characteristic connectivity distribution scale-free decays as a power law broad-scale power law regime followed by a sharp cut-off single-scale fast decaying tail
For the ‘broad-scale’ and ‘single-scale’ networks, these authors suggest that there are constraints limiting the addition of new links, and that the nature of such constraints may be the determining factor for the emergence of different classes of networks. (See also Section 0.2.3 on growth and preferential attachment.)
Software resources for network analysis and representation
Open source software for network analysis using python:
- Networkx module: provides classes for (un)directed graphs and multigraph for building all sorts of networkx (Erdos-Renyi, Barabasi-Albert) with access to all common metrics (degree, strength, clustering). Includes tools for deeper analysis as well (shortest paths, connected components).
- Python implementation of the Yen algorithm: useful to find the k best shortest paths on a network instead of just one.
Open source software for network visualization using python:
- Basemap matplotlib toolkit: It is a Python Matplotlib toolkit that provides an easy and efficient way to draw plots over real world maps.
- Examples: Figures A, B and C & Evolution of clusters of congested airports video in Analysis of air transportation using complex networks
- More: Basemap examples
- SimpleKml: It is a python package for generating KML (or KMZ) files. Keyhole Markup Language (KML) is a file format used to display geographic data. It was first developed for use with Google Earth.
- Examples: Modelling delay propagation dynamics video in Analysis of air transportation using complex networks
- Shapely and Descartes modules: Shapely provides several classes to build two-dimensional objects (point, line, surface) and common operations on them (difference, union...). Descartes allows to easily display them. Useful for instance to display one layer of sectors.
Open source software for general network visualization:
- Gephi: Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs; runs on Windows, Linux and Mac OS X. Gephi can handle multiple data format as input. Several data analysis and visualization algorithms are available within the software and through third-party plugins.
Micro-, meso- and macroscales
Generically, three levels (or scales) of complex systems can be defined:
- Microscale - when the attention is devoted to the relationships and/or dynamics of a single element. For example: is a given element of the system highly connected with the others, or is it isolated? How does it evolve?
- Mesoscale - dealing with the structures created by the relationships between a small number of elements. Typically, one may ask if such relationships are organized in clusters or communities, creating groups of elements with strong interconnections between them.
- Macroscale - when the structure of the system as a whole is analysed.
The following figure helps to clarify these three concepts, when applied to a complex network.
Figure 0.3 Example of the three scales in the analysis of a system of relations.
The microscale analysis would focus on single elements, here represented by circles; for example, it would identify the presence of two highly connected elements. Furthermore, one may recognize that the elements are organized in two big communities, composed of blue and green nodes - a mesoscale approach. Finally, one may look at the whole structure, and identify one element (in red) responsible for connecting the two communities, and thus responsible for the propagation of information (or delays, or passengers, etc.) between them. We may more specifically illustrate these scales with air transport examples:
- Microscale – e.g. a single flight. At this smallest scale one must consider all the different stages of the flight (strategic, pre-departure, gate-to-gate, and post-arrival).
- Mesoscale – e.g. ATC sector. This is an intermediate scale that allows focusing on a given area that contains many individual aircraft that interact among themselves following a given set of rules. This scale still considers individual vehicles, but describes their activities and interactions based on aggregate relationships.
- Macroscale – e.g. the European air transport network. This scale integrates the state of the various ATM elements and allows focusing on the network properties, giving a high-level view of the system.
Growth and preferential attachment
Many mechanisms can give rise to scale-free networks. A popular mechanism is given by the following two-step algorithm:
- Growth: given a network with a small number, m0, of nodes at time t, at time t+1 we add a new node with m<m0 edges; such edges link the new node to already existing m nodes.
- Preferential attachment: how do we choose these already existing m nodes? Rather than choosing them randomly, we assume that the probability Π that a new node will be connected to node i is proportional to the degree ki of node i.
Here, preferential attachment, which is strictly related to the way a network builds up over time, is responsible for the existence of a power-law degree distribution. However, models of networks with power-law degree distributions can be developed in the context of so-called random graphs, i.e. graphs in which the edges are distributed randomly. A general approach to random graphs with a given degree distribution was developed in Newman et al. (2001) using a generating function formalism. By using such formalism, which does not take into account evolution, it is therefore possible to obtain networks characterized by a power-law degree distribution. However, these networks are intrinsically different from the ones generated from a preferential attachment mechanism, as the network construction is different in the two cases. Such differences may emerge at the level of the internal organisation of node into clusters and is captured by other network metrics such as the average path length or the clustering coefficient.
Definitions of Disturbance, Resilience and Robustness in the ATM Context
The following definitions are extracted from Gluchshenko (2012).
In order to evaluate the work of a system, primarily performance indicators describing a state of the system and its reference state have to be specified.
- Current state of a system is defined by the current values of its performance indicators.
- Reference state of a system is the specified set of its performance indicators values. A reference state can be established by single values of performance indicators as well as by intervals or domains where performance indicators can vary. A reference state relative to the current state of the system can be either
- an actual reference state, when the current values of the performance indicators are in the specified set of performance indicators values
- a potential reference state, when at least one of the current values of the performance indicators is not in the specified set of performance indicators values. A potential reference state may be realistic or nonrealistic with respect to the existing operational conditions.
Disturbance is considered as a cause of possible effects, namely stress and perturbation, either internal or external, which may cause a stress in a system. It is relative to the specified reference state and considered system and it is categorized and quantified by type, frequency, intensity and duration.
A system can be influenced by a disturbance and, therefore, can be stressed. Stress is defined as the state of a system caused by a disturbance which differs from the reference state and is characterized by deviation from this reference condition. Stress can be
- survival - if the system can respond by perturbation without modification to change the current state;
- lethal - if the system cannot or should not respond by perturbation to change the current state and has to be modified.
A perturbation is defined as the response of a system to the possible or current significant undesirable changes of the state of the system caused by a disturbance. Perturbation aims at preventing of the changes and/or at minimizing of deviation of the current values from the reference values of performance indicators.
In the case when stress is unavoidable, but survival, perturbation can be
- transient - if it enables temporary deviation which becomes zero over time with return to the reference state;
- permanent - if deviation becomes fixed over time leading to a state of the system different from the reference state.
This framework enables us to define robustness and resilience of a system. Robustness is described as the ability of a system to absorb a disturbance while resilience is given as the ability of a system to return back within a specified time horizon since a disturbance had occurred.
- Robustness - the ability of a system to experience no stress since a disturbance had occurred, i.e. the system is robust against the disturbance over the considered time horizon; is relative to the specified reference state of the system and to a particular disturbance (see figure).
- Resilience - the ability of a system to respond on a disturbance within a time horizon by transient perturbation, i.e. the system is resilient against the disturbance over the considered time horizon; is relative to the specified reference state of the system and to a particular disturbance (see figure).
These concepts are shown in the following figure.
Communities detection methods in ATM
An important characteristic of a complex network is its organization in communities. Communities are generically defined as sets of nodes that are more connected among themselves than with the rest of the network. It generalizes the concept of cliques. They are, therefore, an important element to understand and model the architecture of a network. Here we present three methods to detect this structure of communities.
Maximization of modularity
Modularity is a global property of a graph, defined as:
[math] Q= \frac{1}{2m} \sum_{C\in P}\sum_{i,j\in C} (A_{ij} - P_{ij}), [/math]
where [math]A_{ij}[/math] is the element of the (possibly weighted) adjacency matrix and [math]P_{ij}[/math] is the corresponding expected probability. The sum is carried over of the edges [math](i,j)[/math] belonging to a community [math]C[/math], and over all communities of the partition [math]P[/math]; it normalized by the total weight of the graph [math](2m)[/math]. The most popular choice for [math]P_{ij}[/math] is the one proposed by Newman and Girvan (Newman and Girvan 2003) (NG): [math]P_{ij}= k_i k_j /2m[/math], where [math]k_i[/math] is the strength of node [math]i[/math]. The null hypothesis corresponds to a randomization of the links preserving the strength of each node. Different choices for the null model in the modularity can also be done (see Expert 2011 for instance).
It is well known that modularity has a resolution limit, i.e. in large networks modularity fails to resolve small communities (Fortunato 2007).
Since the problem of finding the partition which maximizes this quantity is NP-hard, approximate algorithm are used in general, mainly the Louvain algorithm (a python implementation is available online).
Infomap
The idea behind the method is to consider a random walk over the network (Rosvall 2008). The more the nodes are connected one with each other, the more the walker will stay with them and thus form a community. The analysis of the flows over the network gives access to the underlying community structure. More precisely, the algorithms computes an optimized compressed description of information flows and, from the information theory point of view, the community detection algorithm searches the partition which minimizes the description length of an infinite random walk over the network.
Different versions of the code is available online.
OSLOM
The last method of community detection we use is called OSLOM – for Order Statistics Local Optimization Method (Lancichinetti 2011). Its general principle is the following. Using the NG null model presented in the standard definition of modularity, the authors use a fitness function – based on the probability that an external node to a community has a given number of neighbors within this community – to assess the statistical significance of each community. More precisely, each external nodes is ranked following this fitness function, and the algorithm tests the likelihood of the score, with the given rank of this node against the null model. This procedure has several benefits. Since this optimization is local, i.e. made independently for each community, the result can be a partition with overlapping clusters. Moreover, the method can be used as refinements to other methods (Infomap, modularity), because one can give to the algorithm an existing partition as input. The package is available online.
References
- Albert R., Jeong H. and Barabási A-L., 2000. Error and attack tolerance of complex networks, Nature (Letters), 406, 378–382.
- Amaral L. A. N., Scala A., Barthélémy M. and Stanley H. E., 2000. Classes of small-world networks, Proceedings of the National Academy of Sciences of the United States of America, 97 (21), 11149–11152.
- Barabási A. L. and Albert R., 1999. Emergence of scaling in random networks. Science, 286 (5439), 509–512.
- Barabási A.-L. , Albert R. and Jeong H., 1999. Mean-field theory for scale-free random networks, Physica A, 272, 173–187.
- Boccaletti, S., Latora, V., Moreno, Y., Chavez, M., Hwang, D.-U., 2006. Complexnetworks: Structure and dynamics. Physics Reports 424 (4–5), pp. 175–308.
- Costa, L. da F., Rodrigues, F. A., Travieso, G., Villas Boas, P. R., 2007. Characterization of complex networks: A survey of measurements. Advances in Physics 56 (1), pp. 167-242.
- Expert P., Evans T.S., Blondel V.D., Lambiotte R., 2011. Uncovering space-independent communities in spatial networks. PNAS 1018962108.
- Fortunato S., Barthelemy M., 2007. Resolution limit in community detection. PNAS 104 (1): 36 - 41.
- Gluchshenko O, 2012. Definitions of Disturbance, Resilience and Robustness in ATM Context. DLR Report IB 112-2012/28, release 0.07,
- Lancichinetti A., Radicchi F., Ramasco J.J., Fortunato S., 2011. Finding Statistically Significant Communities in Networks. PLoS ONE 6(4): e18961. doi:10.1371/journal.pone.0018961.
- Liljeros F., Edling C.R., Amaral L. A. N., Stanley H. E. and Åberg Y., 2001. The web of human sexual contacts, Nature (Brief Communications) 411, 907–908.
- Newman, M. E. J., 2003. The Structure and Function of Complex Networks. SIAM Review 45 (2), pp. 167-256.
- Newman M.E.J., Girvan M., 2003. Finding and evaluating community structure in networks. Phys Rev E 69: 026113.
- Newman M.E.J., Strogatz S. H., and Watts D. J., 2001. Random graphs with arbitrary degree distributions and their applications, Physical Review E 64, 026118.
- Rosvall M, Bergstrom CT, 2008. Maps of information flow reveal community structure in complex networks. PNAS 105: 1118.
- Strogatz S. H., 2001. Exploring complex networks, Nature (Insight review articles) 410, 268-276.
- Watts D. J. and Strogatz S. H., 1998. Collective dynamics of ‘small-world’ networks, Nature, 393, 440–442.