Air delay propagation patterns in Europe from 2015 to 2018: an information processing perspective

Luisina Pastorino; Massimiliano Zanin

doi:10.1088/2632-072X/ac4003

1. Introduction

The complexity of many, if not of most real-world systems that surround us is the result of some kind of transport process. One may for instance think of social networks, in which individuals interact by transporting and interchanging information and emotions [1–3]; power systems, transporting energy in its different forms [4]; or the brain, in which neurons interact by means of chemical and electrical signals [5]. The most natural analysis involves considering them as pure transportation systems: the researcher directly analyses the movements taking place within them, i.e. tracking the movement of items through time and space, in order to synthesise various metrics about their behaviour. On the other hand, a much more abstract and powerful approach can be envisioned: individual movements are discarded, to look instead at how information is processed at different places. In other words, the focus is on how information is distributed among, combined at, and modified by the different elements composing the system. This thus represents a shift from a transportation to an information processing approach.

The epitome of this paradigm change is neuroscience and the study of the human brain, i.e. the study of how the brain performs complex computations to respond to internal and external stimuli. While early neuroimaging and electrophysiological studies typically aimed at identifying patches of activation or local time-varying patterns of activity, there is now consensus that task-related brain activity has a more complex spatially and temporally extended character [6, 7]. As a consequence, complex network theory [8–11], a statistical mechanics understanding of graph theory [12], is progressively becoming standard in neuroscience. Brain activity is then represented through functional networks, constructed from association matrices of pair-wise measures of functional connectivity (e.g. correlation or coherence) estimated from electrophysiological recordings [13–15]. The result is an abstract and coarse-grained representation of how information is transmitted and processed in the brain, and of the functional structures dynamically created to support cognitive tasks.

A relatively less studied, yet socially relevant system in which this methodological shift can be observed is air transport. While complexity science and network theory have long been used in its characterisation [16–18], the air transport system has mostly been analysed from a transport perspective, consistently with the nature of its objectives. Such an approach has especially been applied in the analysis and characterisation of delay propagation, one of the most important research topics in air transport management. Delays have profound implications in the cost-efficiency [19] and safety of the system [20], and contribute to the negative impact of air transport on the environment [21]. To illustrate, the Federal Aviation Administration estimates that US flight delays cost $22bn yearly: getting rid of delays would thus allow to pay one third of the whole national health care system of Spain (€72.8bn in 2017). Additionally, 1 minute of ground delay implies between 1 to 4 kg of fuel consumption, one order of magnitude higher in the case of airborne delay [21]. Delays have thus extensively been studied, usually by constructing large-scale simulations, as for instance in [22–30], or by relying on statistical and machine learning techniques [31–35].

In spite of a substantial body of literature, the mechanisms underlying delay propagation are still poorly understood, and all mitigation policies are broad in scope—i.e. policies tend to penalise all delays, irrespective of their role in the global dynamics. The reasons for this can be traced back to the limitations inherent these simulation-based studies, including limited availability of real data (as e.g. on connecting passengers and on airline's operational policies), the intrinsic uncertainty of the system's dynamics [36], and the difficulty of validating synthetic models. A better understanding of air transport architectural interactions may come from the study of how the system processes information. When aircraft travel between two airports, they do not only transport passengers and goods, but also transmit information about the status of the departure airport (and of the whole crossed airspace) to the destination. One airport receiving (possibly delayed) flights and dispatching them to other airports is not just managing the movement of the aircraft, but is also receiving, processing and retransmitting information about the system. A parallelism can be envisioned between the human brain and air transport: airports in the latter have a similar function of neurons in the former, with neurotransmitters and aircraft being just the instrument for moving and processing information. In recent years, a limited number of attempts have been made to apply this approach to model the propagation of delays in air transport [37–42], although limited by data availability and technical caveats.

In this contribution we propose a large-scale analysis of the structure and evolution of the delay propagation network in Europe, by leveraging on the techniques developed in network neuroscience in the last decades, and on more recent instruments for data pre-processing and network analysis. More in details, we describe the delay propagation patterns across the 50 largest European airports during four years, from 2015 to 2018, using a modified version of the celebrated Granger causality to compensate for the presence of missing values [43]. The resulting functional networks are analysed using standard network science metrics, to describe how the topology has changed through time in response to external events. We further show how the causality relations can be simplified, to cluster airports according to their main role in the delay propagation. To the best of our knowledge, this is the most complete study on network delay propagation based on functional networks to date, both in terms of temporal extension and data processing and representation, and as such complements studies based on other data analysis and modelling techniques. We conclude by drawing some lessons on the limitations associated to, and challenges offered by this interpretation of delay propagation as information processing, and by sketching a roadmap for future studies.

2. Data and methods

2.1. Traffic and delay data

Air traffic data were obtained from the EUROCONTROL's R&D data archive [44], a large and freely available repository of information about the European airspace and all commercial flights crossing it. Of interest for this study were aircraft trajectories, which are created by merging flight plans with data from air navigation service providers' flight data systems, radar and datalink communications among others. The data set has a temporal coverage of four years, from 2015 to 2018; four months are available for each year (March, June, September and December), and therefore data are discontinuous. From the full set of flights, only those landing at the 50 largest European airports (according to their number of passengers in 2015) have further been considered.

Given a flight, its delay at landing has been estimated as the difference between the actual and the scheduled landing times. All flights that have landed at a given airport and in a given hour have then been aggregated, to obtain a time series of average hourly delays for each airport. An overview and the temporal evolution of the number of flights is reported in figure 1, while additional details about the 50 considered airports (including names, International Civil Aviation Organization (ICAO) codes and number of operations) can be found in table A1 in appendix A.

**Figure 1.** Evolution of the number of flights through time and according to the experienced landing delay. From black to light green, the four lines in the (left panel) and the four slices in the (right) pie chart indicate: the total number of landings in the data set; the number thereof corresponding to the 50 airports here considered; the number thereof delayed more than 10 minutes; the number thereof delayed more than one hour. The dashed vertical lines indicate the start of each year, while the dotted ones the start of the four months available in the data set (March, June, September and December).
Download figure:
Standard image High-resolution image

A close inspection of the time series reveals that they are non-stationarity, as delays are usually higher at noon and lower during weekends. In order to correct this, as required by the subsequent Granger causality test, the delays are normalised by applying a Z-score detrend procedure defined as:

$\begin{equation}{D}^{\prime }(d,h)=\frac{D(d,h)-\langle D(\cdot ,h)\rangle }{\sigma (D(\cdot ,h))},\end{equation} \tag{ 1 }$

where D'(d, h) is the normalised delay for day d and time h, D(d, h) the original delay, and $\langle D(\cdot ,h)\rangle$ , σ(D(⋅, h)) the average and standard deviation for the delays for all days at the specific time h. The result is a stationary time series of zero average. D'(d, h) also indicates how many standard deviations the observed delay is from the mean value, that is, how usual or unusual the delay is for that specific time of the day. In this sense, a positive value of the delay indicates that the system behaved worse than expected (it experienced a higher delay) and a negative value indicates that it did better than expected for that time of the day.

2.2. Detecting delay propagation: the Granger causality metric

The Granger causality test [45], developed by the economy Nobel Prize laureate Clive Granger on top of the prediction theory of Wiener [46], is one of the best well-known statistical tests for evaluating the presence of predictive causality [47] between pairs of time series. While this test has been analysed in numerous scientific works, for the sake of completeness the main elements of its mathematical formulation are discussed here below. Suppose an universe U, representing all elements (both observable by and hidden to the researcher) composing a system and relevant for a given problem. Within U we consider two elements A and B, respectively described by two time series a and b. Let us further suppose that these time series fulfil some basic conditions, including being stationary and regularly sampled. B is said to 'Granger-causes' A if:

$\begin{equation}{\sigma }^{2}(a\vert {U}^{-})< {\sigma }^{2}(b\vert {U}^{-}{\backslash}{b}^{-}),\end{equation} \tag{ 2 }$

where σ²(a|U⁻) stands for the error, in terms of the standard deviation of residuals, when forecasting the time series a using the past information of the entire universe U; and σ²(a|U⁻\b⁻) the error when the information about time series b is removed. In other words, B is causing A if including information about the past of B helps predict the future of A, being this the origin of the term predictive causality [47].

While the forecast of equation (2) can be performed through any algorithm [48–50], a common and simple solution is the autoregressive-moving-average model. In this case, two linear models are fitted on the data, respectively called the restricted and unrestricted regression models:

$\begin{equation}{a}_{t}=C\cdot {a}_{t-1}^{m}+{{\epsilon}}_{t},\end{equation} \tag{ 3 }$

$\begin{equation}{a}_{t}={C}^{\prime }\cdot \left({a}_{t-1}^{m}\oplus {b}_{t-1}^{m}\right)+{{\epsilon}}_{t}^{\prime },\end{equation} \tag{ 4 }$

m here refers to the model order, the symbol ⊕ denotes concatenation of column vectors, C and C' contain the model coefficients, and _t and ${{\epsilon}}_{t}^{\prime }$ are the residuals of the models. Equation (2) is then usually written as ${\sigma }^{2}({{\epsilon}}_{t}^{\prime })< {\sigma }^{2}({{\epsilon}}_{t})$ . In order to assess the statistical significance and obtain a p-value, an F-test is performed to check whether the coefficients C' associated to the time series b are different from zero—i.e. whether b is actually having an impact in the prediction.

One important limitation of the Granger test is its sensitivity to missing values [43, 51]. To illustrate, and considering the application at hand, many airports do not operate around the clock, such that some hours can have no operations associated to them, resulting in a zero average delay. This zero is not equivalent to having no delays, but instead represents a missing value: we cannot know what would be the expected delay at the airport, would a flight have landed at that time. Such spurious values can bias the regression model at the basis of the Granger test, and consequently reduce the number of detected causal relationships. In order to solve this issue, we here substitute equation (3) with linear models in which the weight of missing values is set to zero. This effectively implies that missing elements are excluded from the calculation of the autoregressive models, and that the Granger causality test is performed only on the values esteemed as correct. As discussed in [43], this improves the sensitivity of the Granger test even when a significant fraction of values are missing, and allows recovering most of the original causal relationships.

As a final note, two facts about the Granger causality are worth highlighting. First of all, the Granger test, as implemented in equation (3), is linear, and can therefore only detect the linear part of a non-linear relationship. This is not a major problem for the application here considered, as delay propagation is mostly a linear process [52]. Secondly, while the test name includes the word causality, it does not necessarily measure true causality—as highlighted by Granger himself [53]. To be more exact, this test assesses the directed lagged interactions between joint processes, or the quantification of information transfer across multiple time scales. In spite of this, and for the sake of simplicity, the relationships detected by this test will here be called causal.

2.3. Propagation networks reconstruction and analysis

The Granger test has been applied to the detrended time series D'(d, h) of section 2.1, to reconstruct 16 propagation networks, i.e. one for each available year-month. Each network is composed of 50 nodes, one for each airport; and is represented by an adjacency matrix A, of size 50 × 50, where the element a_ij has a value of 1 to indicate that there is a directed edge from node i to j (i.e. the delay at airport i 'Granger-causes' the delay at airport j), and 0 otherwise [9, 11].

As customary in functional network reconstruction, the elements of A are obtained by applying the Granger causality test over the time series of each pair of airports; the maximum lags is set to 6 values—i.e. we discard any propagation that would require more than 6 hours, a time interval longer than the duration of any intra-European flight. Additionally, it is worth noting that the output of the Granger test is a p-value. In order to avoid the increased probability of type I errors as a consequence of the multiple comparisons required by the reconstruction process, we applied a Bonferroni correction and rejected the null hypothesis of the test for an effective α = 0.01/(50 × 49) ≈ 4.08 × 10⁻⁶.

In order to characterise the obtained networks, the following topological metrics have been calculated:

(a)
Link density. Number of existing edges in the network, divided by the maximum number of edges that the network could have:
$\begin{equation}{l}_{\mathrm{d}}=\frac{L}{{N}^{2}},\end{equation} \tag{ 5 }$
where L is the number of edges and N the number of nodes of the network.
(b)
Diameter. Maximum distance between any pair of nodes in the network, such distance being defined as the number of edges in the shortest path connecting them. In the context of delay propagation it represents the maximum number of flights that are needed to disseminate the delays throughout the network.
(c)
Transitivity. Fraction of triplets of nodes that are included in triangles, also representing the tendency of nodes to form clusters [54]. It is mathematically defined as:
$\begin{equation}T=\frac{3{N}_{{\Delta}}}{{N}_{3}},\end{equation} \tag{ 6 }$
where
$\begin{equation}3{N}_{{\Delta}}=\sum\limits _{k > i > j}{a}_{i,j}{a}_{i,k}{a}_{j,k},\end{equation} \tag{ 7 }$

$\begin{equation}{N}_{3}=\sum\limits _{k > i > j}({a}_{i,j}{a}_{i,k}+{a}_{j,i}{a}_{j,k}+{a}_{k,i}{a}_{k,j}).\end{equation} \tag{ 8 }$
A high transitivity means that the network contains triplets of airport that are strongly connected, such that a delay originating in one of them is easily disseminated to the other airports of the group.
(d)
Assortativity. Tendency of edges to connect nodes of similar degrees [55]. Mathematically it is defined as:
$\begin{equation}A=\frac{1}{L}\sum\limits _{j > i}\frac{1}{2}({k}_{i}+{k}_{j}){a}_{i,j},\end{equation} \tag{ 9 }$
where k_i and k_j are the degrees of respectively nodes i and j. Positive (respectively, negative) values of assortativity indicate that nodes tend to connect with nodes of similar (different) degree. In the network of delay propagation, a positive assortativity indicates that airports propagating delays tend to connect with each other.
(e)
Efficiency. Measure of how efficiently the network can exchange information between nodes. The efficiency is defined as the inverse of the harmonic mean of the distances between pairs of nodes [56]:
$\begin{equation}E=\frac{1}{N(N-1)}\sum\limits _{i\ne j}\frac{1}{{d}_{ij}},\end{equation} \tag{ 10 }$
where d_ij is the distance between nodes i and j.
(f)
Information content (IC). Metric evaluating the existence of regular patterns in the adjacency matrix of the network, and calculated as the amount of information lost when pairs of nodes are iteratively merged [57]. The lower the IC, the more complex is the structure of the network, indicating the presence of some kind of meso-scale structure.

It is worth noting that some of the aforementioned metrics cannot be directly compared when the corresponding networks have different characteristics. For instance, the diameter depends not only on the structure of the network, but also on its link density; comparing two networks of different link densities can thus yield misleading results. In order to solve this, a set of random networks (here, 500) is generated, using the same number of nodes and edges as the network we are evaluating. The Z-score of the value m of the metric is then computed as:

$\begin{equation}{m}_{\text{Z}-\text{score}}=\frac{m-{\mu }^{\text{M}}}{{\sigma }^{\text{M}}},\end{equation} \tag{ 11 }$

where μ^M and σ^M are the average and standard deviation of the metric as obtained in the random networks.

Besides these network metrics, the following three measures of centrality are calculated for each node:

(a)
Out-degree: number of edges coming out from a node. Airports with high out-degree are those that distribute more delays to the rest of the network, i.e. they are responsible for initiating the propagation process.
(b)
In-degree: number of edges arriving to a node. Airports with the highest in-degree are those that receive more delays from other airports.
(c)
Betweenness centrality: metric defining the number of times a node is included in the shortest paths between pairs of nodes. The betweenness centrality c_B of a node w is defined as:
$\begin{equation}{c}_{\text{B}}(w)=\sum\limits _{s,t\in V}\frac{{P}_{w}(s,t)}{P(s,t)},\end{equation} \tag{ 12 }$
where V is the set of nodes, P(s, t) is the number of shortest paths connecting s and t, and P_w(s, t) the number of shortest paths connecting s and t containing w. The airports with the highest betweenness centrality are those that control the flow of information (delays) via connected paths.

2.4. Detecting propagation roles

While the previously described functional network, and specifically the associated adjacency matrix A, completely encodes all information about causal relationships in the system, it also presents the drawback of not being easily explainable. To illustrate, a single node can be at the receiving and sending ends of multiple links, each one of them with different strengths; it can thus be difficult to synthesise whether that airport is a net source or receiver of delays. In order to yield a clearer representation, we here resort to an approach based on clustering nodes according to their causality role [58]. Specifically, we here hypothesise that each node (i.e. each airport) can be assigned to one of two clusters, respectively representing the group of net sources of causality (denoted as C₁) and of net receivers (C₂). Given one assignation of all nodes to clusters, the system is then simplified by reducing it to a network composed by two nodes, C₁ and C₂; these are then represented by two time series, defined as the weighted mean of the time series of the airports composing them. Afterwards, a metric J is calculated as:

$\begin{equation}J=p{V}_{1,2}(1-p{V}_{2,1}),\end{equation} \tag{ 13 }$

with pV_1,2 and pV_2,1 representing the p-value of the Granger test performed respectively between C₁ → C₂ and C₂ → C₁. A low value of J indicates the presence of both that a strong causality C₁ → C₂, i.e. C₁ is forcing C₂; and of a non statistically significant causality C₂ → C₁, such that C₂ has a passive role. The best clustering of nodes is finally obtained as the one minimising J.

It is worth noting that the minimisation process can be extremely computational expensive, as a brute-force search would have to evaluate 2^N solutions (with N being the number of nodes, here N = 50). As proposed in [58], an approximation of the optimal solution is here obtained through a dual annealing (DA) optimisation, a stochastic approach combining the classical simulated annealing with a final local search [59, 60]. In order to further discard local minima, the DA optimisation has been executed 50 times using random initial conditions, and the solution associated to the minimal J has been retained.

In order to extract more complete information about airport roles, we further consider the case of three clusters, thus representing sources, receivers and brokers—the last ones being airport that propagate delays between sources and receivers. This requires updating the definition of J in equation (13) in order to account for all possible combinations of three clusters—see [58] for further details.

3. Structure and evolution of the delay propagation network

As customary in complex network theory, we start by analysing the evolution through time of the topological metrics defined in section 2.3, see figure 2. It can be appreciated that, in general, metrics display strong oscillations, reflecting important changes in the underlying network structure. Assortativity and diameter have values close to zero, thus indicating no strong topological structure; this is nevertheless not valid for transitivity, efficiency and IC, all having large absolute values. While the negative IC implies the presence of a non-specified mesoscale structure [57], the large transitivity and low efficiency suggest a network composed of a large number of triadic relations. In order to further analyse this aspect, figure 3 reports the evolution through time of the Z-score of the four most frequent motifs, all of them having a triangular structure. On average, these four motifs have a Z-score of 8.43, as opposed to the Z-score of 2.66 of the four motifs that do not include a triangle (designated as motifs 1 to 4 in [61]).

**Figure 3.** Evolution through time of the four motifs with largest average Z-score. The horizontal green dotted line indicate the average over the 16 networks, whose value is reported on the (right) side of each panel.
Download figure:
Standard image High-resolution image

March 2016 has a singular behaviour, with a large (in absolute value) density, transitivity, efficiency and IC—see figure 2; the same can also be observed in the evolution of the second motif of figure 3. This behaviour is due to two main factors. Firstly, a terrorist attack on March 22nd targeted the Brussels airport, disrupting the connectivity of the network in the following days, with most flights rescheduled to nearby airports; and further generating delays in the following months, as a consequence of more strict security checks. Secondly, the system experienced a 17% increase in reactionary delays (from an average 3.85 minutes in March 2015 to 4.52) due to a five-fold increase in en-route delays following French industrial actions [62]. The negative effects of these events are clearly represented in the network structure, also highlighting how two localised occurrences (one affecting one single airport, and one a single airspace) can have system-wide consequences.

We then move to studying the centrality of airports and how such centrality evolves through time. This is accomplished by creating a ranking of the airports in each time period, according to the three centrality measures explained in section 2.3; the average ranking position across the four years is then calculated for each airport. This metric thus represents how instrumental has been, on average, each airport in the propagation dynamics, with low values (i.e. top positions in the ranking) indicating a more important role. A list of the most central airports is reported in table A2 in appendix A. When plotted against the number of flights, a statistically significant negative relationship is detected between the out-degree centrality of an airport and its size (see figure 4, linear fit, ρ = −0.039 and r = −0.3781). Such relationship is nevertheless lost when considering the in-degree or the betweenness. This indicates that large airports have the tendency to generate and transmit delays; but, on the other hand, that the roles of delay broker, i.e. to act as a bridge to propagate delays between airports; and of delay receiver is something more distributed across the whole network. Additionally, the aforementioned correlations are not statistically significant when the degree of nodes is weighted according to the average delay of sources or destination airports, suggesting that delay propagation patterns are not directly linked to delay magnitudes.

**Figure 4.** Airport centrality vs size. From (left) to (right), the three panels depict the average ranking (respectively according to the out-degree, in-degree and betweenness centrality) as a function of the number of flights. The dashed black line represents the best linear fit, with the corresponding ρ and p-values reported within each panel.
Download figure:
Standard image High-resolution image

We further analyse the relationship between the average delay per airport, normalised according to the total number of operations, and the measures of centrality (out degree and betweenness). In order to obtain a complementary view, for each time period a linear model is fitted between the arrival delay and the centrality; the slope of such fit is then represented as a function of time in figure 5. The linear model is significant only in some time periods, marked with an *. There is nevertheless a general positive correlation, indicating that airports suffering from high levels of landing delays are also those spreading those delays throughout the network.

**Figure 5.** Evolution of the slope of a linear fit between the out-degree centrality and the landing delay (left panel), and the betweenness centrality and the landing delay (right panel). The grey bands indicate the confidence interval for the slope; and the blue dashed lines the average of the slope across the four years.
Download figure:
Standard image High-resolution image

4. Global delay propagation roles

As a complementary view to what presented in the previous section, we here apply the clustering analysis proposed in [58] and described in section 2.4. We start with a simplified situation with two clusters, respectively cluster 1 including all airports being net sources of delay propagation; and cluster 2 with airports being net receivers. This allows to strongly simplify the interpretation of the role of each airport in the network, as it is now described by a single binary value—as opposed to a combination of centralities, as seen in section 3.

The (left panel) of figure 6 reports the evolution of the assignation of airports to clusters. As previously seen, the structure of the network evolves substantially over time, and as a consequence, so does the assignation. Still, some airports display a mainly static role, with some being almost always delays generators and absorbers—see (top and central right panels) of figure 6. The (bottom right panel) of the same figure also illustrates the presence of a correlation between the size of each airport, measures through its number of flights, and the number of months it has been classified in the first cluster. In other words, and confirming the previous analyses, large airports are mostly responsible for the generation of delay propagation patterns.

While the previously shown classification into two clusters presents the advantage of maximally simplifying the network structure, this may come at the cost of an over-simplification; in other words, by forcing nodes to be assigned to two categories, complex propagation dynamics may be lost. We thus performed the previously shown analysis for three clusters, with results reported in figure 7. As introduced in [58], this allows to detect a new role located in the middle of sources and receivers of delays, i.e. the one of delay brokers: airports that do not generate nor absorb delays, but mostly transmit them from and to other airports. While this analysis reduces the risk of oversimplification, it has to be noted that it is still a simplification of a complex propagation dynamics; as such, it allows to better understand the main propagation patterns, at the price of deleting smaller details. As can be seen in figure 7, results are qualitatively similar, and airports responsible for generating delays are mostly the same as the ones in figure 6. Additionally, the correlation between generation of delays and traffic volume increases from ρ = 0.548 to ρ = 0.745, highlighting the role of major airports in the propagation dynamics.

**Figure 7.** Clustering of airports into three causality roles. (Left) Assignation of airports to roles as a function of time, with three roles here considered: net sources (red rectangles, cluster 1), brokers (yellow rectangles, cluster 2), and net receivers (green rectangles, cluster 3). (Right top and middle) Lists of the seven airports most frequently found in cluster 1 (net sources, top panel) and in cluster 3 (net receivers, central panel); black bars correspond to the fraction of time the airport was in the corresponding cluster in the case of two clusters, i.e. as in figure 6. (Right bottom) Scatter plot of the number of months each airport has been classified in cluster 1 as a function of its number of flights.
Download figure:
Standard image High-resolution image

We finally analyse the temporal evolution of the airport assignation to two clusters, by splitting the available data in two time windows, i.e. years 2015–2016 and 2017–2018, and by calculating the probability of finding a given airport in cluster 1 (i.e. net sources) in each window. The results, depicted in figure 8, indicate that the average role is stable, beyond the variability observed in each month. The exceptions are Warsaw Chopin Airport (EPWA), Lyon-Saint Exupéry Airport (LFLL), and Nice Côte d'Azur Airport (LFMN), all of them exhibiting an increase in the probability of belonging to cluster 1, i.e. they have generated more delay propagation during 2017–2018. Notably, all three airports have experience an increase in their traffic during the four years here considered, with a growth of the number of passengers between 2015 and 2018 of respectively 58.4%, 26.4% and 15.3%.

**Figure 8.** Evolution of the probability for each airport to be classified in cluster 1, i.e. net source of propagation, for 2015–2016 (blue bars) and 2017–2018 (grey bars). Airports are sorted in descending order of size, as in table A1. The * and ** symbols indicate statistically significant differences between the two time periods at respectively α = 0.1 and α = 0.05, according to a chi-square test; the ICAO code of the corresponding airport is reported on top.
Download figure:
Standard image High-resolution image

5. Results validation and stability

In order for this information-based analysis of delay propagation to have a real impact, one essential aspect is the validation of the results by it obtained, especially when there is yet no clear consensus on the techniques that ought to be used, nor on the way data have to be pre-processed. This is nevertheless not a simple endeavour, as a direct intervention in the system, e.g. to check the effects of a given solution, is usually not feasible. Given this state of affair, we here propose the use of two types of indirect validation.

As a first step, one can try to establish if the evolution of the reconstructed networks is related to events that happened in the system, such that the latter can explain (and somehow validate) the former. As already seen in section 3, the abnormal network structure observed on March 2016 can easily be explained by the extreme disruptions suffered by the system that month. Explaining the behaviours observed in other time windows is nevertheless complicated, as no additional events of such magnitude have happened. One can further correlate the strength of the network connectivity with some macroscopic metrics describing the behaviour of the system. To illustrate, a correlation between the average log₁₀ of the link p-values for each network, and the average amount of reactionary delays for the corresponding month (as reported in CODA's 'all-causes delay to air transport in Europe' monthly reports), is negative (ρ = −0.093, r² = 0.062, p-value = 0.391). Although not statistically significant, this result seems to confirm that months with more delay propagation are reflected in more strongly connected networks.

As a second way of validating the results, we here consider a null model in which causality links between any two airports A and B are calculated using two different months. As the air transport system mostly resets every night, delays seldom propagate between consecutive days; any causality test across different months should therefore yield large p-values, with few links being statistically significant. Figure 9 reports the histograms of the −log₁₀ of the p-values of links (left panel), and of the −log₁₀ J of the causality clustering analysis (right panel), for both the real data and the null model. It can be appreciated that results become less statistically significant when considering different months, as expected; and specifically, that the number of significant links drops tenfold (from 4935 to 583, i.e. to less than one link per airport per network) in the case of the null model.

**Figure 9.** Evaluation of results against a null model. (Left and right panels) Respectively report the histograms of the −log₁₀ of the p-values in the causality networks, and of the −log₁₀ J in the causality clustering analysis. Black lines correspond to the real results, while blue lines those obtained using a null model—see main text for details.
Download figure:
Standard image High-resolution image

We finally analyse how stable results are when considering different time series lengths. Specifically, figure 10 reports the evolution of the number of statistically significant links (left panel) and of the average log₁₀ J in the causality clustering analysis (right panel) as a function of the length of the time series used to assess the Granger causality, expressed in number of days. As expected, longer time series yield more links, as the causality test is able to detect weaker interactions. On the negative side, the two metrics do not stabilise with one month on data; in other words, more links could possibly be obtained, were more data available. This highlights the need of more complete data sets, beyond what currently offered by, for instance, the EUROCONTROL's R&D data archive [44]; but also raises a note of caution against studies focusing on very short time intervals, as e.g. analyses of the evolution of causality on a daily basis [40], as the majority of the causality links may be undetectable with current techniques.

6. Discussion

We here presented an analysis of the structure and evolution of the network created by delay propagation in the European air transport, covering from 2015 to 2018 and the 50 largest airports.

As seen in figure 2, the monthly propagation networks have a highly variable structure, reflecting the events that drive the creation and subsequent spreading of delays—as for instance the terrorist attack on Brussels airport on March 2016. Such variability is mainly affecting the global (or macro-scale) structure of the network; nevertheless, the micro-scale structure is notably much more consistent, especially when considering the role of individual airports. As shown in figure 6, many airports maintain their global cluster assignation throughout time; and only three of them changed cluster in a statistically significant way when comparing 2015–2016 with 2017–2018, see figure 8. This may points towards the presence of two opposing forces: a structural one, according to which some airports have a stable propagation role, resulting from their connectivity, traffic volume, procedures, equipment, etc; and the appearance of random events throughout the system. While the former pushes the propagation network towards a fixed state, the latter events can appear at any location and time, thus effectively acting like a random rewiring.

The delay propagation network is dominated by triangular structures, as shown by the high transitivity (see figure 2) and high Z-score of triangular motifs (figure 3), in agreement with what previously found in [39, 40]. The network is also dominated by large airports, which have a higher probability of starting a delay propagation—as seen by the negative correlation between the out-degree and the ranking, figure 4, and the positive correlation between size and time spent in cluster 1, figures 6 and 7. Such correlations are nevertheless lost in the case of the in-degree and of the betweenness, suggesting that the dissipation of delays is a process in which all airports contribute, independently on their size. This is at odd with what reported in other studies; specifically [39], identified a clear negative correlation between degree and size of airports, while [40] reported a positive correlation.

Such discrepancies can be due to many factors, including the use of different data sets with different number of airports; the different geographical area considered in [40], which also implies different prioritisation rules for flights and hence delay mitigation strategies; and the way data are pre-processed, as discussed in [43]. This raises questions about the reproducibility and validation of results; beyond the analyses presented in section 5, this topic will be discussed in the next section.

As a final note, it is worth discussing what operational knowledge can be extracted from the results here presented; or, in other words, how can these results guide the development of strategies aimed at reducing delay propagation? Given that large airports are mostly responsible for starting the propagation process (as reflected in their high out-degree), the simplest solution would be to deploy additional resources there. This is nevertheless not optimal for several reasons. First of all, large airports already operate close to their maximum capacity, and expanding this is usually not a simple process—as illustrated by the case of the construction of a third runway at London Heathrow Airport [63]. Beyond expanding capacities, another solution may involve increasing the efficiency of their usage. Yet, 30 of the 50 airports here considered are already included in the Airport collaborative decision making program, exchanging relevant information in real-time about aircraft turn-round and pre-departure processes [64–66]; expanding the program to all top-50 airports is expected to bring an increase in en-route capacity, but not necessarily a more efficient use of airport capacity [67, 68]. Many research initiatives have also highlighted the increase in efficiency that can be achieved using machine learning and other data analysis tools, see for instance [34, 69–72]; the implementation path for these approaches is nevertheless not clear. Additionally, this would represent an example of a policy broad in scope, i.e. targeting indiscriminately all large airports, as opposed to pinpointing specific cases and tackling them in a more focused and cost effective way.

Results here presented highlight two different potential solutions. The first one involves finding those airports that have a high centrality, e.g. in terms of out-degree, but also a reduced numbers of operations, and identifying what operational aspect is hindering their performance; this may include Düsseldorf Airport (EDDL) or Geneva Airport (LSGG), which are operating well below their theoretical capacity. The second one includes changing the priority rules of flights, such that aircraft departing from high out-degree airports would be prioritised when landing at a high betweenness airport, in order to break the propagation chain. Yet, the validation and the economic assessment of these strategies go beyond the scope of this work.

7. Conclusions: can functional networks disrupt delay propagation?

Will functional network representations of delay propagation ever bring about a revolution in air transport?

As seen in the analysis here presented, interpreting delays as a form of information being processed throughout the network brings a radical new way of understanding their propagation, and has the potential of suppose a conceptual quantum leap. Yet, and in spite of a promising start with a constant if modest flow of new research works in the area, the analysis of air transport delay propagation through functional networks has a long way ahead, and the need to solve several theoretical and practical problems before being able to generate a tangible impact. We below sketch a roadmap based on the lessons learnt in this work, making parallelisms with the solutions already developed in neuroscience in the last decades when dealing with functional representations of brain activity.

First of all, metrics used in the reconstruction of functional networks are based on different assumptions about the data and the relationships between the system's elements, and therefore yield different (and at times, conflicting) views on the interaction structure. While this is a problem well-known in neuroscience literature [73, 74], little is known about how delay propagation patterns are affected by the choice of the connectivity metric [37, 41]. As shown in [43], even a same causality metric can yield different results according to how missing values are dealt with. It is further possible that a correct evaluation of these propagation structures will require tailored, yet to be developed metrics, on the line of what proposed in [41]. It has further to be noted that the problem of functional network reconstruction from data is still an open one even from a theoretical point of view, and that researchers within statistical physics are still improving our understanding of the process [75–78]. The problem here considered is further strongly connected with other research topics in statistical physics, like the dynamics of coupled oscillators with and without delays [79–83]. A stronger connection between both fields will therefore be essential.

Secondly, once delay propagation networks are reconstructed, the next logical step is the extraction of topological metrics representing their structure. As only natural, metrics initially used were those standard in network theory, as e.g. transitivity or modularity—see e.g. [39, 40] and section 2.3. Specific topological metrics may nevertheless be needed to describe domain-specific structures, as e.g. is the case in neuroscience for the cost efficiency [84] or the leverage centrality [85].

As a third point, there is a need of validation of the obtained results, a process that can in turn be used to validate the selection of both the connectivity and topological metrics. In this regard, air transport is a complex domain. On one hand, a validation based on changing the state of the system and observing the resulting differences in the evolution is impractical, due to the high costs that this would entail, both economic and in terms of mobility. On the other hand, the system seldom experiences large-scale disruptive events that may be used as different conditions, akin to pathologies in neuroscience [13]. The remaining alternative is possibly the creation of synthetic models, able to generate time series representing realistic dynamics of airports and aircraft, and whose parameters can be tuned to recreate different propagation patterns. While models and toy models are common in air transport [25, 86, 87], one tailored to network analysis has not hitherto been proposed.

As a final step, delays propagation analyses cannot remain restrained within the academic world, but should instead be used to guide and evaluate policies aimed at improving the system. This will require the inclusion of complementary aspects, as e.g. safety and cost analyses, which are usually not considered in more theoretical works; and, in turn, the participation of multiple stakeholders, from network managers to airlines.

Acknowledgments

This project has received funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (Grant Agreement No. 851255). Authors acknowledge the Spanish State Research Agency, through the Severo Ochoa and María de Maeztu Program for Centers and Units of Excellence in R&D (MDM-2017-0711).

Data availability statement

The data that support the findings of this study are openly available at the following URL/DOI: https://eurocontrol.int/dashboard/rnd-data-archive.

Appendix A.: Airport data

See tables A1 and A2

Table A1. List of the 50 European airports considered in this study, ranked according to their number of passengers in 2015. The third column reports the airport ICAO code, here used to designate airports in figures. The three rightmost columns respectively report the total number of landings used in this study, and the maximum and minimum number of landings detected in one day.

Rank	Airport name	ICAO code	Total landings	Max. daily landings	Min. daily landings
1	London Heathrow Airport	EGLL	314 256	696	589
2	Paris Charles de Gaulle Airport	LFPG	317 427	768	608
3	Amsterdam Airport Schiphol	EHAM	322 436	788	574
4	Frankfurt Airport	EDDF	315 483	782	411
5	Adolfo Suárez Madrid-Barajas Airport	LEMD	255 238	632	455
6	Josep Tarradellas Barcelona-El Prat Airport	LEBL	207 449	554	323
7	Munich Airport	EDDM	261 189	656	358
8	Gatwick Airport	EGKK	187 035	477	357
9	Leonardo da Vinci-Fiumicino Airport	LIRF	204 317	527	352
10	Paris Orly Airport	LFPO	156 009	409	276
11	Dublin Airport	EIDW	142 772	389	252
12	Zurich Airport	LSZH	167 403	404	289
13	Copenhagen Airport, Kastrup	EKCH	173 942	457	160
14	Palma de Mallorca Airport	LEPA	128 784	539	118
15	Humberto Delgado Airport	LPPT	127 311	337	265
16	Oslo Airport	ENGM	163 165	429	143
17	Manchester Airport	EGCC	127 424	357	208
18	London Stansted Airport	EGSS	117 152	327	182
19	Vienna International Airport	LOWW	160290	422	247
20	Stockholm Arlanda Airport	ESSA	159 099	428	143
21	Brussels Airport	EBBR	148 967	392	216
22	Milan Malpensa Airport	LIMC	116 939	328	239
23	Düsseldorf Airport	EDDL	141 298	370	200
24	Athens International Airport Eleftherios Venizelos	LGAV	122 253	376	211
25	Berlin Tegel 'Otto Lilienthal' Airport	EDDT	119 595	314	204
26	Málaga Airport	LEMG	79 231	254	107
27	Warsaw Chopin Airport	EPWA	105 709	301	190
28	Geneva Airport	LSGG	114 486	281	195
29	Hamburg Airport	EDDH	97 619	256	136
30	Václav Havel Airport Prague	LKPR	91 072	273	145
31	London Luton Airport	EGGW	79 725	214	134
32	Budapest Ferenc Liszt International Airport	LHBP	65 909	187	112
33	Edinburgh Airport	EGPH	80 527	225	131
34	Alicante-Elche Miguel Hernández Airport	LEAL	57 158	179	70
35	Nice Côte d'Azur Airport	LFMN	85 184	275	126
36	Henri Coandă International Airport	LROP	72 308	206	99
37	Cologne Bonn Airport	EDDK	85 916	251	127
38	Orio al Serio International Airport	LIME	54 154	146	75
39	Boryspil International Airport	UKBB	54 361	184	101
40	Birmingham Airport	EGBB	71 835	209	91
41	Francisco Sá Carneiro Airport	LPPR	54 287	153	109
42	Stuttgart Airport	EDDS	75 102	221	94
43	Venice Marco Polo Airport	LIPZ	58 667	187	99
44	Lyon-Saint Exupéry Airport	LFLL	74 347	198	111
45	Catania-Fontanarossa Airport	LICC	42 176	137	70
46	Naples International Airport	LIRN	43 760	148	69
47	Glasgow Airport	EGPF	58 459	165	86
48	Toulouse-Blagnac Airport	LFBO	60 716	169	87
49	Marseille Provence Airport	LFML	61 314	179	102
50	Milan Linate Airport	LIML	74 150	196	115

Table A2. List of the ICAO code of the 10 most central airports, according to the average ranking yielded by the out-degree, in-degree and betweenness centrality.

Out-degree		In-degree		Betweenness
ICAO code	Ranking	ICAO code	Ranking	ICAO code	Ranking
LFPO	9.44	LEMG	12.38	LFPO	11.5
LFPG	10.75	LIRN	14.31	EDDM	14.25
EDDF	13.69	EDDT	14.38	EDDT	14.62
EGKK	13.75	EDDM	16.12	LEMG	15.06
EDDL	14.62	LFPO	17.44	EDDL	15.06
EDDM	14.88	LSZH	18.44	LSZH	15.94
LSGG	16.75	LIPZ	18.5	EDDF	17
EDDT	17.75	EGCC	19.56	LFBO	17.94
LSZH	17.81	EGGW	19.94	LIRN	18.81
EDDH	18.19	EDDH	19.94	LSGG	19.38

Air delay propagation patterns in Europe from 2015 to 2018: an information processing perspective

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Data and methods

2.1. Traffic and delay data

2.2. Detecting delay propagation: the Granger causality metric

2.3. Propagation networks reconstruction and analysis

2.4. Detecting propagation roles

3. Structure and evolution of the delay propagation network

4. Global delay propagation roles

5. Results validation and stability

6. Discussion

7. Conclusions: can functional networks disrupt delay propagation?

Acknowledgments

Data availability statement

Appendix A.: Airport data

Air delay propagation patterns in Europe from 2015 to 2018: an information processing perspective

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Data and methods

2.1. Traffic and delay data

2.2. Detecting delay propagation: the Granger causality metric

2.3. Propagation networks reconstruction and analysis

2.4. Detecting propagation roles

3. Structure and evolution of the delay propagation network

4. Global delay propagation roles

5. Results validation and stability

6. Discussion

7. Conclusions: can functional networks disrupt delay propagation?

Acknowledgments

Data availability statement

Appendix A.: Airport data