A Large Deviation Theory-based Analysis of Heat Waves and Cold Spells in a Simplified Model of the General Circulation of the Atmosphere

We study temporally persistent and spatially extended extreme events of temperature anomalies, i.e. heat waves and cold spells, using large deviation theory. To this end, we consider a simplified yet Earth-like general circulation model of the atmosphere and numerically estimate large deviation rate functions of near-surface temperature in the mid-latitudes. We find that, after a re-normalisation based on the integrated auto-correlation, the rate function one obtains at a given latitude by looking, locally in space, at long time averages agrees with what is obtained, instead, by looking, locally in time, at large spatial averages along the latitude. This is a result of scale symmetry in the spatial-temporal turbulence and of the fact that advection is primarily zonal. This agreement hints at the universality of large deviations of the temperature field. Furthermore, we discover that the obtained rate function is able to describe spatially extended and temporally persistent heat waves or cold spells, if we consider temporal averages of spatial averages over intermediate spatial scales. Finally, we find out that large deviations are relatively more likely to occur when looking at these spatial averages performed over intermediate scales, thus pointing to the existence of weather patterns associated to the low-frequency variability of the atmosphere. Extreme value theory is used to benchmark our results.


Introduction and motivation
The typical way to formalise the analysis of extremes for a stochastic variable X revolves around looking at the tail of the probability distribution of X and identifying extremes as very large (or very small) events with long return time. This point-as discussed below-is mathematically very powerful, but in the usual setting is not well https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 stationary anomalous synoptic situations-so-called blocking events; see the recent review by Tibaldi and Molteni (2018).

A mathematical framework for climatic extremes
1.1.1. Extreme value theory. A robust mathematical framework for the study of extreme values is provided by extreme value theory. One way to construct a theory of extremes according to the procedure of block maxima can be summarised as follows. One considers a sequence of realisations of independent and identically distributed variables X 1 , X 2 , ... and takes the maximum M m over m of such variables (Fisher andTippett 1928, Gnedenko 1943). Alternatively, extremes can be constructed according to the so-called peak over threshold method by considering the same sequence of variables as before, and selecting the values exceeding a given threshold u (Balkema andde Haan 1974, Pickands 1975). Both methods are formulated in the form of limit laws, and rely on the convergence in distribution of the selected extreme values to one limiting family of distributions for a vast class of parent distributions, as one considers more and more extreme levels, i.e. for the block sizes m and threshold u, respectively. The condition of independence of the stochastic variables can be relaxed in order to include the case of weakly correlated variables (Leadbetter et al 1983), and can be formulated in such a way as to allow the establishment of extreme value laws for observables of chaotic dynamical systems .
The limiting family of distributions is the generalised extreme value distribution in the case of the block maxima method, and the generalised Pareto distribution in the case of the peak over threshold approach. The Pickands-Balkema-de Haan theorem guarantees that, in the limit, there is a one-to-one correspondence between the generalised extreme value and generalised Pareto limiting distributions for a given dataset, even if, when finite data are considered, the two approaches select dierent candidates for extremes (Balkema andde Haan 1974, Pickands 1975). If the limiting generalised extreme value or generalised Pareto distributions exist and can be estimated reliably, one can calculate the probability of occurrence of events that are more extreme than any observed event. In other words, reliable estimates of return periods for time ranges longer than what is actually observed can be constructed. This indicates that, if the limit law applies, predictive power (in a statistical sense) emerges for events with very low probability of occurrence. Note that while universality emerges in the limit, the speed at which the asymptotic properties are realised is process-specific, i.e. not universal (Gálfi et al 2017).
As clearly emerges from the discussion above, a statistical analysis based on extreme value theory is in principle extremely useful exactly for those stakeholders that need to plan for a long time ahead, and has in fact long been applied in areas such as finance (Embrechts et al 1997), engineering (Castillo 1988), and hydrology (Katz et al 2002). Somewhat surprisingly, while examples of application can obviously be found, the methods of extreme value theory are still not the mainstream approach for studying very intense events in climate studies, where it is more popular to use empirical methods based on the analysis of high (or low) percentiles of the probability distribution of the variable of interest. In fact, it is usually assumed that the theory is too data-hungry to be eectively applied in most available climatic time series (IPCC 2012). Nonetheless, it has been recently shown that the theory of extreme values can be rigorously and robustly applied also in the case of a relatively short time series of a few tens of years, see e.g. Zahid et al (2017).
We have mentioned above the problem of persistence. One can analyse persistent events generally in two ways: first, by treating them as a concatenation of successive extreme events and studying the properties of clusters of extremes (Ferro andSegers 2003, Segers 2005), or, second, by looking at PDFs of time-averaged observables. In this study we follow the second route. Following intuition, if we look at the PDF of finite-size averages of an observable, one expects that the tails of the distribution are mainly populated by averages coming from persistent extremes. A rationale for this is that the averaging window acts like a low-pass filter on the length of the considered persistent events, leading to a connection between extremes of averages and persistent events with a certain length (greater or approximately equal to the chosen averaging window). This will roughly be, in fact, the scenario we will explore below. However, the link between persistence and extremes of finite-size averages is not always true: in the case of heavy-tailed random variables, for example, the extremes of averages are dominated by a single very large extreme event within the averaging window (Mikosch and Nagaev 1998). We remark that, generally, the methods of extreme value theory can also be applied in the same way to study extremes of averaged observables. However, the averaging process reduces the amount of available data, so that these methods can become more dicult to apply as the averaging window length is increased.

Large deviation theory.
A powerful mathematical framework describing the properties of PDFs of averaged observables is provided by large deviation theory (LDT), introduced by Cramér (1938) and further developed by other mathematicians, like J. Stat. Mech. (2019) 033404 Donsker and Varadhan (1975a, 1975b, 1976, 1983, Gärtner (1977), and Ellis (1984). The central result of the theory consists of writing the probability of averaged random variables A n = 1 n n i=1 X i : for n → ∞ the probability of averages decays exponentially with n, p(A n = a) ≈ e −nI (a) . This is called a large deviation principle. The speed of decay is described by the so-called rate function I(a) 0. The probability p (A n = a) decays everywhere with increasing n, except when I(a) = 0. Here, e −nI(a) = 1. For independent identically distributed random variables, one would have that E[A n ] = a * , where a * is such that I(a * ) = 0. If the rate function exists, one can estimate the probability of averages for every n. Similarly to extreme value theory, if the limit law applies, we gain predictive power, with the dierence that in this case it is directed towards averages with increasing n. This means that one no longer has to deal with the problem of the decreasing amount of data as n increases. The theory of large deviations is used very extensively in physics, mostly in the context of thermodynamics and statistical mechanics; see the review by Touchette (2009).
While they have recently been applied in the context of geophysical flows (see e.g. Bouchet and Venaille (2012), Bouchet et al (2014) and Herbert (2015)), techniques of LDT have been used sporadically until now in climate studies, despite the fact that they can be useful whenever the connection between macroscopic or long-term observables and microscopic or instantaneous observables is important, and one is interested in persistent and/or extended fluctuations of a climatic field.
One area of climate modelling where techniques based on large deviations are just beginning to be applied is the sampling of rare events. Rare event computation techniques based on elements of LDT have been developed with the aim to produce reliable statistics of specific rare events of a given model, as an alternative to long direct numerical simulations (Giardina et al 2016, Wouters and Bouchet 2016, Lestang et al 2018. Ragone et al (2017) describes how model trajectories can be selected, based on a rare event algorithm, by keeping an ensemble realisation of the system in states that are preferentially close to those leading to heat waves. Therefore, one can exponentially oversample events that have ultra-long return periods, and thus construct richer statistics of heat waves than one would get by standard Monte Carlo techniques. The described method also provides the possibility to investigate dynamical properties of the system state (like global circulation patterns and jet stream position) supporting the occurrence of the studied extremes (heat waves).

This paper
In this work we adopt LDT to analyse the properties of temporally and/or spatially persistent surface temperature extremes-heat waves or cold spells-generated through simulation performed with the Portable University Model of the Atmosphere (PUMA) (Lunkeit et al 1998, Fraedrich et al 2005b. We investigate temperature averages computed in time and/or in space, the spatial averaging being performed along the zonal direction for reasons of symmetry. We point out that this is the first study to analyse persistent climatic events based on this simple application of LDT. We perform nonequilibrium steady-state model simulations using idealised conditions, which cannot be directly related to realistic atmospheric states. Thus, the endeavour in this work is to test the introduced methodology, and to understand the possible potential by applying J. Stat. Mech. (2019) 033404 these methods of statistical mechanics to the atmosphere, rather than analysing real atmospheric conditions. However, this work should also be seen as a first step into the direction of applying LDT to more realistic climate simulations and to observational data. At this stage, we do not investigate the dynamical processes leading to the heat waves and cold spells, but rather try to construct their asymptotic statistical properties.
PUMA-details given in section 3-describes with a good level of precision the dynamics of the three-dimensional atmosphere as an out-of-equilibrium forced-dissipative system. We analyse the properties of the steady state achieved as a result of timeindependent forcing after transient dynamics have been discarded. For a wide range of parameter values, PUMA features high-dimensional chaotic dynamics (De Cruz et al 2018). By considering the connection between the averaged values and persistent events on suitably defined scales (as explained above), large deviations of temperature can possibly be related to persistent extreme events of temperature, i.e. heat waves or cold spells.
Following the discussion above on the phenomenology of synoptic disturbances, we expect to find a link between spatially extended and temporally persistent events. In order to achieve a large deviation when considering spatial averages in a turbulent system, we need the occurrence of a spatially extended structure of length, say L. In a system possessing a characteristic velocity scale U one expects such a structure to persist for a typical time of the order τ = L/U . In this work, we explore the connection between temporal and spatial large deviations, and we also analyse spatio-temporal large deviations. We seek answers to two main questions: 1. How well does LDT describe persistent in space and/or in time temperature fluctuations in PUMA? 2. What is the link between temporal, spatial, and spatio-temporal large deviations?
These questions are potentially relevant, because, if we find experimental proofs that the large deviation limit does hold in the case of our numerical simulations, there is a good chance to calculate the probability of occurrence of arbitrarily long in time and/or extended in space (within the limits allowed by the geometry of the Earth, as seen later) heat waves and cold spells. We point out that the possibility of establishing large deviation laws in geophysical systems is a non-trivial matter, as a result of the presence of temporal and spatial correlations on multiple scales. The strength of these correlations is crucial for the practical applicability of the theory given a finite amount of data. We remark that, when considering coupled atmospheric and oceanic dynamics, finding large deviation laws can become a dicult task. Examining dynamical indicators, De Cruz et al (2018) could not detect large deviation laws in the case of finite-time Lyapunov exponents in a quasi-geostrophic coupled ocean-atmosphere model. Vannitsem and Lucarini (2016) analysed the large deviations of finite-time Lyapunov exponents as well in a low-order version of the above-mentioned coupled model, and found a large deviation principle only in the case of non-zero Lyapunov exponents, whereas the convergence was considerably slower or even absent in the case of near-zero Lyapunov exponents.
Our model does not feature the presence of slow oceanic time scales, and, therefore, provides a simpler setting to test our ideas. In the case we find a link between temporal https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 and spatial large deviations, we can deduce the probability of spatial (or spatio-temporal) averages from the one of temporal averages and vice versa. This can be very useful in the case of applications, when for example only temporal or only spatial series are available. In order to test the quality of the predictions of the return time based on LDT, we compare the results with what can be obtained using extreme value theory (we use the peak over threshold method).
The structure of the paper is as follows. In section 2, we provide the theoretical formulation of LDT and some elements of extreme value theory. In section 3 we give a description of the PUMA model and give details of the numerical simulations performed for the scope of this paper. We present our results in section 4. Here, we first focus on the link between temporal and spatial large deviations, and then we additionally consider the case of spatio-temporal large deviations. We test the correctness and applicability of our results by computing return periods of extremes of temperature averages and comparing them with the empirical data and with return periods obtained based on the peak over threshold method. Additionally, in order to assess the robustness and applicability of the proposed approach, we test how our findings related to return periods change when considering shorter time series for estimating the large deviation rate functions. Section 5 concludes the paper, containing a summary and discussion of our results and ideas for future investigations.

Constructing the rate functions describing the large deviations
The large deviation theoretical framework can be formulated on three dierent levels, corresponding to the complexity of the statistical description of the dynamical system. These are, as described by Oono (1989), based on: sample means of observables (level-1), probability distributions on the state space of observables (level-2), and probability distributions on the path or history space, i.e. the entire set of possible orbits or histories of the system (level-3). The below description follows the level-1 approach, according to the scientific purpose of this paper, and is mostly based on the works of Touchette (2009) andOono (1989). We do not pursue a rigorous mathematical formulation here; our aim is rather to recapitulate the main concepts and results, and to introduce our notation.
We say that the random variable A n = 1 n n i=1 X i , where X i are identically distributed random variables, satisfies a large deviation principle if the limit exists. The probability density p (A n = a) decays exponentially with n for every value of a except the ones for which I(a) = 0, where lim n→∞ p(A n = a * ) = 1, and a * = E[A n ]. I(a) 0 is the so-called rate function, representing the rate of this exponential decay of probabilities. Whenever the limit (1) holds and I(a) has a unique global minimum, A n converges in probability to its mean a * and obeys the law of large numbers. If then additionally I(a) is quadratic (i.e. twice dierentiable) around a * , the central https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 limit theorem holds, meaning that small fluctuations around the mean are normally distributed. The expression 'small fluctuations' is very important here, because large fluctuations around the mean are not necessarily normally distributed. Since the rate function describes both small and large deviations, LDT can be considered as a generalisation of the central limit theorem. Now let's consider, instead of random variables, observables produced by a deterministic dynamical system. If the system is Axiom A, all of its observables obey a large deviation principle (Eckmann and Ruelle 1985). If we consider a high-dimensional chaotic system, by invoking the chaotic hypothesis introduced by Gallavotti and Cohen (1995), one can expect to find large deviation laws, even in systems which are not Axiom A.
The dynamical nature of out-of-equilibrium steady-state systems requires, however, a slight modification of our theoretical approach, which mainly implies that time has to be considered in the formulation of the large deviation principle, replacing the parameter n. Due to temporal correlations in these systems the computation of the rate function requires level-2 or level-3 theory. This has been done for Markov chains and random variables with a specific form of dependence, and involves mostly the computation of transition matrices or joint PDFs (den Hollander 2000, Touchette 2009). In the case of non-Markovian processes and high dimensional systems the computation of analytical rate functions is a hopeless endeavour. Thus, in this work, we adopt another (very simple) strategy to deal with temporal correlations. In the case of weakly correlated observables (i.e. X j and X l have an exponentially decreasing correlation if |j − l| is large enough), one can take advantage of the fact that for large enough n the averages A n become almost uncorrelated. This represents the basis for the block averaging method (Rohwer et al 2015). We transform the variables X i into variables represents the size of the averaging block, i.e. b = n/k with the number of blocks k. In the case that Y i are almost independent and identically distributed (ergodic Markov chain), a large deviation principle can be obtained for: Intuitively, one can argue that b has to be at least so large that X i and X i+b are almost uncorrelated, i.e. b ρ where ρ is a metric of persistence expressed in terms of the amount of successive correlated data. One usually quantifies persistence in terms of the auto-correlation function. Considering our scientific goal, which is the study of probabilities of averages, it makes sense to choose the integrated auto-correlation as a general measure of persistence in time and space, since this quantity plays a central role in the central limit theorem for Markov chains, as described below.
According to a formulation of the central limit theorem in the case of dependent variables based on Billingsley (1995), suppose that X 1 , X 2 , ... is a stationary Markov chain with E[X n ] = 0 and satisfies the appropriate mixing conditions, then the variance of the sample mean A n is where c(k) = C(k) C(0) is the auto-correlation, and C(k) denotes the auto-covariance at lag k, (3) shows that the rescaled variance of the sample mean of the Markov chain converges to the variance of X 1 times the integrated auto-correlation.
It is well known that the central limit theorem is violated when large extreme values dominate the fluctuations around the mean. In these cases, the probability of sums converges to a more general limit instead of the Gaussian distribution. This limit is represented by the class of infinitely divisible distributions, including Levy alpha-stable distributions (West et al 2003). As a consequence of diverging second (or even first) moments of the distribution of the stochastic variable of reference, the probability of sums decays sub-exponentially and the rate function is trivially 0 (Touchette 2009). Large deviation results can still be obtained in many cases; however, they are dominated by the largest values in the sample instead of the mean, as already mentioned in the introduction (Mikosch and Nagaev 1998). These conditions are relevant for some variables of interest in (geo)physical fluid dynamics. It has been shown that for certain variables of turbulent flows the central limit theorem does not hold. For example, velocity dierences (or gradients) between two points in space often develop long tails, as an eect of long-lived strong vortices near the dissipative range of scales (Biferale 1993, Jiménez 1996, Jiménez 2000. Several climatic variables have been also found to exhibit an increasing variability at low frequencies: atmospheric surface variables in the tropics (due to the eect of pulse-like convective events), or sea surface temperature in some regions (Blender et al 2008, Fraedrich et al 2009. It is expected that in these cases the long-term memory prevents convergence to what was predicted by the central limit theorem, at least on the relevant finite scales.

Extreme value theory: peak over threshold approach
A straightforward way to investigate the extremes of averaged quantities is, clearly, through the use of extreme value theory. Note that, despite such an approach being unfeasible or impractical in many practical applications because the procedure of averaging dramatically reduces the size of the dataset, our numerical simulations are long enough to allow for a reliable implementation also in the case of averages. We briefly summarise below the main ideas of extreme value theory.
Let us consider Z m = max{X 1 , ..., X m }, where X 1 , ..., X m is a sequence of independent and identically distributed random variables with common distribution function F (x). The Fisher-Tipett-Gnedenko theorem (Fisher andTippett 1928, Gnedenko 1943) states that the distribution of properly normalised block maxima Z m converges under certain conditions, for m → ∞, to the so-called generalised extreme value distribution G(z; µ, σ, ξ), with three parameters: the location parameter µ, scale parameter σ, and shape parameter ξ: where −∞ < µ < ∞, σ > 0, 1 + ξ(z − µ)/σ > 0 for ξ = 0 and −∞ < z < ∞ for ξ = 0 (Coles 2001 Pickands (1975) reformulated the Fisher-Tipett-Gnedenko theorem based on the conditional probability of values exceeding a high threshold u and reaching the upper right point of the distributions of X, given that X > u. Under the same conditions such that the distribution of Z m converges to the generalised extreme value distribution, the exceedances y = X − u are asymptotically distributed according to the generalised Pareto distribution family (Coles 2001) where 1 +ξy/σ > 0 for ξ = 0, y > 0, and σ > 0. H(y) has two parameters: the scale parameter σ and the shape parameter ξ . The shape parameter ξ describes the decay of probabilities in the tail of the distribution, and determines to which one of the three possible types of generalised Pareto distributions H(y) belongs. If ξ = 0, the tail decay is exponential; if ξ > 0, the tail decay is polynomial; and if ξ < 0 the distribution is bounded, i.e. the extremes are limited from above (Balkema andde Haan 1974, Pickands 1975). The parameters of the generalised extreme value and generalised Pareto distributions are related as follows: ξ = ξ and σ = σ + ξ(u − µ) (Coles 2001). This implies that the two approaches (block maxima and the peak over threshold) for investigating extremes are asymptotically equivalent. Classical extreme value theory has been extended to deal with weakly correlated random variables (Leadbetter et al 1983), and adapted to analyse extremes of observables of chaotic dynamical systems. A detailed overview of this research field is provided by Lucarini et al (2016), where it is shown that, if one considers an Axiom A system, one obtains that extreme values of dierent classes of observables can be used to infer the properties of the stable and unstable manifold, including the possibility of estimating the Kaplan-Yorke dimension (Eckmann and Ruelle 1985). Just as discussed above, adopting the chaotic hypothesis (Gallavotti and Cohen 1995), such findings can be expected to apply for more general systems possessing high-dimensional chaos; see a detailed analysis in Bódai (2017) and an accurate investigation in the case of a high-dimensional system (with O(10 3 ) degrees of freedom) in Gálfi et al (2017). Extreme value theory combined with the analysis of recurrences has proved very useful for providing a new framework for identifying the so-called weather patterns in actual climate data and in the outputs of climate models, and for interpreting their specific dynamical properties .

Return periods and return levels
We compare the two methods of analysing rare events on a practical level, i.e. based on return periods and return levels. In the case of LDT, we estimate the return periods r of events exceeding the value a using the general formula r = 1 P (An>a) = 1 1−P (An a) , where P (A n a) represents the cumulative distribution function of the large deviation law according to the data. In the case of the peak over threshold approach, the expected return levels can be written explicitly in terms of the generalised Pareto parameters, which can be inferred using the usual proven estimation methods, like maximum likelihood estimation (Coles 2001) or L-moments (Hosking 1990). The level y r that is exceeded on https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 average once every r observations is called the r-observation return level and is the solu- , and consequently (Coles 2001): As an eect of serial correlations, the threshold exceedances can be organised in clusters. If an extreme value law does exist at all in this case, it is necessary to introduce the so-called extremal index-the inverse of the limiting mean cluster size-which has to be considered in the estimation of the generalised Pareto parameters, with the exception of the shape parameter (Coles 2001). A widely adopted method to deal with correlated threshold excesses is to apply declustering, which basically aims to identify the maximum excess within each cluster and then to fit the generalised Pareto distribution to the cluster maxima (Leadbetter et al 1989, Ferro andSegers 2003). Here, since the operation of averaging dramatically reduces the eect of serial correlation, the peak over threshold approach can be applied in a straightforward way, similarly to the case of independent and identically distributed random variables.

Model description and setup
We perform simulation with PUMA, which is a simplified spectral general circulation model developed at the University of Hamburg. PUMA has been used for the investigation of several atmospheric phenomena, like storm track dynamics or low-frequency variability (Lunkeit et al 1998, Fraedrich et al 2005b, and has even been adapted to extra-terrestrial atmospheres (Grieger et al 2004). A recent study investigates the properties of the Lyapunov spectrum in PUMA, including large deviations of finitetime Lyapunov exponents (De Cruz et al 2018). PUMA is the dry core of the Planet Simulator (PlaSim), which is a climate model of intermediate complexity (Fraedrich et al 2005a, Lucarini et al 2010. In the following, we summarise the model equations and the applied parameterisations. For a more detailed description of the model, please consult (Fraedrich et al 2009). As commonly done in atmospheric modelling, the physics of the model are fundamentally described by the primitive equations for the atmosphere, which amount to a modification of the Navier-Stokes equation in a rotating frame of reference where the vertical acceleration of the fluid is constrained to be small compared to gravity (Klein 2010). These equations provide a good representation of the dynamics of the atmosphere for horizontal spatial scales larger than few tens of kms (Holton 2004). Compared to a full atmospheric general circulation model, moist processes are omitted, and simple parameterisations are used to account for the eect of friction (Rayleigh friction), diabatic heating (Newtonian cooling), and diusion. The Newtonian cooling and Rayleigh friction terms are such as that proposed by Held and Suarez (1994) for the comparison of dynamical cores of general circulation models. The model equations allow for the conservation of momentum, mass, and energy. The prognostic equations for absolute J. Stat. Mech. (2019) 033404 vorticity, given by the sum of the relative vorticity ζ and planetary vorticity f, horizontal divergence of the velocity D, temperature T, and surface pressure p s can be written by using spherical coordinates and the vertical σ-system as follows: The variables and parameters used in equations (7)-(10) are listed in table 1. The horizontal representation of the prognostic model variables is given by a series of spherical harmonics, which are integrated in time by a semi-implicit time-dierencing scheme (Hoskins and Simons 1975). The linear contributions in the prognostic equations are computed in spectral space, the non-linear contributions in grid point space. The horizontal resolution is defined by triangular truncation. The vertical discretisation is based on finite dierences on equally spaced σ-levels. The vertical velocity is set to 0 at the upper (σ = 0) and lower (σ = 1) boundaries.
A Rayleigh damping of horizontal velocities with time scale τ F accounts for the eect of boundary layer friction in the lowest levels. τ F = 0.6 d at σ = 0.95 (the vertical level nearest to the surface), and τ F = 1.65 d at σ = 0.85. For higher levels no friction is considered, i.e. τ F = ∞. The eect of non-resolved processes on the energy and enstrophy cascade is represented by hyperdiusion (∼∇ 2h ). The hyperdiusion coecient K is such that it provides a maximal damping of the shortest waves, and has no eect on the mean state (wave number 0). The integer exponent h = 4 leads to an additional damping of short waves. The diusion time scale for the shortest wave is 1/4 d. The diabatic heating (cooling) is parameterised by a Newtonian cooling term. This forces the relaxation of the model temperature to a so-called radiative-convective equilibrium state specified by the restoration temperature T R , which depends only on the vertical level and latitude.
T R (φ) describes the meridional form of the restoration temperature, whereas f (σ) accounts for the vertical changes in this meridional profile: where (∆T R ) NS is the temperature dierence between the North and South poles, and (∆T R ) EP represents the equator-to-pole temperature dierence. The meridional temper ature gradient decreases with height in the troposphere, f (σ) = sin(0.5 π(σ − σ tp )/(1 − σ tp )) for σ σ tp , and vanishes at the tropopause, f (σ) = 0 for σ < σ tp , where σ tp is the height of the tropopause. T R (σ) describes the vertical profile of the restoration temperature: with: restoration temperature at the surface, (T R ) s = 288 K; moist adiabatic lapse rate, L = 6.5 K km −1 ; global constant height of the tropopause, z tp = 12 km; geometric height z. S allows for a smoothing of the temperature profile at the tropopause. In the case of ten vertical levels l, the time scale of the Newtonian cooling τ R is 2.5 d in the lowest level at l = 10, and 7.5 d at l = 9. τ R continues to increase monotonically with height until the upper three levels, where it is set to 30 d. We run the model in a simple symmetric setting (usually referred to as an aquaplanet), i.e. without orography. We remove the annual and diurnal cycle, and use a symmetric forcing with respect to the equator, (∆T R ) NS = 0. We set the equatorto-pole temperature dierence (∆T ) EP to 90 K, thus creating a baroclinically more unstable atmospheric state than in the standard setting with (∆T ) EP = 70 K. We run the model with constant forcing in time using a time step of 30 min. The horizontal resolution is T42 (triangular spectral truncation with 42 zonal waves), and the vertical resolution consists of ten levels. The length of the simulations is 10 4 years, excluding a transient of 5 years, which are discarded in order to consider steady state conditions. We consider for our analysis the air temperature in the lowest vertical level at 960 hPa, with daily output. The spectral temperature variable is transformed during the post-processing into grid point space consisting of a 65 × 128 equidistant latitude-longitude grid.
Using the same model settings as above, but with a lower equator-to-pole temperature dierence, De Cruz et al (2018) estimated a Kaplan-Yorke dimension D KY of 187 and a number of positive Lyapunov exponents of 68 for (∆T ) EP = 60 K. In this study, (∆T ) EP = 90 K, thus the model atmosphere is baroclinically substantially more unstable than in the mentioned study. Thus, to provide a rough estimation, D KY > 200 and the number of positive Lyapunov exponents >80 in our system. Consequently, we expect for this setup a very high dimensional chaos, which fulfils the chaotic hypothesis, as shown also by the fast decay of auto-correlations in figures 2(c) and (d) below. As a result, we expect that the outputs of our model can be analysed using extreme value theory and LDT, as discussed above. Nonetheless, it is a priori unclear whether the asymptotic result can be clearly detected at finite size given the length of our numerical integrations. Note that in De Cruz et al (2018) it was shown that the finitetime Lyapunov exponents obey a large deviation law.

Results
Before discussing our main results related to the large deviations of temperature, it is useful to have a general picture of the properties of the simulated temperature field at 960 hPa (i.e. close to the surface). For the analysis of temporal, zonal, and spatiotemporal large deviations, we select three latitudes: 60°, 46°, and 30°. We focus on the mid-latitudes because it is the region of the atmosphere with the strongest turbulence, so we expect that the corresponding observables should behave in agreement with the chaotic hypothesis; see discussion in Gálfi et al (2017). We remark that the inclusion of moist processes, of more comprehensive parameterisations, and less idealised boundary conditions would greatly increase chaotic processes, and higher horizontal and vertical resolutions would lead to substantially stronger turbulence in the tropical belt.
In the considered setting, the two hemispheres have identical statistical properties; additionally, the two hemispheres are weakly coupled, broadly as a result of the chosen boundary conditions, of the lack of seasonal cycle, and of the conservation law for potential vorticity. Therefore, we can treat the time series coming from the two hemispheres as separate realisations of the same dynamical process. In the following, we provide first a qualitative comparison of temporal and spatial features of the temper ature field, and then we quantify the persistence in time and space based on the integrated auto-correlation. We point out that we perform the analysis in a Eulerian framework, corresponding to our objective of studying persistent temperature extremes from a Table 1. List of variables and parameters in PUMA, equations (7)-(10).

Symbol
Value Description spatially fixed point of view: this provides the most relevant information for the specific problem-the investigation of persistent temperature anomalies-we have in mind. Figure 2(a) illustrates the temperature field T(x,y ,t * ) as a function of longitude x and latitude y at one selected time point t * , whereas figure 2(b) represents the temperature field T(x * ,y ,t) as a function of latitude y and of time t at one selected longitude x * . Qualitatively similar figures would be obtained for dierent values of t * and of x * , respectively. Note that to facilitate the comparison between space and time, the x-axis in figure 2(b) is backward in time according to the prevailing eastward zonal winds (also referred to as westerlies) at mid-latitudes (Holton 2004). Additionally, the range of the x-axis in figure 2(b) is the same as in figure 2(a) once we rescale the time axis according to the scale velocity U τ introduced below (computed for 46°), which weights the decay of the correlation in space at a fixed time and in time at a fixed location. Comparing these two figures we realise that by cutting across time or across longitudes we obtain very similar wavy patterns, which is not surprising since the forcing is invariant in time and along a latitudinal band.
While this result would be trivial when observing a periodic or quasi-periodic signal, we need to consider here that the dynamics of the atmosphere feature a non-trivial mixture of wave, turbulence, and particles (Ghil and Roberston 2002), so that we need to look at this space-time similarity from a statistical point of view. According to this, we have that, at a given latitude y * , the temporal series T (x * , y * , t) and the zonal series T (x, y * , t * ) are sampled from two similarly distributed random processes, given the condition of steady state and the discrete symmetry with respect to translation along latitudes.
The main dierence between T (x * , y * , t) and T (x, y * , t * ) is related to distinct temporal and spatial characteristic scales, i.e. to temporal or spatial correlations. At midlatitudes, cyclones have a typical temporal scale of ≈1 d and a characteristic spatial scale of about 1000 km (Holton 2004). Obviously, these scales are relevant when we try to obtain a large deviation principle, thus it is very important to find an adequate metric to describe them.
We quantify the typical temporal and zonal scales based on the integrated auto-correlation, as explained in section 2. We calculate the auto-correlations of the temporal and zonal series at a selected latitude y * , based on which we later obtain the integrated temporal and zonal auto-correlations. For this, we use 1000 years of our simulation out of a total of 10 000 years, as this proves to be more than enough to reach robust estimates. As described in section 2, the auto-correlation is defined as the ratio between the auto-covariance C(l) at lag l and the variance σ 2 : c(l) = C(l)/C(0) = C(l)/σ 2 . To obtain better auto-correlation estimates, we calculate the spatio-temporal mean and variance at each y * , and use these estimates for the computation of both temporal and zonal auto-correlations: and where N t = 3.6 × 10 5 is the number of considered points in time (daily data), and N x = 128 is the number of grid points in the zonal direction. This is reasonable considering the symmetries in our system in time and along latitudinal circles. The subscripts t and x refer to time and to the zonal dimension, also in what follows.
In the case of the temporal series T (x * , y * , t), we calculate the auto-covariance at one selected longitude x * . This estimate is independent of x * , thus it is unimportant which longitude we choose. We have: The length of the zonal series T (x, y * , t * ), however, is too short to obtain reliable auto-correlation estimates. The number of grid points along the zonal dimension is only 128. Together with such a restriction related to the size of the Earth, there is another one related to the shape of the Earth. In fact, we have to reduce the maximum lag to N x /2 = 64 because at larger lags the correlations start to increase again due to the periodicity along a latitudinal circle. To increase the robustness of our estimate, we first calculate the lagged zonal auto-correlation coecients at each time point and then we take the average over time: https://doi.org/10.1088/1742-5468/ab02e8 Figure 2(c) shows the temporal auto-correlation coecient as a function of the temporal lag in units of days, whereas figure 2(d) illustrates the zonal auto-correlation coecient as a function of the spatial lag expressed as longitude indexes i x = 0,1,2,.... Both temporal and spatial auto-correlations decay to zero, meaning that two temperature values which are far away from each other in time or in space are independent for all practical purposes. We finally estimate the integrated temporal and zonal autocorrelations by taking the sum of the auto-correlation coecients until the maximum lag l t = l x = 64. Note that we use the same temporal and zonal maximum lags for consistency reasons. The temporal integrated auto-correlation can be obtained also for larger maximum lags, but this changes the estimated value only negligibly because the decay to 0 is relatively fast. We define: τ t is 1.32 at 60°, 1.05 at 46°, and 1.61 at 30° (in units of time steps, which are equivalent to days), whereas τ x is 3.26 at 60°, 3.54 at 46°, and 7.68 at 30° (in units of grid points, which correspond to 391 km at 60°, 732 km at 46°, and 1292 km at 30°). We define τ t and τ x in a non-dimensional form (i.e. as the number of time units or zonal data points) to facilitate the comparison of temporal and spatial persistence based on the resolution of our data, and because in this form we can use them directly for scaling the rate function, as we show below.
We define the scale velocity U τ := τxδx τtδt , where δ t is the time step of 1 d and δ x is the latitude-dependent grid spacing. From a statistical point of view, U τ is the ratio between spatial and temporal persistence 6 . From a geometrical/dynamical point of view, U τ represents the ratio between spatial and temporal typical scales. Thus, U τ is a measure of the anisotropy between space and time. At 60° U τ = 4.25 ms −1 and at 46° U τ = 8.47 ms −1 . For these latitudes, the scale velocity U τ is in good agreement with the mean zonal velocity at 960 hPa [U ], which is 3.6 ms −1 at 60°, and 6 ms −1 at 46°. This is hardly surprising as, to a first approximation, the turbulent structures are advected by the mean flow.
The agreement is lost when looking at 30°, the boundary of the mid-latitude baroclinic zone, for which the qualitative description given above applies. As we approach the equator, the atmospheric dynamics have a much lower degree of chaoticity with respect to the mid-latitudes, unless we look at convective scales, which are not resolved https://doi.org/10.1088/1742-5468/ab02e8

J. Stat. Mech. (2019) 033404
at all in this model. The spatial persistence is strongly enhanced (see also figures 2(a) and (b), as a result of the dominance of larger structures associated with the downdraft of the Hadley cell rather than synoptic disturbances associated with mid-latitude weather systems advected by the prevailing westerlies. In this case we find U τ = 14.95 ms −1 , while [U ] is −3.4 ms −1 , which indicates prevailing easterly flow, a clear signature of tropical dynamics. Figure 2(e) emphasises that the near-surface temperature experiences the largest variance near latitude 46°, as a result of the very strong baroclinic instability associated with mid-latitude weather patterns. This latitude also corresponds to the mean position of the jet, the localised region where the speed of the upper-level winds are at a maximum (Holton 2004).
Before continuing with the description of the temporal and spatial large deviations, we briefly discuss the connection between high values of coarse-grained temperatures and long individual events where the temperature readings are persistently above the long-term average, discussed already in section 1. Figures 3(a)-(c) show three short temporal series at latitude 46° together with the corresponding series of the coarsegrained quantities where averages are computed using block lengths of 20τ t , 10τ t , and 5τ t , respectively. The three short time series have been specifically chosen because they feature a large fluctuation in the coarse-grained quantity. Figures 3(d)-(f) show the same in the case of the zonal fields. The main finding is that up to moderately long averaging windows of about 10τ t (or 5τ x for spatial averages) it is possible to link large fluctuations with individual persistent events. When a coarser graining is considered, using a window of 20τ t for time averages and 10τ x or 20τ x for spatial ones, thus going in the direction of the regime of the large deviations discussed below, we do not have such a one-to-one identification. Instead, large ultra-long fluctuations are related to the occurrence of subsequent moderately long persistent features.

The link between temporal and zonal large deviations
At this point, we turn our attention to the estimation of the temporal and zonal rate functions. For this, we first have to obtain sequences of temporal and zonal averages for increasing lengths of averaging blocks n t and n x , for which we use the total length of our simulation of 10 000 years.
The lengths of temporal averaging blocks are chosen to be multiples of τ t : n t = 5τ t , 10τ t , ..., 40τ t . Similarly, the lengths of zonal averaging blocks are multiples of τ x , but in this case the largest possible multiple m is limited due to the size and shape of the Earth, as mentioned above: n x = 5τ x , 10τ x , ..., mτ x . m = 20 in the case of latitudes https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 60° and 46°, whereas m = 10 in the case of latitude 30°. To increase the number of averaged values for the computation of the temporal rate functions, we lump together the temporal averages from every 25th longitude along a latitudinal circle. Since τ x 25, these temporal sequences can be treated as independent realisations 7 . In the case of zonal averaging, we take one averaged value in space from every tenth point along the time axis, which we consider to be independent realisations as well 8 . Such an assumption is reasonable because the integrated temporal auto-correlation of zonal averages is much lower then 10, even for the largest n x (as shown later in figure 5). We obtain for each value of n t and n x estimates of the rate functions, after using the re-normalising factors given by 1/τ t or 1/τ x , respectively: where p(A nt = a) and p(A nx = a) represent empirical estimates of the PDFs of the temporally and zonally averaged sequences. Due to the re-normalisation, the logarithm of the probabilities is scaled by n t /τ t or n x /τ x , i.e. by the amount of uncorrelated data, instead of by the total amount of data, in an averaging block. Thus, we eliminate the eect of correlations. Figure 4 shows Ĩ nt (a-c) and Ĩ nx (d-f) for every n t and n x . As a side note, we remark that in every figure below the shown re-normalised rate function estimates are shifted vertically so that their minimum is at 0. In the case of the temporal rate functions, it is clear that for n t 20τ t the estimates Ĩ nt do not change in shape by further increase in n t , meaning that we obtain stable and reliable estimates, i.e. there is a proof in our data for a large deviation principle in time. We also notice that the range of A nt values becomes narrower as n t increases as an eect of averaging, which reduces the amount of available data. Thus, we obtain our best estimate at an optimal averaging block length n * t which is large enough to allow for the convergence of rate function estimates, but is at the same time small enough so that the range of A nt is not too narrow, i.e. n * t = min(n t ;Ĩ nt ≈ I nt ). We choose the same optimal averaging length for all three latitudes: n * t = 20τ t ; although in the case of latitudes 60° and 30°, Ĩ nt=10τt already seems to be a good estimate for the asymptotic I nt . Comparing the re-normalised rate function estimates at the selected latitudes, we realise that the rate function has smaller curvature at latitude 46° than at 60° and 30°. This is not a trivial consequence of the larger variability of the system in the middle of the considered domain, as mentioned above and shown in figure 2(e), because we are considering averages of fluctuations here.
In case of the zonal rate functions, we first notice that the largest n x seems to be too small for a clear-cut convergence. In other words, the latitudinal circle is not long enough to clearly obtain a large deviation limit. However, the dependence of Ĩ nx on n x seems to decrease as n x is increasing, thus we choose the largest possible n x as the optimal zonal averaging length n * x = max(n x ). n * x = 20τ x in the case of latitudes 60° and 46°, whereas in the case of latitude 30°, n * x is only 10τ x because of the stronger zonal auto-correlations.
The best estimates of the temporal and zonal re-normalised rate functions Ĩ * nt =Ĩ nt=n * t and Ĩ * nx =Ĩ nx=n * x are shown again in figures 4(g)-(i  (2017)). We notice that Ĩ * nt ≈Ĩ * nx . The equivalence is very good in the case of the latitude 60° and in the case of negative anomalies at latitudes 46° and 30°. We also notice some dierences for positive anomalies at latitudes 46° and 30°, with larger dierences at 30°. At this later latitude, however, it has to be considered that the maximum possible zonal averaging length is 10τ x , whereas in the other cases it is 20τ x . We assume that the dierences between the temporal and zonal re-normalised rate function estimates have to do with the fact that n * x is not large enough to estimate the rate function properly. Larger values of n x are needed to overcome the enhanced skewness in the distribution of zonal averages as the eect of spatial correlations; however, this is impossible due to limitations coming from the size and shape of the Earth. These findings have correspondence with the large value of U τ at this latitude, defining the anisotropy between space and time. While the temporal rate function can be estimated reliably at a relatively small n t , the estimation of the zonal one is a much more dicult task. However, the main message of figure 4 is that the temporal and zonal re-normalised rate functions seem to be equal, I nt = I nx , if the probability of averages is scaled by the amount of uncorrelated data in an averaging block, n t /τ t or n x /τ x , as explained above. In other words, there is a link connecting temporal and spatial large deviations or averages, due to the existence of a universal function I n ; universal in the sense that it represents large deviations in both dimensions-time and space.
Obviously, based on the large deviation principle in time or in space, one cannot characterise persistent temporal or spatial events, because the limit law starts to act on large scales, where persistence is lost and universality emerges. However, one can capture persistent space-time events by averaging in both dimensions: space and time. To achieve this, it is important that the spatial averaging length is not too small but not too large either, as we show in the following.

Spatio-temporal large deviations
We consider temporal sequences of zonally averaged observables over averaging lengths n x = 1τ x , 5τ x , 10τ x , 20τ x , and then average each sequence in time for increasing averaging lengths n t = 1τ t , 5τ t , 10τ t , 15τ t , ...40τ t . The notation ˆ is meant to indicate that we average in space and additionally in time, and τ t is the decorrelation time of the spatially averaged observable. By considering several n x values, we choose the spatial scale at which we analyse the large deviations in time. The spatio-temporal averages are computed as: Similarly to the previous cases, also in the case of spatio-temporal averages, we have to take into account the strength of auto-correlations if we pursue to compare the spatio-temporal rate functions with the temporal and zonal ones. We estimate the integrated temporal auto-correlation τ t of spatio-temporal averages analogously to τ t or τ x , but, in order to assure the stability of τ t , we choose a higher maximum lag of 120 d because the auto-correlation in time of zonal averages has a slower decay compared to that of unaveraged temporal or zonal observables. Figure 5 shows τ t as a function of zonal averaging length n x and temporal averaging length n t . The temporal auto-correlations of the spatio-temporal observables are increasing with n x and decreasing with n t . The increase with n x , on the one hand, can be explained by the connection between temporal and spatial scales. Large events in space are long-lasting events in time, as discussed in section 1 9 . The decrease of the temporal integrated auto-correlation with n t , on the other hand, can be explained by the increase of the number of uncorrelated events with respect to the number of correlated events in an averaging block as a consequence of increasing the block length. This is automatically the case for large https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 averaging blocks when correlations are finite, and is crucial for the applicability of the block averaging method. The dierent behaviour with n x and n t , however, has to do mainly with the discrepancies in the temporal resolution of the newly obtained averages. While in the case of zonal averaging, the temporal resolution remains one day, in the case of additional averaging in time, the temporal resolution decreases with n t , thus the temporal auto-correlation lag increases. However, this is not a problem for our analysis since we are interested in the correlations of the averaged observables measured in the amount of averaged data. A stronger increase of τ t at the 'end' of the channel underlines the above discussed eect of averaging along a latitudinal circle. At the zonal 'end' of the channel, the temperature values are strongly correlated with those at the 'beginning' of the channel.
The dependence of τ t on the zonal and temporal averaging lengths is qualitatively similar for the chosen latitudes if one considers n x in units of τ x (along the vertical lines in figure 5 with same colours). As we proceed from South to North, the auto-correlations of the zonally averaged observables become stronger. This is, however, mostly due to the decreasing distance between the longitudes leading to stronger correlated temperature values at neighbouring longitudes.
Estimates of spatio-temporal re-normalised rate functions are then computed for each n x and n t as: We remark that equation (22) accounts for both zonal and temporal auto-correlations by multiplication with τ t τ x , similarly to the case of temporal and zonal rate functions. The spatio-temporal re-normalised rate function estimates are displayed in figure 6 (coloured lines). For comparison reasons, we also show the best temporal and zonal estimates Ĩ * nt (continuous black lines) and Ĩ * nx (short-dashed black lines), together with the estimate of the zonal re-normalised rate function at the selected zonal averaging length Ĩ nx (long-dashed black lines). The main message here is that: • The spatio-temporal re-normalised rate function seems to be equal to the universal function I n for small and large zonal averaging lengths.
• We suppose that in the case of small zonal averaging lengths n x n * x , like n x = 1τ x , the zonally averaged observable is not significantly dierent from the spatially localised observable, so that Ĩ nx,nt converges to the universal function I n .
• In the case of large zonal averaging lengths n x n * x , like n x = 20τ x , the zonal averages already exhibit universal characteristics, which are not altered by the additional temporal averaging, thus Ĩ nx,nt corresponds again with the universal function I n .
• At intermediate levels however, i.e. τ x < n x n * x , due to the non-trivial zonal correlations one obtains after zonal averaging a totally dierent observable from the original one.

J. Stat. Mech. (2019) 033404
The re-normalised spatio-temporal rate functions are dierent from the universal function I n at n x = 5τ x in the case of the latitudes 60° and 46°, as well as at n x = 10τ x in the case of latitude 60°, although in this last case it is worth mentioning that the spatiotemporal rate function corresponds with the zonal rate function estimate at n x = 10τ x . In all these cases, the spatio-temporal rate functions are flatter than the universal function, pointing out a higher probability of large deviations, which favours the presence of organised structures in the form of persistent weather patterns. Indeed, this is a new way to assess the existence of specific dynamical mechanisms-which result in distinct statistical properties of the temperature fields-associated with the low-frequency variability of the atmosphere discussed in the introduction. Figure 7 represents schematically the ranges of temporal and zonal averaging lengths, at which universality emerges (blue) or is hindered (light blue) due to zonal correlations. Pre-asymptotic regions, where the large deviation law is not valid yet, are depicted in white.
As a side note, the horizontal shift of the rate function estimates at small averaging lengths (n x or n t ) in figure 6 emphasises that these estimates are not reliable because the averaging length is too small for the law of large numbers to hold. We also wish to remark that dierences emerge when looking at temperature data from latitude 30°. Here, the spatio-temporal re-normalised rate function Ĩ nx,nt at n x = 1τ x is not identical to the universal function I n . One possible reason for this is that when averaging over a length n x = 1τ x the newly defined observable has already significantly The coloured lines represent spatio-temporal rate functions for dierent temporal averaging lengths n t according to the legend. The black continuous line is the best temporal rate function estimate, the black shortdashed line is the best zonal rate function estimate, and the black long-dashed line is the zonal rate function estimate at the selected n x . The rate function estimates are shifted vertically so that their minimum is at 0. T = T − µ represents temperature fluctuations around the mean. https://doi.org/10.1088/1742-5468/ab02e8 J. Stat. Mech. (2019) 033404 dierent properties from the local (in-space), time-dependent observable. The universality of the spatio-temporal rate function cannot be checked properly due to the limit in zonal averaging length of 10τ x . What we see, however, is that at n x = 10τ x the spatio-temporal re-normalised rate function is quite similar to-yet distinct from-the universal function.

Return levels of the large deviations
We briefly summarise our main findings presented until now: 1. When considering temporal averages, the estimates of the rate functions seem to converge to an asymptotic function, and we obtain the best estimate of the rate function at an optimal averaging block length n * t . We show that there is a large deviation principle, i.e. a universal law that allows us to estimate the probabilities of occurrence of averages over n t n * t , without having to actually perform the averaging.
2. Spatial averages of the temperature field along latitudes obey the same large deviation law obtained for temporal averages. This means that we can deduce statistical properties of temporal averages from the ones of spatial averages and vice versa. Additionally, the same asymptotic law is obtained when performing long spatial and temporal (two-dimensional) averages.
3. The temporal averages of temperature fields averaged on intermediate spatial scales along latitudes obey dierent large deviation laws, which point at a relatively higher probability of occurrence of heat waves and cold spells. This indicates the preferential existence of organised spatial structures, which correspond to the well-known low-frequency variability of the atmosphere. Now, the question is how we can use this information in a practical way. One possible application, which we present in this subsection by the example of latitude 60°, arises in the context of computing return periods of large events. Figure 8 shows return level plots, i.e. return levels as a function of return periods obtained in three dierent ways, based on empirical data (circle markers), large deviation principle (continuous lines), and the generalised Pareto distribution (dashed lines). For the estimation of return periods based on the large deviation limit, we first obtain kernel density estimates (function density of the R package stats, R Core Team (2016)) of the PDFs p (A n = a) at fixed equidistant return levels A n,1 , ..., A n,256 , based on which we estimate the cumulative distribution function P (A n a), and then compute the return periods for A n a as We compute the generalised Pareto return levels based on (6) using the maximum likelihood estimates of generalised Pareto parameters (functions gpd.fit and gpd.rl of the R package ismev, Stephenson (2016)). We analyse return levels of high-temper ature values exceeding a threshold equal to the 99.9% quantile of the averaged series, as well as return levels of low-temperature values below the 0.1% quantile. To verify the applicability of the peak over threshold method, the stability of return levels was checked also for a higher (lower) quantile of 99.99% (0.01%). The return level estimates seem to be stable even if the threshold is increased (not shown). Note that although the very slow convergence of the generalised Pareto shape parameter is well known in some cases, the stability of return level estimates still holds if the change in the shape parameter is relatively small as the threshold increases (Gálfi et al 2017). The shading around the dashed lines in figure 8 represents 95% maximum likelihood confidence intervals of return level estimates. As a side note, in the case of the peak over threshold method the estimation concerns the return levels while the return periods are fixed, whereas we proceed the other way around in the case of the large deviations. This is necessary because we estimate the rate function I(a) at fixed equidistant values a.
In figures 8(a) and (b) the return levels of temporal averages are shown for three dierent averaging windows 20τ t , 30τ t , 40τ t . Here we use point 1 from above, and obtain the return periods based on the large deviation principle for every averaging window from p(A n * t =20τt = a). We notice a very good agreement with the empirical data and the generalised Pareto return levels not only for 20τ t but also in the case of 30τ t and 40τ t , for both high (figure 8(a)) and low ( figure 8(b)) extremes of averages. In the case of n t = 20τ t , the confidence intervals of the largest return periods based on large deviations become very unstable, and the lower limits even reach negative values, thus they cannot be displayed on this semi-logarithmic scale.
The return periods based on large deviations have an upper (or lower) limit because the estimation relies on empirical PDFs. This is not the case for the generalised Pareto return periods since they can be extrapolated to even unobserved events. The large deviation principle, however, is a limit law that gives us return periods for every averaging length n > n * , whereas the generalised Pareto return periods have to be computed separately for every n. This becomes more and more dicult with increasing n due to J. Stat. Mech. (2019) 033404 the decreasing data amount as an eect of averaging. With other words, figure 8 points out the dierent dimensions in which the two limit laws act, as mentioned already in section 1. The predictability of the peak over threshold method (and, in general, the predictability of extreme value theory) is directed towards larger and larger events, i.e. towards unobserved ones, whereas the predictability of LDT is directed towards larger and larger averaging lengths, i.e. towards observables that, by construction, dramatically reduce the amount of data available for statistical analysis.
Point 2 presented above is illustrated by figures 8(c)-(f). In the first case, return periods of temporal averages are computed based on the large deviation principle obtained for zonal averages (n * x = 20τ x ), and, in the second case, return periods of spatio-temporal averages (with a spatial averaging length of 20τ x ) are obtained from the large deviation law for temporal averages (n * t = 20τ t ). In both cases, but especially for the spatio-temporal averages, the agreement with the empirical data and the generalised Pareto return levels is good. The dierences between the return levels based on the large deviations and the empirical data (also generalised Pareto return levels) are related to the discrepancies in the estimation of the temporal and zonal as well as the temporal and spatio-temporal re-normalised rate functions. For example, the underestimation of low extremes of temporal averages based on the zonal rate function has to do with higher re-normalised zonal rate function values compared to the temporal ones in their left tails (see figure 4(g)). We remark that the possibility of commuting between averages of dierent dimensions (time and space) is due to the fact that by eliminating the eect of serial correlations the large deviations of these dierent dimensions follow a universal function.

How sensitive are our results to the length of the numerical simulations?
In typical data analysis exercises based on observational datasets or state-of-the-art climate simulations, the time span of available data is substantially less than in the case of our idealised simulations, ranging from O(100) to O(1000) years. To test the applicability of the method in the case of shorter time series, we divide our 10 000 year long simulations into 100 sections of 100 year simulations. For each of them we estimate return levels and periods of temporal averages based on temporal large deviations. We also estimate the generalised Pareto return levels using the 95% quantile as a threshold. Afterwards, we increase the length of the simulations, and repeat these steps also for 10 sections of 1000 year simulations. The obtained return levels and periods are illustrated  figure 9. Note that the empirical return levels are still obtained based on the whole amount of data of 10 000 years, and thus represent a reliable basis of comparison. Figure 9(a) demonstrates that climatologically important events with return periods of several tens and hundreds of years can still be approximated reasonably by large deviation estimates based on simulations of 100 years. The average of the estimates (grey dashed lines) slightly overestimates the empirical return levels; however, the agreement is good until long return periods and improves with increasing averaging lengths. We additionally point out that the spread of predictions is remarkably low, and is decreasing with increasing averaging length. This underlines the advantage of using large deviation estimates for return levels of averages over large averaging windows, and shows that the predictions of LDT are also very stable if much shorter datasets are considered. The agreement with empirical return levels improves substantially by increasing the simulation length to 1000 years ( figure 9(b)).
In case of the 100 year simulations, the generalised Pareto return levels agree with the empirical data slightly better than the large deviation estimates (compare figures 9(c) with (a)). However, the variance of the generalised Pareto estimates increases stronger with the return period also for large averaging lengths. Furthermore, the averaged generalised Pareto estimates (grey dashed lines in figure 9(c)) tend to underestimate the largest events for both 100 and 1000 year simulations. In the case of realistic model simulations and observational data, however, one has to deal with additional complications besides the smaller amount of data, mainly as an eect of non-stationarity and strong correlations. We discuss these eects in the next section.

Summary and discussion
We have analysed the properties of temporal and spatial near-surface (960 hPa) temperature averages in the PUMA simplified global atmospheric circulation model based on LDT. Extremes of averages on specific scales are related to persistent extreme events, like heat waves or cold spells. We run the model for 10 000 years with a constant (only latitude-dependent) forcing, creating non-equilibrium (due to the forced-dissipative nature of the model) steady-state simulations without orography, annual or daily cycle. The forcing is symmetrical for the two hemispheres. The horizontal resolution is T42 with ten vertical levels, and the temperature values are recorded daily. We compute and compare the re-normalised rate functions based on the integrated auto-correlation for temporal and zonal temperature sequences at selected latitudes (60°, 46°, and 30°), focusing on the mid-latitude region, where turbulence is best developed. The spatial averaging is performed only in the zonal direction, because this is the geometrical direction along which the system has a symmetry. We also analyse the two-dimensional case, i.e. spatio-temporal averaging. We verify the correctness of our results by comparing the return periods based on the rate functions with return periods from the empirical data and from the peak over threshold method. Before discussing them in detail, we first summarise our main findings: 1. The temperature averages in PUMA follow a large deviation principle.
2. The temporal and zonal re-normalised rate functions are equal if we compute them by eliminating the eect of temporal and zonal correlations. Thus, we can define a universal function, describing temporal as well as spatial large deviations.
3. The spatio-temporal re-normalised rate functions are equal to the universal function for small and large spatial averaging lengths. On intermediate levels, as an eect of non-trivial spatial correlation, the spatio-temporal re-normalised rate functions dier from the universal ones.
The estimated rate functions clearly converge in the case of temporal averages. We obtain reliable estimates at an optimal averaging length n * t , which is about 20τ t , where τ t represents the temporal integrated auto-correlation. The fact that we find a large deviation principle for temperature averages might seem unsurprising, but actually it has extremely important consequences on a practical level. Based on large deviations, we can estimate the probabilities of averages, and thus for practical use very important return periods, for every averaging length n t n * t . All we need to know is the probability of averages A n * t , which we can estimate empirically. In contrast to the temporal averages, in the case of zonal averaging the spatial averaging length n x is substantially limited by the size and shape of a latitudinal circle. The temporal averaging is performed on a theoretically infinite (and practically very long) line, whereas the zonal averaging takes place on a circle. Thus, the convergence of the estimated rate functions is not as clear as for temporal averages. However, the comparison of the zonal results with the temporal re-normalised rate function estimates shows that the averaging length n * x = 20τ x seems to provide a reasonable rate function estimate, thus we choose this one as the optimal zonal averaging length. In the case of latitude 30°, 20τ x cannot be reached due to the stronger zonal correlations. Here, the maximum averaging length is 10τ x .
We find that the temporal and spatial re-normalised rate functions seem to be equal if we eliminate the eect of correlations according to equation (20), where we basically scale the rate functions by the amount of uncorrelated data instead of the whole amount of data in an averaging block. Based on this equivalence, one finds a universal function I n = I nt = I nx , in the sense that it describes both temporal and spatial large deviations. From a practical point of view, this implies that one can commute between space and time: we can deduce statistical properties of spatial averages (including return level estimates) from a single time series, and this is, of course, true the other way round too.
Obviously, based on a large deviation limit obtained in one dimension-time or space-we cannot describe persistent events, because the limit law is acting on very large scales, where spatial or temporal organisation is lost and universality emerges. However, as our results show, persistent space-time events can be studied based on LDT if one performs the averaging in both dimensions-time and space.
Therefore, we extend our analysis also to spatio-temporal large deviations. Here, we average first in the zonal direction taking dierent averaging lengths n x = 1τ x , 5τ x , 10τ x , 20τ x , and then we search for a large deviation principle in time of the zonally averaged observables. We find that the spatio-temporal re-normalised rate J. Stat. Mech. (2019) 033404 function, computed again by eliminating the correlations according to (22), is equal to the universal function I n in two cases: (1) for small zonal averaging lengths n x ≈ τ x , and (2) for large ones n x n * x . We suppose that in the first case, due to the small n x , the zonally averaged observable is not significantly dierent from the temporal observable, and thus the rate function converges to the universal function. In the second case, the zonal averages already exhibit universal characteristics because the large n x allows for enough mixing in the series of zonal averages. These universal characteristics are not altered by the additional temporal averaging. On intermediate scales however, i.e. τ x < n x n * x , due to the non-trivial zonal correlations, one obtains after zonal averaging a totally dierent observable, whose large deviations follow a clearly dierent rate function to the universal one. Consequently, by computing large deviations in time of zonal averages, we get rid of temporal persistence if the temporal averaging length is large enough, but we cannot eliminate the eect of zonal persistence on intermediate scales, which then leads to a non-universal re-normalised rate function. This also means that in this way we can study persistent extreme events based on LDT. These intermediate scales of about 5-10τ x or ≈2000-4000 km are approximately equal to the scale of persistent synoptic disturbances, like the ones causing severe heat waves. According to this point of view, long-lasting synoptic scale disturbances are large deviations from the steady state, which allow for a higher degree of spatio-temporal organisation and, in a loose sense, a lower entropy compared to disturbances at any other scale. This is an interesting signature of the so-called low-frequency variability of the atmosphere, which manifests itself in a complex phenomenology like in the case of blocking events (Tibaldi and Molteni 2018).
The advantage of applying LDT to analyse persistent climatic events is, besides the already discussed predictive power, the opportunity to learn something about the system under investigation: • Our system is chaotic enough to allow for a large deviation principle. This means that correlations decay suciently fast and the system is mixing enough for the chaotic hypothesis to hold. A very important characteristic of these kind of systems is that fluctuations are dominated by the mean instead of the biggest events, and thus the central limit theorem holds.
• The rate functions are approximately symmetric, so that positive fluctuations and corresponding negative fluctuations of the same size have a similar probability of occurrence.
• We obtain an equivalence between temporal and spatial re-normalised rate functions, meaning that fluctuations in time are equivalent to fluctuations in space if one takes into consideration the dierent spatial and temporal scales. Thus our non-equilibrium steady-state system exhibits a symmetry between the temporal and spatial (zonal) dimensions. This suggests that in the renormalised temporal and spatial dimensions the statistical properties of temperature can be considered as isotropic.
• We find that the spatio-temporal rate function related to intermediate spatial scales is substantially flatter and lower than the universal function. Additionally, we compare the two frameworks for investigating rare events, i.e. LDT and the peak over threshold approach of extreme value theory, from a practical point of view, based on return level and return period estimates. Both methods are based on limit laws, but they dier in the way the limit is obtained, and thus also in the direction in which the limit acts. The peak over threshold approach deals with the conditional probabilities of averages exceeding a high threshold. The limit law is obtained as one considers larger and larger extremes, thus it is directed towards large, even unobserved events. In the case of LDT, we approach the limit as we consider averages with increasing averaging length n, thus the limit is directed towards n → ∞. Our results point out these dierences. On the one hand, the return level estimates based on the theory of large deviations are limited from above at small averaging lengths because they are obtained based on empirical distributions, whereas the estimates based on the peak over threshold approach can be extrapolated to unobserved events. On the other hand, the return levels based on large deviations can be obtained for every n n * based on the probabilities of A n * , whereas in the case of the peak over threshold approach they have to be estimated for every n separately. We also have to remark that the conv ergence to the limit law seems to be easier to achieve in the case of large deviations than in the case of extreme values (Gálfi et al 2017).
As mentioned above, we eliminate the eect of correlations in the computation of the rate functions by multiplication with the integrated auto-correlation. We estimate both temporal and zonal integrated auto-correlations, τ t and τ x . By computing the ratio between spatial and temporal persistence, we define a scale velocity U τ = τ x /τ t , which is a measure for the anisotropy between space and time. If the anisotropy between space and time is strong, it becomes more dicult to show the existence of a universal rate function, as in the case of latitude 30°. We remark that the scale velocity we find by such asymptotic procedure could be viewed in connection with the research lines aiming at identifying the multifractal nature of the weather and climate fields (Lovejoy and Schertzer 2013), and, in particular, of precipitative fields (Deidda 2000). Generally, the connection between spatial and temporal scales is given by some characteristic velocity. In the multifractal analysis of spatio-temporal precipitation fields, the temporal dimension is usually rescaled by the advection velocity to fit the spatial ones, as explained by Deidda (2000). If the rescaled temporal and the spatial dimensions are isotropic the overall advective velocity is sucient to describe the relationship between the spatial and temporal properties of the precipitation field. In the case of spatio-temporal anisotropy, however, the advective velocity is scaledependent. In this work we are searching for the connection between time and space in terms of rate functions, and we find that this space-time connection is described very well by the ratio between the spatial and temporal integrated auto-correlation, which we indicate as U τ . As mentioned in section 4, U τ is comparable with the zonal mean velocity at latitudes 60° and 46°, which, indeed, advects turbulent structures at a first approximation. However, the agreement is worse at 30°. In this case, the dynamics have a mixed tropical/extratropical character, and long spatial correlations are due to the presence of the Hadley cell downdraft.

J. Stat. Mech. (2019) 033404
While nature and society do not typically conform to the hypotheses of the theorems needed to establish universal laws, such asymptotic results can nonetheless be extremely useful for studying observational data, just as in the widely used case of extreme value theory. Therefore, this work should be seen as a first step towards the use of LDT for the analysis of actual climatic data and the outputs of state-of-the art climate models. The perspective is to find new ways to eciently estimate the probability of occurrence of extremely rare events associated with persistent climatic conditions. In this work, we have focused on time scales which are long compared to those typical of the atmosphere, but one can adopt the same methods for studying persistent events of multi-annual scales, where the oceanic variability is, instead, essential. This has, potentially, great relevance for addressing the problem of assessing human and environmental resilience to the low-frequency variability of the climate system.
In the case of applications to state-of-the-art model simulations or observational data, one has to deal with various complications, which are absent in our idealised model simulations. These have to do mainly with the presence of non-stationarity in the time series-as a result, e.g. of the seasonal cycle, and, on longer time scales, of climate change-as well as the presence of multiple time scales in the climate system, which may lead to a slow decay of correlations for some variables. Note that, for example, ocean surface temperature decorrelates more slowly than the temperature over land surface, as a result of the larger heat capacity of the active surface layer in the ocean. As discussed above in section 2, strong correlations can inhibit the convergence to the limit law, at least when finite-size datasets are considered. Pragmatic approaches for dealing with time-dependent systems can be adapted from what was done in the case of analyses based on extreme value theory. One can eliminate a long-term trend, and then look at the detrended data using LDT. In a similar manner, it is possible to eliminate the annual cycle from the time series, obtain a large deviation principle, and consider the annual modulation later in the estimation of return levels. Another possibility would be to divide the time series according to seasons, and to obtain separate rate functions for separate seasons. Furthermore, it would also be more dicult to obtain universal properties of large deviations due to the high spatial heterogeneity as an eect of orography. However, we expect that this kind of universality should be found in regions with similar orographic and climatic characteristics.
Based on our idealised simulations, the estimated rate functions for the temperature fields are symmetric, suggesting that positive large deviations of temperature have the same probabilities as negative ones. In more realistic data sets however, we expect to find asymmetric rate functions more frequently. An argument supporting such a conjecture is that positive large deviations of air temperature should dier from negative ones in the presence of moist processes due to dierent chemical and physical characteristics of warm air compared to cold air, which has a much lower water vapour content. It might in fact be interesting to compare the large deviation rate functions of the surface temperature with those of the wet-bulb temperature, which takes into account the presence of moisture and is relevant for assessing heat stress (Zahid et al 2017). An alternative way to combine information on temperature and moisture is to look at the so-called equivalent potential temperature, which is proportional to the logarithm of