Using power system modelling outputs to identify weather-induced extreme events in highly renewable systems

In highly renewable power systems the increased weather dependence can result in new resilience challenges, such as renewable energy droughts, or a lack of sufficient renewable generation at times of high demand. The weather conditions responsible for these challenges have been well-studied in the literature. However, in reality multi-day resilience challenges are triggered by complex interactions between high demand, low renewable availability, electricity transmission constraints and storage dynamics. We show these challenges cannot be rigorously understood from an exclusively power systems, or meteorological, perspective. We propose a new method that uses electricity shadow prices - obtained by a European power system model based on 40 years of reanalysis data - to identify the most difficult periods driving system investments. Such difficult periods are driven by large-scale weather conditions such as low wind and cold temperature periods of various lengths associated with stationary high pressure over Europe. However, purely meteorological approaches fail to identify which events lead to the largest system stress over the multi-decadal study period due to the influence of subtle transmission bottlenecks and storage issues across multiple regions. These extreme events also do not relate strongly to traditional weather patterns (such as Euro-Atlantic weather regimes or the North Atlantic Oscillation index). We therefore compile a new set of weather patterns to define energy system stress events which include the impacts of electricity storage and large-scale interconnection. Without interdisciplinary studies combining state-of-the-art energy meteorology and modelling, further strive for adequate renewable power systems will be hampered.


Introduction
As electricity grids reach ever higher levels of renewable penetration to meet net-zero emissions targets, their weather dependence increases.Weather and climate variability therefore become increasingly important for power system operations and planning [1,2].However, traditional power system modelling has relied on a "typical meteorological year" which may only include a few hourly time slices to represent demand and renewable variability.There has been a large effort over recent years to incorporate the impacts of climate variability into power system modelling, and running multiyear hourly simulations is becoming more common [3,4,5,6,7,8,9,10,11] with climate scientists now producing demand, wind and solar inputs for national and continental-scale modelling [12,13,14,15,16].Particularly in systems containing large amounts of wind power generation, the choice of simulation years can significantly impact the operational adequacy of a system [3,4,5] and not considering year-to-year climate variability can also lead to failure to meet long-term decarbonisation objectives [4].
*Contributed equally, order decided by coin toss.
Multi-decadal climate simulations are also important for characterising the most challenging days for power system operation (e.g.days that might lead to blackouts).These energy system stress events can be investigated without a full power system modelling approach by looking at time series of demand or demand-net-renewables ("net load") [17,18,19,20,21].Although these events are commonly periods of peak demand, they may include times of wind droughts (prolonged low wind speeds) [22], solar droughts or dunkelflauten ("dark doldrums").
In a renewables-based power system both electricity demand and generation are driven by weather and cannot be considered independently; it is thus becoming common practice to consider times of energy system stress as compound events involving a combination of nearsurface temperatures, wind speeds, irradiance and hydrological variables across large geographic and temporal scales [23,24,25].For example, high pressure systems can cause compound events [17,26], affecting multiple countries simultaneously.While the basic mechanics of periods with energy scarcity in Europe revolve around extremely low near-surface temperatures (for demand) and low near-surface wind speeds (for wind power produc-tion), we still lack a detailed understanding of the power system dynamics during these weather-driven extremes, including electricity transmission and storage.
The complicating factors of transmission and storage motivate the use of a high-resolution power system optimisation model to identify periods of power system stress.Such models output shadow prices, a proxy for nodal electricity prices, which have been used successfully as a metric for strained supply situations in studies using dispatch optimisation models [25,27,28].With the shift towards power systems dominated by variable renewable generation, where capital expenditure represents the majority of total system costs instead of operational and fuel costs, we propose using a capacity expansion model instead.Thus, we co-optimise infrastructure investments and dispatch decisions simultaneously in order to generate cost-optimal, fully decarbonised power system designs for Europe.In this setting, high shadow prices primarily indicate system-defining events triggering large investments.For the present study, we use PyPSA-Eur [29,30], an open optimisation model for the European power system.
The central question we address is that of identifying energy system stress events for decarbonised systems, and classifying the weather regimes leading to such events.We investigate events using three different approaches over four decades of weather variability.Approach 1 is a baseline method rooted in energy meteorology and assesses the difficulty of a period by net load as is commonly done [17,18,19].The main novelty lays in Approach 2, where we filter system-defining events whose total electricity costs explain large investments, based on the shadow prices obtained by the capacity expansion model.Approach 3 is a validation using dispatch optimisations with out-of-sample weather years and lost load as an alternative metric to shadow prices.
Identifying the large-scale weather patterns leading to system-defining events is of central importance for systems planning, operations and forecasting.Whereas previous studies have compiled weather patterns leading to high net load or compound events [17,26,18], an analysis informed by the operation of power systems including transmission and storage into account is missing.We show that this additional consideration can impact results significantly.While both Approach 2 & 3 take power system dynamics into account, we find that Approach 2 is the more practical and computationally less demanding of the two (as Approach 3 requires many additional optimisations), while the outcomes of Approach 2 & 3 are similar.
To summarise, the key aims of this paper are to: • Filter out and delineate system-defining events using shadow price outputs from a power system optimisation model.
• Classify these events based on the prevailing weather conditions, and determine the main factors leading to continent-wide system stress.
• Construct a new set of weather patterns that define European power system stress from both a climate and power systems modelling perspective.
Section 2 describes the meteorological and modelling set-up and introduces the definition of system-defining events.In Section 3 we combine the insights from the power system model and meteorology to lay out weather patterns underlying power system stress.We put the results into context of the expansion of renewables and conclude with Section 4.

Data and methods
In the spirit of Craig et al. [2] we apply a transdisciplinary approach to identifying challenging weather for power systems.First, we use outputs from a power system optimisation model to filter out system-defining events that drive investment in additional generator capacities.For these time periods, we cluster the meteorological conditions into groups such that we can identify weather patterns that drive weather stress events.Then we analyse the effects in the power system (model) during these time periods to determine which components lead to difficulties and are under stress.

Datasets and tools
The weather inputs to the meteorological analyses and to the power system optimisation model are based on ERA5 reanalysis data [31] and are described in the following section.We represent the European power system by using the open-source energy system optimisation model (ESOM) PyPSA-Eur (github.com/PyPSA/PyPSA-Eur) [32] (version 0.6.1)with small modifications; the modelling setup follows thereafter.

Meteorological inputs and energy variables
We use gridded weather variables from the ERA5 reanalysis [31] from 1980 until 2021.2m temperature, 10m wind speed and surface air pressure over the region 34   Weather-dependent power systems time series are mainly generated using the open-source software Atlite [15].In Atlite, 100m wind speeds from ERA5 are first extrapolated to turbine hub height using a logarithm law and passed through a reference power curve to obtain capacity factors (fraction of rated power output that can be produced at the given wind speed); we use the Vestas 112V 3MW turbine for our calculations.PV capacity factors are computed from ERA5 direct and diffuse shortwave radiation influx data using a reference solar panel model, assuming no tracking and a fixed 35 • panel slope.Weather-dependent electricity demand is generated based on historical ENTSO-E data and adjusted for heating or cooling demand using a heating/cooling degree days approach as in [33,9,23].

Power system modelling set-up
PyPSA-Eur is configured with high spatial (181 generation and 90 network nodes [34]) and temporal resolution (1-hourly), making it well-suited to investigating a highly renewable European electricity network [35,36,37,38,39,40,9,41].The model is solved for forty individual weather years (July 1980 -June 2020, preserving winters).Although capable of a sector-coupled representation of the European energy system (e.g.including the heat and transport sectors), we restrict PyPSA-Eur to the optimisation of the power sector alone for clarity.We minimise total system costs of the European power system by optimising investment and dispatch of electricity generation, storage, and transmission to meet prescribed hourly national demand over a year.The model performs a partial greenfield optimisation, i.e. with existing transmission network (2019) and capacities of hydropower and nuclear power (2022), but without existing renewable capacities (see Fig. S1 for a break-down of total system costs for the forty different weather years).
Our cost assumptions are based on a modelling horizon of 2030 and we assume a fully decarbonised power system; the available generation technologies are thus nuclear and renewables: hydropower and biomass (nonexpandable), solar, onshore and offshore wind power (all expandable).Transmission can be expanded (overnight) by 25% compared to current levels (Fig. 6 in Hörsch & Brown [29]), and electricity can be stored through hydro reservoirs (non-expandable), battery storage and hydrogen storage.This can be thought of as modelling an ambitious, early decarbonisation of the European electricity sector using current or near-future technologies.The focus on the power system enables a study of weather dependence providing more evidence on transmission and storage before the impacts of long-term climate change emerge.
We run capacity expansion optimisations for each of the 40 weather years (July-June) separately, arriving at 40 different cost-optimal system designs.The overall makeup the resulting designs is similar for all weather years with total system costs being dominated by wind, then solar investment expenditure.However, there are significant variations in the magnitudes of installed capacities, as well as in the investment in hydrogen and battery storage; see Fig. S1.Running separate optimisations allows for the identification of system-defining events in each weather year, as opposed to only a smaller number of events that are defining over the entire 40-year period.The single-year optimisations also allow for a high spatial and temporal resolution, whereas 40-year optimisations have only been accomplished at a moderate resolution [9].While basing the results on 40 different system designs is a potential limitation (is a period identified as challenging for one design also challenging for other designs?), cross-validation using load shedding (Approach 3) shows that there is very good alignment between system-defining events in one year and load shedding events for other designs operated on the same year (see also Section 2.6).

Dual variables and shadow prices
PyPSA-Eur is formulated as a linear program in order to find investment-and operational decisions which minimise the objective (total system costs) with linear constraints ensuring feasibility of the model result.An optimal solution to a linear program consists of an optimal value for each decision variable, as well as an optimal dual value for each linear constraint.These dual values indicate how much the objective function would decrease if the corresponding constraint was relaxed by one unit, quantifying the "difficulty" of satisfying the given constraint.
The dual variables corresponding to the constraints ensuring that a fixed demand is met at each network node  and timestep  are denoted  , following [32].These dual variables -also called shadow prices of electricity -can be interpreted as the modelled price of electricity (in EUR / MWh) at the given node and time (see e.g.[28,27] in the context of dispatch optimisation).Note, however, that despite this economic interpretation the shadow prices are not comparable to electricity prices in the current European market, as the shadow prices are largely driven by the need for renewable expansion in the model, not marginal operating costs.
Apart from these, other hourly and locational dual variables corresponding to constraints on transmission and storage can be used to reveal transmission congestion rents and values of stored energy in the model, respectively (see Supplementary Materials A.2). Since transmission expansion costs are recovered through congestion rents in the model, the congestion rent time series can reveal which times primarily triggered investment in transmission; the same goes for storage.

Identifying system-defining events
In this paper a system-defining event is a period where the incurred electricity costs surpass a specified threshold within a limited time frame.We restrict the duration of a system-defining event to a maximum of two weeks, and set the minimum cost threshold to 100 bn EUR.
An event starting at  0 and lasting for  hours is considered system-defining if for  = 100 bn EUR and  ≤ 336 (the number of hours in two weeks), where  , is the electricity demand at node  and time step , in MWh.A priori, many overlapping events of various lengths meet the above criteria.
For the purposes of this study, we thus filter out overlapping events until only a non-overlapping set of systemdefining events remains; see the Supplementary Materials for an exact description of the filtering procedure.
By definition, relaxing either the length or cost threshold can only lead to additional events being classified as system-defining; we have chosen the threshold values used in this study so as to produce approximately one system-defining event per year.The relative values of the thresholds can affect the average duration of identified events; we chose the cost threshold so as to obtain events averaging around 7 days -the discharge duration of hydrogen storage included in our model.See also Fig. S2 for an overview of most costly periods of varying times across the studied weather years.It should be stressed that the thresholds can be freely adjusted in future studies to fit the research questions at hand.

Traditional meteorological weather regimes
To understand the weather conditions present during system-defining events we use a weather regimes approach.Weather regimes are recurring large-scale atmospheric circulation patterns that can be linked to surface weather, and energy system impacts [14].Previous work has shown weather regimes have predictability for energy applications out to a few weeks ahead [42], which is beneficial for energy system planning.Weather regimes are calculated from daily-mean October-March 500 hPa geopotential height (Z500) anomalies over the Euro-Atlantic region (90 following the classification method of [43].The first 14 Empirical Orthogonal Functions (EOFs) of the Z500 data are computed [44], which capture 89% of total data variance.The associated Principal Component time series (PCs) are used as inputs for the k-means clustering algorithm, with four clusters (which has previously been found to be the optimal number over the region [43]).Using the PCs of the Z500 data makes the problem significantly quicker to compute without losing useful information about the large-scale weather conditions.The four cluster centroids are: the positive and negative phases of the North Atlantic Oscillation, the Atlantic Ridge and Scandinavian Blocking (see Fig. 6(c)-(f) for visualisation of these).We then find the weather regime present during each system-defining event.Previous work has shown that although these patterns have some useful sub-seasonal predictability for energy applications, extreme events are not necessarily represented well by the cluster centroids [18].Therefore, as well finding the regime number during each extreme event, the pattern correlation between the days' Z500 anomaly, and the days' cluster centroid is also calculated.

K-means clustering of system-defining events
In addition to weather regimes defined in terms of 500 hPa geopotential height anomaly representing mid-troposphere dynamics, we also study near-surface weather data during extreme events.These near-surface data better represent the weather conditions present near the power system impacts.For each system-defining event hourly gridded 2m temperature and 10m wind speeds are taken for the region described in Section 2.1.1.This gives 5615 hours (∼ 233 days) of data.We then perform another k-means clustering, similar to the method of [43] and applied above to Z500 data (see Section 2.4).
Temperatures and wind speeds are first normalised by their 1980-2021 daily climatologies (by both mean and standard deviation, to allow both fields to be comparable).The data are then converted into principal components (the first 14 are kept, explaining 56% of the total variance).These principal components are then grouped into four clusters using the k-means algorithm.Four was identified as the optimal number of clusters using the silhouette score (commonly used to determine optimal cluster number for k-means algorithms).There was no obvious elbow present when using the elbow method (not shown).The cluster centroids can then be analysed and compared to more traditional methods as in Section 2.4.

Validation using load shedding as indicator for difficulty
An alternative approach to capture the adequacy of the power system is to measure load shedding (unmet demand) in a fixed power system design.In the context of net-zero scenarios, we can first obtain a power system design from a capacity expansion model, and then subject that design to a dispatch optimisation with different inputs in order to measure potential load shedding.In our case, we run a capacity expansion model with one weather year  1 , and perform a dispatch optimisation over a different weather year  2 .Periods of system stress in weather year  2 can then be recognised by high load shedding in this dispatch optimisation.We perform this cross-year dispatch optimisation for all 1600 combinations of  1 ,  2 ∈ {1980∕81, … , 2019∕20} and average the load shedding profiles for each weather year to obtain time series comparable to those derived from electricity shadow prices.Calculating the average load shedding based on the out-of-sample weather years relies on the optimal networks (or some other network assumptions) and is computationally more expensive than Section 2.3.

Results
Traditionally, power grids and generation stock have been designed around fossil fuels which could act as dispatchable generators, especially during peak demand.
With increased reliance on variable renewables and balancing via transmission and energy storage, this paradigm breaks down.In particular, the most critical events to system design extend beyond a single hour or day, and identifying such periods no longer depends only on weather data but also power system parameters including storage and transmission [7,45,46,20].

Approach
Underlying method Description ).Also see Fig. 1 for a visualisation of the workflow.
Figure 1: An overview over the workflow and the three approaches we compare in this study.For a definition of the approaches, see Table 1.
We propose a re-orientation to studying power system stress through system-defining weather events (see Table 1 and Fig. 1).Electricity shadow prices reveal which time periods cause additional infrastructure investments (Section 2.3) and determine an hourly total electricity cost (Fig. 2) whose yearly sum is the total annual value of electricity in the model.The total annual value of electricity is closely linked to the total system cost (differing only because of existing infrastructure), which is dominated in this model by investment costs (especially as renewables are optimised from scratch -see Fig. S1).

Characteristics of periods driving system design
We find that on average across 40 weather years, the single most expensive day in each year accounts for 12.4% (6.6-31.3%) of total yearly electricity cost, whereas 19 weather years contain a three-week period accruing more than 50% of total electricity cost (Fig. S2).This heterogeneity of events calls into question the use of representative periods or time slices in energy systems modelling.Moreover, we find large variations between different weather years, with the single most expensive week explaining between 18% and 77% of total respective electricity costs.For context, the total yearly electricity costs (that also include the value of existing infrastructure) range from 216 to 330 billion EUR depending on the weather year.
As introduced in Section 2.3, we define a system-defining event as accumulating costs exceeding 100 billion EUR in less than two weeks.We identify 32 such events which all happen between November and February (see Fig. 2 and Table S1).The events vary in length considerably (2-13 days), being 7 days long on average.
We find that meteorologically extreme single days [18,47,19] do not reliably identify system-defining events in individual weather years (Fig. S3).While such extreme days almost always lead to high shadow prices, these are not necessarily surrounded by a challenging enough period to have a large impact on system design (e.g.see the events in 1997/98, 2011/12 and 2012/13 from Bloomfield et al. [19], Figs.S3-4); the same also holds for weeklong events (Fig. 2 and Figs.S5-8).
As opposed to methods considering only peak load or net load, (i.e.peak mismatch between renewable generation and load) [17,18,19,20,23], using power system optimisation outputs to identify system-defining events takes the complex interactions between storage and transmission into account.Moreover, we need not make assumptions about the availability of storage and transmission in any particular region.

Origins of power systems stress events
In line with previous research, we find that power system stress occurs in the winter months when temperatures, wind and solar production are low in Europe [19,46,41].
Power systems based on renewables are primarily winddependent in the winter, especially in the northern latitudes [48], making them prone to "wind droughts".Using standard cost projections, we see annualised investments of 60.9 bn EUR in wind power (onshore and offshore), 28.4 bn EUR in solar power, 15.2 and 13.3 bn EUR in batteries and hydrogen storage respectively, and 18.4 bn EUR in transmission expansion (mean over 40 individual weather year optimisations -Fig.S1).
We find significant variations in the magnitude and location of stress triggers over Europe across the 32 systemdefining events (e.g.Figs.S9-10).Still, all but one identified events are consistently driven by low wind power and high load anomalies (Fig. 3 (a)-(b)) when aggregating over the whole system.Moreover, we find that even though the low wind and high load anomalies during system-defining events are concentrated over certain regions, high shadow prices typically spread to the whole continent (Fig. 4).This is despite a modest maximum allowed transmission investment of 25% compared to the current-day grid value in the model.Only peripheral regions (northern Scandinavia and, to a lesser extent, the Iberian peninsula) have significantly lower shadow prices during some of the events; even then they are much higher than average.

Role of transmission and storage during system-defining events
While system-defining events can be caused by various meteorological conditions, the most severe events almost always impact the sizing of all power system components.Fig. 4 shows a representative example of a weeklong system-defining event during December 2007.This period was caused by a high pressure system over central Europe causing a period of prolonged low wind as well as high heating load (Fig. 4 (a)-(b)).The event is identified as difficult by the spiking electricity shadow prices (shown by region in Fig. 4 (c) and over time in (d)).
To discern the roles of transmission and storage during this event, we consider the dual variables of the line capacity constraints and inter-hour storage energy level linking constraints respectively (see Section 2.2 and Supplementary Materials for details).While we see in Fig. 3 that the 40-year mean shadow price of congestion  , across the network is just below 2 EUR / MW, Fig. 4 (c) shows that  , reaches event-average values above 1000 EUR / MW for individual lines.This demonstrates that the event in question is a major factor in driving transmission expansion -in fact some 39% of the total annual network congestion rent for the 2007/08 network was gained during the week in Fig. 4.There is significant congestion between continental Europe on one hand and Scandinavia and the British Isles on the other hand, with significant wind-and hydropower supplied from these regions.The transmission grid is well-connected enough to avoid extreme price spikes in the affected regions.
The value of stored hydrogen energy around the December 2007 event in Fig. 4 (d) reaches a maximum during the event, but as the marginal electricity prices are higher still, the entire hydrogen storage reserves in the network are discharged.This particular system-defining event was preceded by a week of already high prices and high values of stored energy, during which not all hydrogen storage was able to fill up in anticipation of the main event.Other weather years contain meteorologically distinct system-defining periods up to several weeks apart that are nonetheless connected by sustained high values of storage in the interim.This underlines the temporal interdependence of power system dynamics when storage is included, meaning that periods of system stress cannot be studied as isolated events.

Comparison to the traditional relationship between climate and power systems
Composites of the normalised surface weather conditions observed during each of the 32 events from Approach 2 (Table 1) are shown in Fig. 5 (a)-(b).The events are defined by high pressure systems over Central Europe and the North Sea region (where the capacity expansion model mainly builds wind power), resulting in cold temperatures and low wind speeds.This is similar to the synoptic situations [17,18,26] seen using Approach 1.
Within Fig. 5   Figure 2: An overview of all identified system-defining events in the context of daily system cost.Additionally the week with the highest net load for each year is marked (Approach 1 in Table 1).Only winter months are shown as shadow prices are consistently low during the summer.All costs are in 2013 EUR, but derive from model shadow prices, not actual market prices.are present.Performing K-means clustering on the normalised hourly near-surface temperature and wind speed fields over the 32 events to isolate key weather patterns of interest (see Section 2.5) gives the four clusters shown in Fig. 5 (c)-(j).All include high pressure centres over parts of Europe and low winds over the North Sea.However, each cluster has very different spatial patterns of surface temperature anomalies, which are not seen in studies neglecting transmission and storage constraints [18,19].Future work will investigate if these conditions are unique to system-defining events, or if it is possible to also have these anomalous weather conditions at times of low power system stress.
If instead each day is assigned to a more traditional Euro-Atlantic weather regimes framework from Cassou [43], we see a high frequency of Scandinavian blocking (54%) which is over double the 25% seen climatologically.We also see over four times fewer instances of NAO+ (Fig. 6).Generally the pattern correlation between each day's weather and the assigned cluster is low (Fig. 6), particularly when a day is assigned to NAO+ or the Atlantic ridge.Fig. 6 (g) shows the 500 hPa geopotential height composite for all of the system defining events.This explains the higher prevalence of Scandinavian blocking events (Fig. 6 (d)) but importantly, the system defining events resemble a fusion between the high pressure centre from the Scandinavian blocking pattern, and low pressure region from the NAO− pattern.Fig. 7 shows the temporal evolution of the weather regime categorisation from Fig. 6 over each event.The figure is centred around the peak day of each event, which is the day containing the single most expensive hour of the event.It is interesting to note that the peak day can be at any point during the extreme event, and that the weather regime present during an extreme event is often quite persistent.Both of these are interesting points for future work.The results in this section motivate the need for more bespoke approaches to extreme energy days [49,50].
When considering seasonal extremes, previous studies have shown strong correlations between the North Atlantic Oscillation (NAO) and national demand and wind power generation [51,52,17,53].Winters with a negative NAO index have weaker surface pressure gradients across Europe, leading to colder, stiller conditions and higher seasonal demands.Fig. 8 (a) shows positive correlation between the October-March NAO index and European mean wind capacity factor ( = 0.52), with similarly strong negative correlations seen for NAO index and European mean load (Fig. 8 (b)).Significant correlation is also found when costs of electricity (between October and March) are considered ( = −0.42).Winters with a negative NAO index generally exhibit higher costs (Fig. 8 (c)).However, there are times where a high cost can happen in a mild winter.For instance, January 1997 (Fig. S12-13) experienced a low-wind-cold-snap driving high system costs; a very anomalous event compared to the rest of the season.
Fully modelling transmission and storage constraints can lead to a different characterisation of the most challenging winters for power system operation than seen in studies entirely based on meteorological input variables.An overview over all events can be found in Table S1.This is particularly important when considering the subseasonal to seasonal prediction of extreme energy events.

Validation of system-defining events
We validate our approach through load shedding (or lost load) which is a commonly used tool to measure power system adequacy [54,55,50,9].Load shedding can be measured in dispatch optimisations of fixed power system designs, whereas capacity expansion models avoid any load shedding by design.
To validate whether system-defining events align with periods of high load shedding, we calculate for each weather year   the hourly average load shedding in the dispatch optimisations of the power system designs obtained from weather years   ,  ∈ {1980∕81, … , 2019∕20} operating over year   (a total of 40 dispatch optimisations per weather year).See Section 2.6 and Supplementary Materials for details.We find that all but one systemdefining events overlap with the week-long periods of highest load shedding in the weather year they occurred in.
In any year, system-defining events tend to be those with high load shedding; either method can be used to identify power system stress.Crucially, both shadow prices and load shedding agree on extreme events that are different than those from Approach 1 (Table 1) based only on net load (Figs.S14-S17).This highlights yet again the importance of detailed power systems modelling (also required for computing load shedding) in identifying weather stress events.
Arriving at load shedding data takes an additional step (possibly on top of Approach 2): first obtaining one or several system designs and then running them in dispatch mode to reveal load shedding.The latter approach also entails additional assumptions: one has to choose which input scenarios to use for capacity expansion steps and dispatch steps respectively.

Discussion & Conclusions
In this study we investigate difficult weather events for power systems through an integrated approach combining meteorology with power systems modelling.To improve resilience against weather extremes, we show that it is not enough to look at meteorological variables alone (Approach 1), but we also need to include a detailed representation of future, to-be-designed energy systems (Approaches 2 and 3).We propose identifying systemdefining weather periods as those being the main drivers of investments; such periods are defined by high electricity shadow prices in a power systems model.As this approach builds directly on modelling outputs, it is free of assumptions on specific characteristics of extreme events.
We find that risk factors like persistent low temperatures and low wind align well with previous literature [56,22,21], however, conventional meteorological analysis does not reliably identify the most severe difficult periods for future power systems.In particular, challenging periods for the integrated European network vary in duration and are characterised by transmission and storage interactions over time, not only extreme weather.We see that isolated regional studies are not good enough, as the vast majority of the continent experiences uniformly high shadow prices during all system-defining events.[43] during system-defining events (with the solid dot marking the overall 40-year relative frequency of each regime), (b) pattern correlation between the daily 500 hPa geopotential height anomaly from the 32 system-defining events and the four Euro-Atlantic weather regimes, (c)-(f) 500 hPa geopotential height (Z500) anomaly composites for the Euro-Atlantic weather regime cluster centroids, (g) Z500 anomaly composite during the 32 system-defining events.S1).The events are centred around the peak day, which is the day containing the single most expensive hour of the event.If the association of a day to a weather regime is not statistically significant, it is shown with high transparency.To reliably predict future energy system stress events traditional meteorological classifications [43,56,18] are not enough, and more detailed knowledge on surface weather impacts on power systems is needed [14,50].
Since our approach is based on single-year optimisations resulting in different system designs for different weather years, electricity shadow prices and thus severity of events are not directly comparable across weather years.This limitation can be addressed by using load shedding (Approach 3 in Table 1) instead of electricity shadow prices to identify extreme events.However, our validation shows that the load shedding and shadow price approaches agree on the most severe events in each individual weather year.Computing load shedding is also more computationally expensive and involves more assumptions, requiring a two-step process.
Restricting our analysis to events shorter than two weeks, we capture significant fractions of total electricity cost, but do not capture the full chain of cascading compound events.A complete understanding of how seasonal weather relates to total annual system cost (beyond the partial correlation with the NAO index) is still elusive.Perfect foresight also limits the ability of our model to react realistically to multi-week or longer events.On the other hand, our analysis also does not focus on very brief events.Further analysis over a variety of event length, both longer and shorter, would be beneficial.
An interesting extension of this study would be the inclusion of sector coupling: electrification of heating strengthens the impacts of heating load and the inclusion of more sectors could lead to different dynamics than in the power sector alone.Still, low wind generation will be key in years to come due to higher penetration of renewable technologies.With ever-improving climate models, these methods could be applied to climate model projections, as system insights based on weather from the 1980s might not necessarily be transferable to mid-century sys-tems under climate change.
The question of pinning down what makes certain weather years difficult (in terms of system costs) remains complicated and computationally expensive; the main part of investments throughout the years is driven by a few short-lived and severe events.Our classification can help meteorologists, transmission system operators and long-term system planners to develop early warning systems and resilience strategies for these events.It is worth remembering that current systems usually struggle with high load, but that these risks and coping mechanisms will shift towards supply issues when renewable production dominates.A good understanding of the anatomy of such events will help in risk assessments including frequency and severity under climate change, crucial for ensuring system adequacy.
Our flexible approach can be applied to other contexts beyond this European case study and shows that rigid assumption-based analyses within one discipline do not suffice for challenges the world is facing.Our approach exploits inherent information from existing models and unites perspectives from linear optimisation, energy modelling, and meteorology to enhance the understanding on how more resilient future energy systems can be planned.Without interdisciplinary studies with state-of-the-art power system models and meteorological data, progress in researching and implementing renewable energy systems cannot be made.

Code and data availability
The code to reproduce the results of the present study, as well as links to the data used, are available at https:// github.com/koen-vg/stressful-weather/tree/v0.All code is open source (licensed under GPL v3.0 and MIT), and all data used are open (various licenses).

A Additional methodological details A.1 Energy system optimisation models and dual variables
Many energy system optimisation models (such as PyPSA-Eur) are formulated as a linear program, which means they have a linear objective and linear constraints: This formulation gives rise to a dual problem We are interested in the dual variables that stem from the nodal energy balance constraints for every time step  and node ; equation ( 12) in Brown et al. [32].These constraints ensure that supply meets a given inelastic electricity demand at each hour and node, and following Brown et al. [32] we denote their respective dual variables  , (also known as marginal or shadow prices).By definition,  , is the rate of change of the objective function, here total system costs, with respect to demand at node  and time .More usefully,  , (given in EUR / MWh) can be interpreted as the marginal electricity price at each node and time step.Letting  , be electricity demand,  , ⋅  , is the cost of satisfying electricity load at node  and hour .It follows that ∑ ,  , ⋅  , is the total cost of electricity over the entire modelling horizon.
It should be noted that the marginal prices  , typically do not follow the same profile as real electricity market prices; this is due to the inclusion of capacity expansion in our model.This leads  , to not only be driven by marginal operating costs of power plants, as in free electricity markets, but mainly by conditions triggering investments.Thus, the shadow prices  , typically stay very low most of the type, and increase drastically during periods necessitating additional investment in generation, storage and transmission capacity.Nonetheless, ∑ ,  ,  , ∕ ∑ ,  , gives a good indication of the system-average electricity price resulting from the model.
In a simple greenfield capacity expansion model, with no included existing infrastructure, the total cost of electricity ∑ ,  ,  , (plus the shadow cost of emissions in case of a global emission constraint) is equal to the objective value of the optimisation problem; this following from strong duality for linear programs.Since our model includes existing transmission, hydropower, nuclear and biomass generation infrastructure whose costs are not included in the objective function, the objective value is lower than the total electricity cost.Still, ∑ ,  , is a good indicator for total system cost.

A.2 Transmission congestion and value of stored energy
For each transmission line , the electricity flow  , over that line at time  is subject to the constraints  , ≥ −  and  , ≤   where   is the capacity of the line in MW and the sign determines the direction of the flow.The dual variables  lower )  is the congestion rent of the network, and equal to the surplus gained by the transmission grid at time  [58].This way we can judge whether certain periods are determining in the transmission expansion decisions.
Similarly, constraints preserving the state of charge from one hour to the next give rise to dual variables which can be interpreted as the marginal value of stored energy, with each storage unit discharging if and only if its value of stored energy is below the marginal price of electricity at the network node it is connected to [59,60].It should be noted that these considerations can be a useful indicator for locating crucial regions.

A.3 Selection of system-defining events
Recall that an event starting at  0 and lasting for  hours is considered system-defining if for  = 100 bn EUR and  ≤ 336 (the number of hours in two weeks).
A priori, many overlapping time periods of the same or different lengths can attain the above thresholds.For example, if the period [ 0 ,  1 ] is system-defining and strictly shorter than two weeks, then [ 0 ,  1 + 1] is also systemdefining.For the purposes of this study, we select a disjoint subset of all system-defining events.In particular, we build up the subset iteratively by going through system-defining events from shorter to longer events (and in decreasing order of total electricity cost for events of the same length), and only adding each event to the selected subset if it does not overlap with previously selected events.This corresponds to imposing a partial order on all system-defining events by defining  1 <  2 if and only if  1 and  2 overlap and  1 is shorter than  2 or, if of the same length, is more expensive; our selected subset consists of the minimal elements of the resulting partially ordered set.
As a final step, we extend the selected events on either side as long as this does not decrease event-average hourly electricity cost.Thus, for the left side of each event, we extend from [ 0 ,  1 ] to [ 0 − 1,  1 ] as long as The right side of the events is extended similarly.

A.4 Validation using load shedding
To compute load shedding profiles to compare to shadow prices, we fix system designs   , each obtained by a capacity expansion based on a weather year   ,  ∈ {1980∕81, … , 2019∕20} (preserving winters from July -June), and optimise the dispatch of   year-by-year with all weather years   ,  ∈ {1980∕81, … , 2019∕20}.The forty initial optimisations lead to different electricity networks with large discrepancies in total system costs (as in Grochowicz et al. [9]) and are often inadequate for weather conditions that are not represented in the inputs.Keeping the capacities of   fixed, we add an artificial generator at each node  which can supply electricity at very high variable (and no capital) costs if demand cannot be met any other way.The power supplied by this artificial generator,   , can be interpreted as load shedding and quantifies the extent and times during which the system fails to meet demand.
For each weather year   , we compute the average load shedding l across all 40 system designs   (although   cannot have any load shedding for   by the model formulation), thus obtaining values for each time step between July 1980 and June 2020: where   , is the load shedding at node  when the system design   is operated at time .One advantage of using load shedding over electricity shadow prices is that latter may suffer from "overshadowing" effects.Since shadow prices indicate events triggering investment, one event might overshadow another in the same weather year if one is slightly more severe than the other but similar otherwise, thus triggering investments (leading to high shadow prices) that render the second event benign.We see limited evidence of this in Fig. S16 (comparing electricity shadow prices and load shedding), but shadow prices and load shedding match well for the most severe events (Figs.S12-15).For instance, the vertical slice of the graph at  = 1 week shows that the single most expensive week ranges in cost between about 40 and 200 bn EUR.The thick black line segment shows the cutoff that was used to identify system-defining events: periods of at most 2 weeks having a total electricity cost of at least 100 bn EUR.The weather years without system-defining events correspond to the curves that do not intersect the cutoff line.Note that the cutoff line on this graph cannot be used to identify multiple system-defining events in the same weather year.Figure S11: An overview over all identified system-defining events in the context of daily system cost.Apart from the costs we also plot average load shedding (as in Methods).The marked "difficult days" are from [19].Only winter months are shown as shadow prices are consistently low during the summer.All costs are in 2013 EUR.  Figure S12: An overview over all identified system-defining events in the context of daily system cost.Apart from the costs we also plot average load shedding (as in Methods).Additionally the week with the highest net load for each year is marked (Approach 1 in Table 1).Only winter months are shown as shadow prices are consistently low during the summer.All costs are in 2013 EUR.S2: Key metrics for all identified system-defining events.The anomalies (to the mean for 1980-2020) for wind power production, solar production, and load are hourly averages in GW, and the values for transmission and the different storage technologies are hourly averages for shadow prices of congestion (in EUR/MW) and value of stored energy (in EUR/MWh).

1 Figure 3 :
Figure3: A summary of key metrics compared to 40-year means.Each dot represents the mean value of the metric in question over one system-defining event.From left to right: (a) renewable production deviation from 40-year mean at the time of each event, (b) load deviation from 40-year mean at the time of each event, (c) mean shadow price of transmission congestion during each event, (d) mean value of stored energy for each event.An overview over all events can be found in TableS1.

1 Figure 4 :
Figure 4: System-defining events are the result of an interplay of low renewable availability, high load, storage constraints and transmission congestion.Inputs in the top row, comparable to a usual meteorological approach (Approach 1).System variables in the bottom row.(a) Average weather in Europe over the example event.Note the wind speed anomalies over the North Sea region and the temperature anomalies in Central Europe in Fig. S11.(b) Time series of wind power production and electricity load around the highlighted event (smoothed with rolling averages of 24 hours).The dashed lines show seasonality deduced from the period 1980-2020.(c) Network map of the European power system with the edge widths showing shadow prices of congestion and the regions shaded with the average electricity price during the event.(d) Time series of electricity prices, value of hydrogen storage (with logarithmic scales), and the hydrogen storage level around the highlighted event (all network averages).All costs are in 2013 EUR.

1 Figure 7 :
Figure 7: Daily evolution of the predominant weather regime during each system-defining event (see TableS1).The events are centred around the peak day, which is the day containing the single most expensive hour of the event.If the association of a day to a weather regime is not statistically significant, it is shown with high transparency.

1 Figure 8 :
Figure 8: The relationship between October-March mean North Atlantic Oscillation (NAO) index and October-March (a) European mean onshore wind capacity factor, (b) total European net load, and (c) total costs of electricity (all between October and March).The year with the highest costs accrued between October and March (1996/97) is marked with green in (a)-(c).R values show the Pearson correlation coefficient between variables.Similar results are seen for individual countries (not shown).
to these constraints are called the shadow prices of congestion.The capacity-

1 Figure S10 :
FigureS10: Total electricity costs of most expensive contiguous periods as a function of period length across different weather years.For instance, the vertical slice of the graph at  = 1 week shows that the single most expensive week ranges in cost between about 40 and 200 bn EUR.The thick black line segment shows the cutoff that was used to identify system-defining events: periods of at most 2 weeks having a total electricity cost of at least 100 bn EUR.The weather years without system-defining events correspond to the curves that do not intersect the cutoff line.Note that the cutoff line on this graph cannot be used to identify multiple system-defining events in the same weather year.

Figure S13 :
Figure S13: Hourly and accumulated electricity costs across the weather years 1980-1990.System-defining events are marked along with two weather input-based filters: for each year the week with the highest electricity load ("Demand") and the week with the largest mismatch between electricity load and renewable production ("net load").All values in bn EUR (2013).

Figure S14 :
Figure S14: Hourly and accumulated electricity costs across the weather years 1990-2000.System-defining events are marked along with two weather input-based filters: for each year the week with the highest electricity load ("Demand") and the week with the largest mismatch between electricity load and renewable production ("net load").All values in bn EUR (2013).

Figure S15 :
Figure S15: Hourly and accumulated electricity costs across the weather years 2000-2010.System-defining events are marked along with two weather input-based filters: for each year the week with the highest electricity load ("Demand") and the week with the largest mismatch between electricity load and renewable production ("net load").All values in bn EUR (2013).

Figure S16 :
Figure S16: Hourly and accumulated electricity costs across the weather years 2010-2020.System-defining events are marked along with two weather input-based filters: for each year the week with the highest electricity load ("Demand") and the week with the largest mismatch between electricity load and renewable production ("net load").All values in bn EUR (2013).

5 1 Figure S21 :B. 6 Load shedding provides an alternative method to shadow prices 1 Figure S22 :
Figure S21: Average weather in Europe over the example event in January 1997.

Figure S23 :
Figure S23: Average load shedding (across all networks) for the weather years 1990-2000.System-defining events are marked.

Figure S24 :
Figure S24: Average load shedding (across all networks) for the weather years 2000-2010.System-defining events are marked.

Figure S25 :
Figure S25: Average load shedding (across all networks) for the weather years 2010-2020.System-defining events are marked.

Table 1 :
An overview over the three approaches we compare in this study.Approach 1 is commonly used in the literature.We introduce Approach 2 in this study (also see Section 2.2 and 2.3) and validate it with Approach 3 (seeSection 2.6