Compound drought and hot events assessment in Australia using copula functions

The occurrence of compound drought and hot events has been shown to cause stronger socio-economic, environmental and health impacts than the isolated events. Moreover, the frequency of these compound events has increased unevenly throughout the world and is expected to keep increasing in several regions. In this work, an assessment of compound drought and hot events in the summer months in Australia was made, using copula functions. Drought and hot conditions were identified by the Standardized Precipitation Evapotranspiration Index (SPEI) and the indices Number of Hot Days (NHD) and Number of Hot Nights (NHN) for the summer months, respectively. We analysed drought conditions in the current and the previous 1 to 3 months and the periods 1950–1978 and 1979–2020. The results show that the conditional probability of the occurrence of hot events given drought conditions is very high for the concurrent month in most of the study area, reaching 0.9 in some cases. Considering previous drought conditions, the higher probabilities are obtained in the southeastern region in December and in the north in February but, in most of the study area, these values are higher than for the case of non-drought conditions, pointing to an effect of previous drought conditions on hot events of up to 3 months. Moreover, an increased frequency of compound drought and hot events from the first to the second period was identified in more than half of the study area for lags of 1 and 2 months. We show that, although the conditional probabilities are mostly higher when computed with NHD, NHN is also affected by drought conditions, and should also be considered in this analysis, since nights can have a relieving contribution when impacts in health and wildfires are being analysed.

A relationship between drought conditions and hot events has been identified in several regions of the world [26,27], such as China [28,29], the Mediterranean basin [30,31], Brazil [32], the U.S.A. [33], and Australia [34][35][36][37].Moreover, an increase in the frequency of these compound events has also been identified [31-33, 38, 39], driven mainly by the increase in hot events [29,31,33,40].The frequency of these compound events is expected to continue increasing in the future, as pointed out by future projections for either low or highemission scenarios [41,42].Moreover, the extreme complexity of drought assessment is related to the difficulties associated with drought spatial assessment, namely identification, forecasting, mitigation, input parameter modelling, and effects of climate change [43].Additionally, drought conditions have been reported to contribute to enhanced hot events in some regions [39,44], such as during the heatwave of 2009 in southeastern Australia [45].Antecedent drought conditions have also been shown to be related to an increase in the number of hot days [26,46], and in some cases, the drought conditions occurred several months before the hot event, such as in Europe [30,46,47] and in Australia [3,37,48].Nonetheless, for the case of antecedent dry conditions, weather regimes have also been shown to play a role in the occurrence of hot days, namely in the amplification or inhibition of the soil moisture-temperature feedback [46,49,50], whereas for the case of antecedent wet conditions, no feedbacks are present [46].
In particular, in water-limiting climates, soil moisture and air temperature are related through latent and sensible heat.The decrease in soil moisture decreases evapotranspiration and, therefore, latent heat, which then increases sensible heat and consequentially an increase in air temperature [49].In turn, the increase in air temperature can increase evapotranspiration, further depleting soil moisture [49].Moreover, it has been shown that soil desiccation promotes temperature increases downwind, via heat advection [51].Therefore, heatwaves in Australia have been shown to be associated with large-scale circulation patterns and land surface conditions [52].The relationship between soil moisture and drought conditions makes drought indices good proxies for soil moisture, and these indices have been used in studies of the relationship between drought and hot events [26,30,35,37,39,47].
One of the challenges of studying extreme events is the fact that these events are, by definition, very rare, and therefore the number of observational samples is usually very small to allow the counting of threshold exceedances [53].Nonetheless, the use of copula functions allows to overcome this issue and further derive probabilities of one extreme event given the occurrence of another extreme event.Copulas can also be used to generate random data via sampling, preserving the dependence structure of the observations, thus increasing the number of samples of rare events [53,54].This method allows to perform a joint probability analysis of two variables, and to subsequently compute conditional probabilities of exceedance via empirical counting with a robust number of tail occurrences.For this matter, the ability of copulas to characterize the tail behaviour of a joint distribution (the amount of dependence in the corners of upper-right and lower-left quadrants) is of great importance when studying extreme events such as heatwaves and droughts.In particular, copula functions were used to evaluate compound dry and hot extremes [39,53] and to show that hot summers are more connected to drought than to non-drought conditions [47,55].
Although copulas have proved to be a very efficient and flexible tool to disclose important features of the dependence structure between two types of extreme events (e.g., non-linear relationships and tail dependence), other methods have been applied to explain the relationship between heatwaves and droughts.Namely, the study of the relationship between drought conditions and hot events in Australia has been made using a correlation analysis [26,[34][35][36][37], although copula functions have also been applied [39,40].Hot events occurring during the summer months were shown to be related to drought conditions occurring concurrently [34-37, 39, 48], but also in the previous months [26,34,37,48].Nonetheless, most of these studies considered the aggregated summer months [34][35][36]48] or only the hottest month [26,27].Recently, [37] showed that the correlation between drought conditions and hot events varies during the months of December to February, either for concurrent or for previous drought conditions, but no work has considered a monthly temporal resolution when using copula functions.Moreover, [40] identified an increased frequency of occurrence of compound drought and hot events in the period 1990-2019, when compared to previous years, but annual or seasonal events were considered, and not events in the individual summer months.
The main goal of this work was to assess the joint probability of drought and hot extremes in Australia, during the months of December to February, using copula functions.The indices NHD and NHN were used to identify hot events, whereas drought conditions were identified with the drought index SPEI at several time scales.The probability of occurrence of hot events was assessed, given drought conditions in the concurrent and in the three previous months, and was compared to non-drought conditions.Two time-periods were analysed, 1950-1978 and 1979-2020, in order to assess if the relationship between these extreme events had changed.The results obtained with NHD and NHN were compared, to assess which was more affected.

Study area
The climate in Australia includes tropical, temperate, and arid classes, as defined by Koppen-Geiger classification [37], and the latter classes cover 77% of Australia.Following [37], the region classified as arid desert, which is around 51% of the country, was removed from the study, using the climate maps computed by [56].

Indices of extreme temperature
Several indices were proposed by the Expert Team on Climate Change Detection and Indices that allow to identify extreme events, such as extreme heat events [50,57].Among which are the indices Number of Hot Days (NHD) and Number of Hot Nights (NHN), which correspond to the number of days, per month, that exceeds the 90th percentile of daily maximum and minimum temperature, respectively [58,59].NHD and NHN were previously used in similar analyses in Australia and in other regions [26,30,36,37,47], and are considered suitable to this type of study.These indices were computed using temperature data from the ERA5 dataset [60] covering the period 1950-2020.This includes the ERA5 preliminary version of the data from 1950 to 1978.[61] compared heatwave trends computed with the station-based ACORN-SAT [62] dataset with two gridded products and found that the results are very different.Nonetheless, NHD and NHN were previously computed with both the ERA5 and ACORN-SAT datasets and correlated with SPEI by [37], and the results indicate a good agreement in terms of the correlation value.Hourly 2 m temperature data were retrieved, with a spatial resolution of 0.25°, and daily maximum and minimum temperature were calculated.NHD and NHN were computed with the Climpact package for R for the base period of 1950-2020.This package includes a resampling procedure that removes the inhomogeneities caused by the use of a base period [59].

Drought index
The Standardized Precipitation Evaporation Index (SPEI) [63] computed with the time scale of 1 month was found to be highly correlated with soil moisture in the study area [64] and was used as a proxy for soil moisture, following previous studies [30,37,47,55].Moreover, when correlated to NHD and NHN in the Mediterranean region, SPEI yielded similar spatial patterns and correlation values to the results of the Standardized Precipitation Index.It was computed using monthly precipitation and temperature data from the CRU TS4.05 dataset [65], which covers the period 1901-2020, with a spatial resolution of 0.5˚.This dataset has been widely used in similar studies [10,27,[30][31][32]66] and the correlation between precipitation and temperature data from this dataset with data from stations in the study area has been shown to be very high [65].The atmospheric evaporative demand needed to compute SPEI was estimated using the modified Hargreaves method [67].The water deficit was modelled with a log-logistic distribution [68].SPEI was computed for the period 1950-2020, with time scales of 1, 3, and 6 months, which include the effect of atmospheric water demand from the current month to the previous spring and winter.SPEI was resampled to match the resolution of ERA5, using a bilinear interpolation, since SPEI presents a low spatial variability, which will likely not be affected by this interpolation [69].

Copula functions
In this work we make use of copula functions [54,70,71] to estimate the joint distribution between the drought and extreme temperature indices and subsequently derive the probability of occurrence of extremely hot days and nights preceded by dry conditions.Sklar's theorem states that a joint probability distribution can be decomposed into two parts: the univariate marginal distribution functions and a copula which describes the dependence between the margins [70].Given two correlated variables, X and Y, uniform on the interval [0,1], a copula function C links the marginal distributions F X (x) and F Y (y) to their joint probability F XY (x,y), as follows: Among a suite of available parametric copula families, here we have fitted a selection of elliptical (Gaussian and student's t-copula) and Archimedean (Frank, Gumbel, and Joe copulas) models.The preference for the use of these five copulas relates with their well-documented use and their properties that allow the modelling of different bivariate linear and nonlinear dependencies.For example, Gaussian dependencies are mostly concentrated in the main diagonal, maximizing the linear correlation, and featuring a vanishing tail dependence.In contrast, the Gumbel and Joe copulas enhance nonlinear correlations with much more influence of the clustering of bivariate extremes.While the Gaussian and tcopulas are used to describe symmetric correlations, Gumbel and Joe models are able to characterize asymmetric tail dependencies in the upper right corner.With the aim of assessing changes in the conditional probability of the compound drought events and hot extremes, the analysis was performed on two periods, namely 1950-1978 and 1979-2020.These coincide with the two periods of the ERA5 dataset, from which temperature data was retrieved to compute NHD and NHN.Since the correlation between SPEI and NHD/NHN is mostly negative in the study area [35][36][37], we mirrored the values of SPEI in the analysis to model only positive correlations and thus facilitate the copula modelling [47].To fit the copula functions, first the data is non-parametrically transformed to uniform variables via simple ranking.As the variables NHD and NHN are noncontinuous, the presence of ties in the rank-scaled samples can introduce biases in the estimation of the copula parameters [72].For this reason, the ties observed in the ranking of the variables were replaced by their mean value, since this method was shown to be more appropriate than randomly untying when using the maximum likelihood estimator [72].Afterwards, the copula parameters were estimated using the maximum likelihood and the Bayesian information criterion (BIC) was used to select the most appropriate copula function following [47].The goodness of fit of the copula selected was then tested with the Cramér-von Mises distance [47,73], with a level of significance of 0.1.The fitting of the copula functions was performed separately for the time periods 1950-1978 and 1979-2020.

Likelihood of hot and dry extremes
Following [47], a robust number of occurrences of hot and dry events were generated via sampling of 10,000 values from the selected copula functions preserving the bivariate dependence structure of SPEI and NHD/ NHN.The conditional probabilities of exceedance (CPE) of hot extremes given dry conditions are then empirically estimated by counting the relative number of occurrences of SPEI and NHD/NHN exceeding critical values.The uncertainty associated with the sampling was estimated via bootstrapping by resampling 1000 times samples with size equal to the number of observations (29 and 42, respectively for the first and second periods) [47].This procedure was applied at the pixel level, for the cases where the correlation between the uniform variables was positive, as obtained by Kendall's correlation coefficient, estimated by means of tau b, which is adjusted for ties [72].Confidence intervals (95%) were then computed for the copula parameter and compared to that of the observations.Drought conditions can be identified when SPEI is lower than −0.84 [74], or when the mirrored SPEI is higher than 0.84, which corresponds to a probability of occurrence of 20%, i.e., the quantile 0.8.Nonetheless, this is only true for the case of the complete time series.Therefore, for each time period and for each pixel, the quantile corresponding to mirrored SPEI higher than 0.84 was assessed.Considering that the values of the generated samples are uniform and belong to the interval [0,1], drought conditions were identified by values higher than the obtained quantile, whereas non-drought conditions were identified with values lower than the obtained quantile.To identify extreme values of NHD/NHN, we chose the quantile 0.8, and first assessed the corresponding threshold on each time period and pixel.We then used the same method as for SPEI to assess the quantile to be applied on the generated samples.The conditional probability of NHD/NHN exceeding the obtained quantile, given drought and non-drought conditions, were computed for all months, SPEI time scales, and considering drought conditions on the current month and on the previous 1 to 3 months (lags 0 to 3).These lags were chosen because it has been shown that in some regions it is possible to forecast strong heatwaves up to more than 1 month ahead [75] and also because precipitation in the previous spring and winter has been shown to affect temperature extremes in the summer in several regions [30,34,37,[46][47][48].For each month and lag, the maximum value of CPE obtained for each SPEI time scale was chosen, for the cases of drought and non-drought conditions.The difference between the two was computed, in order to estimate the effect of drought conditions on the occurrence of hot events.The difference in the results obtained for the two periods was also computed, as well as the difference between the two indices.

Results
The Kendall correlation between NHD/NHN and SPEI is presented in figures S1 and S2, for the time scale that yielded a higher CPE.Negative correlations occupy an area that is generally larger in the first period for both indices, and it reaches 43% of the study area for the case of NHD and 40% for the case of NHN.Nonetheless, the correlation is positive in almost all cases.In most of the study area, the dependence of the variables is not high, especially for lags larger than 0. For the case of the uncertainties associated with the sampling method, no pixels fall outside the 95% confidence interval.CPE was computed for the SPEI time scales of 1, 3 and 6 months, but the difference between the values of CPE obtained was lower than 0.05 (0.1) on 45% (76%) of the cases (results not shown).Here we present only the maximum value obtained, for each of the lags (0 to 3 months), and we present the SPEI time scale corresponding to the maximum values shown in figures 1 and 2 in figures S3 and S4.When analysing results for NHD and NHN (figures 1 and 2), CPE is higher for lag 0 and for NHD, compared to NHN.For this lag and for both indices, some regions in the north present CPE higher than 0.8 in February in the second period, whereas for NHD most of the study area presents values higher than 0.7.For the remaining lags, January presents very low values of CPE in most of the territory, and in December the values are higher in the  southeast, particularly in the first period, and are generally low on the remaining area.In the first period, for both indices, CPE is higher in the southeast in December for lag 3 than for lag 2, and it reaches values higher than 0.6, which are comparable to the values obtained for lag 0. In February in the first period, lag 1 presents CPE values higher than 0.6 in the north, much higher than lags 2 and 3.
An increase in CPE is shown in many regions from the first to the second period, as shown in figure 3. Table 1 presents the percentage area of increase for each month, lag, and index.The area of increase is larger than 50% for almost all cases, and the larger areas occur for lag 0 and for NHD.This shows that there was an increase not only in areas where CPE was lower, at lags 1, 2, and 3, but also for cases where CPE was already high, at lag 0. On the other hand, decreases in CPE occurred in many cases in the southeast (figure 3), which presented high values in the first period.An increase in CPE for the case of non-drought conditions is also shown in most of the study area (figure S5), although it was much smaller than for the case of drought conditions.
Figures 4 and 5 show the difference between CPE given drought conditions (shown in figures 1 and 2) and non-drought conditions, for NHD and NHN, respectively.The difference is positive for all cases, indicating that hot extremes are more likely to occur following drought conditions than following non-drought conditions.On a very small number of pixels, the difference is negative, but these cases do not exceed 0.3% of the study area (not shown) and occur only in the first period.Like before, lag 0 presents larger values.The patterns obtained are very similar to those of maximum CPE (figures 1 and 2), showing that the higher the CPE, the higher the difference between drought and non-drought conditions.Moreover, despite a general increase from the first to the second    period for the case of lag 0, for the larger lags many regions present a decrease in the difference between drought and non-drought conditions.The difference between the CPE obtained with NHD and NHN is shown in figure 6, and table 2 presents the area showing a positive difference, which indicates a higher value obtained for NHD.These results show that, in most cases, CPE is higher for NHD, and the difference is generally larger when it is also positive (figure 6).The area showing a positive value increased in the second period in almost all cases, except for some cases in January and February (table 2).Nonetheless, for different months/lags, CPE is higher for NHN in some regions, although no clear pattern exists.

Discussion
In this work, negative correlations between SPEI and NHD/NHN occur mostly in the first period.The change in the sign of the correlation indicates a change in the dependence of the variables, which can be explained by changes in the occurrence of extreme events, as happened in Northeastern Australia [39].On the other hand, there are several climate drivers of heat extremes [76], such as large-scale circulation [52,77], and soil moisture [52,77], and it is possible that these drivers changed from the first to the second period.We have fitted copula functions to model the dependence of non-continuous variables, namely NHN and NHD, with SPEI, which can  result in the presence of ties following the ranking of these variables and in biases in the parameters estimated [72].Nonetheless, the biases are lower when variables present a medium dependence [72], which is the case in most of the study area.The results obtained here with copula functions are in agreement with the results previously obtained using a correlation analysis [34][35][36][37], namely the high CPE obtained here for lag 0 in most of the study area, and for larger lags, different regions showing higher CPE for different months.Although all indices used here include temperature in their computation, in many regions the maximum conditional probability of exceedance for lag 0 is found with a time scale larger than 1 month, which means that the temperature considered corresponds to different periods, and therefore the obtained CPE is not due to the use of temperature values from the same month.Nonetheless, the use of copula functions allows to model the bivariate distribution of SPEI and NHD/NHN and capture linear and non-linear relationships between them, unlike a correlation analysis.The higher CPE obtained for the case of drought conditions, compared to non-drought conditions, point to a relevant effect of the drought conditions in the occurrence of hot events [26].Also computed this difference for Australia, using quantile regressions, but the small number of samples only allowed to assess the difference in a small area of the country.This is also overcome by using copulas, namely by simulating a large enough number of samples with the same dependence structure as the observations.The high CPE values obtained with NHD for lag 0 and for larger lags in December in the eastern coast, and for lag 1 in February in the north, are in agreement with the strong correlations obtained previously in a similar period (1979-2019) [37].These regions have also been identified as having a strong correlation between drought conditions and extreme temperatures [26,35,36], but these authors did not study the southern hemisphere summer months separately, and therefore it was not possible to see that the months of December, January and February do not present the same patterns at the same time.Particularly, for lag 1, in January CPE is much higher in the southeast and in February CPE is much higher in the north.These results reinforce the advantage of studying the different months separately, already identified by [37], and allow to identify regions with higher and lower probability of compound events at different moments.Some regions where a significant correlation was identified by [37] now present relatively low values of CPE, as is the case of the month of January for lags larger than 0. Nonetheless, the difference between CPE for drought and non-drought conditions is positive, pointing to an effect of the drought conditions on the occurrence of extreme temperatures.The case of the larger CPE obtained in the southeast in December for lag 3 in the first period, when compared to lag 2, is similar to the results of [37] for the period 1950-2019.A similar result was also found by [30] in the Mediterranean area, namely a higher correlation was found between SPEI in May and NHD/NHN in the summer, and also between soil moisture in May and NHD/NHN in the summer, than for the case of SPEI and soil moisture in June.Moreover, a higher correlation was found in some regions between soil moisture in May and NHN in the summer than between the concurrent soil moisture and NHN in the summer.
The CPE values obtained point to a very high probability of temperature extremes given drought conditions in Australia and reach values higher than in other regions where copula functions were applied, such as the Iberian Peninsula [47], and the United States [55].Very high CPE values in Australia were also obtained by [39], for concurrent drought and hot events.Moreover, the use of a probabilistic approach corroborates that for lags up to 3 months, in some regions, drought conditions increase the likelihood of hot events, as obtained in some previous works [26,34,37,48] making Australia a hotspot for (contemporaneous and non-contemporaneous) compound drought and hot events.
The results show an increase in CPE from 1950-1978 to 1979-2020 in most of the study area, which indicates a higher probability of occurrence of hot events given drought conditions.[37] have already shown a stronger correlation between SPEI and NHD/NHN in 1979-2019, compared to 1950-2019.An increase in compound drought and hot events has also been identified in Australia by [40] for the period 1990-2019, compared to the two previous 30 year-periods, and by [78], for the period 1989-2020 compared to the period 1958-1988, but these authors only analysed concurrent events occurring on an annual or seasonal time scale.Other works that identified increasing frequencies of compound drought and hot events only considered the concurrent events, either on a monthly or yearly time scale [31-33, 38, 39], and here we show that the effect of antecedent drought conditions on hot events at monthly scale also increased in most of the study area.Nonetheless, some regions present a decrease in CPE, which is consistent with results obtained by [77], which used the time periods 1951-1984 and 1985-2018, that are very similar to the ones used in this work.More specifically, [77] showed regions with a very small increase or even a small decrease in eastern Australia, and we obtained a decrease in CPE in the same area in February.It is likely that the use of a higher time resolution in our analysis allowed to detect the decreases in CPE that were otherwise masked by the seasonal or annual time scales, such as the summer season and annual time scales used by [40], and the extended summer season (November to March) used by [78].
Many authors use only NHD or maximum temperatures when performing analysis similar to the one performed here [26,32,35,36,47,78].Nonetheless, night-time temperatures have been associated with increased health risk [79,80], with the severe 2007 wildfire season in Greece [18], and with an increased night-time fire intensity, and therefore understanding the dependence of NHN with drought conditions is also relevant.NHD and NHN were used in correlation analysis with SPEI in the Mediterranean [30], where in some cases the correlations obtained with NHN were stronger compared to NHD, and in Australia [37], where the results obtained with NHN did not outperform those obtained with NHD.In here, we show that there is added value in using NHN with copula functions in Australia, since the results show that this index is also influenced by drought conditions, and in some regions, CPE is higher when computed with NHN.

Conclusions
The probability of the occurrence of compound drought and hot events in Australia was computed, using copula functions and the indices SPEI, NHD, and NHN, in the periods 1950-1978 and 1979-2020.The CPE was computed at the concurrent and previous 1 to 3 months, for the months of December to February.The main findings of this work can be summarized as follows: • CPE is very high for lag 0 throughout Australia, reaching values that are higher than in other regions of the world.
• CPE is higher at lag 0, for NHD and NHN, than for the remaining lags.In some cases, CPE is relatively high at lags larger than 0, pointing to a large effect of previous drought conditions on hot extremes.
• Although both increases and decreases in CPE occurred on all months and lags, CPE increased on most of the study area from 1950-1978 to 1979-2020 on almost all months and lags, whereas CPE decreased more frequently at larger lags.
• There is a higher probability of occurrence of hot events following drought conditions, when compared to non-drought conditions, showing the influence of drought on the occurrence of hot extremes.
• Although usually only maximum day-time temperatures are considered, there is added value in using both NHD and NHD, since they are both affected by drought conditions, and extreme night-time temperatures can also have adverse impacts, such as in health and in wildfire risk.
The results obtained reinforce that there are different spatial patterns of the probability of compound hot and dry events on different months.These results, combined with the likely increase in the frequency of compound dry and hot events, highlight the need to understand the effects of these extreme events in important sectors, such as the aforementioned health and wildfires, but also in agriculture.

Figure 1 .
Figure 1.Maximum CPE, given drought conditions.Regions classified as desert are shown in grey.Pixels with a negative correlation between mirrored values of SPEI and NHD are shown in white.

Figure 3 .
Figure 3. Difference between the maximum CPE on the periods 1979-2020 and 1950-1978.Regions classified as desert are shown in grey.Pixels with a negative correlation between SPEI and NHD are shown in white.

Figure 4 .
Figure 4. Difference between maximum CPE given drought (SPEI>80th percentile) and non-drought conditions (SPEI<80th percentile).Regions classified as desert are shown in grey.Pixels with a negative correlation between SPEI and NHD are shown in white.

Figure 6 .
Figure 6.Difference between maximum CPE of NHD and NHN, given drought conditions.Positive values indicate a higher CPE value obtained with NHD.Regions classified as desert are shown in grey.Pixels with a negative correlation between SPEI and NHD are shown in white.

Table 1 .
Area (%) of increase of CPE, given drought conditions.

Table 2 .
Area (%) showing a positive difference between maximum CPE of NHD and NHN, given drought conditions.