Optimal COVID-19 infection spread under low temperature, dry air, and low UV radiation

The COVID-19 pandemic, caused by the novel coronavirus SARS-CoV-2, is spreading rapidly throughout the world, causing many deaths and severe economic damage. It is believed that hot and humid conditions do not favor the novel coronavirus, yet this is still under debate due to many uncertainties associated with the COVID-19 data. Here we propose surrogate data tests to examine the preference of this virus to spread under different climate conditions. We find, by mainly studying the relative number of COVID-19 deaths, that the disease is significantly (above the 95% confidence level) more common when the temperature is ∼10 °C, the relative humidity is ∼60%, the specific humidity is ∼5 g kg−1, and the ultraviolet radiation is less than ∼50 kJ m−2 (per hour). We also find, but less significantly, that the relative number of COVID-19 deaths is high when the wind is weak and low when the wind is strong. The results are supported based on global and regional data, spanning the time period from January to December 2020. The COVID-19 data includes the daily reported new cases and the daily deaths; for both, the population size is either taken into account or ignored.


Introduction
The COVID-19 pandemic has quickly spread throughout the entire world, with a high cost in lives and severe economic damage. As of December 21, 2020, the novel coronavirus, SARS-CoV-2, has infected over 77 million people (confirmed cases) and caused 1.7 million deaths globally. Since the disease mainly spreads by human breath through air droplets, and with the current initial stage of the vaccine distribution and lack of other effective treatments, the main practices to prevent the spread of COVID-19 include social distancing, sanitation, masks, and reducing human mobility by adopting strict quarantine policies. However, full mobility limitations in an entire country can only be made for short periods, mainly due to high economic costs [1,2].
The patterns of human mobility and their effect on the spread of COVID-19 have been studied for the initial stage of the pandemic [3]. In the early stage of the virus's spread in China, it seemed as if the spread could be explained by the mobility patterns; i.e. Lévy flight [4] patterns. These patterns are characterized by many short movements, together with a few very long range movements according to a power law distribution. Yet, human mobility patterns alone cannot explain the virus's continued spread to other regions around the globe. This is mainly because the spread of the virus is not evenly distributed in different countries and does not follow the distribution of population around the globe. This mismatch can be easily seen by comparing the global population distribution to the distribution of the COVID-19 (infected and death) cases ( figure 1(a)). This can also be seen by plotting the number of COVID-19 cases versus population . It is apparent that in the NH (panels (e), (g) and (i)), the number of new COVID-19 cases usually decreases as the temperature increases, while the opposite occurs in the SH (panels (d), (f) and (h)). Yet, there are counterexamples in both hemispheres, (e.g. Australia in the SH and India in the NH), indicating a non-trivial relation between the spread of COVID-19 and temperature. The temperature data is obtained from the ERA5 reanalysis [29,30].
(figure 1(b)). While there is a general increase in COVID-19 cases as a function of population size, the range of (infected and death) cases spans several orders of magnitude, indicating a mismatch between the population size and COVID-19 spread. Moreover, the number of confirmed COVID-19 cases is not closely related to the number of COVID-19 deaths as can be clearly seen from spread of points between the two dashed lines in figure 1(c), covering more than one order of magnitude. This large spread is probably due to biases associated, for example, with in the number of COVID-19 tests which vary with time and from one country to another.
One possible explanation for the mismatch between COVID-19 cases and population distribution is related to atmospheric conditions [5][6][7][8]. It is apparent that most of the northern hemisphere (NH) countries have passed the first wave of the disease towards the NH summer of 2020 while, currently, towards the NH winter of 2021 (figure 1(i)), they are experiencing a second wave of COVID-19 spread (figures 1(e) and (g)). At the same time, southern hemisphere (SH) countries experienced a delayed burst of COVID-19 (figures 1(d) and (f)) with an increasing number of COVID-19 cases during the SH winter and with reduced number of COVID-19 cases towards the SH summer. This delay may be also related to the larger effective distance of SH countries to Mainland China [9]. This suggests a possible link between temperature (and other climate variables) and the spread of COVID-19. Yet, there are counterexamples in both hemispheres (e.g. India in the NH and Australia in the SH).
It has been shown [10] that high temperatures (>38 • C) and high relative humidity (>95%) disrupt SARS-CoV-1 viability and activation; SARS-CoV-2 is the coronavirus genetic variant that stands behind COVID-19. We note however that the above temperatures and relative humidity values are not necessarily the same for the SARS-CoV-2 variant. It is also known that other types of viruses, such as influenza, have higher activation levels in colder weather; the reason for this, however, is still under debate [11,12]. Currently, after almost 10 months of the pandemic, it seems that the seasonal cycle does not affect COVID-19 in the way it affects influenza but also it seems that the seasonal cycle does affect the virus spread globally.
The effect of the atmospheric conditions on COVID-19 spread is still under debate. The origin of this debate is related to many biases in the COVID-19 data, which make it very difficult to compare one place to another. The biases include different healthcare capabilities in different countries, different numbers of COVID-19 tests administered in different countries and different times (i.e. variable number of COVID-19 tests over time), partial or possibly intentionally incorrect information published by some countries, different age pyramids, and different countermeasures and human mobility restrictions. Moreover, some studies reported a high COVID-19 replication rate under colder conditions [5], while other studies claimed that infection rates increase only with temperature and are negatively correlated with humidity [6]. Similarly, a study of 429 cities in China [8] found an increased risk of spread in a narrow temperature range and that both high and low humidity rates are associated with higher reproduction rates [7].
Most of the studies on the effect of climate conditions on the spread of COVID-19 have concentrated on temperature and humidity and, more importantly, are only limited on a regional scale. However, UV radiation has received much less attention. First, we note that the virus causing COVID-19 can remain active for almost three days on surfaces [13] and also that other corona-viruses are highly sensitive to UV radiation [14,15]. Artificial disinfection by UV radiation takes about 15 min, and UV radiation is commonly used as a germicidal disinfectant, both directly and indirectly; we note however that the wavelength of germicidal UV is primarily ∼254 nm and that this wavelength is absent at the surface of the Earth [16]. For example, experiments investigating the effectiveness of non-direct artificial UV radiation that were installed in hospital rooms reported a reduction in tuberculosis of almost 80% [17]. While most artificial UV disinfection lights range from 250-305 nm [18][19][20][21][22] and 290 nm wavelengths barely reach ground level [23], the effectiveness of virus disinfection by longer wavelength UV radiation (that do reach ground level and penetrate the atmosphere) is proven by SOlar water DISinfection (SODIS) [24][25][26][27]. This method of water disinfection is effectively used to disinfect water against the rhinovirus (common cold), polio virus, and norovirus. It is used in the developing world daily, for water purification, in more than 2M houses. In contrast to the 15 min disinfection period, SODIS disinfects water by exposing it to UV from the Sun for over 12 h. Indeed, a recent modeling study suggested reduced COVID-19 infection during the summer due to the relatively strong UV radiation from the Sun at ground level [28].
Following the above summary, the goal of this study is to quantify the effects and significance of climate variables (temperature, specific and relative humidity, UV radiation, and wind speed) on the spread of COVID-19, using surrogate data tests. The proposed tests are based on a random shuffling of climate records from different locations on the globe and a comparison of the shuffled data to the original data (e.g. relative number of COVID-19 deaths). We applied the tests to both global data and data from individual regions.

Data and methods
We extracted climate variables from the ERA5 reanalysis (ECMWF reanalysis 5th generation) database [29,30]; ERA5 is a high spatial (1/4 of a degree) and temporal (hourly) resolution database that includes many multi-level climate variables. We focus on surface level data of 2 m temperature, (unweighted) downward UV radiation (in the range of 200-440 nm) at the surface, 1000 hPa relative and specific humidity, and surface (10 m high) wind speed. The downward UV radiation is the accumulated radiation per hour and it is given in units of kJ m −2 . We tested whether these climate variables can be associated with the spread of COVID-19. We used the hourly data to extract the daily mean and daily maximum values of the different climate variables.
The COVID-19 data is obtained from the Johns Hopkins COVID-19 GitHub repository [31], and the demographic data is obtained from SEDAC of the US National Aeronautics and Space Administration [32]. The COVID-19 data includes the number of confirmed cases, the number of active cases, the number of severe cases, the number of deaths, and more. Here, we focused on the daily deaths and the number daily confirmed new cases. We used the daily deaths, along with the daily confirmed cases, since presumably there are infected people that are not tested, but this happens less often for deaths. Furthermore, the number of tests differ between developed and developing countries, affecting the reported number of infections, while the number of deaths is not subject to this bias. The data is mostly provided on a resolution of an entire country (e.g. Germany and Italy), and in some cases, for different provinces and regions within a country Columns (from left to right) present the pdf versus temperature, relative humidity, specific humidity, and UV radiation. The blue curves depict pdfs of COVID-19 relative deaths recorded over the period of 23/1/20-12/12/20 as a function of 14 days backward mean climate variables. The figure also depicts the corresponding distributions of the shuffled (location) surrogate data where the median (solid black line) and the 5%-95% confidence interval (shaded gray area) are plotted. Importantly, the peaks for temperature around 10 • C, specific humidity around 5 g kg −1 , and UV radiation below 50 kJ m −2 are significant, falling well above the 95% surrogate level, suggesting that the COVID-19 virus has a tendency to be more effective at these temperature, specific humidity, and UV values. Moreover, the COVID-19 virus is significantly less common when temperature and specific humidity are high. The title in each panel indicates the values of the separation measures between the original data pdf and the surrogate data pdf: M: maximum probability difference between the pdfs; A: the area between the original pdf (blue curve) and the confidence interval of the surrogate data pdfs (gray area); and O: the 1 minus overlap between the original pdf and the mean pdf of the surrogate data. The larger the separation value (of all measures), the greater the separation.
(e.g. UK, Canada, China, and Australia), and for cities within a country (such as the US). We used the regional COVID-19 data when possible.
We mainly analyzed the normalized number of deaths by considering the number of cases per 1000 inhabitants. We concentrated on countries/states/provinces whose population is larger than half a million. The population normalization was performed in order to filter out the population size effect, and we found results that are similar to the results without population normalization; see figure 2 and figure S1 (https:// stacks.iop.org/NJP/23/033044/mmedia). In addition to the analysis of daily new COVID-19 deaths, we also analyzed the number of daily confirmed COVID-19 cases.
The different climate variables were interpolated to the reported locations of COVID-19 cases. Then, for each location and date, we calculated the past d-days mean (either of the daily mean or daily maximum) and the d-days lag values of the climate variables. The rationale behind the d-days mean operation is that new COVID-19 deaths may occur after a varying number of days, somewhere between a few days (from the infection time) to more than 1 month at some cases-the mean operation crudely reflects this temporal spread. The d-days lag operation aims to examine the other extreme, unrealistic alternative in which new COVID-19 deaths occur after a fixed number of days after the infection. The daily-mean and daily-max procedures aim to test whether the extreme values of temperature/humidity/UV affect the virus spread or the accumulated daily value (which is reflected by the mean operation). We examined different time lags and temporal mean periods and found a typical span time of d = 14 days (not shown). The typical span time for the number of confirmed cases is d = 7 days. We note however that other choices of time lags yielded similar results. Generally speaking, the daily mean d-days mean procedure yielded better results in comparison to the d-days lag procedure.
We developed surrogate data tests (inspired by reference [33]) to study whether COVID-19 spread favors a certain range of climate attributes. The common practice is to assume a NULL hypothesis and to design statistical tests that will either falsify or confirm this hypothesis. In our case, the NULL hypothesis is that the climate attributes are not related to COVID-19 spread. If this is indeed the case, the spread should not be affected by the climate conditions of a certain location. Thus, we shuffled the locations of the reported COVID-19 cases, keeping the time series of the cases unaffected but using the climatic time series of other random locations; i.e. the COVID-19 time series where analyzed with respect to climatic time series from other, randomly selected, COVID-19 locations. The shuffling operation can be repeated many times. If the NULL hypothesis is valid, the resulting distribution of the shuffled data should be similar to the distribution of the original data. If it is significantly different than the original distribution, the NULL hypothesis is rejected, and the climate variable is proven (to a certain confidence level) to affect the spread of COVID-19.
We proposed and implemented three methods to generate random locations of COVID-19 cases. In the first method discussed above, the reported locations of the virus cases were shuffled, i.e. we analyzed the number of COVID-19 cases that were recorded in a particular location, with climatic records from a different, randomly selected location, from the list of the locations of reported cases; see figure 2. In the second method (see figure S6 of the SI), we randomly chose the longitude of the reported COVID-19 locations but kept the latitude unchanged; the random longitudes were restricted to be over land. In this way, we kept the original seasonality which is based on latitude coordinates, yet studied the sensitivity of the results to the climate variability along the original latitude. In the third method (see figure S7 of the SI), we randomly chose both the longitude and latitude and used the climate records of the random locations instead of the climate records of the original locations. The new random locations were chosen to be evenly distributed over land. We note that the proposed analysis is unbiased and the null hypothesis can be either confirmed or falsified.
We constructed the probability density function (pdf) of the COVID-19 cases using the various climate variables; see figure 2. Then, the pdf of the original data was compared to the pdf of the surrogate data. We generated 200 surrogate time series for each reported COVID-19 location and then estimated the 5% and 95% confidence level. High separation between the pdf of the real data and the pdf of the surrogate data implies a higher dependency of COVID-19 on the climatic variables. We quantified the dissimilarity between the pdfs of the real data and the surrogate data in three ways: (a) the maximum probability difference (i.e. the difference in probability density times the bin size) between the real data pdf and the 95% confidence level of the surrogate data pdf-we use the letter 'M' to represent this separation measure. (b) 1 minus the overlapping area between the original data pdf and the mean pdf of the surrogate data-we use the letter 'O' to represent this separation measure. And (c) the sum of the areas that are bounded between the original data pdf and the 5% and 95% confidence level of the surrogate data pdfs (i.e. the area between the blue curve in figure 2

Results
We summarize the main results in figure 2 where we plot the pdf of the relative number of COVID-19 deaths (i.e. number of deaths per thousand) as a function of temperature (first column), relative humidity (second column), specific humidity (third column), and UV radiation (forth column). The different geographic regions include the entire globe (first row), the globe excluding the US (second row), and North-America (third row). We used the daily mean climate data of a 14 days backward mean. The results are consistent over the different regions and suggest a preference for spread in a relatively narrow range of temperature and specific humidity. The maximum of the probability density is well above the 95% confidence level for a relatively low temperature (around 10 • C) and low humidity (around 60% relative humidity and 5 g kg −1 specific humidity). Thus, the NULL hypothesis that temperature and humidity are not related to the occurrence of COVID-19 is rejected with high confidence. In some cases, it is below the 5% confidence level for higher temperature and higher (relative and specific) humidity. One possible explanation for the cold and dry weather preference may be the virus's poor viability in higher temperatures and humidity. As shown in reference [10], SARS-CoV-1's survival and activation levels in high temperatures and high humidity are also poor; this is probably also valid for the SARS-CoV-2 virus. Another explanation for the cold/winter weather preference may be the relatively lower levels of UV radiation from the Sun during this season.
The analysis of UV radiation is shown in the fourth column of figure 2 and indicates that the number of COVID-19 relative deaths is above the 95% confidence level when the UV radiation is less than ∼50 kJ m −2 (per hour); thus, the NULL hypothesis that the UV radiation is not related to COVID-19 is rejected. Since the virus lives on steel and plastic surfaces for several days [13,34,35], it is possible that sites with lower UV radiation levels will suffer from a longer survival time of the virus on surfaces, leading to higher infection The number of cases is not taken into account here so that the results reflect the changes in temporal atmospheric variables, mainly resembling the seasonal cycle of the NH (as most of the locations of the COVID-19 cases are from the NH). This seasonal trend is seen in (a), (c), and (d) but is most clearly reflected in the UV radiation (panel (d)), which is high during the NH summer. The relative humidity does not exhibit seasonal trend. Lower panels: pdfs (colors) of COVID-19 relative deaths versus time and temperature, relative humidity, specific humidity, and UV radiation. Here we use a daily mean and 14 days backward mean climate variables.
rates. Also, vitamin D, which is needed for the activation of the lungs' immune system, requires UV radiation for its formation; thus, lower exposures to UV radiation [36] may reduce its production. Our results support the modeling study of reference [28].
The saturated water vapor pressure can be determined by using the Clausius-Clapeyron relation [37]. This relation yields an exponential relation between the air temperature and the saturated water vapor pressure. The specific humidity is a measure for the amount of water vapor in the air (e.g. in grams of water vapor per kg of air) while the relative humidity is the ratio between the measured specific humidity and the maximum specific humidity. It turns out that the relative humidity exhibits weak (if at all) seasonality such that the specific humidity is expected to have seasonal cycle with higher value during the hot season. Figure 3 shows the evolution with time of the pdfs of the temperature, relative and specific humidity, and UV radiation. The daily mean values with the 14 days backward mean are used. The preference of COVID-19 for a specific temperature and humidity can be also seen in figure 3. The upper panels present the (weekly) pdfs of temperature, humidity, and UV radiation in the locations of the reported COVID-19 cases, where the relative number of deaths is not taken into account. The seasonal trend toward warmer, more humid, and higher UV radiation levels during the NH summer is clearly seen. This is expected as the majority of the COVID-19 cases were reported in the NH such that the seasonal trend reflects the NH seasonal cycle. In comparison, the lower panels of figure 3 present the weekly pdfs of COVID-19 relative deaths as a function of temperature, relative and specific humidity, and UV radiation. In contrast to the top panels, the COVID-19 pdfs are more skewed with a single narrow maximum that follows, to some extent, the seasonal trends shown in the upper panels. We note that the relative humidity does not exhibit a clear seasonal trend. The pdf of COVID-19 relative deaths as a function of temperature also peaks around 10 • C at the first part of the pandemic and then switches to ∼25 • C toward the NH summer, mainly as a result of the late COVID-19 burst in India and Brazil. The situation is somehow similar for the specific humidity which peaks around 5 g kg −1 at the first part of the pandemic. The transition from the first part of the pandemic to the last part can also be seen by comparing the pdfs of the first three months of the COVID-19 pandemic (23/1/2020 to 22/4/2020) to the last three months (13/9/2020 to 12/12/2020) where the preferred temperature of 10 • C and the preferred specific humidity of 5 g kg −1 is clear in both the first and last periods (figures S3 and S4); these preferred values are less clear during the NH summer.
In addition we analyzed the relative number of COVID-19 deaths as a function of surface (10 m) wind speed; see figure S5. We find high probability of relative number of deaths for weak winds (∼4 m s −1 ) and low probability for strong winds (∼7 m s −1 ). This preference is valid for different regions (figures S5(a)-(c)) and seems to be stable with seasons ( figure S5(d)). This observation may be due to the ability of strong winds to spread the air and hence the virus away far from centers of population. This is in accordance with the observation that atmospheric pollution is usually low under strong winds; see, e.g. reference [38].
To further reconfirm the results reported above of enhanced COVID-19 spread for a preferred temperature, relative humidity, specific humidity and UV radiation, we also analyzed the relative number of COVID-19 confirmed cases. The results are shown in figure 4 and exhibit similar results as for the relative number of COVID-19 deaths seen in figure 2. Yet, the confirmed cases results are less significant, probably due to biases associated with the number of tests which become more available with time.
The results described above are based on the relative number of COVID-19 deaths and confirmed cases, i.e. the number of COVID-19 cases divided by the size of population. The normalization by population aims in taking into account, very roughly, the population density. Still, it is interesting also to check the results without the normalization by the population size, as clearly in most countries the situation is far from the herd immunity ratio (see figure 1(a)). The results are shown in figure S1 and are similar to the results when the normalization by population is performed ( figure 2). This strengthen our conclusion regarding the preference of the COVID-19 virus to develop under cold and dry conditions. We also analyzed the number of confirmed COVID-19 cases without normalization by population and here there is no significant preference to specific value of temperature and humidity ( figure S2). This is probably due to the biases associated with the number of COVID-19 tests versus time.
In addition to the surrogate method presented here, we performed two other surrogate data tests to support the effect of climate variables on the spread of COVID-19. In the first surrogate test, we randomly chose the longitudes (over land) but kept the original latitudes of the COVID-19 cases. This aims to find a possible preference of COVID-19 for specific climate conditions, while keeping the seasonality (as the original latitude is maintained). In the second, additional, surrogate approach, both the longitudes and latitudes were chosen randomly where the random locations are evenly distributed over land. This aims to test the spread of COVID-19 with respect to global (continental) mean climate conditions. The results of these methods are shown in the SI (figures S6 and S7) and indicate that COVID-19 spread is affected by temperature, relative and specific humidity, and UV radiation, supporting the results reported above.

Discussion and conclusions
The fast spread of COVID-19 has resulted in a global turbulence of fear, uncertainty, and social distress. In some countries, severe quarantine restrictions were imposed, changing people's lives entirely. Then, the restrictions were slowly removed and imposed again, depending on the severity of the pandemic. Yet, even the most infected places in the world are far from reaching the herd immunity percentage [39,40]; see the dotted line of figure 1(b). While it is clear that the most effective strategy (beside the vaccine) to fight the spread of COVID-19 is quarantine restrictions, our results, based on the rejection of the NULL hypothesis that COVID-19 is not related to climate conditions, indicate that high temperature, high specific humidity, high UV radiation, and strong winds can slowdown the spread of the disease. Our results might explain the delayed first burst of COVID-19 in some warm climate countries and SH countries which entered the winter when the NH entered the summer. We conjecture that the situation could have been much worse if the climate in countries such as India and Brazil, which experienced late bursts of COVID-19, was colder and drier. Moreover, it is plausible that the climate, which is warmer humid during the NH summer, helped to cope with the pandemic in many places. The current second wave of COVID-19 in many NH countries also supports our conjectures.
Generally speaking, the different climate variables we consider here are interconnected and may influence each other. Yet, the correlations between them is not trivial. For example, at the high latitudes during the summer, the shortwave radiation is higher than the radiation at the low latitudes and the temperature is lower, despite the higher shortwave radiation of the high latitudes during the summer. Another example is radiation and temperature at low elevations in comparison to the radiation and temperature at high elevations-for the former case the temperature is usually higher despite the lower radiation. Because of this we analyzed several climate variables and the different variables yielded different level of significance for different range of values.
The implications of our results demand attention since the most important factor regarding the spread of COVID-19 pandemic is the social distancing. Yet, in the current period in the month of December 2020 it seems as though most NH countries experience a second wave of the pandemic; the most infected country, the USA, is still in the midst of its fight. In contrast, SH countries (excluding, e.g. New Zealand) are manifesting a plateau in their infection curve (see figure 1), probably due to strict social distancing policies. Based on our results, we conjecture that in some marginal cases when the growth rate of the pandemic is close to 1, dry and cold climate conditions can accelerate the spread of the pandemic while warm and humid conditions can slow it down.

Data availability statement
No new data were created or analysed in this study.