Prediction of the number of heatstroke patients transported by ambulance in Japan’s 47 prefectures: proposal of heat acclimatization consideration

The incidence of heatstroke is affected by various meteorological variables. However, previous studies in Japan have mainly investigated and adopted a single temperature metric or composite index for their analyses. Herein, we conducted a time series study through multivariate analysis of different weather conditions simultaneously, in order to analyze the relative importance of meteorological variables to determine the number of heatstroke patients transported by ambulance in all of Japan’s 47 prefectures. We proposed a method that considers heat acclimatization, which has been found to impact the heatstroke, by manipulating certain meteorological variables. For the heatstroke data, we utilized the secondary data provided by the Fire and Disaster Management Agency, Japan. The time period considered was from May 2015 to September 2019. All calculations were performed using R 3.5.1. For the analysis, the machine learning method of random forest (RF) was applied. The results showed that the relative temperature (RelTemp), which represents heat acclimatization, had the highest ranking among all the meteorological variables studied. Then, we developed the exponential model and the RF model to predict the number of heatstroke patients transported by ambulance by adopting the highly ranked meteorological variables including RelTemp as explanatory variables. To confirm the effectiveness of heat acclimatization, we also developed the exponential model and the RF model both without RelTemp (instead, with maximum temperature). According to the results, the R2 values of the exponential and the RF models, including RelTemp, were 0.76 and 0.74, respectively, and those of the exponential and the RF models, excluding RelTemp, were 0.68 and 0.67, respectively. We confirmed the effectiveness of considering heat acclimatization via RelTemp and found that the exponential model with RelTemp provided the higher accuracy. Better predictions by the exponential model with RelTemp would contribute to better preemptive allocation of ambulances and medical staff in medical facilities.


Introduction
Globally, the air temperature has been increasing at a rate of 0.74°C/100 years, whereas, in Japan, it has been rising at a rate of 1.24°C/100 years, as a result of climate change [1]. Higher air temperatures have been observed in urban and metropolitan areas as a result of urban heat islands [2]. In Tokyo, the most populated city in Japan, the air temperature has been increasing by 3.2°C/100 years owing to the combined influence of climate change and the urban heat island effect [1]. These temperature increases cause several impacts on society and nature [3], including increased health risk from heatstroke.
The Fire and Disaster Management Agency, Japan (FDMA)-in cooperation with prefectures-has been collecting data on the number of heatstroke patients transported by ambulance since 2008 to monitor the incidence of heatstroke [4]. The following is the definition of heatstroke: 'a general term for a disorder that occurs when the body's water and salt (e.g., sodium) levels become unbalanced due to a breakdown in the body's ability to regulate body temperature in a hot environment, and includes phenomenon, such as sunstroke, heat cramps and heat exhaustion' [5]. As these heat stroke data are publicly available on the FDMA website, they are mainly used to conduct heatstroke-related studies in Japan. The number of heatstroke patients transported by ambulance in 2019 was 71,317; 126 deaths were recorded [6]. It is projected that the number of heatstroke patients associated with future temperature increases by climate change will rise [7,8].
To prepare countermeasures against heatstroke, such as better allocations of ambulance and medical staff working in emergency medical facilities [9,10], understanding of the number of heatstroke patients in advance is crucial. There have been several studies analyzing the relationship between temperature and ambulance dispatches related to heat-illness morbidity in different regions [11][12][13][14][15][16][17][18][19]. One study found that the number of heatstroke patients transported by ambulance in Japan has a significant positive correlation with the daily mean or maximum air temperature [20]. Hoshi et al [21] showed that the incidence rate has a significant positive correlation with the daily maximum temperature (MaxTemp) and the Wet Bulb Globe Temperature (WBGT [22]). A model to calculate the incidence rate using the daily mean temperature with correction terms of daily minimum and maximum temperatures is developed [9]. A model has been developed that incorporates the WBGT components [23].
The incidence of heatstroke can be attributed not only to the meteorological factors on the incidence day, but also to those on the preceding days (i.e., the lag effects or cumulative effects of the heat load) [22][23][24]. Furthermore, the incidence is larger in the early summer than in the late summer, even at the same daily maximum temperature. One of the factors contributing to heatstroke is heat acclimatization [24][25][26]. A model developed by an earlier study considered the cumulative effect of heat load by adopting not only the ambient temperature or WBGT of the incidence day but also those of successive days [27]. This model also considered heat acclimatization by counting the number of days from the end of the rainy season as the number of patients transported by ambulance increases after the end of the rainy season in Japan [28]. These studies, however, mainly investigated and employed a single temperature metric (e.g., daily mean/maximum temperature) or a composite index (i.e., WBGT) in their calculations. Thus, a developed and multivariate analysis of different weather conditions simultaneously has not been conducted. Furthermore, very few studies have considered the heat acclimatization effect in Japan in relation to meteorological factors [27].
Zhang et al [29] performed a multivariate analysis of different weather conditions simultaneously and heatrelated mortality by applying the robust statistical learning method of random forest (RF) to rank the relative importance of 45 meteorological variables in US cities simultaneously. In another study, the importance ranks of model parameters including meteorological variables (11 variables) and socioeconomic status variables are identified to calculate the heatstroke incidence in Chinese cities using RF [30]. However, in these studies, the effect of heat acclimatization was not considered.
The multivariate analysis of different weather conditions simultaneously has not been sufficiently conducted in Japan. Therefore, we analyzed the relative importance of meteorological variables to determine the number of heatstroke patients transported by ambulance using the RF method. This is the first study that uses the RF to determine the number of heatstroke patients transported by ambulance in Japan and to conduct a multivariate analysis with a large number of meteorological variables. Furthermore, methods to consider heat acclimatization have also been proposed as few studies have considered the effect. Then, to analyze the effectiveness of considering heat acclimatization, we developed the exponential model and the RF model to predict the number of heatstroke patients transported by ambulance by adopting the highly ranked meteorological variables as explanatory variables.
In addition to meteorological factors, socio-economic conditions, geographical conditions, and humanrelated factors also affect the incidence of heatstroke [30]. In this study, however, we focused only on meteorological factors, as our objectives were to determine the following. (1) Which meteorological variable (among various meteorological variables) is the most important causative factor of heatstroke?(2) How accurately can the observations be predicted when only meteorological variables are considered?

Material and methods
In this time series study, we analyzed the correlation between heatstroke data and meteorological variables. The data and method adopted in this study are described in sections 2.1 and 2.2, respectively.
The analysis flows in this study are summarized in figure 1. All calculations were performed using R 3.5.1 (R Development CoreTeam 2018).

Heatstroke data
Regarding the data of the daily number of heatstroke patients transported by ambulance for each prefecture, we used the secondary data collected and provided by FDMA [4]. The FDMA data for 2019 show that the number of heatstroke patients aged 64 years or older comprised 52% of all heatstroke patients [4]. This could be a result of their physical vulnerability to hot temperatures [31][32][33][34], a decline in thermal sensitivity [35,36], tendency to not hydrate often, and less frequent use of air conditioning [26,32]. Japan is an aging society [37], and therefore the share of the population greater than 64 years old is expected to increase in the future. Thus, countermeasures, especially for those of more than 64 years old, are crucial. Considering this situation, we selected the generation of more than 64 years old as the main target, although all age groups are considered in Supplementary material to confirm the consistency of our methods. In this study, the number of heatstroke patients over the age of 64 in a certain prefecture is divided by the total population over the age of 64 in that same prefecture. The obtained value is converted to the number of heatstroke patients per 10 9 population before introduction into the RF analysis.

Meteorological data
Regarding the meteorological data, we used data from the Automated Meteorological Data Acquisition Data System [38]. AMeDAS is a surface observation network system operated by the Japan Meteorological Agency (JMA) for gathering regional weather. For each prefecture's meteorological variables, the AMeDAS stations located in each prefectural capital were used. Seven meteorological variables (daily maximum/minimum/mean temperatures, precipitation, total solar radiation, wind speed, and relative humidity) were obtained from the AMeDAS stations. Of the 47 prefectures, the data on total solar radiation and relative humidity were not observed at 15 and 2 AMeDAS locations, respectively. Thus, as an alternative, data from the model coupled cropmeteorological database (MeteoCrop DB) Ver. 2 [39], developed by the National Institute for Agro-Environmental Science (NIAES), were used. Regarding the daily maximum/mean WBGTs, we used the secondary data from the Heat Illness Prevention System developed by the Ministry of the Environment, Japan [40] .
In addition to the meteorological factors on the incidence day, those on the previous days (i.e., the lag effects or cumulative effects of the heat load) may contribute to the incidence of heatstroke. To consider this effect, we examined the meteorological variables on the incidence day as well as on the three preceding days. This method has been used in previous studies [25,29,30].
Furthermore, we proposed a method to consider heat acclimatization by introducing a 'relative temperature' parameter. 'Relative temperature,' is not only the temperature of the incidence day but also considers the cumulative temperatures from May 1 to the incidence day. The proposed equation is as follows: where T n rel is the relative temperature of the incidence day, T n obs is the daily mean temperature of the incidence day, and n is the total number of days from May 1 to the incidence day. The ratio of T n obs / ( T n i n i obs 1 å = / ) shows how high the daily mean temperature of the incidence day is compared with the cumulative temperatures since May 1. In Japan, temperatures increase from May to August, and then fall again in September [41]. In July, for instance, T n obs tends to be higher than In this case, we assume that people are insufficiently adapted to the heat. However, in September, T n obs tends to be ) to be closer to unity (i.e., T n rel ∼T n obs ). In this case, we assume that people are already sufficiently adapted to heat. This is consistent with the fact that in Japan, the number of heatstroke patients increases after the end of the rainy season (usually mid-July to early August in the main island of Japan [42]) as the human body has not experienced hotter temperatures and consequently has not adapted to such heat in this season.
In addition, to the above-mentioned merits of our proposed method, it could represent heat acclimatization as a meteorological factor using a relatively simple definition, and allows heat acclimatization to be expressed seamlessly over the summer period (May-September) based on a single meteorological factor.

Evaluation using the RF model
In this study, the RF [43] was used to analyze the relative importance between the meteorological variables to determine the potential number of heatstroke patients for each prefecture. The RF can handle a large number of highly correlated variables [29]. We took 10-folds of cross-validation to evaluate the calculation model by randomly choosing 90% of the data as training data and the remaining 10% as testing data. Regarding the parameters of the RF, the number of decision trees (ntree) was set to 500, as this was where the RMSE converges, and the number of variables (mtry) was grid-tuned, wherein its median value for the 47 prefectures was 15. The fitness of this model was calculated using the coefficient of determination (R 2 ) and the root mean squared error (RMSE) between the testing and observation data. We referred to the above calculation as Case 0. These analyses were performed using the CARET package for grid-tuning and cforest in the party package for random forest [44,45] in R 3.5.1. The permimp package [46] were used as the variable importance metric in the cforest; permimp presents a conditional permutation approach and can be used in cases with correlated variables.

Comparison of the exponential model and the RF model
Regarding the number of heatstroke patients transported by ambulance, previous studies [9,20,23,27] mainly investigated and adopted a single or few metrics related to temperature. These studies formed exponential type calculation models as follows: where P is the number of heatstroke patients, T i is the meteorological explanatory variable on the incidence day, a i and b i are the regression coefficients, and n is the number of meteorological explanatory variables. We selected the highly ranked meteorological variables among the evaluated 37 meteorological variables derived by the RF as explanatory variables for equation (2). To estimate the regression coefficients in equation (2), the generalized linear model (GLM) with a Poisson distribution was adopted. The fitness of equation (2) was evaluated using R 2 and RMSE. For this evaluation, we utilized the same training and testing data as in the RF analysis in section 2.2.1 and selected a 10-fold cross-validation.

Results
The study period was limited to 2015-2019 because the number of heatstroke patients transported by ambulance for the month of May was unavailable prior to 2014 and in 2020 on the FDMA website. We considered all the 47 prefectures of Japan separately. Table 1 lists the characteristics of the meteorological variables utilized in this study. In total, 37 meteorological variables were considered. Table 2 summarizes the descriptive statistics for the population, number of heatstroke patients, and daily meteorological means from May to September in 2015-2019 for all 47 prefectures in Japan.

Importance rank of meteorological variables
The importance rankings of the meteorological variables derived from the testing data for Tokyo are shown in figure 2. In the case of Tokyo, the parameter relative temperature (RelTemp) has the highest score of all the meteorological variables. The importance rank of meteorological variables derived by the RF for Japan's 47 prefectures are illustrated in figure 3. The numbers in the cells denote the ranking number for each prefecture (each column). Here, the meteorological variables of the incidence day are highly ranked compared with those of the previous day(s). However, even for the incidence day, precipitation (Prec) was found to be less important than the other variables. Moreover, the average relative humidity (AveRelHum) was not as highly ranked as expected because its contribution to the WBGT is relatively high, wherein 0.7T w +0.2T g +0.1T d , where T w is the natural wet bulb temperature (indicating humidity), T g is the globe thermometer temperature (indicating radiant heat), and T d is the natural dry bulb temperature (indicating ambient air temperature) [22].
The importance rank of the meteorological variables and the appearance rank by integrating the results of the 47 prefectures are summarized in table 3. The numbers in the brackets denote the number of appearances at the corresponding importance rank. The RelTemp has the highest importance rank among all meteorological variables in 42 prefectures. Meteorological variables, such as total solar radiation (Rad), maximum WBGT (MaxWBGT), mean WBGT (MeanWBGT), mean temperature (MeanTemp), maximum temperature (MaxTemp), and average wind speed (AveVel), are also ranked in table 3. Based on these results, the RelTemp is the most important ranked variable among all the meteorological variables for 47 prefectures.
Regarding all age groups, Fig. S1 available online at stacks.iop.org/ERC/3/125002/mmedia (Supplementary material) shows the importance rank of meteorological variables derived using the RF for    Table S1 summarizes the importance rank of the meteorological variables as well as the appearance rank by integrating the results of the 47 prefectures. Again, RelTemp was confirmed to be the highest importance ranked variable among the meteorological variables. Therefore, the results obtained in the case of all generations are consistent with those obtained in the case of the generation aged 64 years or older.

Fitness comparison between models
We predicted the number of heatstroke patients transported by ambulance by the exponential models. We adopted the highest ranked variables as metrics in equation (2). The meteorological variables adopted were extracted from the results in table 3 as follows. Firstly, the top three variables under both 'Importance rank in RF' and 'Rank of appearances in the corresponding importance rank' in table 3 were considered, and the resulting extracted meteorological variables were: RelTemp, Rad, MaxWBGT, MeanWBGT, MaxTemp, MeanTemp, and AveVel. RelTemp, being the highest ranked variable, was prioritized to be adopted in equation (2). MaxTemp and MeanTemp were excluded considering the possible correlations with RelTemp. Furthermore, MeanWBGT and MaxWBGT were excluded as WBGT includes the ambient air temperature component (T d ) in its definition (0.7T w +0.2T g +0.1T d ), and may have possible correlations with RelTemp. Furthermore, Rad and AveVel were additionally adopted in equation (2).
MaxWBGT and MeanWBGT are highly ranked in table 3; if they are not adopted in equation (2), it will lead to the lack of consideration of a potentially important component in WBGT (T w , the humidity component); T g and T d are considered by Rad and RelTemp, respectively. To avoid this, the humidity component was also adopted in equation (2) by AveRelHum. In the end, we considered four metrics of RelTemp, Rad, AveVel, and AveRelHum in equation (2) (Case 1). We also set a case where four metrics of RelTemp, Rad, AveVel, and AveRelHum were adopted in the RF (Case 2) for the comparison with the result of Case 1. All calculation cases adopted in this study are summarized in table 4. Although, Case 0 considers all meteorological variables, the prediction accuracy is not significantly improved compared to Case 1. Table 3. Importance rank of meteorological variables using the random forest (RF) method, and their appearance rank.
Importance rank using the RF method 1st 2nd 3rd 4th 5th Rank of appearances in the 1st Numbers in brackets show the number of appearances at the corresponding importance rank (more than 64 years old). Relative temperature (RelTemp) is written in bold.

Confirmation of effectiveness of heat acclimatization
Previous studies [9,20,23,27] adopted MaxTemp as the metric related to temperature in equation (2). In this study, instead of MaxTemp, RelTemp, representing heat acclimatization, was adopted as the temperature metric in equation (2)   The time series of the number of heatstroke patients transported by ambulance from Case 1 to Case 4 for Tokyo are shown in figure 6. In Tokyo, the number of heatstroke patients increases at the end of the rainy season (usually in July) [28]. Thus, heat acclimatization had not occurred in July and early August. Compared to observational data, the results of Cases 3 and 4 underestimated heatstroke incidences in July and early August and overestimated in late August and September. This was because heat acclimatization was not considered in these cases. Conversely, the results of Cases 1 and 2 were more consistent with the observational data.
The R 2 and RMSE values for all age groups from Case 1 to Case 4 are shown in Fig. S2 and Fig. S3, respectively. The results obtained for all age groups were consistent with those obtained for the generation older   Fig. S4 shows the time series of heatstroke patients transported by ambulance of Case 1 to Case 4 for all age groups in the case of Tokyo. Furthermore, Fig. S5 and Fig. S6 show the time series results for the generation older than 64 years and for all age groups of Osaka, the second largest metropolitan area in Japan, respectively. The results shown in Fig. S4-Fig. S6 are consistent with those obtained for the generation older than 64 years for Tokyo ( figure 6).
In the present study, we found that RelTemp, representing heat acclimatization, had the highest importance rank and its consideration improved the fitness of observation. In particular, considering the heat acclimatization, the underestimated values of heatstroke incidences in July and early August were corrected and overestimated values in late August and September were also corrected. We found that exponential model with RelTemp (Case 1) provided the best accuracy of fitness.

Discussion
Previously, the effect of heat acclimatization on heatstroke patients was not sufficiently studied in Japan [24][25][26]. Herein, we found that the RelTemp, as a proxy for heat acclimatization, has the highest ranked importance among all the meteorological variables to determine the number of heatstroke patients transported by ambulance ( figure 3 or table 3). Daily MaxTemp and MaxWBGT have been considered most important metrics when determining heatstroke risk in Japan. Based on these metrics, a 'High temperature warning' is imposed by the JMA to warn the public about heat stroke when the MaxTemp is expected to be 35°C or higher [48]. Furthermore, the JMA and MOEJ perform a 'Heatstroke Warning Alert,' which disseminates an alert to warn people when the MaxWBGT is expected to be 33°C or higher [49]. Consideration of RelTemp in 'High temperature warning' and 'Heatstroke Warning Alert' as a supplementary metric to MaxTemp/MaxWBGT would be effective to determine the heatstroke risk in Japan. In fact, the MOEJ has been considering more useful metrics [50]. The results of this study have the potential to effectively meet this requirement.
We also found that underestimates of the number of heatstroke patients transported by ambulance in July and early August and overestimates in late August and September by the models without considering heat acclimatization (Case 3 and Case 4) were improved by considering heat acclimatization (Case 1 and Case 2) in Figure 5. Root mean squared error (RMSE) values of Case 0 using all meteorological variables by the random forest (RF) method; Case 1 using four metrics RelTemp, Rad, AveVel, and AveRelHum by the exponential model; Case 2 using same metrics as in Case 1 by the RF; Case 3 using four metrics MaxTemp, Rad, AveVel, and AveRelHum by the exponential model; and Case 4 using same metrics as in Case 3 by the RF for all 47 prefectures (more than 64 years old). figure 6. We expect that improving the overestimates or underestimates would contribute to preparing countermeasures, such as better allocation of ambulances and medical staff to work in emergency medical facilities with more precision [7,8]. The exponential model with heat acclimation (i.e., Case 1) was found to provide the best accuracy. We believe that this model has the advantage of being relatively simple and easy to utilize.
Our study has several limitations. First, although our best model (i.e., Case 1) has good accuracy, deviations from the observation data were still present (figure 6). For some years the calculations fit well with the observation data, but in others the calculations deviated. In addition to the aforementioned socio-economic conditions, geographical conditions, and human-related factors, these deviations might reflect effects that stem from temporal factors [51][52][53]. For instance, the timing and frequencies of outdoor events in a certain region could affect heatstroke incidences [26,54]. Another possibility is that the more heatstroke risk information there is in the media, the higher the awareness of heatstroke, which might increase the likeliness of calling an ambulance when one feels in danger of heatstroke. As an associated phenomenon, a correlation between heatstroke deaths/incidences and its search index were found in China [55]. Note that this correlation is higher than that with maximum temperature. Second, although RelTemp, which herein represents heat acclimatization, was found to be the highest importance ranked variable among all meteorological variables, the relationship between RelTemp and heat acclimatization processes in the human body was not investigated in this study. These are limitations that must be addressed in future works.

Conclusions
We conducted a multivariate analysis of different weather conditions simultaneously to analyze the relative importance of meteorological variables with respect to determining the number of heatstroke patients transported by ambulance in all of Japan's 47 prefectures. In this study, we considered the effect of heat acclimatization. The results showed that RelTemp, which represents heat acclimatization, has the highest importance rank among all the meteorological variables. We developed the exponential model and the RF model to predict the number of heatstroke patients transported by ambulance by adopting the highly ranked meteorological variables. We also considered the cases with/without RelTemp. The result showed that the R 2 of the exponential and RF models with RelTemp was 0.76 and 0.74, respectively and that of the exponential and RF Figure 6. Time series of the number of heatstroke patients transported by ambulance: Case 1 using four metrics RelTemp, Rad, AveVel, and AveRelHum by exponential model; Case 2 using same metrics as in Case 1 by the RF; Case 3 using four metrics MaxTemp, Rad, AveVel, and AveRelHum by exponential model; and Case 4 using same metrics as in Case 3 by the RF for Tokyo (more than 64 years old). The end dates of the rainy season in the Kanto area, which covers Tokyo, were 10 July in 2015, 29 July in 2016, 6 July in 2017, 29 June in 2018, and 24 July in 2019 [47]. models without RelTemp was 0.68 and 0.67, respectively. We found that the exponential model with RelTemp provided best accuracy. Additionally, we confirmed the effectiveness of considering heat acclimatization via RelTemp.