Development of world famine database for 1840–2019 and risk assessment of its occurrence and severity

Famine still exists in this world, hindering the achievement of Sustainable Development Goal 2: zero hunger. The history and the mechanisms of famine have been broadly studied; however, few studies have focused on quantitative and long-term analyses of the general characteristics of famine. This study analyzes the factors influencing famine and estimates its risks. We developed a historical famine database and estimated the probability and intensity of famine through regression analysis. Herein, we identified that less food production and underdevelopment are related to famine occurrence, and famine tends to be more intense in less urbanized and dryer areas. By extrapolating the regression with future scenarios, we revealed that famine would be less likely to happen in the future mainly due to GDP growth, although conflict would be the key factor of future famine risk. This study is one of the first steps in the quantitative analysis of long-term famine risk.


Introduction
In recent years, the number of undernourished people has increased and it is estimated that between 720 and 811 million people went hungry in 2020 [1], with famine being one of the leading causes.The food security and nutritional status of the most vulnerable population groups is likely to be affected further due to the health and socio-economic impacts of the COVID-19 pandemic [2].With rising concerns about food insecurity backed by population growth, climate and/or environmental changes, and the pandemic, the elimination of famine will remain a challenge even in the 21st century, despite that the UN Sustainable Development Goals (SDGs); SDG 2 [3] aims to achieve 'Zero Hunger.' Assessing historical famine will provide essential information regarding future famine risk mitigation.The causes of famine and its processes have been widely examined [4]; however, most previous studies are descriptive and detailed in individual events rather than quantitative and comprehensive.Although some studies conducted quantitative analyses on historical famines, most of them revealed facts and insights about famines, but not their risks.The Famine Early Warning System [5] takes a scenario-based approach to assess the risk of food insecurity; however, it is designed to support the administrator's decision in urgent food insecure situations and does not offer future prospects for longer than six months.Alfani and Grada [6] conducted a temporal scan analysis of famines in Europe from the 1250s to the present; they sorted out the timing of famines through a literature review and analyzed each famine by population dataset.They also carried out a simple regression analysis on the probability of famine by using population and wheat price data, concluding that famine before 1710 was mainly caused by production failure, not by human-made causes.Their primary focus was on analyzing the famines before the 1800s, not considering various factors related to recent famines.
The purpose of this study is to identify the major factors affecting historical famines and their changes, and to assess the risk of famine in the future.This study consists of two parts.In the first part, data on major famines since 1840 were assembled into a famine database based on a literature review, and their causes and processes were analyzed.In the following section, we assess the probability and intensity of famine using regression analyses with the variables associated with the famine database.Data from the historical famine validated the regression results.We also assessed the future risk of famine in each country by applying a logistic regression model under a few socioeconomic scenarios and concentration pathways.

Development of famine database
The term 'famine' has several definitions.In this study, to clarify the criteria for adoption into the database, the term 'famine' was defined by a mortalitybased definition [7], which defines famine as an event in which more than 10 000 people died.We developed a famine database targeting famines since 1840s, which we listed based on a Google Scholar search for the word 'famine' , and the literatures we found by the search.As for the basic information, we included the year range, country, and number of deaths of each famine.For the range, starting year and ending year are slightly different in some literatures, so we selected the year most frequently referred in the literature.Each famine was attributed to the country rather than region, as statistical data are available at the country scale, although many famines occurred in only a part of a country.The number of deaths differed from that reported in the literatures; our database adopted the average value of the reported number of casualties.
The explanation of famine in this database database consists of two parts: factor and description.In the factor section, major factors that triggered famine, such as drought or conflict, were attributed to each famine.The categorization of factors, shown in supplementary table 1, was defined according to the chapter division of 'Theories of Famine,' [8] which provides a comprehensive and organized overview of famine factors.DeRose's perspective [9] was also used to organize the categories.In the second section, auxiliary information is provided: the pre-famine situation, shock, and social response against the famine are summarized.
As for the limitation of this database, it is impossible to cover all famines including those without the record.However, for those famines that have been recorded, we tried to source information from multiple literatures and extract commonly mentioned information to make the description in the database more accurate.Hence, one of the main benefits of this database would be the ability to list and compare the background, factor, and subsequent response, supported by multiple literatures.
Figure 1(a) shows the average number of factors that have contributed to each famine every 20 years since 1860.Overall, recent famine has been caused by a combination of various factors, which is consistent with Devereux's insights [10]; famine causality is becoming more complex than ever.The trends in causes can be broadly divided into three periods based on a combination of causes.In the first period, from 1860 to 1899, environmental triggers, such as droughts, were the leading causes of famines.In the second period, from 1900 to 1959, the number of social triggers and domestic responses, including war or development failure, was aggravated due to colonization, revolution, and the two World Wars.In the third period, after 1960, the factors became more diverse: environmental factors further increased, international responses increased, and social triggers decreased.A breakdown of the factors in each category (shown in figures 1(b)-(e)), famines in the third period seem to have been triggered by diverse factors; however, the number of social triggers, domestic responses, and international responses decreased from the 1960s to the 2000s, while there was consistently more than one environmental factor in each famine.Environmental factors primarily consist of drought, and in other categories, the proportion of each element varies depending on the period.Among the social triggers, factors such as insufficient infrastructure and market failure increased, while the share of war or conflict was relatively constant after 1900.

Risk assessment of famine
Herein, we hypothesized that the intensity of famine could be assessed as research on undernourishment by introducing proxies for each factor.We assembled the dataset shown in supplementary table 2 based on the classification of the famine database and conducted statistical analyses with the dataset.Due to data availability, we selected the target of this research as 162 countries with a population larger than 10 million in 2010.The target period of the statistical analyses was set to 1961-2019, expecting to understand the trend of famine in the third period (1961-now), as we proposed in the previous section.This time range is also due to the data availability limitation; the standardized data of some of the explanatory values before 1960 was inaccessible, such as GDP of each country.
The datasets used in the regression analysis are presented in supplementary tables 2 and 5.The global hydrological model H08 [11,12] was used to calculate soil moisture forced with input using ISIMIP dataset [13].A coupled simulation of the land surface process, river, crop growth, and reservoir operation were conducted with a spatial resolution of 0.5 • and a time resolution of one day.The annual mean value of soil moisture in each country was calculated from the model output using a national mask with a spatial resolution of 0.5 • .For the future projection of soil moisture, the four climate models' projected values of shortwave radiation, longwave radiation, precipitation, temperature, wind speed, humidity, and air pressure were used as forcing data of H08.Precipitation and temperature were bias-corrected with historical data and re-grid to a resolution of 0.5 degree.
Future GDP data provided by the National Institute for Environmental Studies, Japan (NIES) [14] were converted into USD in 2019 to align with historical data.Gridded GDP per capita in SSP1, SSP2, and SSP3 were calculated with future GDP and population data, converted into each country's value using the national mask.The conflict frequency was calculated based on the conflict dataset provided by the Uppsala Conflict Data Program [15,16].The ratio of years when conflict occurred during the target period was considered the conflict frequency.For all data, missing values were filled in by interpolation.In this research, the analysis process was divided into two steps to assess farmers' risk using datasets related to famine causes.First, it is important to mention that conducting linear regression using all cases (one year, one country) to estimate famine intensity was found to be challenging.This is because, in most cases, famine did not occur; there were only 87 cases with famine among all the possible 9558 cases (162 countries in 59 years).For this reason, two-step analyses were conducted as illustrated in the supplemenraty figure 1: logistic regression to infer whether famine happened, and panel data analyses to infer the intensity of the happened famine.Logistic regression analysis (see supplementary information) is a binary classification method that is adopted to estimate the probability of famine occurrence, that is, how likely famine is to happen.Focusing on the probability alone at first makes it possible to avoid the difficulty of directly estimating the intensity, and then, panel data analysis (see supplementary information) was conducted on the happened famine cases alone to assess the mortality of the famines.The intensity of famine (i.e.famine mortality, calculated by dividing the number of famine casualties by the population at that time) was estimated using the random-effects model.Note that this analysis is on the famine case alone, which means the result would show the intensity of famine if it happened.

Results from the logistic regression analysis
The results of logistic regression analysis after stepwise selection are shown in table 1. GDP per capita, access to safe water, crop production per capita, precipitation, and cereal import rate have negative coefficients, which means that social development and production capacity are associated with the absence of famine.Conflict can be assumed to increase the probability of famine, because it has a positive coefficient.As for the absolute value of the coefficients, crop production per capita is the largest, GDP per capita is the second largest, and the rest are minor, suggesting that these two have especially strong influences on probability.Supplementary table 3 shows the results with all explanatory values before applying the stepwise selection.
The logistic regression model, with the selected values in table 1 was applied to all countries.The estimated famine occurrence is shown in figure 2. Most of the historical famines, indicated by red dots, are associated with a high probability of famine occurrence.
However, some countries are estimated to have a high probability of famine occurrence with no actual reported famine, while others have low estimated values despite the fact that the famine actually occurred.This issue is examined in the discussion section.The four world maps in figure 2 illustrate the changes in the estimated famine probability in each 15 year period.The probability has decreased in recent years in most countries, but sub-Saharan countries keep high values, which is consistent with recent trends; in particular, Somalia still has high values in the 21st century.

Results from the panel data analysis
Table 2 shows the results of the panel data regression.
Stepwise selection was conducted, as in the logistic regression analysis.The urban population rate and soil moisture were used as explanatory values.The result indicates that more severe famine is associated with low urban population and low soil moisture.The results of panel data regression after stepwise selection are presented in supplementary table 4. Figure 3 shows the results of the model application.Note that this model estimates the famine intensity when famine occurs but not the risk of the famine itself.Some of the large famine, such as famine in Cambodia in the 1970s, is well detected.However, two cases, famine in China in 1961 and famine in North Korea in the late 1990s, are estimated to be less intense than they were reported.These two cases should be related to the political regime and policy, and such factors might not be well counted with the variables we used.

Future projection
The probability of famine occurrence and intensity of famine were assessed in the previous chapter.Herein, we applied the same model to the two future scenarios, SSP1-RCP2.6 and SSP5-RCP8.5, to grasp the famine risk in the future and their differences between the scenarios.In this simulation, GDP and polpulation data from NIES [14], precipitation data from ISIMIP2a simulation data [17], and crop production data simuilated by global hydrological model H08 were used.The former model, logistic regression assessing famine occurrence, is applied to the future scenario here since the second one estimates famine intensity in famine-experienced countries alone.
Supplemental figure 2 shows the estimation results.Overall, the estimated probability of famine decreases in the future, which is mainly due to GDP growth.The difference between the two scenarios is small, though the SSP1-RCP2.6scenario has a slightly higher estimated result.As explained in the previous section, GDP would be higher in the  SSP5-RCP8.5scenario, but crop production would be slightly smaller.Since these two have opposite effects on the probability of famine, it can be understood that the two results are almost the same.Instead, the effect of the conflict's existence is more extensive; in both scenarios, the estimated probability would soar if a war occurred.

Discussion
Through this research, it was revealed that the trend of the causes of famine has been changing; environmental factors were the leading cause in the 19th century, and conflicts became significant after 1900.
In particular, the cause has become more diverse since the 1960s, however, it should be noted that this difference can be due to the amount of information and studies in the past and recent years.In the risk assessment, GDP per capita, conflict, access to safe water, crop production per capita, and cereal import rate were proven to be significant explanatory values of famine occurrence.As for the estimation of intensity, urban population and precipitation are significant, but more detailed estimation on the intensity would be needed with careful quality control of the famine data both on its magnitude and the ratio to 'entire' population.Some exceptions to this result were examined.Underestimation is found in the case of famine, which happened outside Africa, and it is inferred that the flood was one of the causes.This suggests the importance of monitoring vulnerability to floods to estimate famine risk.Lastly, the future prediction of the probability of famine illustrated that the risk of famine would decrease regardless of socioeconomic pathways; however, if a conflict occurs, the probability of famine to occur would become much higher.In particular, countries in sub-Saharan Africa would still have a high risk of famine in the future.The logistic regression model and random effect model revealed that some parameters are dominant in explaining famine occurrence and intensity, but these two models have overestimation and underestimation.Herein, the outputs of both models are discussed in detail because the countries are limited in the panel data analysis.
In figure 2, it can be seen that two famines, 1974 in Bangladesh and the late 1990s in North Korea, had an especially low estimated probability of famine occurrence.The variables in these cases are shown in supplementary table 6, showing that both countries have higher access to water and production per capita than the average of all famine cases, although GDP per capita and cereal import rate are low; in these two cases, there was no conflict stated, decreasing the estimated possibility.From the data, it can be said that the high access to water, high cereal production, and the absence of war can be attributed to the low possibility of famine.However, it is notable that floods, damaging cropland, are one of the factors in both famines.As a proxy for flood, we adopted the maximum 30 days precipitation sum, which has a positive correlation coefficient but is not a dominant variable.This might be because the risk of flooding depends on the situation of each basin, which sometimes covers an area larger than one country or a small area in a certain district.From this point of view, further research focusing on the relationship between floods and famine risk is requiredThere are some cases in which the probability of famine occurrence is estimated to be high, but famine did not occur in reality.The countries that have an estimated probability greater than 0.15, despite the absence of famine at least once during this period are Burundi, China, Chad, Cambodia, Nigeria, Niger, Burkina Faso, Somalia, Gambia, Mali, Uganda, and Ethiopia; most of them are famineexperienced countries.In Burundi and Gambia, no famine was reported during this period.However, these countries are reported to be one of the poorest countries; WFP [18] describes Burundi as the country with the highest hunger score and is the 9th food security crisis in the world."The Gambia have been reported to have food insecurity and malnutrition issues that have remained unchanged or have worsened in the last 10 years [19].It is assumed that famine, which we defined as an event with 10 000 mortalities by hunger, did not occur or was not reported in them, but the situation could be in danger, and they were vulnerable to famine.
As for famine intensity, there are many overestimations and underestimations.It would be difficult to say that the estimation is reasonable because the dominant parameters are only the urban population and precipitation.The intensity of famine is determined by a complex combination of factors that are challenging to quantify, such as policy or regime failure, and the stepwise regression result shows that it is challenging to represent the intensity with available quantitative variables.Defining famine intensity also contains difficulty; we defined the ratio of casualties to population as the measure of intensity, but the population of the whole country might not be a proper denominator to calculate the ratio, since famine tends to be a local event and some population living a richer life would not be exposed to famine at all, even if the famine happened in a poorer area.
There is room for further improvement.First, considering the time-series processes of famine, the relationships among each factor are also critical.In this study, we evaluated the risk of famine mainly based on the annual value of each factor; however, the magnitude of the risk could be different depending on the seasonality of the parameters during famine due to the lack of detailed data on monthly or weekly spans.Second, analysis with higher resolution data would be preferable; our study dealt with country-scale data, but it is suggested that flooding can be a major risk factor for famine, and the low urban population rate is associated with intense famine, indicating the importance of examining famines on a local scale, though getting data in such area that governance is weak is difficult as mentioned in Maxwell and Hailey [20].It is also important to assess the causal relationships among the factors, including process-based analysis, in the estimation of long-term global famine risk, to obtain a clearer view of famine in the future.Finer quantification is required for some data.For example, for conflicts, it would be better to use quantitative data that show the intensity of the conflict rather than just using data on whether there was a conflict or not.Lastly, for future risk assessment, a more sophisticated estimation and hypothesis are needed.As Choularton and Krishnamurthy [21] and Krishnamurthy et al [22] pointed out, collecting more accurate data and estimating weather phenomena would be critical in the future risk assessment.In this research, we assumed that only GDP and crop yield would change due to data limitations; however, other parameters related to resilience or tolerance against famine, such as infrastructure and trade, would change in the future and affect famine risks.Global economic models that account for these changes should be applied.Even if the model framework is developed that can reasonably explain the occurrence and the magnitude of past famines, famines may occur through different mechanisms, or the relationship between vulnerability and socioeconomic factors may change.Therefore, it is necessary to be reminded that there is a possibility that the model framework does not apply directly to future famines.
Although there is room for a more refined assessment, our study shows the comprehensive characteristics of famine and quantifies its general risk.This will be one of the first steps in assessing the long-term analyses of future famine risk in the world.

Figure 1 .
Figure 1.Average number of four categories in each famine.The average number of causes contributing to each famine was counted according to four categories every 20 years from 1860 onwards.(a) shows the sum of each category, and (b)-(e) shows the breakdown of each category.Note that the results from 1840 to 1859 were omitted because there was only one famine in that period.

Figure 2 .
Figure 2. Transition of estimated probability of famine occurrence and its geographical distribution.In the top-line graph, famine-experienced countries are represented by a red line, while the others are gray, and historical famines are represented as red circles.The size of the red circles indicates the ratio of famine casualties to the whole population.The geographical distribution of the values is shown in the four figures below.

Figure 3 .
Figure 3. Transition of estimated famine intensity by ratio casualties (%) of total population and its geographical distribution.

Table 1 .
Result of logistic regression analysis.

Table 2 .
Results of panel data analysis.