High resolution prediction and explanation of groundwater depletion across India

Food production in much of the world relies on groundwater resources. In many regions, groundwater levels are declining due to a combination of anthropogenic extraction, localized meteorological and geological characteristics, and climate change. Groundwater in India is characteristic of this global trend, with an agricultural sector that is highly dependent on groundwater and increasingly threatened by extraction far in excess of recharge. The complexity of inputs makes groundwater depletion highly heterogeneous across space and time. However, modeling this heterogeneity has thus far proven difficult. Using two ensemble tree-based regression models, we predict district level seasonal groundwater dynamics to an accuracy of R 2 = 0.4–0.6 and Pearson correlations between 0.6 and 0.8. Further using two high-resolution feature importance methods, we demonstrate that atmospheric humidity, groundwater groundwater-based irrigation, and crop cultivation are the most important predictors of seasonal groundwater dynamics at the district level in India. We further demonstrate a shift in the predictors of groundwater depletion over 1998–2014 that is robustly found between the two feature importance methods, namely increasing importance of deep-well irrigation in Central and Eastern India. These areas coincide with districts where groundwater depletion is most severe. Further analysis shows decreases in crop yields per unit of irrigation over those regions, suggesting decreasing marginal returns for largely increasing quantities of groundwater irrigation used. This analysis demonstrates the public policy value of machine learning models for providing high spatiotemporal accuracy in predicting groundwater depletion, while also highlighting how anthropogenic activity impacts groundwater in India, with consequent implications for productivity and well-being.


Introduction
Pressures from climate change, population growth, and demographic change are placing increasing stress on the world's water resources [1][2][3].Groundwater reserves are crucial to global agriculture but are being depleted at rates far in excess of recharge [4][5][6].This depletion in groundwater poses a direct risk to the livelihood and food security of several hundred millions of people [7][8][9].Few nations face more acute challenges than India, where irrigationdependent agriculture depends on groundwater resources that have been steeply declining [10,11,43].Understanding the drivers of depletion is clearly necessary, but inherently difficult: decline is the result of complex interactions between natural and human variables and policy inputs, and exhibits temporal and spatial heterogeneity in the timing and incidence of groundwater depletion [12][13][14][15][16].
Understanding and predicting this heterogeneity is of crucial scientific and policy importance.Modeling variation in groundwater decline can shed light on interdependencies and non-linearities between human-controlled policy inputs and natural systems at the local, national, and global scale.Additionally, more granular models of groundwater depletion rates can inform difficult policy decisions that require weighting competing demands for equity across space, time, and groups.Despite the importance of understanding groundwater decline, modeling change, especially at decision-relevant scales, is stymied by difficulties in obtaining high-resolution measures of groundwater change over time and by challenges inherent to modeling complicated interdependencies between human and natural systems.Meanwhile, a number of previous works have demonstrated the ability of machine learning models to predict groundwater recharge [17][18][19][20].Yet these studies do not focus on India and do not incorporate survey data of agricultural outputs and instead rely on simulations or well-depth time series alone.Likewise, these studies focus on smaller geographical and hydrological areas-such as one or two aquifers-whereas our study draws on a far greater number of well observations, over an entire country, with greatly varied hydrological and sociological settings.
We thus build on these studies, drawing from important research work at the nexus of machine learning and hydrology [19,[21][22][23] and water scarcity [24,25].This research contributes to scientific and policy debates with analysis of highly granular data on groundwater decline in India over the period from 1998-2014.We overcome previous difficulties with new data and approaches (see Methods and supplementary information for details).Our analysis shows that the rates and severity of groundwater depletion in India are more spatially and temporally heterogeneous than previously understood.We also show that the relative importance of anthropogenic drivers is becoming more important over time, and possibly greater than climatic factors in the same time period , in some areas, contra findings from previous analysis using linear models [26].
These findings point to the value of new approaches, combining massive data sets and machine-learning to understand the highly variable and localized manifestations of change in complex, linked human-natural systems.The research also points to the importance of locally-adapted, geographically and meteorologically attuned public policy in managing groundwater and other forms of natural resource depletion in the 21st century.

Data description
Our variable of interest (target variable in the regressions) is groundwater level, collected from over 20 000 monitoring wells from 1998-2014 across India (figure S1).Groundwater level is measured through observation wells which come from the Indian Central Water Commission, Ministry of Jal Shakti, River Development and Ganga Rejuvenation.This data includes over 2.8 million observations of groundwater levels from monitoring wells over the entire time period.We aggregate these wells to the district level by taking the median groundwater level of all wells in a district and season to avoid the undue weight of occasional large outliers.
Our predictors of groundwater level include a range of variables about irrigation, crop production, weather, climate [41,42], and geology of groundwater aquifers.These predictors are listed in table S1.They are chosen because of relevance to groundwater dynamics and practical data availability.Note that for each predictor that vary over time, we further test lags of one through four seasons on each predictor.The predictors are all processed to season-by-district levels.Supplementary information section S1 give additional explanations on the data sources and preprocessing of the predictors.

Modeling architecture
To model the relationship between the predictors and groundwater level changes over time, we mainly use two regression methods, XGBoost [27] and Random Forest [28].Supplementary information section S3 provide details on down-selection of the predictors, model hyperparameter optimization, and performance validation using a sequence of six validation periods (2000-2002, 2003-2005, 2005-2007, 2007-2009, 2009-2011, 2012-2014).Both XGBoost and Random Forests are tree-based ensemble methods-that is, they are made up of many individual regression trees that are summed up (for XGBoost) or averaged together (for Random Forest) to improve the goodness-of-fit and robustness of the model.XGBoost differs from the Random Forest most significantly in that XGBoost trains each successive round of trees on the residuals of fit of all the previous trees, while Random Forest trains the trees in parallel on random subsamples of the training data.XGBoost also introduces a number of new tuning parameters intended to prevent overfitting, and have shown great success in delivering state-of-theart results when properly tuned [27].In our findings, the XGBoost model performs slightly better than the Random Forest model on three standard metrics (root mean squared error, Pearson correlation, and mean bias) (table S3).
In order to understand impacts from high correlation between predictors, we perform XGBoost and Random Forest regressions using a full set and a reduced, less inter-correlated set of predictors (supplementary information S3).
In order to understand prediction uncertainty of the regression models, we consider two probabilistic modeling methods, XGBoostLSS and Quantile Regression Forests [29,30].We consider XGBoostLSS because it is the first extension of XGBoost that we are aware of that models the entire distribution, as opposed to just the mean, of the target variable [30].We consider Quantile Regression Forests because it is an extension of Random Forest that can model the entire distribution of the target variable and has been used to quantify uncertainty in groundwater data [31].Each method uses the same tree-based ensemble principles as its corresponding mean-only regression method but a different loss function to accommodate the uncertainty prediction [29,30].In testing, the Quantile Regression Forests turns out to exhibit much more reasonable performance than XGBoostLSS (see supplementary information section S3).Therefore, we choose to only examine prediction uncertainty using the 90% prediction intervals estimated by Quantile Regression Forests.

Predictors' importance
The complex structure of machine learning models make it often difficult to interpret the predictions they give in human understandable terms.Explanability methods improve the transparency, robustness, and trustworthiness of machine learning models, aiding model validation, error analysis, and generation of new scientific insights [32].Permutation importance [33] and Shapley additive explanations (SHAP) values [34] belong to a branch of explaniability methods that help us interpret which variables are the most important to, or used by, the machine learning model.In this study, we are interested in how the importance of the predictors of groundwater level changes compare to each other, and how they vary in time and space.
To calculate a predictor's permutation importance, we randomly permute the predictor's values to unpair them from the values of the other predictors.We measure how much the model's predictive performance, measured by R 2 , is perturbed by the change, and then average this result over multiple random permutations.Permutation importance has been shown to be less biased than other popular explanation methods used for trees [33].It has benefits over training models without the variable or filling in the value of the variable with an altogether different value over each observation, such as 0 or a large negative number [35].However, use of permutation importance also merits some important caveats, which we detail in supplementary information section S5.In this study, we calculate permutation importance with the better-performing XGBoost model.We examine the temporal evolution of permutation importance across validation periods in two ways.The first is over the whole study region for each individual predictor.The second is at district level for categories of predictors (Irrigation, Crop, Weather, Geology; [table S1]) included in the final fitted model.For the latter, we calculate the changes in R 2 over the data points in individual districts for each predictor, and then average over predictors in the same categories to decrease noise.
The computation of SHAP values is based on the concept of Shapley values in cooperative game theory [34].The computation of SHAP values requires subsampling the original set of predictors-ranging from having none of them to all of them-to determine the average marginal contribution of each predictor across all possible subsamples [21,34,36].The benefits of SHAP values include that the SHAP value of a predictor does not vary with the level at which it is placed in a decision tree, making it suitable for application on tree-based ensemble methods, and that SHAP values are additive [34].That is, each predictor has a SHAP value at each data point, and those SHAP values can be added up across the data points to derive global importance of the predictor [21,34,36].Also, at each data point, the sum of the SHAP values over all the predictors is equal to the predicted value at this data point [21,34,36].For the purpose of this study, the additive feature of SHAP values creates the drawback that the SHAP values cannot be directly compared across the six validation periods.For example, if the magnitudes of the predictand increases over time because of groundwater depletion, the SHAP values of all the predictors will tend to increase, but this does not mean they are all becoming more important.We therefore further normalize the SHAP values into SHAP-based importance as follows.For the temporal evolution of SHAPbased importance over the whole study region, we first convert the SHAP values of each predictor to absolute values, then average them over all the data points in the validation period, and finally divide the result by the average absolute magnitude of predicted groundwater levels in this validation period.For the temporal evolution of SHAP-based importance in individual districts, we calculate them for categories of predictors to be consistent with permutation importance.For each predictor, we first calculate its district-level importance by summing its SHAP values over all data points in the district in the validation period.We then sum up the results within categories of predictors and take the absolute value.This is the un-normalized district-level importance for each category of predictors.We finally normalize this importance district-wise via dividing it with the magnitude of average predicted groundwater level in the district.SHAP values are, however, much more computationally expensive to calculate than permutation importance.As a result, we only implement the SHAP values with the Random Forest model, which is more amenable to parallelization than XGBoost.We present the results in the supplementary information section S4 to complement the permutation importance results from XGBoost.

Results
Below, we summarize our findings.We show the predictive modeling performance, discuss permutation importance and SHAP value-based importance, and present evidence on the increasing importance of water withdrawal factors over time.

Predictive accuracy and heterogeneity
Early research done in 2009 with the GRACE satellite [10] brought to the forefront of public discourse the fact groundwater in India appeared to be declining at an alarming rates.This work focused over a six-year window and had native spatial resolutions in the tens of thousands of square kilometers, providing opportunities to track changes in aquifers at scales that had previously been impossible.In our approach, we use observation wells from over 20 000 monitoring wells across almost 20 years.We find groundwater depletion to be spatially varied at the district scale and that water levels are substantially declining in many areas.
We find that groundwater depletion was present in the majority of districts (58%), with wells generally decreasing the fastest in Northern India.We also find substantial decreases in districts in regions of Southern, as well as Eastern India where groundwater depletion is less frequently examined.The geographic patterns of depletion highlight the substantial heterogeneity in groundwater depletion's drivers and trajectories (figure S2).
Between the two regression modeling approaches for groundwater level changes, the performance of the XGBoost model is superior, with Pearson correlations ranging between 0.65 and 0.80, root mean squared errors between 3.39 and 5.92, and R 2 between 0.38 and 0.61 over the six validation periods that span the period 2000-2014 (table S3).For the Random Forest model, the Pearson correlations are between 0.59 and 0.80, root mean squared errors between 3.62 and 5.65, and R 2 between 0.35 and 0.59 (table S3).Therefore, we focus on the results of the XGBoost model in the main text, and use the Random Forest model to derive supplemental figures.
Figure 1 shows the predicted versus actual seasonal time series of groundwater levels at the district level for the XGBoost model, for eight randomly selected districts in states across India and a single validation period 2012-2014.One can see substantial heterogeneity in predictive accuracy across districts.We find that predictive accuracy is generally worse in the districts with relatively greater groundwater depletion over the course of our study (e.g.districts in Punjab and Karnataka in figure 1; compare to S2).We further support this conjecture using the prediction intervals from Quantile Regression Forests (figure S4).The sizes of the prediction intervals are negatively correlated with groundwater levels, meaning the predicted values are more uncertain when groundwater levels are deeper.The districts with greater groundwater depletion may have dynamics that are more affected by human water withdrawal than natural physical processes like precipitation, infiltration, and groundwater movement, which we explore further using the importance values in the next section.Because human activities depend on many policy and economic conditions that are locally heterogeneous and hard to measure, the resulting impacts on groundwater levels may be less predictable than natural physical drivers.This conjecture may be supported by the relatively strong positive correlations between deep and dug well irrigation and prediction uncertainty (figure S4).The prediction uncertainty does not show clear trend over time (figure S4).

Predictors' importance changes over time
When measuring the changes in R 2 accuracy using permutation importance with the XGBoost model (see methods 2.3) across our entire dataset, irrigation, weather, and crop variables are the most important categories of predictors (table S1).The top four most important predictors are: deep irrigation, humidity, shallow irrigation, and total crop production, all at one-season lag (i.e. one season before) to the target predictor, i.e. groundwater levels.If replaced with a column of randomly permuted values, withholding these predictors lowers the average R 2 of the six validation periods by 0.2, 0.13, 0.08, 0.08 respectively.Looking across time, we find large changes in the importances as seen in figure 2. We observe increases in the importance of shallow and deep irrigation over time, with increases of 2× and 2.5× respectively from the first validation period to the last.Conversely, the total crop production and cultivated area show large absolute and relative decreases in importance over time, with their final scores only roughly a third of the first.
Compared to the permutation importance, the irrigation, weather, and crop variables are still the most important according to the SHAP importance with the Random Forest model (figure S5).Furthermore, the partial dependence relationships between those most important predictors and groundwater level changes are consistent between the XGBoost and Random Forest models, lending confidence to the robustness of the captured relationships (supplementary information section S6).The top four most important predictors in the Random Forest model are: humidity at four-and one-season lags, deep irrigation at one-season lag, and precipitation at  At the district level, the permutation importances of irrigation still become more important over time (figure 3).The figure summarizes across the predictor categories by only displaying the most important category in each district in each validation time period.Irrigation variables (deep and shallow) become the most important in the majority of districts as time goes on.The number of districts where weather is the most important variable shrinks from the first period to the last two periods, and the role of irrigation seems to be quickly increasing across the country.One can also observe a spatial pattern that irrigation becomes the most important in Central and Eastern India.This coincides with where model predictive performance is relatively low (section 3.1) and groundwater depletion is severe (figure S2), supporting our previous conjecture that model predictive performance is lower where groundwater dynamics is more driven by human activities.
Consistent with the SHAP importances over the whole study region (figure S5), the districtlevel SHAP importances are most important for the weather variables.If we present a map like figure 3, weather will show up as the most important predictor in most of the districts in all validation periods.We therefore present the time evolution of the SHAP importance of each predictor category separately using trends (figure S6).We also display deep irrigation as a separate category because its dynamics diverge from the aggregate dynamics of deep and shallow irrigation.One can see that weather, crop, and geology variables become more important in Western India.Irrigation as a whole shows fewer instances of increasing importance than of decreasing importance, and the increasing trends are concentrated in the northern half of the country.In contrast, deep irrigation becomes more important in most of Central and Eastern India.Therefore, despite some differences in the other variables, both permutation and SHAP importances suggest increasing influences of deep irrigation on groundwater dynamics in those parts of the country where groundwater depletion is severe (figure S2).This is an important explanatory consistency across two methods, SHAP and permutation importance, with distinct assumptions and approaches.

Discussion
Using 17 years of monitored groundwater levels at over 20 000 locations across India, we verified that the often-reported groundwater decline based on satellites information is seen on the ground (figure S2) [37,38].We find that around 40%-60% of the variability in seasonal, district-level groundwater dynamics in India can be explained by weather, crops, irrigation, and geology factors (table S3).Among those factors, humidity, irrigation, and crop cultivation are consistently ranked as the most important predictors by two feature importance methods.Those most important factors suggest evapotranspiration and agricultural water withdrawal-driven groundwater dynamics.
Using two different feature importance methods, we demonstrate that deep irrigation is becoming a more important predictor of groundwater levels especially in Central and Eastern India, where groundwater depletion is the most severe (figures S2, 3 and S6).Importantly, deep irrigation wells are much more water intensive structures than shallow irrigation wells, withdrawing ∼15× the water per unit time (supplementary information section S1).Therefore, their increasing importance suggests an increasing reliance on groundwater.The literature has shown that groundwater-fed irrigation is a critical component of water stress in India, and our model supports that conclusion.The relative geographical distribution of these relationships, however, was previously unknown [26].The fact that shallow irrigation did not replicate the increasing importance of deep irrigation may be a function of the differential scale of water withdrawal across these two well types (see supplementary information for further discussion), or may be because increasing well depth is needed to access groundwater in many parts of the country.
We also see the trend of increasing importance of irrigation or deep irrigation (figures 3 and S6) present even in many parts of the country where median well depths are level or slightly increasing (figure S2).One potential explanation is that investment in groundwater extraction and irrigation is being expanded in these areas as well.While groundwater depletion is still not largely an issue in many of these regions, with recharge frequently meeting or exceeding levels of extraction, it is still possible that groundwater extraction is intensifying in these areas.To explore whether such intensification is accompanied by a resulting gain in agricultural productivity, we estimate the relationship between crop production and deep irrigation in figure 4.This figure shows that the majority of districts have declining efficiency of irrigation, suggesting that increasing intensity of deep irrigation is not generally accompanied by increased agricultural productivity.This is especially the case in many of India's eastern states, where we observe a coinciding increase in the importance of deep irrigation in predicting well-level water depth.While we note the scientific and policy importance of explaining this trend of decreasing irrigation efficiency in agriculture, we note that the data collection and methods needed to rigorously document and explain it are outside of the scope of our manuscript, and would require separate design, data collection, and execution.It is likely that a combination of environmental changes, as well as expanding cultivation of rice, a highly water-intensive crop, in regions of the country that are heavily reliant on groundwater [39], are leading to this decrease in yield per unit of irrigation.Additionally, slowly increasing rural electrification has increased the feasibility of tubewells, increasing farmers' access to often under or unpriced groundwater.
We do note that the permutation and SHAP importance methods have some differences in their findings.The SHAP importance method suggests decreasing importance of shallow irrigation over time, whereas the permutation importance suggests increasing importance.SHAP importance ranks humidity as the most important driver and deep irrigation as the second most important, whereas permutation importance ranks deep irrigation as being far more important than the other predictors.Those differences may be because SHAP and permutation importance methods handle variable interrelations and interdependencies in different ways [27,33].Permutation importance has the drawback of potentially creating unrealistic data points, for which the validity of the regression model would be degraded because of the well-known out-of-sample problem (see supplementary information section S5).In general, due to the considerably different calculation procedures of SHAP and permutation importance, their results should not be expected to match exactly, and the agreement between the two methods on the increasing importance of deep irrigation lends robustness to that finding.Feature importance methods are still an area under active development and we expect more rigorous and computationally feasible methods to become accessible in the future [40].
Our results are subject to several sources of uncertainty.One source is inherent limitations in the available data.For example, the irrigation well surveys only report the hours of operation but not the actual water withdrawals (supplementary information S1).Human errors may exist in the surveys but the survey data does not contain uncertainty intervals.The geologic maps of rock types undoubtedly contain considerable uncertainty due to the general difficulty in obtaining below-ground data.The uncertainty in groundwater level measurements should generally be small due to the large number of monitoring wells across India and the considerable administrative efforts put into their collection, but may be larger in some parts of India where the monitoring wells are sparser (figure S1).Detailed quantification of the impacts of such sources of uncertainty are beyond the scope of this study and we recommend future studies to better treat them.Another source of uncertainty is in predictors' selection.Some of the selected predictors are highly correlated, which may decrease the robustness of the results.To understand this source of uncertainty, we perform XGBoost and Random Forest regressions using a reduced set of predictors that remove some of the highest correlations (supplementary information S3).The validation performances using the reduced set of predictors are lower compared to the full set of predictors (table S4).The permutation importance of XGBoost shows increasing importance of deep irrigation over time, albeit with greater noise than in figure 2. The SHAP-based importance of Random Forest show increasing importance of deep irrigation over time with less noise than in figure S5.Those results demonstrate the robustness of that finding to predictor changes.The final source of uncertainty is in model fitting, which we address using Quantile Regression Forests.The results show that the uncertainty intervals in the predicted groundwater levels are mainly around 5 to 10 meters and grow larger with deeper groundwater levels S4.
Groundwater recharge is a function of precipitation and transpiration, modulated by topography and geologic conditions, net of extraction.As climate change accelerates this century, in some cases adversely impacting the meteorological drivers of groundwater recharge, there will be an increasing need to predict and highlight areas of particular concern, in India and elsewhere.State and federal policy makers may wish to forecast poor or oversupplying seasons to better apportion resources, seasons or years ahead.The framework presented in this paper can help inform such water management decisions through its predictive framework, and help identify local drivers to suggest (locally and temporally tailored) maximally-impactful policy levers.Especially with improved data quality, the model we present helps identify areas most at risk in the immediate future and gives an indication about what might be done to reverse the trend.

Conclusion
Groundwater depletion is a significant problem in many parts of the world, and has been a major concern in India for much of the last two decades.Accelerated depletion is expected in many areas due to climate change and excessive water withdrawals.Other parts of the country, however, have shown net recharge of their fairly shallow water tables.
Groundwater in India is highly diverse, thus necessitating relevant policy decisions to be made at local levels.In this paper, we presented predictive frameworks-created and tailored to a high degree of accuracy at the district level, using the XGBoost and Random Forest models.We demonstrate the models' successes in predicting up to three years in advance across a sequence of validation periods, with out of sample R 2 of 0.4-0.6 for XGBoost and 0.35-0.6 for Random Forest; we also document heterogeneity in these models' performances.
Beyond the predictive performance of our models, we demonstrate, through permutation and SHAP feature importance methods, that human-driven groundwater depletion appears to be accelerating in importance in much of the country.Irrigation using deep tube wells, especially, became consistently more important over 1998-2014 in the Central and Eastern parts of India according to both importance methods.Lastly, we provide suggestive evidence that countrywide trends are accounted for by an intensification of irrigation-fed agriculture in the past couple of decades, as aquifers have become severely overdrawn in many areas.This work thus shows the complexity and nuance of water depletion across India, and demonstrates the usage of modern machine learning methods in helping predict losses and understand its drivers.

Figure 1 .
Figure 1.Actual (blue) and XGBoost predicted (orange) seasonal time series of groundwater depth (unit: m).The subplots show eight randomly selected districts over India-subplot titles are formatted as 'state name, district name' .This figure shows the results from a model trained from 1998 to 2011 with predictions made out of sample from spring 2012 to 2014; x-axis shows seasons starting with season 1 in Spring 2012.Districts show a wide degree of heterogeneity in the amount of cyclical behavior demonstrated due to the relative impacts of the monsoon and irrigation pressures on groundwater.

Figure 2 .
Figure 2. Time-varying permutation importances across all districts.Training was performed on all years prior to the first date of the testing window.For example, the 2005-2007 testing window are the results of the model on data from 1998 to 2004.Only the eight most important predictors are shown.

Figure 3 .
Figure 3.Most important features over time in each district.Calculated with permutation importance.Categories include multiple variables and were aggregated to the district level.The results demonstrate an increase in the importance of irrigation across the country, and especially in Central and Eastern India.

Figure 4 .
Figure 4. Linear trend of the ratio of crop production to deep irrigation over the years in each district (unit: ton per well×hour per year).Negative numbers indicate that more deep irrigation is being used per ton produced over time.