Data-driven approaches demonstrate legacy N accumulation in Upper Mississippi River Basin groundwater

Increases in nitrogen (N) fertilizer application, livestock densities, and human population over the last century have led to substantial increases in nitrate contamination. While increases in riverine N loads are well-documented, the total magnitude of N accumulation in groundwater remains unknown. Here we provide a first data-driven estimate of N mass accumulation in groundwater within the Upper Mississippi River Basin (UMRB), an area of intensive row-crop agriculture and the primary contributor to Gulf of Mexico hypoxia. Using approximately 49 000 groundwater nitrate well concentration values and a suite of geospatial predictors, we developed a Random Forest model to produce gridded predictions of depth-varying nitrate concentrations. Our results suggest that approximately 15 Tg of N (328 ± 167 kg-N ha−1) is currently stored in UMRB groundwater recharged over the last 50 years. For context, we compare these predictions to those from a lumped statistical model, which predicts accumulation of 387 ± 133 kg-N ha−1, as well as to a simple N mass balance model of the UMRB, which puts an upper bound on accumulation of approximately 1000 kg-N ha−1 (1967–2017). These findings highlight the importance of considering legacy N when forecasting future water quality, as N in the subsurface will continue to impair drinking water quality and elevate surface water N concentrations for decades to come.

Nitrogen fertilizer use and intensive livestock production systems have for close to a century been increasing concentrations of nitrate in both groundwater and surface water (Schlesinger 2009, Murphy and Sprague 2019, Van Meter et al 2023. In the Upper Mississippi River Basin (UMRB) (figure 1), the largest source of nitrate to the Gulf of Mexico, excess N not only drives development of a large, low-oxygen hypoxic zone in the Gulf each summer, but also leads to groundwater nitrate accumulation, which elevates baseflow nitrate concentrations within the river network and impairs rural drinking water quality (Tesoriero et al 2013, Jha et al 2015, Wheeler et al 2015.
Data-driven approaches are increasingly being used to improve our understanding of spatially varying patterns of nitrate accumulation in groundwater. A range of methodologies, from multiple linear regression to random forest (RF), boosted regression trees, artificial neural networks, and Bayesian networks have been used to predict groundwater well nitrate concentrations in a variety of locations, both within and outside the U.S (Rodriguez-Galiano et al 2014, Nolan et al 2015, Wheeler et al 2015, Ransom et al 2017, Knoll et al 2019. In 2021, a boosted regression tree approach was used to predict the distribution of groundwater nitrate concentrations across the contiguous U.S . In this study, gridded predictions of groundwater nitrate concentrations were merged with gridded population data, allowing the authors to estimate that the drinking water of nearly 1.5 million people across the U.S. is at risk of having nitrate concentrations above the 10 mg-N l −1 drinking water standard (USEPA 2013.
While this previous work has increased our ability to predict groundwater nitrate concentrations and to better understand drivers of high concentration values, it has done little to quantify actual magnitudes of N accumulation. Although concentrations may be the most important indicator of nitrate contamination when considering human health effects, quantification of total N mass accumulation is also of vital importance when attempting to simulate long-term transport of groundwater N to surface water. In particular, it is increasingly understood that taking into account watershed N legacies is crucial to predicting time lags between implementation of conservation measures and measurable changes in water quality , Ilampooranan et al 2022.
Our goal in the present study was to develop spatially varying estimates of legacy N accumulation within the UMRB that can be used to better our understanding of long-term water quality trajectories for the Gulf of Mexico. To meet this goal, we use multiple data-driven approaches, including a simple lumped analysis of groundwater nitrate concentration data across the UMRB, a boosted RF regression model, and a watershed-scale mass balance to provide spatially varying and depth-stratified estimates of nitrate-N accumulation. To allow for the progression from estimates of concentration to estimates of mass, the modeled concentration predictions are combined with gridded estimates of aquifer porosity, water-table depth, and depth of modern groundwater (Gleeson et al 2015) to develop, for the first time, spatially varying and depth-stratified estimates of groundwater nitrate-N mass across a large agroecosystem.

Study area
The UMRB covers approximately 491 700 km 2 of the Midwestern United States, including sections of Minnesota, Wisconsin, Iowa, Illinois, Indiana, Missouri, and South Dakota (figure 1) (Arnold et al 2000). Although agriculture dominates the land cover of the basin, large areas of forested land can be found in southern Missouri and in northern portions of Wisconsin and Minnesota (figure 1(a)) (Multi-Resolution Land Characteristics Consortium 2016).
The principal shallow aquifer sections of the UMRB include Cambrian-Ordovician, Mississippian, Silurian-Devonian, and glacial (i.e. Pleistocene) sediments ( figure 1(d)). The large portion of the watershed listed as 'Other' in figure 1(b) (tan) represents extensive surficial aquifers of glacial and alluvial deposits of sand and gravel. The next largest Cambrian-Ordovician aquifer system includes sandstone and dolomite (dark brown). Other aquifer systems in the watershed are composed primarily of sandstone and carbonate rocks (Olcott 1992).
Water quality in the UMRB is threatened by a range of contaminants, ranging from excess nutrients (nitrogen, phosphorus) to insecticides, herbicides, and chloride from road salt (Stark et al 2000, Secchi et al 2011, Dugan et al 2020. In many areas of the watershed, groundwater nitrate concentrations in drinking water wells exceed the U.S. EPA drinking water standard of 10 mg-N l −1 . These water quality concerns not only affect the ecosystems and residents who rely on surface/groundwater for drinking and recharge of surface water, but also downstream water bodies. In particular, the hypoxic zone in the Gulf of Mexico has grown due to eutrophication from transported nutrients from the UMRB (Van Meter et al 2016).

Groundwater nitrate concentration data
Groundwater data was obtained for more than 100 000 wells across the UMRB (SI, table S1). Approximately 5% of the final dataset was obtained from the USGS National Water Information System (United States Geological Survey n.d.). These data were then supplemented with state-level data for Illinois, Iowa, Minnesota, Missouri, South Dakota, and Wisconsin (Illinois State Water Survey 10/2019, Missouri Department of Natural Resources 10/2019, Minnesota Pollution Control Agency 10/2019, South Dakota Department of Agriculture & Natural Resources 10/2019, Wisconsin Department of Natural Resources 10/2019). Groundwater well data records without location information, date information, nitrate concentration values, or well depths were removed from the dataset. Concentration data provided as nitrate (NO 3 ) were converted to nitrate-N values. Records with concentration values more than 16 times the standard deviation of the entire dataset were considered to be outliers and were removed from the analysis. Due to very sparse data availability prior to 1970, all samples collected before this date were also removed. After data cleaning, a total of 49 031 groundwater well concentration data points were included in the analysis, representing 44 162 unique locations across the UMRB for the period 1970-2021.

Predictor dataset creation
Numerous datasets were compiled to develop an initial set of 74 predictor variables for the RF modeling framework. Broad categories of predictors included well-specific data (year/month of measurement, well sampling depth) obtained from the same datasets as those used for the nitrate concentration data, climatic variables such as temperature and precipitation (PRISM Climate Group, Oregon State University 2004, U.S. Geological Survey 2019); geomorphological variables such as aquifer rock type, soil type, and other soil properties (Bayles et al 2017, Horton 2017; gridded components of the N mass balance, as derived from the gTREND-Nitrogen dataset (Byrnes et al 2020a(Byrnes et al , 2020b; and surface variables such as elevation, land cover, and slope (Wieczorek 2014, Multi-Resolution Land Characteristics Consortium 2016. With the estimates of N inputs from the TREND dataset, we used year-specific input values to correspond to the year in which nitrate concentrations for individual wells were obtained. For variables with significant spatial heterogeneity at the scale of hundreds of meters, 500 m buffers were created around each well point, and a mean value was calculated. All vector data were aggregated to a grid scale with a resolution of 250 m. For a summary of all 74 predictor datasets as well as relevant data sources and notes on processing, see SI table S2. For development of the RF regression model, the gridded predictor data was extracted for point locations corresponding to each modeled well. The gridded data was then used with the developed to predict depth-varying N concentrations (0-200 m), at a 250 m grid scale across the UMRB. For more details regarding model development are provided in the following section.

RF model development and training
Two RF models were developed using the MATLAB Treebagger function: the first, a standard RF model, and the second a one-step boosted RF (OSBF) (Ghosal and Hooker 2018, The MathWorks, Inc 2021. The RF method is a supervised, ensemble learning algorithm now commonly used for classification and regression problems in hydrology and other fields (Belgiu and Drȃguţ 2016, Tyralis et al 2019, Shen et al 2021. With this method, each decision tree divides the dataset into optimized groups based on a subset of variables randomly determined by the algorithm, and final model predictions are based on the averaging of results from a specified number of decision trees. Because RF models are known to be prone to systematic bias, with both underestimates of large values and overestimates of small values, we used a boosted OSBF approach, which involves creating a second RF model to estimate the residuals from the original RF, and then adding the predicted residuals to the results of the first model run to minimize prediction error (Zhang and Lu 2012, Ghosal and Hooker 2018. For the model runs, the nitrate well concentration data was first randomly split between testing data (10%) and training data (90%), with the training data being used for calibration and the test data being used for model evaluation. Hyper parameters for the RF model were tuned during the calibration process, including (1) the number of decision trees used in the forest; (2) the number of records collected in the end node or 'leaf ' of each decision tree; and (3) the number of predictors to sample at random for each decision split. For each of the hyper parameters, we tested a range of values and selected the highest-performing parameters by comparing the average out-of-bag mean squared error (OOBMSE) (SI figure S1) (Grömping 2009).
While all model predictors were used in the initial training model, the number of predictors was reduced based on considerations of both parsimony and model performance. Of the original 74 predictors, the number of predictors for the predictive model was reduced to 31 based on quantification of the predictor importance and on the model performance obtained with various combinations of predictors. The predictor importance was quantified based on the OOBMSE using the method of permutation-based mean-squared error (MSE) reduction (Grömping 2009). Using this method, the out-of-bag MSE is calculated for each tree as the average of the squared deviations of out-of-bag responses from their respective predictions. If a predictor does not have predictive value for the response, permutation of that predictor will make no appreciable difference in model results. Therefore, for each variable in each tree t, the difference between OOBMSE t and OOBMSE t,permuted is calculated as well as the average change in MSE across all trees. Predictors with the largest change are evaluated to have the largest relative importance. To account for model uncertainty, ten initial training runs of the model were carried out, and the predictor importance values for each predictor were stored across all ten model runs. The preliminary rankings of predictor importance were based on the median variable importance across these ten runs.
To develop the final model, model performance of both the initial RF model and the second OSBF was evaluated based on depth-specific prediction of groundwater nitrate-N concentrations at the monitoring well sites using the following metrics: the Nash Sutcliffe efficiency coefficient (NSE), MSE, the rootmean-square error (RMSE), percent bias (PBIAS), and R-squared values for the relationship between observed and predicted values. Equations associated with each performance metric are provided in supplementary table S3. The final model grew 200 decision trees with a terminal node size of five records, and with ten predictors (∼one-third of the total predictors in the final model) being randomly sampled at each decision split. The final rankings of predictor importance were based on the median scored across all runs. Two-factor analysis of variance (ANOVA) was used to test for significant differences between the predictor importance values of all predictors (Helsel et al 2020). The final rankings of predictor importance were based on the median predictor importance values scored across 200 runs with the 31 selected predictors. Two-factor ANOVA was used to test for significant differences between the predictor importance values of all 31 predictors (Helsel et al 2020). For additional information regarding evaluation of predictor importance and development of the final model, see the supplemental information (SI text S1).

Gridded estimates of concentration and nitrate-N mass
The developed RF model was used to predict depthstratified groundwater nitrate-N concentrations (mg l −1 ) at a 250 m grid scale across the UMRB for the year 2021. Concentrations were predicted at 10 m depth increments to a maximum depth of 200 m, with the model being run 200 times for each specified depth. Concentrations values were converted to depth-stratified estimates of N mass (kg ha −1 ) within each grid cell, as follows: where M represents the area-normalized mass of nitrate-N (kg ha −1 ), c the NO 3 -N concentration (kg m −3 ), d the layer depth (m), n the estimated porosity, and 10 000 the conversion factor to convert from m 2 to hectares (ha). Spatially varying porosity estimates were obtained from Gleeson et al (2015). The mass estimates for each 10 m depth interval were integrated to produce estimates of N mass for three depth ranges: (1) 0-50 m, (2) 50-100 m, and (3) the depth of modern groundwater. For the latter, we consider modern groundwater to be groundwater recharged within the past 50 years (Kazemi et al 2006, Bethke and Johnson 2008, Gleeson et al 2015. To determine the spatially varying depths of modern groundwater across the UMRB, we used the dataset developed by Gleeson et al (2015), which provides d equivalent , defined as the height of young groundwater if extracted and pooled evenly at the land surface (m), at the scale of HydroSHEDS Level 00 basins (Lehner et al 2008). Watershed-scale data was converted to a 250 m grid scale using QGIS software. To convert from d equivalent depths to the depth to which modern groundwater would be present underground, we used the following equation: where D modern is the depth of modern groundwater (m) and D wt is the depth to the water table.
Water table depths were obtained from the National Hydrography Dataset (USGS n.d).

Development of the lumped statistical model
A statistical model was used to provide a second estimate of NO 3 -N accumulation in UMRB groundwater. For this approach, we included all 49 031 groundwater well measurements that were also used to develop the RF models. The concentration data points were separated into groups by depth, based on 20 m depth increments from 0 to 200 m. For each depth range, we calculated the median NO 3 -N concentration and the corresponding interquartile range (IQR). To quantify the predicted NO 3 -N mass at each depth range, we used the same approach used for the gridded mass estimates (equation (1)), except that instead of using spatially varying porosity values, we used the median porosity value for the watershed. (Gleeson et al 2015) As with the RF model results, mass estimates were aggregated across three depth ranges: (1) 0-50 m, (2) 50-100 m, and (3) the depth of modern groundwater. To provide a lumped estimate of the depth of modern groundwater, we took the median value of the spatially varying, 250 m grid-scale depth estimates used in the RF methodology (equation (2)).

Development of the mass balance model
A simple mass balance model was used to estimate the maximum possible NO 3 -N accumulation magnitude for the UMRB. In this model, we calculate the cumulative net N inputs, N cum (kg ha −1 ), for the watershed for the period 1967-2017: where SURP ann (kg ha −1 ) is the annual watershedscale N surplus and RIV ann (kg ha −1 ) is riverine N export. Here, the N surplus is defined as the difference between N inputs to the watershed (fertilizer, livestock manure, biological N fixation, human waste, atmospheric deposition) and N uptake in the form of crop production (Byrnes et al 2020a). All N surplus data was taken from the 250 m grid scale gTREND-Nitrogen dataset and was aggregated to the scale of the UMRB. Area-normalized estimates of annual riverine export are provided here based on calculations of annual NO 3 -N fluxes for the Mississippi River below Grafton, Illinois (Murphy et al 2013). To predict riverine N in years for which no riverine N data are available, we calculated annual export ratios, EXP ann , for the years with data availability, and then used the median export ratio to predict values for the missing years: EXP ann = RIV ann SURP ann . (4)

Results and discussion
The final groundwater nitrate dataset for the UMRB consisted of 49 031 sample measurements from 37 655 unique well locations. Groundwater well densities varied across the UMRB, with the highest number of wells per HUC8 watershed (>750 wells per HUC8) in Wisconsin and Iowa (figure 2(a)) and the lowest (<5 wells per HUC8) in Minnesota. The median well depth for the dataset is 39.6 m (IQR 24.4-70.0) ( figure 2(b)), and the median nitrate-N concentration is 1.8 mg NO 3 -N/L (IQR 0.03-4.10) (figure 2(c)). Across all depth ranges, 5958 wells have concentrations greater than the drinking water standard of 10 mg NO 3 -N/L and more than 300 have concentrations greater than 40 mg NO 3 -N/L. Concentrations were found to decrease with depth ( figure 2(d) The ability of the developed RF models to accurately predict depth-varying groundwater NO 3 -N concentrations for the groundwater well dataset was assessed across the entire dataset and across a range of depth and concentration increments. For the final 31predictor boosted model (OSBF), the model demonstrated RMSE of 2.4 (training data) and 5.1 (test data). The Nash-Sutcliffe efficiency (NSE) values were 0.91 (training) and 0.47 (test). The PBIAS values were 0.47% (training) and 6.1% (test). The final boosted model had an R 2 of 0.93 for all training data and 0.50 for test data (SI table S4, figure S1(a)).
Based on the above results, we conclude that the predictive ability of the model is strong, but also note that performance varies across both depth and concentration ranges (SI figure S1, tables S5 and S6). In particular, the model performance is poorest for wells with the highest nitrate concentrations (>95th percentile, >15.8 mg l −1 ), where there is a strong tendency toward underprediction of concentration magnitudes (SI figure S1(c)). These findings are not unexpected, as ensemble-tree machine learning models like the RF models used here are known to be prone to systematic bias, with an overestimation of small values and an underestimation of large values . We have reduced this bias to some extent in the current methods through use of a second, boosted RF model (OSBF), which provides overall better performance. In all cases, the OSBF boosted model provided superior results to the basic RF model, and therefore results from the OSBF are used in all subsequent analyses.
Because RF models provide built-in measures of predictor importance, they are frequently used to identify drivers of the target variable being measured. Accordingly, we discuss herein the most important predictors of groundwater nitrate-N concentrations in our developed model. However, before discussing in detail these identified predictors, it is important to point out that an RF model is an empirical model based on the correlation between predictors and the target variable rather than on any establishment of causation (Grömping 2009). In addition, the quantified importance of predictors in a particular model is highly context-dependent, and different relative degrees of importance might be found with a different set of input data or when modeling similar data over a different spatial domain (Vowels 2022). These caveats must be kept in mind in any discussion of relative variable importance.
That being said, for our RF model, the most important predictors of groundwater nitrate-N concentrations in the UMRB were found to be permeability, hydraulic conductivity, well depth, soil pH, and depth to the water table (SI figure S1(d)). Higher permeability and hydraulic conductivity values were found to be positively associated with higher groundwater nitrate concentrations (figures 3(a) and (b)). These positive correlations with higher nitrate concentrations are expected, as higher soil permeability and conductivity typically suggest that soils have larger pore spaces and are well-drained. Accordingly, water can infiltrate the soil more easily and move through it quickly (Harter and Walker 2008). While soils with high permeability tend to be beneficial for agriculture due to reduced waterlogging and a facilitation of root growth, they also play a significant role in the leaching of solutes to groundwater  due to a faster transport of nitrate-laden water in areas of higher permeability and conductivity (figures 3(a) and (b)). Groundwater well depth is negatively correlated with concentration ( figure 3(c)), particularly in the range from 0 to 50 m, similar to the relationship seen for the raw well data. These results are typical of other studies of groundwater nitrate, which have found depth to be a strong predictor of concentration (Wheeler et al 2015). Soil pH is positively correlated with concentration, with an especially strong jump in concentration between pH values from 5.5 to 6 ( figure 3(d)). This relationship can largely be explained by forested areas of the UMRB having more acidic soils and therefore low levels of agricultural N inputs (SI figure S2(i)). There is also a positive correlation between depth to the water table and predicted groundwater nitrate concentrations (figure S3(e)), likely due to the higher likelihood of higher denitrification rates and thus reduced transport of nitrate to shallow groundwater where water tables are highest (Kalita and Kanwar 1993).
The model also predictably shows positive correlations between agricultural N inputs and groundwater nitrate concentrations. Notably, the final RF model shows strong relationships between the predicted nitrate concentrations, agricultural fertilizer inputs, and the overall N surplus. For fertilizer, the steepest increases in concentration occur at fertilizer input magnitudes above 60 kg-N ha −1 yr −1 (figure 3(f)). We also see a threshold effect for the overall N surplus magnitudes, with the sharp increases in groundwater nitrate concentrations being seen in areas with an N surplus above 60 kg-N ha −1 yr −1 (figure 3(h)). In the UMRB, areas with fertilizer application rates less than 60 kg ha −1 yr −1 generally represent portions of the landscapes where land use is mixed, with fertilized crops such as corn occurring adjacent to forested or residential areas (Byrnes et al 2020a). In contrast, higher fertilizer application rates generally correspond to areas dominated by row-crop agriculture, with either corn production alone or corn mixed with soybeans. Accordingly, these results indicate a relationship between higher groundwater nitrate concentrations in areas with more intensive row-crop agriculture. We also see strong positive correlations between increased livestock manure magnitudes and groundwater nitrate concentrations, with modeled concentrations being particularly sensitive to magnitudes of both poultry and hog manure (figure 3(g), SI figure S2(d)).

Predictions of gridded, spatially varying UMRB nitrate-N concentrations
The results from the developed RF model show substantial spatial and depth-related variability in nitrate-N concentrations across the UMRB. In figure 4, we see concentrations decreasing with depth, ranging from a median concentration of 7.6 (IQR 6.1-11.7) mg NO 3 -N/L in the shallowest groundwater (0-10 m) down to 3.4 (IQR 2.6-4.7) mg NO 3 -N/L at depths greater than 50 m. In the shallowest layer, we see the highest concentrations in Iowa, southern Minnesota, and in northern Indiana, at the easternmost reach of the watershed. All of these areas have high levels of agricultural land use and, in the case of Iowa and southern Minnesota, high livestock densities, especially hog production (SI figures S2(a)-(c)). Below 20 m, the locations of the predicted nitrate hot spots change, specifically to the area of Eastern Iowa's Silurian-Devonian aquifer system, which is composed chiefly of porous dolomites and limestone (Prior et al 2003). In these karst aquifers, it is common to see high levels of nitrate enrichment throughout the aquifer and high contributions of groundwater nitrate to river baseflow (Kalvāns et al 2021, Musgrove et al 2014.

Predictions of groundwater nitrate-N accumulation magnitudes
Results from our watershed-scale analysis of observed groundwater nitrate-N concentrations suggest that accumulation magnitudes, like nitrate concentrations themselves, decrease with depth, with the highest accumulation magnitudes, ∼130 kg NO 3 -N ha −1 at the shallowest depths (0-20 m) ( figure 5(a)). To place those findings in context, keep in mind that the mean application rate for fertilizer N across the UMRB is approximately 39 kg-N ha −1 and that the mean annual N surplus magnitude for the watershed is approximately 49 kg-N ha (IQR 14-72) ( figure 5(b)). Accordingly, our statistical analysis leads us to conclude that nitrate-N accumulation in the shallowest 20 m is equivalent in magnitude to approximately 2-3 years of the total watershed N surplus.  Further, our results from the RF modeling demonstrate a clear spatial variability in accumulation magnitudes, with accumulation being a function of not only concentration but also aquifer porosity (Gleeson et al 2015). In figure 6, we show spatially varying N accumulation magnitudes across three different depth ranges: (6a) shallow groundwater (0-50 m); (6b) deep groundwater; and (6c) modern groundwater, which we define, based on the work of Gleeson et al (2015), as groundwater recharged within the last 50 years. Note that the depth of modern groundwater (median 75 m, IQR 34-102) varies across the watershed (figure 6(d), SI figure S2(g)), based on variations in porosity and groundwater recharge rates (Gleeson et al 2015). For modern groundwater , the predicted median N accumulation magnitude is 314 kg-N ha −1 (IQR 217-416 kg-N ha −1 ), with accumulation magnitudes reaching higher than 600 kg-N ha −1 in some areas of the UMRB where N inputs to the watershed are high and where aquifer porosities are also high (figure 6(e)). Figure 6. Total predicted N mass (kg ha −1 ) in groundwater across the UMRB. Panels (a) and (b) show the range of N accumulation magnitudes across shallow (water table-50 m) and deep (50-100 m) depth ranges, and panel (c) shows legacy N accumulation in 'modern groundwater' across the UMRB. Here, modern groundwater is defined as groundwater recharged within the past 50 years (Kazemi et al 2006, Bethke and Johnson 2008, Gleeson et al 2015. In (d), we show the distribution of 'modern groundwater' depths across the UMRB, and in (e) we show the estimated distribution of legacy N mass in UMRB modern groundwater.
Based on the RF results, we estimate total mean accumulation of NO 3 -N in modern UMRB groundwater to be 328 ± 167 kg-N ha −1 , a value approximately ∼16% less than our prediction based on the simple statistical analysis of well data alone (∼389 kg-N ha −1 ) ( figure 5(b)). The estimates are similar, of course, due to both models being based on the same underlying dataset. Note, however, that the RF model allows us to take into account spatial heterogeneity in various landscape and management drivers of groundwater N accumulation, and thus to make predictions for areas with limited monitoring data. Accordingly, the RF approach may help to eliminate any bias in the available groundwater data that might be caused by more frequent and/or spatially targeted monitoring of nitrate-N concentrations in areas at higher risk of high concentration magnitudes. In other words, the lower N accumulation estimates from the RF model may be justified.
These two estimates of N accumulation in UMRB groundwater also lie well within the upper bounds of potential N accumulation that we have estimated based on calculation of N surplus magnitudes and nitrate-N outflows from the watershed at the watershed outlet. As shown in figure 5(c), annual watershed N surplus magnitudes have increased by approximately 60% over the 50 year study period, from 25 kg-N ha yr −1 in 1967 to 39 kg-N ha yr −1 in 2017. Over this period, riverine N outflows from the watershed have also increased, from approximately 5.6 kg NO 3 -N ha yr −1 in 1967 to 9.0 kg NO 3 -N ha yr −1 in 2017. Based on these values, we estimate that, over the 50 year period from 1967 to 2017, approximately 22% of surplus N has been exported via riverine flows, an estimate that is in line with other estimates for US rivers (Boyer et al 2002, Chang et al 2021. Accordingly, we estimate cumulative net N inputs over this time period (N surplus-riverine N export) to be approximately 1400 kg-N ha −1 ( figure 5(d)), a magnitude that represents the upper bound on possible N accumulation. Our estimates of modern groundwater N accumulation over the period 1967-2017, then, constitute on the order of 21% (RF Results) to 28% (lumped statistical model) of net N inputs, suggesting that the other approximately 72%-79% of net N inputs have undergone denitrification, are being stored in soils and sediments, or have been taken up by watershed vegetation (Van Meter et al 2017, Chang et al 2021, Liu et al 2021, Batool et al 2022. These rough estimates of denitrification and storage magnitudes are in line with those of a study of legacy N accumulation in agricultural soils of the Mississippi River Basin, in which it was shown that approximately 69% of surplus N remaining after riverine export is stored in soils and that the other 31% is either denitrified or stored in groundwater or other landscape compartments (Van Meter et al 2016).
The current work is, to our knowledge, the first attempt at using data-driven approaches, with actual groundwater well concentration data, to provide spatially varying predictions of the mass of nitrate-N in the groundwater of a large agroecosystem. The broad value of such predictions lies in the importance of groundwater N to both current and future riverine N loads. In the last decade, it has increasingly been shown that landscape N accumulation within soils, sediments, and groundwater of intensively farmed landscapes can lead to multi-decadal time lags in achieving water quality goals , Ilampooranan et al 2019, Chang et al 2021, Liu et al 2021. With this understanding, there has been a call for new modeling approaches that can better capture the time lags associated with subsurface legacy N accumulation (Vero et al 2018, Lutz et al 2022, Golden et al 2023. However, without robust data-driven estimates of groundwater N accumulation, we are unable to reliably calibrate watershed legacy N models and to refine our predictions of watershed time lags. Our spatially varying estimates of groundwater N accumulation in the UMRB, the largest contributor to Gulf of Mexico hypoxia, not only provide us with a better understanding of hot spots for N accumulation and threats to rural drinking water quality in the upper Midwestern US, but also provide a framework for improving our modeled predictions of riverine nitrate trajectories.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// 10.6084/m9.figshare.22578796.