Spatial aggregation effects on the performance of machine learning metamodels for predicting transit time to baseflow

Water transit time is the duration between the entry and exit of a parcel of water across a hydrologic system. It is a fundamental characteristic that links hydrologic transport, biogeochemical processing, and water quality, and it has broad implications for resource vulnerability and sustainability. Physically based models can accurately describe transit time distributions but require significant computational resources when applied to large regions at high resolutions. In this study, we evaluate the potential of machine learning metamodels to emulate physically based models for computationally efficient prediction of key metrics from transit time distributions. Transit times are computed from a continental scale, integrated hydrologic model coupled with particle tracking. The metamodeling approach is illustrated in the 280,000-sq km Upper Colorado River Basin, USA, a principal headwater basin that is under multiple stresses, including resource overallocation, water quality threats, and climate change impacts. We evaluate the effects of using different types of spatial aggregation in the metamodels, including regular grids, hydrologic units, and upstream watersheds. We found that metamodels using upstream watershed aggregation exhibited the best overall performance across our target predictions. Errors were more pronounced in metamodels that employed smaller spatial aggregation units compared to larger units, suggesting that additional predictors that capture the heterogeneity of topographic, climatic, and geologic properties are needed at these scales. We also found that predictor importance and input-output relations were remarkably consistent across spatial aggregation type and agree with previous findings documented from physically based models and tracer-based studies. Our results show the feasibility of developing machine learning metamodels for predicting transit times and demonstrate the necessity of multiscale analyses to probe the robustness of the findings.


Introduction
Transit time, or the amount of time between entry and exit of a parcel of water through a system, is a key concept that integrates hydrological transport and biogeochemical processes to determine water quality across surface water and groundwater domains at multiple scales (Hrachowitz et al 2016, Li et al 2021, Benettin et al 2022).It has also been used to describe the intrinsic vulnerability or susceptibility of water bodies to pollution from various anthropogenic activities and to aid in understanding resource renewability as well as climate change impacts (Molson and Frind 2012, Jasechko et al 2017, Clow et al 2018, Gleeson et al 2020, Hosen et al 2021, Jurgens et al 2022).Transit time distributions, which reflect the heterogeneity of flow paths and overall system complexity, can be inferred from environmental tracers using lumped parameter models and storage selection functions that describe the system through different mixing or sampling scenarios, such as piston flow or complete mixing (McGuire andMcDonnell 2006, Sprenger et al 2019).These approaches require spatially and temporally resolved observations of tracer concentrations in both the inflows and outflows to the system of interest.Such observations can be obtained in highly instrumented watersheds but are typically not available at large regional scales (Stockinger et al 2016, Jasechko 2019).Moreover, multiple tracers are required to generate full transit time distributions that accurately reflect contributions from old groundwater (Stewart et al 2010, Frisbee et al 2013, Thiros et al 2023).
Alternatively, transit time distributions can be estimated from physically based hydrological models, which describe states and fluxes in the system through a set of partial differential equations, boundary conditions and forcings, and spatially heterogeneous parameters (Engdahl and Maxwell 2015, Danesh-Yazdi et al 2018, Remondi et al 2018, Maxwell et al 2019, Jing et al 2021, Zeyrek et al 2023).Results from these models can be calibrated to available observations of groundwater level or streamflow, thus alleviating the need to obtain high frequency tracer data to gain insights on transit times.Continental scale transit time assessment using high resolution, integrated hydrological modeling paired with particle tracking has been demonstrated in recent years, but such applications remain relatively rare due to their substantial computational requirements (Maxwell et al 2016, Yang et al 2023).
Machine learning is increasingly used alongside physically based hydrological modeling to reduce some of these computational demands (Konapala et al 2020, Tran et al 2021, Razavi et al 2022).Metamodeling, also referred to as model emulation or surrogate modeling, involves training machine learning models on inputs and outputs from physically based models (Asher et al 2015).Owing to their reduced complexity and fast run times, metamodels provide the opportunity to explore questions that would otherwise be computationally intractable via physically based models.Metamodels also provide physically meaningful insights in regions beyond the geographic domain of physically based models as they emulate the driving principles behind such models (Soriano et al 2021, Starn et al 2021).
Metamodels have been constructed to emulate physically based models at various levels.Some metamodels emulate physically based models at the level of the grid resolution, i.e., the inputs and outputs of the metamodel are defined using the same computational topology as the source physically based model (Maxwell et al 2021, Ren et al 2021, Starn et al 2021, Tran et al 2021, Bjerre et al 2022, Soriano et al 2022).Meanwhile, other metamodels have been constructed with target outputs computed for individual locations (e.g., groundwater wells) with input predictors aggregated within a surrounding region of influence (e.g., circular zones with multiple defined radii or windows encompassing n adjacent cells) (Fienen et al 2018, Nolan et al 2018).Another approach previously applied to metamodeling was to aggregate target responses and input predictors to entire model domains, e.g., polygons defined by the hydrologic unit code (huc) framework (Starn and Belitz 2018).These various approaches have also been applied in data-driven machine learning models for predicting groundwater and surface water quality (Green et  The effects of aggregation and scale on transit time distributions have been previously documented in tracerbased studies (Hrachowitz et al 2010, Speed et al 2010, Hale and McDonnell 2016, Kirchner 2016a, 2016b).However, their effect on machine learning metamodels is less understood.In this study, we systematically evaluate the effect of different types of aggregation on metamodels trained to emulate an integrated surface water and groundwater model.We assess the effect of aggregation type on the prediction of three metrics describing transit time distributions: the median transit time, interquartile range, and the fraction of transit times shorter than 65 years.Metamodel predictive performance and consistency are assessed.Our analysis highlights considerations for the implementation of machine learning approaches to successfully emulate physically based models and serve as scientifically robust decision support tools.

Study area and physically based model for transit time computations
The Upper Colorado River Basin (UCRB) is a snowmelt-dominated basin that covers 280,000-sq km across the western US states of Wyoming, Colorado, Utah, New Mexico, and Arizona (figure 1).Elevation ranges from 900 m at Lee's Ferry to 4,300 m at the peaks of the Southern Rocky Mountains along the eastern portion of the UCRB.The climate is characterized by an annual average temperature of 6°C and annual average precipitation of less than 250 mm at low elevations to more than 1000 mm at high elevations.Hydrogeology is characterized by moderate to well-consolidated sedimentary aquifers, and groundwater discharge (baseflow) is estimated to contribute 56% of total streamflow on average (M P Miller et al 2016, Tran et al 2020).The UCRB is one of the principal headwater basins of the United States, providing most of the flow in the Colorado River, which supplies drinking water for over 40 million people and irrigation water for 2.2 million hectares of agricultural land (Tillman et al 2022).The Colorado River is one of the most overallocated basins in the world and is experiencing hydrological impacts from resource overexploitation as well as climate change (Ficklin et al 2013, O L Tillman et al 2016, Miller et al 2021, Tran et al 2022).Salinity is an important water quality concern, with estimated economic damages of 300 to 400 million USD per year (Rumsey et al 2021).Dissolved solids in the UCRB waters are generated by natural processes, such as dissolution of sedimentary and carbonate rocks, and anthropogenic activities, such as irrigation of agricultural land (Rumsey et al 2017, Tillman et al 2018).Dissolution rates are related to transit time through water-rock interactions, such that a better understanding of transit times in the UCRB can help shed light on regional salinity concerns (Maher 2010).
Transit times in the UCRB were computed from a continental scale, integrated hydrological model, ParFlow, coupled with Lagrangian particle tracking, as described by Maxwell et al (2016).ParFlow uses a terrainfollowing grid to solve the three-dimensional Richards' equation for subsurface flow and the kinematic wave equation for surface water flow, employing a free surface boundary condition at the interface of the two domains (Maxwell et al 2015).Once the pressure and velocity fields were computed, particle tracking was used to simulate advective transport of a conservative solute representing parcels of water.Particles were initialized at stream cells (i.e., fully saturated cells corresponding to the streambed) and tracked backwards towards their infiltration locations at the land surface.The computed transit time excludes the time spent within surface water (e.g., stream channels), as these were orders of magnitude smaller than time spent in the unsaturated and saturated subsurface (Maxwell et al 2016, Hosen et al 2021).This transit time thus corresponds to the transit time of baseflow.The ParFlow model was set-up using a 1-sq km grid cell resolution and a total depth of 102 m discretized into five layers.The top 2 m of the domain was assigned properties based on soil texture, while the bottom 100 m was based on lithology and an empirical e-folding relationship that reflects changes in topography.This physically based model was run at steady state using long-term average precipitation minus evapotranspiration (PME) as climatic forcing, representative of conditions from 1950-2000.The continental scale simulation required 2.5 M core hours.Simulations were validated using average groundwater levels, streamflow, and water age inferred from tritium concentrations.Within the UCRB, simulated water table depths exhibit good agreement with observations, while simulated flows are slightly overpredicted (figures 7 and

Machine learning metamodel set-up
For our metamodeling, we set as our targets three metrics that describe the particle tracking-derived transit time distributions (figure 2): median transit time (MTT), the interquartile range (IQR), and the fraction of transit times shorter than 65 years (f65y).We selected f65y to designate a fast fraction of flow that is analogous to the fraction of young or recent groundwater, following definitions based on the atmospheric tritium peak from bomb testing adopted in previous groundwater age studies (Fienen et al 2018, Starn et al 2021).These transit time metrics were computed from the aggregation of particles based on their infiltration point at the land surface (i.e., the ending locations of the particle tracking simulation).Thus, the metrics describe the forward transit time from infiltration to baseflow or groundwater discharge to streams.
We used four methods to spatially aggregate the particles into zones to calculate these metrics from the particle tracking-derived transit time distributions: upstream watershed (wshed), 12-digit and 10-digit hydrologic unit code polygons (huc12 and huc10, respectively), and a regular Cartesian grid with 10-sq km cells and 20-sq km cells (grid10 and grid20, respectively) (figure 1).These aggregation types were selected as they have previously been used for defining inputs and outputs in machine learning applications as described in the Introduction.Upstream watersheds were delineated from the hydrologically conditioned digital elevation model that was used to define the top boundary of the parent ParFlow model.huc polygons were extracted from the US Geological Survey's Watershed Boundary Dataset (USGS 2022).Within this framework, huc10's are sometimes referred to as watersheds and huc12's are referred to as sub-watersheds.However, these polygons do not necessarily encompass the full recharge area contributing water and material fluxes to a location, and they are often truncated portions of real topographic watersheds (Griffith et al 1999).The grid based aggregation units are not dictated by topographic recharge areas; rather, they represent the practice of aggregating distributed point data into regular grids of selected predictors (e.g., raster data) to produce corresponding grids of the target prediction (e.g., Podgorski and Berg 2020).Transit times were log-transformed prior to spatial aggregation to account for heavy tailed distributions.Predictor variables describe the topography, climate, and geology of the study area, based on prior studies examining dominant controls on transit time distributions The predictors were defined to match the corresponding spatial aggregation of the target variable.Recursive feature selection was performed until we arrived at a parsimonious set of predictors, which were then kept uniform across all subsequent analyses (Table S1).Processing of the spatial data was conducted in ArcGIS Pro 3.0 and arcpy.
We trained metamodels using random forest (RF) regression, an ensemble machine learning method that uses decision trees and bootstrap aggregation (bagging) to recursively split random samples of the training data into more homogeneous sets using random subsets of predictors, with the final prediction calculated as the mean of predictions across all trees (Breiman 2001).RF has been used in a wide variety of hydrological applications and has been shown to minimize variance and overfitting (Tyralis et al 2019).We tested 1,000, 3,000, and 10,000 trees and subsequently fixed 3,000 trees as it provided a balance between stability of results and computation time.We used grid search to determine optimal values for other hyperparameters (Table S2).We used a 70-30 train-test split for each of the spatial aggregation set-ups.Metamodel predictive performance was assessed through the Pearson correlation coefficient (r), root mean square error (rmse), and median percent bias As a further check on the robustness of our results, we repeated our analysis using the extreme gradient boosting (XGB) regression algorithm as implemented in the xgboost R package (Chen et al 2023).

Metamodel performance
The physically based model predicted transit times ranging from less than 1 year to greater than 1,000 years.The spatial distributions of the MTT, IQR, and f65y were consistent across aggregation types, with a distinct cluster of longer transit times predicted in the north and southeastern portions of the UCRB.
Metamodel performance varied with respect to the target prediction and aggregation type (table 1).Performance in the training data is comparable to performance in the held out testing data, suggesting robustness to overfitting and the generalizability of the metamodels.
Across the different target predictions, the lowest rmse's were obtained by metamodels using the huc10 aggregation type.However, the overall best performing metamodels were those using the wshed aggregation type, as shown by examining rmse, r, and pbias together.Metamodels using a larger aggregation unit (huc10 and grid20) outperformed their corresponding metamodels that used a smaller aggregation unit (huc12 and grid10, respectively), which could indicate a smoothing effect at the larger aggregation units.This result also suggests a need for additional predictors that can capture the appropriate variability at the huc12 and grid10 levels.The XGB metamodels are comparable to the RF metamodels, with slight improvement in performance for training data but lower generalizability to testing data.The performance of the best metamodels is comparable to the performance previously reported for machine learning models for predicting water quality parameters at large scales.For example, Shen et al (2020) reported testing set r values ranging from 0.56 to 0.81 for RF models predicting concentrations of nitrogen and phosphorus in streams within the continental US.Similarly, Ransom

Metamodel interpretations
The relative influence of predictors was evaluated using permutation feature importance and SHAP values (figure 3).Permutation importance is the decrease in model performance when a single predictor is randomly shuffled, thus showing how much the overall prediction depends on a given predictor (Breiman 2001).SHAP is a measure of the local importance of each predictor for each prediction based on Shapley values from game theory, and the mean SHAP magnitude across all predictions is a measure of the global importance of a given predictor (Lundberg et al 2020).Relative predictor influence varied with respect to the target prediction and aggregation type.Metamodels using the wshed aggregation identified area as an important predictor for all three target predictions.Across the different aggregation types, predictors describing topography and geology were the most important for predicting MTT, while predictors describing topography and climate were the most important for predicting IQR.Predictors describing topography, geology, and climate were important for predicting f65y.Results for the XGB metamodels are similar (figure S3).These findings agree with Maxwell et al (2016), who showed that mean hydraulic conductivity (a function of geology) was a dominant control on peak residence times, while aridity (a function of climate) controlled on the variance of the residence time distributions.
The influence of the top predictors is also visualized through plots of the accumulated local effects, which summarize the nonlinear behavior of the metamodels (figure 4).ALE plots for the XGB metamodels are shown in figure S4.A positive ALE indicates that the prediction is higher than the average within the local predictor window, while a negative ALE indicates a prediction that is lower than the average.An ALE of zero indicates that the predicted value does not differ from the average prediction.Metamodel behavior across different aggregation types is generally consistent, as shown in the ALE plots for ave_slp and sd_curvpl.For instance, as average slope increases (i.e., as the spatial unit becomes steeper), MTT tends to decrease (faster transit time), IQR tends to decrease (less variability in transit times), and f65y tends to increase (larger proportion of shorter transit times).ALE plots for sd_curvpl are more complex but remain consistent across the different spatial aggregations.Planform curvature describes the shape of a surface in terms of convergence or divergence of flow.It has been identified as an important variable for predicting groundwater potential, landslide susceptibility, and flood risk (Goetz et al 2015, Kalantar et al 2019, Mahdizadeh and Perez 2023).A higher variability of plan curvature may indicate, on average, shorter flow paths to discharging cells, which would be consistent with faster transit times (lower MTT and higher f65y).ALE plots for area indicate that as watershed area increases, predicted MTT and IQR also tend to increase while predicted f65y tends to decrease.These trends are consistent with an increasing heterogeneity in flow paths encompassed by these larger spatial units.The break in slope in the ALE plots may indicate a stabilization of the area effect.Examining plots of the transit time metrics against watershed area, we see pronounced overall trends of increasing MTT with area up to ∼10^8.5 sq m, increasing IQR with area up to ∼10^9.5 sq m, and decreasing f65y with area up to ∼10^7.5 sq m, with these thresholds bracketing the breaks in the ALE plots (figure S5).The breaks may also reflect fewer data points within the computation windows for larger areas.
Similar tradeoffs in the relative importance of landscape controls on transit time distributions have been previously demonstrated.For instance, the relationships may depend on the specific transit time characteristic being examined.Xiao et al (2021) found that mean transit time of stream water was more influenced by permeability variations, while the fraction of younger water was more influenced by hillslope shape.Relationships may also vary depending on the co-occurrence of specific landscape conditions.For example, Hale and McDonnell (2016) found that baseflow mean transit times were predominantly controlled by catchment area in locations with permeable bedrock, while topography was the dominant control in sites with poor bedrock permeability.Finally, the interaction between landscape and climatic factors also dictates their relationships with transit time.Carroll et al (2020) showed that median baseflow ages increased with reductions in precipitation (i.e., longer transit times with lower precipitation) in systems with low values of recharge to hydraulic conductivity (R/K ) ratios but were insensitive to precipitation changes in systems with high R/K ratios.

Implications
The analysis shows that RF metamodels can be trained to predict different characteristics of transit time distributions.Metamodel performance using larger spatial aggregation units (huc10, grid20) was generally better than performance using smaller units (huc12, grid10), but the best overall performance was attained by metamodels specified by spatial aggregation to upstream watersheds (wshed).This result indicates that machine learning metamodels specified as such may be more successful at emulating the relationships represented by physically based models.
Previous studies comparing the effect of spatial aggregation on the interpretation of water quality measurements have noted that differences in water quality that could be correctly predicted by differences in upstream watershed characteristics can sometimes be obscured by comparisons within encompassing huc units, as these boundaries may not integrate the full area contributing to water and material fluxes (Omernik et al 2017, Kuhn et al 2018).Our analysis supports the use of the upstream watershed for training metamodels to predict transit time, which is interlinked with water quality, as this aggregation type led to the best overall predictive performance.
Our work focuses on scale effects arising from the spatial aggregation of particle transit times and input predictors using different zonal units (i.e., polygons defined by watershed boundaries, huc units, or a regular grid).Scale effects also arise from variable spatial resolution (Zhang et al 2019).This type of scale effect is illustrated in our comparison of grid10 and grid20 metamodels.Previous studies documenting scale effects from spatial resolution have focused on physically based models that discretize space into a regular grid.Comparison of a subset of results from the 1-km resolution model used in our analysis and a 20-m resolution model found that the simulated transit time distributions are in good agreement (Engdahl andMaxwell 2015, Maxwell et al 2016).However, other studies that examined the impact of spatial resolution on physically based models have suggested potentially large scale effects depending on the hydrologic output of interest (Ichiba et al 2018, Komolafe et al 2018).Subsequent work should examine how scale effects from both spatial aggregation and spatial resolution impact the performance of metamodels and their parent physically based models.
The analysis presented herein may also be regarded as an inquiry into the potential effect of the modifiable areal unit problem (MAUP).The MAUP is a phenomenon that arises from spatially distributed data where statistical relationships between environmental predictors and target predictions are contingent on the level of spatial aggregation (Lloyd 2014, Buzzelli 2020).In this case, caution should be used in the physical interpretation of the models, particularly in terms of quantitative inferences (Salmivaara et al 2015).Multiscale and multilevel analyses, such as demonstrated in the current study, are necessary to establish the robustness of results (Friedrichs-Manthey et al 2020).As the ALE plots demonstrate, the metamodels learned robust input-output relationships that exhibit consistent trends across the different aggregation types we studied.In addition, novel machine learning techniques tailored to the properties of spatial data are being actively developed (Nikparvar and Thill 2021).The application of these emerging techniques to hydrological problems potentially affected by spatial aggregation needs further study.

Conclusions
We developed metamodels for predicting three characteristics of transit time distributions-the median transit time, interquartile range, and fraction of transit times shorter than 65 years-from a continental scale, integrated hydrologic model using readily mappable topographic, climatic, and geologic predictors.We evaluated the effect of using various types of spatial aggregation and found that metamodels employing upstream watershed aggregation exhibited the best performance.We also found that metamodels employing aggregation to larger spatial units outperformed their counterparts employing smaller units, suggesting a potential smoothing effect within larger areas and a need for more predictors that preserve the appropriate level of heterogeneity at smaller units.Metamodel input-output relations were remarkably robust across aggregation types and were consistent with physical understanding of transit time behavior.Given the spatial nature of hydrological data, we recommend implementing similar multiscale analyses as a component of hydrological applications of machine learning to ensure robustness of reported findings.
al 2021, Ransom et al 2022, Zhu et al 2022, Haggerty et al 2023, Mallya et al 2023).Data-driven applications in surface water systems have alternatively employed predictors aggregated within the upstream watershed of target output locations (Tillman et al 2018, Shen et al 2020, Virro et al 2022).

Figure 1 .
Figure 1.Median transit time (MTT, years) in the Upper Colorado River Basin (UCRB) for different types of spatial aggregation: a.) upstream watershed (wshed, n = 18,281), b.) 12-digit hydrologic unit code (huc12, n = 3,051), c.) 10-digit hydrologic unit code (huc10, n = 523), d.) regular 10-km grid (grid10, n = 2,786), and e.) regular 20-km grid (grid20, n = 668).MTT for wshed represents the median transit time for all particles located within the upstream watershed of any given stream cell.MTT for the other aggregation types is the median transit time for all particles contained within the corresponding polygons.Maps for the interquartile range of the transit time distribution (IQR) and the fraction of transit times shorter than 65 years (f65y) are shown in the Supporting Information (Fig. S1 and S2).The inset in a.) shows the location of the UCRB within the conterminous USA.

Figure 2 .
Figure 2. Representative cumulative distribution function (CDF) of particle tracking-derived transit times within a spatial unit.The dotted lines show the relevant percentiles in the distribution, while the vertical blue line shows the transit time corresponding to 65 years.In this example, the numerical values for the three target predictions are as follows: MTT = 155 years, IQR = 390 years, and f65y = 0.31.
McGuire K J, McDonnell J J, Weiler M, Kendall C, McGlynn B L, Welker J M and Seibert J 2005 The role of topography on catchment-scale water residence time Water Resour.Res.41 W05002 Miller M P, Buto S G, Susong D D and Rumsey C A 2016 The importance of base flow in sustaining surface water flow in the Upper Colorado River Basin Water Resour.Res.52 3547-62 Miller O L, Miller M P, Longley P C, Alder J R, Bearup L A, Pruitt T, Jones D K, Putman A L, Rumsey C A and McKinney T 2021 How will baseflow respond to climate change in the upper Colorado River Basin?Geophys.Res.Lett.48 e2021GL095085 Molson J W and Frind E O 2012 On the use of mean groundwater age, life expectancy and capture probability for defining aquifer vulnerability and time-of-travel zones for source water protection J. Contam.Hydrol.127 76-87 Nikparvar B and Thill J-C 2021 Machine learning of spatial data ISPRS International Journal of Geo-Information 10 600 Nolan B T, Green C T, Juckem P F, Liao L and Reddy J E 2018 Metamodeling and mapping of nitrate flux in the unsaturated zone and groundwater, Wisconsin, USA J. Hydrol.559 428-41 Omernik J M, Griffith G E, Hughes R M, Glover J B and Weber M H 2017 How misapplication of the hydrologic unit framework diminishes the meaning of watersheds Environmental Management 60 1-11 Podgorski J and Berg M 2020 Global threat of arsenic in groundwater Science 368 845-50 R Core Team 2023 R: A Language and Environment for Statistical Computing (4.2.1) (R Foundation for Statistical Computing) https://rproject.org/Ransom K M, Nolan B T, Stackelberg P E, Belitz K and Fram M S 2022 Machine learning predictions of nitrate in groundwater used for drinking supply in the conterminous United States Sci.Total Environ.807 151065 Razavi S, Hannah D M, Elshorbagy A, Kumar S, Marshall L, Solomatine D P, Dezfuli A, Sadegh M and Famiglietti J 2022 Coevolution of machine learning and process-based modelling to revolutionize Earth and environmental sciences: a perspective Hydrol.Processes 36 e14596 Remondi F, Botter M, Burlando P and Fatichi S 2019 Variability of transit time distributions with climate and topography: a modelling approach J. Hydrol.569 37-50 Remondi F, Kirchner J W, Burlando P and Fatichi S 2018 Water flux tracking with a distributed hydrological model to quantify controls on the spatio-temporal variability of transit time distributions Water Resour.Res.54 3081-99 Ren H, Song X, Fang Y, Hou Z J and Scheibe T D 2021 Machine learning analysis of hydrologic exchange flows and transit time distributions in a large regulated river Frontiers in Artificial Intelligence 4 648071 Rumsey C A, Miller M P, Schwarz G E, Hirsch R M and Susong D D 2017 The role of baseflow in dissolved solids delivery to streams in the Upper Colorado River Basin Hydrol.Processes 31 4705-18 Rumsey C A, Miller O, Hirsch R M, Marston T M and Susong D D 2021 Substantial declines in salinity observed across the Upper Colorado River basin during the 20th Century, 1929-2019 Water Resour.Res.57 e2020WR028581 Salmivaara A, Porkka M, Kummu M, Keskinen M, Guillaume J H A and Varis O 2015 Exploring the modifiable areal unit problem in spatial water assessments: a case of water shortage in monsoon Asia Water 7 898-917 Shen L Q, Amatulli G, Sethi T, Raymond P and Domisch S 2020 Estimating nitrogen and phosphorus concentrations in streams and rivers, within a machine learning framework Scientific Data 7 161 Soriano M A Jr, Deziel N C and Saiers J E 2022 Regional scale assessment of shallow groundwater vulnerability to contamination from unconventional hydrocarbon extraction Environmental Science & Technology 56 12126-36 Soriano M A Jr, Siegel H G, Johnson N P, Gutchess K M, Xiong B, Li Y, Clark C J, Plata D L, Deziel N C and Saiers J E 2021 Assessment of groundwater well vulnerability to contamination through physics-informed machine learning Environ.Res.Lett.16 084013 Speed M, Tetzlaff D, Soulsby C, Hrachowitz M and Waldron S 2010 Isotopic and geochemical tracers reveal similarities in transit times in contrasting mesoscale catchments Hydrol.Processes 24 1211-24 Sprenger M et al 2019 The Demographics of water: a review of water ages in the critical zone Rev. Geophys.57 800-34 Starn J J and Belitz K 2018 Regionalization of groundwater residence time using metamodeling Water Resour.Res.54 6357-73 Starn J J, Kauffman L J, Carlson C S, Reddy J E and Fienen M N 2021 Three-dimensional distribution of groundwater residence time metrics in the glaciated United States Using metamodels trained on general numerical simulation models Water Resour.Res.57 e2020WR027335 Stewart M K, Morgenstern U and McDonnell J J 2010 Truncation of stream residence time: how the use of stable isotopes has skewed our concept of streamwater age and origin Hydrol.Processes 24 1646-59 Stockinger M P, Bogena H R, Lücke A, Diekkrüger B, Cornelissen T and Vereecken H 2016 Tracer sampling frequency influences estimates of young water fraction and streamwater transit time distribution J. Hydrol.541 952-64 Tetzlaff D, Seibert J, McGuire K J, Laudon H, Burns D A, Dunn S M and Soulsby C 2009 How does landscape structure influence catchment transit time across different geomorphic provinces?Hydrol.Processes 23 945-53 Thiros N E, Siirila-Woodburn E R, Dennedy-Frank P J, Williams K H and Gardner W P 2023 Constraining bedrock groundwater residence times in a mountain system with environmental tracer observations and Bayesian uncertainty quantification Water Resour.Res.59 e2022WR033282 Tillman F D, Anning D W, Heilman J A, Buto S G and Miller M P 2018 Managing salinity in Upper Colorado River basin streams: selecting catchments for sediment control efforts using watershed characteristics and random forests models Water 10 676 Tillman F D, Day N K, Miller M P, Miller O L, Rumsey C A, Wise D R, Longley P C and McDonnell M C 2022 A review of current capabilities and science gaps in water supply data, modeling, and trends for water availability assessments in the Upper Colorado River Basin Water 14 3813 Tillman F D, Gangopadhyay S and Pruitt T 2016 Changes in groundwater recharge under projected climate in the upper Colorado River basin Geophys.Res.Lett.43 6968-74 Tran H, Leonarduzzi E, De la Fuente L, Hull R B, Bansal V, Chennault C, Gentine P, Melchior P, Condon L E and Maxwell R M 2021 Development of a deep learning emulator for a distributed groundwater-surface water model: ParFlow-ML Water 13 3393 Tran H, Zhang J, Cohard J-M, Condon L E and Maxwell R M 2020 Simulating groundwater-streamflow connections in the Upper Colorado River Basin Groundwater 58 392-405 Tran H, Zhang J, O'Neill M M, Ryken A, Condon L E and Maxwell R M 2022 A hydrological simulation dataset of the Upper Colorado River Basin from 1983 to 2019 Scientific Data 9 16 Tuszynski J 2021 caTools: Tools: Moving Window Statistics, GIF, Base64, ROC AUC, etc (1.18.2)https://cran.r-project.org/web/packages/caTools/index.htmlTyralis H, Papacharalampous G and Langousis A 2019 A brief review of random forests for water scientists and practitioners and their recent history in water resources Water 11 910 USGS 2022 Watershed Boundary Dataset [Data set]https://usgs.gov/national-hydrography/watershed-boundary-dataset
et al (2022) had hold-out r values of 0.7 for a national scale XGB model for predicting nitrate concentrations in groundwater.