Uncertainty in streamflow measurements significantly impacts estimates of downstream nitrate export

Across watershed science, two key variables emerge–streamflow and solute concentration–which serve as the basis for efforts ranging from basic watershed biogeochemistry research to policy decisions surrounding watershed management. However, we rarely account for how error in discharge (Q) impacts estimates of downstream nutrient loading. Here, we examined the impact of uncertainty in streamflow measurements on estimates of downstream nitrate export using publicly available data from the U.S. Geological Survey (USGS). We characterized how uncertainty in stage-discharge relationships impacts annual flux estimates across 70 USGS gages. Our results indicate the interquartile range of relative error in Q was 33% across these USGS sites. We documented a wide range in mean error in annual nitrate loads; some sites were underestimated (−105%), while predicted loads at other sites vastly overestimated (500%). Overall, any error in estimating Q leads to significant unpredictability of annual nutrient loads, which are often used as critical success benchmarks for governmental nutrient reduction strategies. Moreover, error in annual nitrate loads (as mass, kg) increases with mean Q; thus, as high flows become more unpredictable and intense under future climate change, error in estimates of downstream nutrient loading may also increase. Together, this indicates that error in Q may drastically influence our measures of water quality success and decrease our ability to accurately quantify progress towards algal bloom and ‘dead zone’ reduction.


Introduction
Discharge (Q), the amount of water flowing through a stream reach (measured as volume per time), is the master variable of lotic ecosystems and is critical to answering both basic and applied research questions.For example, Q estimates are used to parameterize hydrologic models [1], make flood predictions [2], efficiently manage water resources [3], and prescribe environmental flow recommendations [4].Moreover, accurate Q measurements are essential for estimating downstream loss of material (e.g.sediments and particulates; [5,6]) and reactive nutrients (e.g.nitrogen and phosphorus; [7,8]).The variability of daily stream Q, material and dissolved solute concentrations, and the resulting loads (i.e.where load = concentration x Q) exported from catchments are the combined result of hydrologic and biogeochemical variability, and the uncertainty in both measurements could potentially propagate into large error distributions.Thus, capturing precise load estimates has significant implications for monitoring downstream water quality [9], improving our understanding of critical watershed processes [2], and making informed management decisions [10], particularly in the face of global hydrologic intensification [11].
In applications across research, management, and regulatory communities, Q is often required at daily to sub-hourly timescales.Typically, high resolution Q time series are estimates derived from gaugespecific rating curves, which are static relationships that predict Q based on measures of river height that are often developed by pairing point measurements of Q and river stage over time at a single monitoring station.When the rating curve is developed across a wide range of stages and flows, continuous measurements of stage can be converted to Q by applying the sitespecific equation (i.e.Q = f(stage); [12]).Thus, any uncertainty in the empirical measurements of Q or the resulting stage-Q relationship will cascade to the Q record and to subsequent estimates of flood risk, water yield, or downstream transport of materials.
Uncertainty in Q estimates can be derived from observational uncertainty that results in random measurement error [2] and epistemic uncertainty, which include non-random errors associated with limitations of the stage-Q approach [13].Common point measurements of flow include dilution gauging [14], the velocity area method [15], and calibrated weirs [16].Each of these methods impart contextspecific error, especially at low [17] and high flows [18].Moreover, developing high-quality continuous streamflow records is often challenging for individual investigators given limited funding for infrastructure, like permanent weirs, and limited personnel time and expertise to develop adequate rating curves [19].
Fundamental assumptions of stage-Q curves are often violated, especially at high flows [2].These common violations include unsteady flow, seasonal changes in channel roughness, and changes in channel geometry.Unsteady flow, or hysteresis in stage-Q measurements at the event-scale, can occur due to changes in water storage, such as in dynamic floodplain zones [20].Moreover, geomorphological changes in the cross-sectional area of a river channel are common throughout most gauge networks [21], and stage records are often corrected to account for channel aggregation or degradation [22].Finally, seasonal changes in vegetative cover can impact roughness and downstream flow [23,24], causing further uncertainty in stage-Q relationships developed across different seasons and years.
Given the significant solute export implications of error in Q, we quantified how uncertainty in stage-Q measurements impacts annual nitrate (NO 3 --N) load estimates.We focus on NO 3 --N as it is a reactive solute that is of primary concern for its role in driving eutrophication and harmful algal blooms when present in excess [25,26].We queried publicly available data from the U.S. Geological Survey (USGS) to ask: How does uncertainty in Q propagate to annual NO 3 --N load estimates across the continental United States (CONUS)?To our knowledge, this is the first study evaluating how error in the stage-Q relationship cascades to estimates of downstream nutrient loss across the USA.

Site selection
To explore how uncertainty in Q measurements impacts estimates of annual NO 3 --N loads, we used a combination of Q and NO 3 --N data from the USGS stream gauge network.The USGS gauge network both provides publicly available continuous NO 3 --N and flow data and represents a high standard to which other research and governmental agencies compare their gauging efforts.Here, we selected gauges that (i) recorded real time flow and NO 3 --N data and (ii) recorded at least three years of both streamflow and NO 3 --N data during the 2009-2023 water years.Note, here we defined the water year as 1 October through 30 September, and we only utilized water years with complete records in this analysis.We identified 70 USGS gauges which met these criteria.See supplementary material for more information on specific gauges (table S1).

Simulating uncertainty in Q measurements
We used a bootstrapping based resampling approach similar to methods presented in Scott et al [27] to quantify the impact rating curve uncertainty has on annual flux estimates (see figure 1 for workflow).Our approach included several steps: we first quantified the error between measured and modeled Q from each gauge's rating curve; resampled that error distribution to create a sample of 1000 relative error values; created 1000 simulated streamflow records for each gauge; and finally characterized relative error using the 1000 simulated streamflow records.Importantly, here we define error as the difference between streamflow reported by USGS and the streamflow simulated through our bootstrapping approach, while the relative error represents the average error across the 1000 simulations for each gauge.Note, we initially conducted all analyses on data aggregated to the daily timestep (i.e.Q represents mean daily Q and nitrate concentrations represent mean daily nitrate concentrations).However, we report relative error at the annual timestep for clarity.
For each gauge, we developed an error distribution based on the difference between measured and modeled Q (i.e. e relative = [Q measured -Q modeled ]/Q measured ) from the gauge-specific rating curve.Here, measured Q values are taken from paired stage-Q measurements, and modeled Q values are taken from Q predicted by the rating curve at the measured stage.This resulted in a distribution of observed errors between the measured and modeled Q for each gage.
In the next step, we conducted a bootstrap resampling approach [27].For each gauge, we resampled its error distribution to develop a sample of 1000 relative error values.We note this resampling was completed with replacement.
We then generated 1000 simulation stream flow records for each gauge.To develop a simulated flow record, we utilized the relative error values from the bootstrap resampling.We multiplied individual error values by the Q (i.e.Q simulated = Q• [1 + e relative ]), which resulted in 1000 realizations of each gauge's record.
To summarize variability in annual flow estimates, we estimated mean relative error across each gauge's 1000 simulations.For each simulated flow record, we calculated mean annual flow.Then, we estimated the relative error by subtracting measured and simulated mean annual flows (i.e.[Q annual,measured -Q annual,simulated ]/Q annual,measured ).Notably, here the observed flow is defined as the flow predicted by the rating curve that is reported by the USGS.To summarize the variability across the simulated flow records for each gauge, we estimated the mean of the annual error values calculated for each simulated flow record.For simplicity, we have used the absolute value of relative error when reporting our Q results.

Cascading uncertainty in stage-Q measurements to annual nitrate load estimates
For each gauge, we used the 1000 simulated flow records to estimate the range of possible annual NO 3 --N loads.To do this, we took the summation of the product of daily Q and daily NO 3 --N concentrations for each year in the simulated flow record (i.e.W annual = ∑ (Q daily • C daily )), where we derived Q from the simulated Q and NO 3 --N concentrations were taken from the gauge's record.
To characterize uncertainty across the simulated NO 3 --N loads for each gauge, we calculated relative error for W annual .Using the same approach to determine relative error of mean annual flow, we calculated relative error of mean annual NO 3 --N loads (i.e.[W annual,measured -W annual,simulated ]/ W annual,measured ) across the simulated records for each gauge.To estimate the relative error for each gauge, we simply calculated the mean of the relative errors across the 1000 simulations.

Error in stage-Q relationships is high and variable across sites
Our results suggest the error in stage-Q relationships results in significant uncertainty in daily flow estimates, and this error is not consistent across the sites used in this study.Comparing the absolute relative error across all 70 gauges, we observed a wide range of error in Q estimates (figure 2).Moreover, we did not find spatially explicit patterns in relative error in Q estimates across CONUS (figure 2(a)).The absolute relative error varied from 3% to 161% error in Q (figure 2(b)).Annually, the median absolute relative error in our investigation was 33% (figure 2(b)), which is within the range of error demonstrated by other studies [9,18].One review found the propagation of error from stage-Q relationships to continuous Q estimates to range from ±10%-100% depending on the flow conditions (e.g.low flow to out-ofbank; [9]).Additionally, a recent study from Australia found anywhere from a −22% to a +29% uncertainty for individual gauges [18].
The lack of spatially explicit patterns in error in Q suggests there are additional drivers of Q error.We hypothesized one driver of Q variability was watershed size; however, we did not document a significant correlation between relative error in Q (as a %) and increasing watershed size (figure 2(c)).However, the correlation between NO 3 --N load uncertainty and drainage area was significant (figure S1; Spearman's r = − 0.34, p = 0.01), suggesting that increasing drainage area may decrease error in load estimates.We hypothesized another potential driver of Q variability is the length of the gauge streamflow record.A longer streamflow record provides more opportunities to measure flow extremes that occur less frequently and can have significant leverage on the stage-Q relationship [28].We note we did not find a significant correlation between gauge record length and relative error in Q or the number of stage-discharge measurements and relative error in Q for the gauges included in this study (figure S2).Finally, extrapolating beyond the maximum measured Q in the stage-Q relationship may also bias annual maximum Q estimates, resulting in misinterpretations of high flows [29] and the resultant downstream loads, leading to results that are not physically plausible [30].However, it is worth noting this was not a significant issue in the gauge records analysed in this study and only applied to one gauge for a duration of two days.Given the potential implications of error in Q estimates on nutrient loading, it is important for future work to parse apart the drivers of uncertainty to reduce their impact on load estimates.

Uncertainty in Q yields large errors in nitrate loads with significant implications for load reduction targets
While various approaches exist to estimate uncertainty in Q time series [31,32], this uncertainty is rarely propagated to estimates of downstream nutrient loads.To our knowledge, our findings represent the first analysis to document this propagation of error to nutrient loads across CONUS.Importantly, ).Note that we were not able to obtain a drainage area for all gauges included in the study.our results have notable implications for accurately quantifying downstream nutrient loading, particularly when considering nutrient load reduction goals and setting accurate total maximum daily loads (TMDLs).
We estimated the possible range of annual NO 3 --N loads for a given site using the 1000 simulated flow records from the Monte-Carlo simulation and NO 3 --N concentrations obtained from each gauging site.Here, we documented a large range in percent difference in NO 3 --N loads, where error in Q translated to anywhere from a 105% underestimate to a 500% overestimate in NO 3 --N loads (figure 3(a)).Thus, without accounting for uncertainty in Q, we may be inaccurately estimating NO 3 --N loads by over an order of magnitude.Moreover, we also note that the majority of error in NO 3 --N loads ranges from approximately −17%-30% (figure 3(a)), which is in the same range as the annual reductions in NO 3 --N loads being called for by government agencies to decrease downstream nutrient loss by 20% to the Gulf of Mexico [33] and by 40% to Lake Erie [34].Notably, these estimates are simply based on error in Q and do not account for potential error in NO 3 --N concentration measurements (e.g.analytical or measurement errors).Together, this indicates that the resultant error in NO 3 --N loads is of a 'practically significant' magnitude (i.e.significantly influences management goals), with the potential to drastically influence our measures of water quality success.Moreover, the nutrient load threshold at which harmful algal blooms occur almost certainly lies within the error documented here [35], making it challenging to accurately quantify progress towards harmful algal bloom and 'dead zone' reduction.
Our results also indicate that the magnitude of error in NO 3 --N loads (in kg) increases with mean total annual flow (figure 3(b); Spearman's r = 0.72, p < 0.001).This is important as high flows result in higher nutrient loads that drive downstream nutrient export [36,37], and high flow events (i.e.storms) are expected to increase in both frequency and intensity in the future [38,39].For example, the Mississippi River Basin and Chesapeake Bay area, where many of our study watersheds are located, may experience an increase in storm frequency with future climate change [40,41].The predicted increase in high flow events may increase total annual flow, and, in turn, drive up uncertainty in annual NO 3 --N loads (figure 3(b)).Given these areas are also the focus of many large-scale nutrient reduction strategies, this uncertainty must be further decomposed given the critical impact inaccurate estimates of nutrient loads may have for the development of TMDLs, the prevention of harmful algal blooms, and the efficacy of nutrient credit trading programs (as is common in the Chesapeake Bay area).We emphasize that further exploration into changing uncertainty across the stage-Q relationship is warranted to fully explore this issue.
We also note that error in NO 3 --N loads was typically greatest in small watersheds (figure 4).Our observation is of particular concern given smaller watersheds contribute most to downstream drinking water quality [42] and are often the targets of conservation efforts that are intended to reduce downstream nutrient loading [42,43].However, our results indicate estimates may be off by greater than an order of magnitude in smaller watersheds, making it nearly impossible to determine if we are meeting local water quality goals.Further, this uncertainty in NO 3 --N loads has significant implications for meeting TMDLs.For example, the Wabash River (Indiana) has an annual NO 3 --N TMDL of 0.3 kg ha −1 yr −1 [44], while we have documented error in NO 3 --N loads up to 1.8 kg ha −1 yr −1 in the Wabash River Basin (figure 4).In this case, the error resulting from the stage-Q relationship represents a consequent uncertainty in NO 3 --N loads that is ∼6x the TMDL in the Wabash River Basin.Errors in nutrient load estimates of this magnitude may significantly thwart progress towards achieving the aggressive targets set by government agencies.As freshwater is a limiting resource [45], it is essential that error in nutrient loading be accounted for to maintain access to clean drinking water and support the myriad ecosystem services provided by streams and rivers [46][47][48].) across gauges, where cool colors represent smaller watersheds and warm colors represent larger watersheds.We have indicated several watersheds that have been identified as important contributors to inland and coastal nutrient loading.Note that we were not able to obtain a drainage area for all gauges included in the study to calculate uncertainty normalized by area.Full gauge information can be found in table S1.

Conclusions and future investigations
While our study provides the first documentation of the propagation of error in stage-Q relationships to NO 3 --N loads for CONUS, additional inquiries are required to fully understand the magnitude of this issue.First, we suggest that future analyses incorporate an error structure that allows for the error in the stage-Q relationship to vary with the magnitude of Q.An initial analysis suggests that differences between Q measurements and rating curves decrease as flow increases (see figure S3).Understanding systematic errors and bias in flow data is especially important for low flows.Low flow periods represent critical times for both species of concern (e.g.habitat and refugia; [49,50]) and for the management of environmental flows [51,52], while high flows are critical periods of downstream nutrient loss [36,37].Therefore, understanding how error in nutrient loads changes over the full range of Q may be important for accurately managing for both low and high flow conditions.Additionally, the distribution of error over the year (i.e. during what periods of the year are errors greatest) may be of concern for managers, conservationists, and other stakeholders.For example, if error is greatest during the summer low flow period, this may have implications for maintaining aquatic biodiversity [4,53].We also suggest further investigation into the drivers of error in the stage-Q relationship, such as stream order and variation over the growing season.This will help inform future data collection and method development to minimize the potential effects of error in Q on estimates of downstream nutrient loads.

Figure 1 .
Figure 1.Diagram demonstrating the workflow for quantifying discharge (Q) and nitrate load uncertainty estimates.Example data in steps 2-5 is derived from the Mississippi River gauge at Baton Rouge, LA (USGS gauge no.07374000).

Figure 2 .
Figure 2. (a) Map of CONUS depicting the location of USGS gauges included in the current study.Point sizes reflect the absolute relative error in discharge (Q) estimates for each gauge, where a larger point size is indicative of a larger relative error.We have indicated several watersheds that have been identified as important contributors to inland and coastal nutrient loading.(b) Distribution of absolute relative error across gauges included in the study (interquartile range = 33%).(c) Error in Q was not correlated to drainage area (in square miles, mi 2).Note that we were not able to obtain a drainage area for all gauges included in the study.

Figure 3 .
Figure 3. (a) Distribution of uncertainty in annual NO3 --N loads (%) as a result of error in the stage-Q relationship.(b) Uncertainty in annual NO3 --N loads (as mass, in kg) was significantly positively correlated to mean total annual flow (in cubic feet, ft 3 ); the orange, dashed line depicts the directionality of the correlation.

Figure 4 .
Figure 4. Comparison of uncertainty in NO3 --N loads (normalized by watershed area; in kg ha −1 yr −1) across gauges, where cool colors represent smaller watersheds and warm colors represent larger watersheds.We have indicated several watersheds that have been identified as important contributors to inland and coastal nutrient loading.Note that we were not able to obtain a drainage area for all gauges included in the study to calculate uncertainty normalized by area.Full gauge information can be found in tableS1.