Abstract
Crop productivity is potentially affected by several air pollutants, although these are usually studied in isolation. A significant challenge to understanding the effects of multiple pollutants in many regions is the dearth of air quality data near agricultural fields. Here we empirically estimate the effect of four key pollutants (ozone (O3), particulate matter (PM), sulfur dioxide (SO2), and nitrogen dioxide (NO2)) on maize and soybean yields in the United States using a combination of administrative data and satellite-derived yield estimates. We identify clear negative effects of exposure to O3, PM, and SO2 in both crops, using yields measured in the vicinity of monitoring stations. We also show that while stations measuring NO2 are too sparse to reliably estimate a yield effect, the strong gradient of NO2 concentrations near power plants allows us to more precisely estimate NO2 effects using satellite measured yield gradients. The presence of some powerplants that turn on and others that shut down during the study period are particularly useful for attributing yield gradients to pollution. We estimate that total yield losses from these pollutants averaged roughly 5% for both maize and soybean over the past two decades. While all four pollutants have statistically significant effects, PM and NO2 appear more damaging to crops at current levels than O3 and SO2. Finally, we find that the significant improvement in air quality since 1999 has halved the impact of poor air quality on major crops and contributed to yield increases that represent roughly 20% of overall yield gains over that period.
Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
A growing literature has demonstrated negative effects of poor air quality on agricultural outcomes, using a variety of experimental [1, 2], observational [3, 4], and modeling approaches [5, 6]. These effects are deemed largest in countries such as India and China that have persistent poor air quality [4, 7, 8], although yield losses have also been reported in other regions including Europe and the United States [3, 9].
Significant uncertainties remain regarding both the effect of individual pollutants as well as the overall effect of pollution on agriculture. These uncertainties limit our ability to identify the potential benefits for agriculture from air quality improvements associated with technology or policy changes [10], as well as which pollutants deserve the most scrutiny. Numerous studies have focused on effects of O3, reflecting the well-documented effects in greenhouse experiments as well as the availability of data on O3 levels from air monitoring stations. While other pollutants such as NO2 and SO2 are also known to be phytotoxic [11, 12], they have been subject to less empirical study [13], presumably because of a relative paucity of data. Study of these two compounds is also complicated by the fact that they react chemically in the atmosphere and contribute to formation of both O3 and aerosol particulate matter (PM). PM, which is often monitored because of well-known human health effects, may also impact agricultural productivity through physical and socioeconomic pathways, including dimming and scattering of solar radiation [7] and reduction in farm worker productivity [14].
Some studies have attempted to estimate the net impact of multiple pollutants, either by comparing yields directly to emission sources [8, 15] or by combining estimates from different studies [16]. These analyses indicate a significant role for multiple pollutants, for instance with Fischer [16] estimating 21% yield loss from O3 and 9% from dimming related to PM in India wheat.
Here we consider effects of multiple pollutants on maize and soybean in the central United States, a region that is a major global producer of these crops and where air pollution has declined in recent decades subsequent to the Clean Air Amendments in 1990 (figure S1 (available online at stacks.iop.org/ERL/16/074049/mmedia)). The main objectives of the current study are to empirically estimate the sensitivity of maize and soybean yields to four key pollutants (O3, PM, SO2, and NO2), and to assess the role that pollution reductions have played in yield progress over the past two decades. To do this, we exploit a few useful features of the study region. First, comprehensive public datasets exist for air quality monitors, power plant emissions, and county-level crop yields (figure 1). Second, despite overlap in emissions sources (e.g. vehicles, power plants) we observe relatively low correlation between pollutants at specific locations (table S1), which allows us to estimate the effects of each. Third, recent progress in satellite analysis has resulted in fine-resolution (30 m) yield maps for both maize and soybean for a nine-state region comprising the largest rainfed maize and soybean producing states since 1999 [17, 18] (figure 1). These data enable us to tailor our data sample to cropped areas near monitors (versus, for example, interpolating pollution data) and enable detailed inspection of yields across pollution gradients. In particular, we find that yield gradients near power plants—stationary sources where emissions of SO2 and NO2 are continuously monitored and vary dramatically by feedstock type and unit technology—provide a very useful constraint on the overall sensitivity of yields to pollution.
Figure 1. Overview of data used in this study. Average 1999–2018 (a) maize and (b) soybean yields for individual 30 m × 30 m pixels estimated based on satellite data. (c) Location of power plant facilities and EPA monitoring stations in the study region. Numbers in parentheses indicate the number of unique locations with at least one unit powered by the indicated fuel type. (d) Locations of specific types of pollution monitors. Not all power plant facilities or pollution monitors were active during the entire study period. The total number of unique monitors for each pollutant since 1999 are 387 for Ozone, 391 for PM10, 428 for PM2.5, 290 for SO2, and 118 for NO2.
Download figure:
Standard image High-resolution image2. Data and methods
2.1. Study region and period
The United States is a major producer of both maize and soybean, constituting roughly one-third of global production for each crop [19]. In the current study, we focus on a nine-state region (Illinois, Indiana, Iowa, Michigan, Minnesota, Missouri, Ohio, South Dakota and Wisconsin) that produces roughly two-thirds of national output and for which detailed satellite-based yield estimates exist (see section 2.5). We consider the 1999–2019 period, again based on the availability of satellite-based yield records for both crops.
2.2. Ground-based air pollution data
We obtained hourly O3 and daily PM, NO2, and SO2 measurements for 1999–2019 from https://aqs.epa.gov/aqsweb/airdata/download_files.html. For O3, we computed the accumulated ozone exposure over a threshold (AOT) during the main growing season (June–August for both maize and soybean) for different thresholds (thresh) from 40 to 80 ppb in 10 ppb increments:

where Oh is hourly ozone values, h goes from 0 to H (the total number of hours with measurements in June–August) in increments of one hour. Although a threshold of 40 ppb is commonly used in the literature (i.e. AOT40) [5, 6, 20], some authors have argued for a higher threshold [21], and therefore we examined a range of possible values. Most stations have measurements for all hours, although exceedances generally occur during daytime. A small number of station-years (∼2%) had more than 20% of days missing and were excluded from analysis. The remaining site-years were missing an average of less than one day of measurement during the 92 d June–August period.
For all pollutants other than O3 (PM10, PM2.5, NO2, and SO2) we calculated the average of daily maximum observations for June–August as reported in the daily EPA files. PM2.5 refers to all PM below 2.5 mm in size, whereas PM10 includes particulates below 10 mm. Slight changes in the growing season definition, for instance adding April–May, did not qualitatively affect results.
2.3. Power plant data
We used emissions data from US electric power generation facilities provided in the EPA's Air Markets Program Database (https://ampd.epa.gov/). The AMPD includes most generation facilities in the country, and provides facility location, technology, operation, and emissions information. We aggregated emissions and fuel information to the facility level (one facility might contain many generation units), and coded facilities according to their primary fuel type with 'coal' including any coal or coke use, 'natural gas' including all variants of natural gas, and 'other' encompassing most residual oil, biomass, and diesel fuel facilities.
2.4. Satellite-based pollution data
For evaluating NO2 gradients near power plants and identifying facilities with high gradients, we utilized tropospheric column NO2 data from the Sentinel 5 TROPOMI sensor. TROPOMI NO2 estimates are available at 0.01° × 0.01° (∼1 km) resolution starting in late June 2018. Here we use the July–August 2018 average of daily tropospheric column vertical density from the L3 OFFL product, obtained through Google Earth Engine [22].
2.5. Crop yield data
Annual maize and soybean yields at the county level were obtained from the USDA National Agricultural Statistics Service (https://quickstats.nass.usda.gov/). NASS yield estimates are based on phone surveys of farmers throughout the region, and generally cover all counties producing maize and soybean. The number of counties with data varies slightly by year because of varying survey response rates, with a notable decline in recent years [23].
Yields at 30 m spatial resolution were obtained for each year in the study period from the scalable crop yield mapper (SCYM) [17] applied to Landsat satellite imagery. Briefly, SCYM uses crop simulations to develop regressions relating final maize (or soybean) yield to two measures of canopy greenness (greenness at the peak of season and 30 d after the peak) and four measures of weather relating to temperature, rainfall, and radiation. Multiple Landsat observations are then used to estimate the greenness measures for each pixel classified as maize or soybean in the USDA's Cropland Data Layer [24]. SCYM estimates have been validated for both crops against county and field level data in the region [17, 18]. These studies document strong agreement between SCYM and ground-measured yields, as well as the ability of SCYM to accurately estimate the yield response to management or soil factors at the field level [25]. For the current study, SCYM yields were averaged for a 20 km radius around each EPA station in each year. In addition, a sample of pixel-level estimates were taken for 20 000 random pixels in each state to examine yield variation with distance from power plant locations.
2.6. Weather data
Daily values of minimum temperature, maximum temperature, and precipitation were obtained from the daily PRISM 2.5 arc minutes (∼5 km) resolution weather dataset [26] for each location with yield estimates. Growing season weather conditions were then summarized by calculating four common measures known to explain variation in maize and soybean yields [27]: growing degree days between 8 °C and 30 °C during the growing season (GDD), extreme degree days above 30 °C (EDD), total precipitation between April 1 and August 31 (Pr), and the square of precipitation (Prsq).
In addition, for testing whether EPA monitoring stations (and, by inference, agricultural fields) upwind and downwind of power plants experienced significantly different pollution levels, we obtained daily wind speeds for each power plant facility from the University of Idaho Gridded Surface Meteorological Dataset [28]. Average wind direction over the June–August period was calculated for each facility in each year, and the angle between wind direction and EPA monitor direction was calculated for every monitor that was closest to that facility. Monitors were considered downwind if the cosine of the angle was greater than zero, and upwind if the cosine was less than zero, although results were not sensitive to using stricter definitions of downwind (e.g. cosine greater than 0.5).
2.7. Yield regressions
To relate yields to pollution, we estimated a panel model with site and year fixed effects:

where yieldi,t is yield in site i in year t, W is a vector of controls for weather variation, P is the measure of pollution exposure, γi is a site intercept or fixed effect, and γt is a year intercept. We alternatively define yield as the average of all SCYM estimates within a 20 km radius around the EPA station location, or the reported NASS yield for the county in which the station is located. When using NASS yields, stations located in the same county were assigned identical yields. Weather controls were the aforementioned four weather variables (GDD, EDD, Pr, and Prsq) averaged over the same 20 km radius. Equation (2) was estimated using a fixed effect ordinary least squares regression in the fixest R package [29], with standard errors clustered at the site level. The regression analysis was conducted separately for maize and soybean. We did not attempt to estimate interaction terms between weather and pollution given the limited number of monitoring sites (figure 1); hence the effects of pollutants were assumed to be constant across different weather conditions.
The fixed effects for site and year indicate that each location and year is given its own unique intercept, which is a common practice to control for unobserved factors (e.g. fertilizer use, soil quality) that might both affect the outcome of interest (i.e. yield) and be correlated with spatial or temporal variations in the causal factor of interest (i.e. pollution), leading to bias in the estimated coefficients. By using fixed effects, identification of τW and τP comes only from deviations from the expected value for a given site and year, reducing the chance for bias from omitted variables. In addition, the use of fixed effects helps reduce the influence of collinearity between pollutants, since pollutants are less correlated after accounting for fixed effects (table S1). Figure S3 presents a simple simulation to illustrate the value of fixed effects when two variables are highly correlated across locations. Similarly, removing fixed effects helps to reduce the collinearity between pollutants and weather (table S1).
2.8. Yield gradients near power plants
In addition to the panel analysis that uses yields in the vicinity of monitoring stations, we also examine yield gradients near power plants. As described in the Results, these yield gradients are particularly useful for constraining the effect of NO2, given the relatively weak constraints provided by the panel analysis (owing to a low number of NO2 monitors), and the strong gradients of NO2 near power plants.
We define Pi (d) as the expected value of a pollutant i at distance d from a power plant, estimated from a LOWESS (locally weighted scatterplot smoothing) regression of pollutant values vs distance (figure 3). Similarly, we define Y(d) as the expected value of the yield residual at distance d, where the residual is from a regression of yields against weather and soil. For the yield regression we use the same weather variables as in equation (2), while for soil we use the National Commodity Crop Productivity Index (NCCPI) produced by the USDA [30].
For a given distance, we then define


Which we refer to as the gradients in pollutant and yield levels, respectively, compared to values at 1 km distance from the power plants. We do not divide these differences by the distance, so the units of pollution gradients are in the original units (e.g. ppb) and the units of yield gradients are t ha−1.
We then utilize the estimated coefficients from the panel analysis (equation (2)) to estimate the expected net yield impact of gradients in O3, PM, and SO2 near the power plants:

The effect of NO2 on yields is then estimated as:

Thus, the amount of yield gradient not explained by the other three pollutants, or by local weather and soil gradients, is attributed to the gradient in NO2. To assess the uncertainty of τNO2 estimated in equation (6), we re-estimate the equation 100 times after resampling the τs for O3, PM, and SO2 based on standard errors from equation (2), and resampling estimates of Pi (d) for each pollutant based on resampling of the sites used in the LOWESS fit (figure 3). We calculate equation (6) for distances of 10 km–50 km in 10 km increments, finding that estimates are most stable for 30 km–50 km (see section 3). We use d = 30 km for estimates reported in the main text, as the pollution gradients are less reliable beyond 30 km because of sparse station coverage at these distances (figure S2).
2.9. Estimation of total pollution effects and contribution to yield trends
For each year in the dataset, we multiply the inferred yield sensitivity to pollutants from equations (2) and (6) by the estimated average level of each pollutant at crop locations. The latter is estimated as:

where Pi,c,t is the average level of pollutant i over croplands at time t, Pi,s,t is the average level at EPA station locations in year t, and Si is a scaling factor that relates the average value at croplands to the average value at stations. We utilize scaling factors, rather than the simple average of observed concentrations at monitoring stations, since monitor locations are not representative of agricultural regions (figure S2). These scaling factors were derived in two ways. The first was to calculate a weighted average of the curve describing the dependence of Pi on distance from the nearest power plants (e.g. figure 3(b)) using weights that correspond to the distribution of crop distances to power plants (figure S2(a)). These weighted averages were calculated for each year and then divided by the simple average of station values for that year. Table S3 (in SI appendix) reports the average value of this ratio for each pollutant. In a second approach, we averaged satellite-based measures of pollutants over crop and EPA station locations, and calculate the ratio of these two averages. These satellite-based scaling factors were only possible for pollutants with accurate satellite measures of near surface concentrations, namely PM and NO2, and served mainly to corroborate the estimates derived from the station data (table S3). Main results reported in the paper correspond to the station-based scaling factors.
The total effect of all pollutants in each year was calculated by summing across the individual estimates for the four pollutants. This assumes that the effects of each are additive, and for instance the effect of PM does not depend on the level of O3. Other studies have made similar assumptions [12, 16, 31]. It is possible that effects could be multiplicative, for instance with damage from NO2 exacerbated at high levels of SO2 [32], in which case our total estimates of damage would be conservative.
3. Results
3.1. Crop yields are significantly inhibited by all four pollutants
We first statistically relate crop yields measured near air quality monitors to the levels of pollutants measured at those monitors during the growing season using panel regression models. These regressions provide average estimates, across the study region and period, of the associations between variations in pollutant concentrations and crop yields while controlling for weather, common time variation (through inclusion of linear time trends or more flexible year fixed effects), and time-invariant site-specific drivers of yields such as soil differences (through inclusion of location fixed-effects; see Methods). Because separate monitoring networks exist for each pollutant, their locations do not typically overlap (table S2). For example, less than one-fifth of the O3 observations and less than one-quarter of PM10 observations over the study period were coincident with SO2 observations. We therefore conduct regressions separately by pollutant, with sensitivity tests used to gauge the potential effect of omitted pollution measures. We estimate these relationships using both the high-resolution satellite-based yields (SCYM) and coarser county-level crop yield data from the USDA (NASS) [33].
We find broadly negative effects of O3, PM, and SO2 on both maize and soybean yields (figure 2). For maize, effects are significantly negative (p < 0.05) whether using satellite-based estimates of yield averaged within 20 km of the EPA station or using NASS reported yields for the entire county containing the station. For soybean, effects are significantly negative except for the estimate for PM10 when using NASS yields, and the estimate for SO2 using satellite yields.
Figure 2. The estimated effect of each pollutant on maize and soybean yields from panel regressions. Plot shows panel regression coefficients for each pollutant, expressed as the percentage change in yield from a 1 standard deviation increase in each pollutant. The numbers in parentheses indicate the total number of site-years used in the regression. NASS refers to regressions using county-level yields reported by the USDA NASS, and SCYM refers to regressions using the average of satellite-based 30 m resolution yields within 20 km of the monitoring station. Error bars indicate ±2 cluster-robust standard errors, based on clustering at site and year level. These results correspond to the model using AOT70 to measure ozone exposure and PM10 to measure PM exposure. Yield changes are expressed as percent of mean yields, which for NASS are 9.24 t ha−1 for maize and 2.96 t ha−1 for soy. SCYM yields average 10.39 t ha−1 for maize and 3.31 t ha−1 for soy. Standard deviations of pollutant levels are 1055 ppb-hours for AOT70, 9.7 mg m−2 for PM10, 11.3 ppb for SO2, and 13.2 ppb for NO2.
Download figure:
Standard image High-resolution imageFigure 3. Pollution and yield gradients help to constrain the response of yields to NO2. Top panels show the dependence of average (a) maize yield residuals (from a regression controlling for weather, soil characteristics, and year) and (b) pollutant levels on distance from the closest power plant facility, based on local polynomial regressions (LOWESS). (c) The average pollutant levels at 30 km, expressed as the percentage change relative to average levels at 1 km. Averages are shown both for all facilities and for the subset of facilities with and without active coal units. Error bars show 5%–95% confidence interval based on bootstrap resampling of facilities. (d) Yield gradients (in t/ha) observed near facilities (gray bars) compared to the yield gradients predicted by the panel regressions for the pollution gradients in O3, PM10, and SO2 shown in panel (c). Error bars show 5%–95% confidence interval based on uncertainty in both pollution gradients and yield sensitivities. Yellow bar indicates the residual yield gradient not explained by the three pollutants, which is then used to estimate the NO2 effect by dividing the residual yield gradient by the NO2 gradient.
Download figure:
Standard image High-resolution imageTo assess whether collinearity with other pollutants would significantly alter results, we repeat the regressions above for the subset of site-years with multiple pollution measures (figure S4). As expected, the much lower sample sizes lead to substantially larger error bars. However, we find that inclusion of other pollutants has relatively small effects on the estimated coefficients. This is consistent with the low correlations after removing site and year fixed effects (table S1), which indicates that while levels of pollutants may be correlated when examined in cross-section (e.g. cities versus rural areas), the fluctuations of individual pollutants within a given location are largely uncorrelated. Similarly, a recent study of O3 effects in China found that O3 coefficients were only slightly altered when accounting for PM [34].
The magnitudes of uncertainties in figure 2 reflect differences in the density of monitoring networks for each pollutant. Ozone has been monitored at 387 unique stations in the region, with a median coverage of 15 growing seasons of data for 1999–2018, resulting in over 4000 site-year observations. Roughly half this number was available for PM and SO2, resulting in slightly broader confidence intervals for the estimates. In contrast, fewer than 600 site-year observations were available for NO2, and as a result the panel estimates for effects of this pollutant were much less precise, and not significantly different than zero.
To further constrain the yield effects of NO2, we turn to investigation of yields in the vicinity of power plant facilities. We observe a clear increase in average satellite-based yields as distance increases from the nearest power plant facility (figure 3). The yield increases are most rapid within the first 30 km but continue to exhibit gradual increases for both maize and soybean up to at least 50 km.
Ideally, one would be able to compare yield and pollution gradients for both down- and up-wind locations, as wind is a common source of exogenous variation to aid identification [35]. However, given relatively low wind speeds and variable wind directions in the region, we found very little dependence of pollution or yields on direction (figure S5). Nonetheless, other lines of evidence suggest that these yield gradients near power plants arise primarily from air pollution, rather than other variables correlated with power plant locations such as soil quality or farmer behavior. First and foremost, we observe that the gradients increase significantly after power plants turn on, and decrease significantly after they shut down. That is, using the subset of facilities that either turn on (n = 111) or shut down (n = 44) during the study period, we perform a regression to estimate the effect of a facility turning on or off on the local yield gradient (the average yield at 29 km–31 km distance minus yield at 0 km–2 km distance), using fixed effects for year and location to isolate the effect of whether the facility is active (table 1). We find a significant increase in the yield gradient after facilities turn on, as well as a significant decline in yield gradients after facilities turn off. Second, we observe only small gradients in observed soil or weather variables near powerplants, with yields predicted based on observed weather and soil (described in section 2.8) exhibiting gradients of less than 0.1 t ha−1. While unmeasured aspects of soil quality could be important, we note that the NCCPI variable we use is a common summary of soil quality that captures many physical and chemical aspects of soil believed to influence yield.
Table 1. Coefficients for regression models that estimate the effect of power plant facilities (1) starting or (2) stopping operation during the study period. The dependent variable in both cases is the yield gradient near the facility, defined as the difference in average yield 29 km–31 km away from the facility and 0 km–2 km away. Values in parentheses show the standard error of the estimate, clustered by year. Asterisks indicate that coefficients are significant at p < 0.05.
| (1) Start-up model | (2) Shut-down model | |
| PoststartTRUE | 0.125* (0.060) | |
| PostshutTRUE | −0.339* (0.136) | |
| Fixed-Effects: | — | — |
| Year | Yes | Yes |
| Facility ID | Yes | Yes |
| Observations | 1928 | 1326 |
| R2 | 0.58 | 0.60 |
Moreover, several lines of evidence suggest that among the pollutants leading to these yield gradients, NO2 is likely the most important factor: (a) yield gradients are significantly larger near facilities where gradients in NO2, as measured in recent years by the Sentinel 5 TROPOMI sensor, are larger (figure S6); (b) yield gradients are also higher near facilities with higher overall NOx emissions (figure S6); (c) most importantly, while all pollutants generally decline with distance from power plant, NO2 exhibits a stronger gradient than the others, likely related to its short atmospheric lifetime (figures 3(b) and (c)). The lack of a rapid decline in O3 reflects the complex chemistry of O3 formation and destruction, and is consistent with studies documenting that peaks in O3 often occur far from the primary source of NO2 [13, 36, 37]. Similarly, the gradual PM decline reflects the variety of PM sources and their broad-scale transport outward from facilities. On average, SO2 initially exhibits an increase with distance from power plants (figure 3(b)), a pattern that is driven entirely by facilities using coal as feedstock (as coal contains much more sulfur and creates much larger quantities of SO2 during combustion than natural gas).
By combining our estimates of O3, PM, and SO2 effects from the panel estimates (figure 2) with observed relationships between their concentrations and distance to power plant (figure 3(c)), we estimate the expected yield change associated with these pollutants as one moves away from power plants (figure 3(d)). We then estimate the effect of NO2 based on the residual yield gradient that is not explained by the other pollutants, finding that these gradients provide a much better constraint on NO2 effects than the panel approach (figure S7). While we use the average gradient for all facilities to estimate NO2 effects, we also find notable differences when limiting the analysis to only facilities with (or without) active coal units (figures 3(c)–(d)). On average, larger yield gradients are observed near non-coal facilities. This finding is consistent with the larger declines in PM and SO2 as one moves away from these as compared to facilities with coal units. In both cases, the amount of yield gradient attributed to NO2 (i.e. not explained by the combination of ozone, PM, and SO2) is similar, as is the average NO2 gradient (figure 3(d)).
As an additional check on our primary NO2 estimate we separately perform a cross-sectional regression of yields on satellite-derived NO2 levels in 2018. These regressions, which control for county fixed-effects as well as weather and soil covariates, also find significant negative associations between yields and NO2 (figure S8).
3.2. Total pollution impacts on yields are significant but declining
Using the inferred sensitivity of yield to each pollutant, we find the total yield burden in the region for each year by multiplying the derived coefficients by the average exposure across the cropped region. For our baseline specification, we estimate that the total yield losses from pollutants averaged 5.8% for maize and 3.8% for soybean for the 1999–2019 period in the nine-state region (figure 4). Across all specifications, the median estimate was of slightly smaller magnitude than the baseline for maize (5.0%) and larger magnitude for soybean (6.6%). These losses were considerably larger at the start of the period, and gradually declined as air quality improved. On average, we estimate that the reduction in pollutants caused an increase of 4.0% for maize and 3.0% for soybean over the 1999–2019 period for the baseline specification, with a median of 5.0% increase for maize and 6.2% increase for soybean across all specifications. As a percentage of the overall yield gains for each crop over the time period, these values represent 19% for maize and 23% for soybean. Thus, we estimate that the benefits of cleaner air comprised roughly one-fifth of yield gains since 1999 for both crops.
Figure 4. Estimated impacts of pollution over time. (a) Temporal evolution of different pollution measures, relative to their value in 1999, and the effects of pollutant levels in each year on (b) maize and (c) soybean yields. Solid lines in (b) and (c) indicate the mean estimate across 100 bootstrap estimates that account for uncertainty in yield sensitivities to pollutants as well as pollution gradients near power plants, which are used to estimate NO2 effects. Shaded areas indicate 5%–95% confidence intervals.
Download figure:
Standard image High-resolution image3.3. Results are robust across many plausible model specifications
The results in figure 4 correspond to our baseline specification which uses high resolution satellite-based crop yield estimates, characterizes O3 exposure as cumulative hours spent above 70 ppb and PM exposure as average daily maximum value of PM10, and uses year fixed effects that gives each year its own intercept. These baseline measures for O3 and PM were based on analysis of out-of-sample prediction error across the various measures (figure S9). Nonetheless, we find that while each of these assumptions has some effect on results, the overall impact of pollutants is fairly robust to each of these (figures S10–S13).
For most specifications, we find that O3 and SO2 have smaller impacts in a typical year than PM and NO2, with the relative importance of the latter two varying across models. In particular, effects of PM are sensitive to whether effects are estimated using PM10 or PM2.5, and whether using averaging satellite-based yields in a 20 km radius around the monitoring site or using county level yields from NASS. For maize, PM effects are smaller in magnitude, and sometimes even positive in sign, when using county level data. Results for soybean are more consistent across specifications than for maize, with PM again being the most variable.
The consistency in estimates of overall impacts of pollution despite the uncertainty in PM effects results from the use of yield gradients in the vicinity of power plants to constrain estimates of NO2 effects. Since all pollutant levels are lower at 30 km than 1 km distance from power plants (figure 3), a larger estimated effect for O3, PM, or SO2 implies a smaller effect for NO2. Specifically, in cases where the PM effects from the panel regression were strongly negative, a large fraction of the yield gradient near power plants can be explained by PM, and a smaller effect is thus estimated for NO2. In contrast, specifications that result in small effects of PM result in a much smaller fraction of the yield gradient explained by pollutants other than NO2. Thus, while PM uncertainty imparts significant uncertainty in NO2 estimates, their combined effect is constrained by the observed yield gradient across known gradients of pollution.
A similar story emerges for the trend in pollution impacts over time. Total benefits of pollution reductions are fairly similar across all specifications, with the baseline specification ranking near the middle of the various plausible alternatives. In some specifications, including the baseline one, the biggest benefits are estimated to have accrued from reductions in NO2, whereas in others the biggest benefits are from reductions in PM or O3. O3 reductions are particularly important in specifications that use higher thresholds for O3 exposure (i.e. >60 ppb), because these high exposures have been declining faster than exposure to more moderate levels of O3 (i.e. 40 ppb) or levels of other pollutants such as NO2 (figure 4(a)).
4. Discussion and conclusions
Given the emphasis of much of the literature on O3 impacts [3, 5, 6, 34], it is somewhat surprising that both NO2 and PM emerge as more important than O3 in most of our specifications. We view three factors as potentially important in explaining this apparent discrepancy. First is the relatively low levels of high-O3 exposure for the study period and region compared to much of the empirical literature, which includes Asia [7, 16] as well as earlier decades and more eastern counties in the United States [3]. Second, much of the empirical literature does not adequately control for the effects of weather when estimating O3 sensitivity. Among the four major pollutants studied here, O3 levels are most strongly correlated with the occurrence of extreme heat (table S1), which is known to cause significant yield loss in both maize and soybean. In fact, if we omit extreme heat exposure (days over 30 °C, or EDD) as a control variable, our estimate of O3 sensitivity increases by 56%–157%, depending on the crop and source of yield data. Similar increases were observed if we replace EDD with a cruder measure of average daytime maximum temperature, as done in [3]. Thus, while [3] estimate ∼10% and ∼5% yield losses in the United States circa 2000 for maize and soybean, respectively, whereas we estimate a roughly 3% loss in 2000 (with a range of roughly 0%–6% loss for both crops across different specifications), some of this difference likely results from their incomplete accounting for extreme heat. Although one could argue that some of the effects attributed to extreme heat are in fact from O3, several lines of evidence indicate a strong impact of extreme heat even in the absence of O3: extreme heat has similar effects in recent decades with much lower O3 than prior decades [38]; crop model simulations can reproduce the empirical sensitivity to extreme heat even though they do not include O3 effects [39]; and the model's explanatory power is substantially improved by including extreme heat compared to including only O3.
Third, some empirical studies rely on O3 exposure estimates based on values interpolated between stations. In a sensitivity test, we find that if these interpolations are overly smooth, such as by using inverse distance weighting of all stations [21], estimates of yield sensitivity to O3 are inflated by as much as a factor of 10 (table S4). This suggests that empirical strategies that use high resolution outcome data and samples near monitors, or that leverage new sources of ground station-based monitoring, may be preferable to interpolation. These findings also suggest the need for research to untangle interpolation and aggregation effects in spatial analyses.
We are not aware of other empirical studies that have estimated effects of NO2 levels on crop yields, although our results are consistent with many fumigation or observational studies that document phytotoxic effects at levels found in our study region. For example, a transect study for a wild plant species found significant reductions in plant dry weight above 18 ppb NO2 [11], a level that was frequently exceeded in our region (figure S1). These prior studies also help to elucidate mechanisms, including reduction of photosynthetic rates and increased dark respiration [40, 41], in a way that our empirical approach cannot. Similarly, our study is able to assess impacts in commercial farmers' fields in a way that experimental approaches cannot. Altogether we find consistent evidence that ambient NO2 causes yield loss for both maize and soybean, although the national aggregate impact is currently modest (∼1%), perhaps not surprising given that NO2 is short-lived and declines rapidly as one moves away from major sources.
We conclude that air pollution has had a significant negative impact on both maize and soybean yields in the US Corn Belt. In contrast to prior work, we estimate the effects of multiple pollutants, and in doing so identify a relatively large role for PM and NO2. We also find that, because of considerable progress in air quality, the yield impact of air pollution is only half of what it was two decades ago. The estimated yield gains associated with the reduction in air pollution has been roughly 20% of total observed gains, indicating that yield growth would have been 20% slower without air quality improvements. Since US maize and soy production is currently valued at ∼$100B/year, in very coarse terms (i.e. not accounting for other factors that might have changed in response to reduced US yields in a counterfactual scenario without reduced pollution) these air quality-driven yield gains correspond to ∼$5B/year.
Finally, our results also highlight the considerable power of using satellite-based yield estimates to understand pollution impacts, as the fine spatial resolution of satellite estimates enables us to exploit pollution gradients near power plants in a manner not possible with county-level administrative yield data. These gradients provide a useful constraint on pollution effects, particularly for NO2 given the sparse coverage of ground monitoring stations. In addition, the ability to restrict regressions using satellite-based yields to areas near monitoring stations provides an important validation for county-level results. The value of satellite estimates could be even higher in other countries, which typically have both fewer pollution monitors and less administrative yield data than the United States.
Acknowledgments
We thank Stefania Di Tommaso for help extracting data from Google Earth Engine, Ariel Ortiz-Bobea for providing the scripts to generate specification charts, and Lisa Ainsworth, Marshall Burke, and Tony Fischer for helpful comments. This work was supported by the NASA Harvest Consortium (NASA Applied 787 Sciences Grant No. 80NSSC17K0652, sub-award 54308-Z6059203 to DBL) and NSF #1715557 and NSF/USDA NIFA #1639318.
Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.



