Intraurban NO2 hotspot detection across multiple air quality products

High-resolution air quality data products have the potential to help quantify inequitable environmental exposures over space and across time by enabling the identification of hotspots, or areas that consistently experience elevated pollution levels relative to their surroundings. However, when different high-resolution data products identify different hotspots, the spatial sparsity of ‘gold-standard’ regulatory observations leaves researchers, regulators, and concerned citizens without a means to differentiate signal from noise. This study compares NO2 hotspots detected within the city of Chicago, IL, USA using three distinct high-resolution (1.3 km) air quality products: (1) an interpolated product from Microsoft Research’s Project Eclipse—a dense network of over 100 low-cost sensors; (2) a two-way coupled WRF-CMAQ simulation; and (3) a down-sampled product using TropOMI satellite instrument observations. We use the Getis-Ord Gi * statistic to identify hotspots of NO2 and stratify results into high-, medium-, and low-agreement hotspots, including one consensus hotspot detected in all three datasets. Interrogating medium- and low-agreement hotspots offers insights into dataset discrepancies, such as sensor placement and model physics considerations, data retrieval caveats, and the potential for missing emission inventories. When treated as complements rather than substitutes, our work demonstrates that novel air quality products can enable researchers to address discrepancies in data products and can help regulators evaluate confidence in policy-relevant insights.


Introduction
Disparate exposures to ambient air pollution contribute to racial and economic inequities in disease burdens (Hajat et al 2015, Tessum et al 2021).Despite the considerable progress of the U.S. Clean Air Act in contributing to population-wide reductions in air pollution exposure (Currie and Walker 2019), neighborhoods with the highest historic pollution exposure remain subject to relatively higher pollution levels (Colmer et al 2020).Although both regional and intraurban gradients contribute to observed differences in exposures across population subgroups (Chambliss et al 2021), localized hotspots-areas that consistently experience elevated air pollution levels relative to their surroundings-can identify important and often modifiable local emissions sources (Clark et al 2014, Hajat et al 2015, Chambliss et al 2021).Contemporary U.S. zoning policies ensure that land use and emissions sources are clustered in space (Hirt 2015), and local regulations like cumulative impacts ordinances (Lee 2020) or federal policies like JUSTICE40 (The White House 2021) increasingly seek to target areas where marginalized communities experience disproportionate pollution burdens.Yet sparse regulatory monitoring systems, such as the U.S. EPA AQS network, are designed for regional monitoring and cannot provide insights on neighborhoodscale variations in pollutants or exposure.As such, better ways of identifying hotspots are needed to ensure that mitigation efforts serve the most affected communities.
There are many pollutants in the urban air; among these is nitrogen dioxide (NO 2 ), which is implicated in premature mortality (Song et al 2023) and morbidities including asthma (Hansel et al 2008, Liu et al 2019), impaired prenatal development (van den Hooven et al 2012, Huang et al 2015), and cardiovascular disease (Stieb et al 2020).NO 2 also contributes to the formation of other health-hazardous pollutants and can exhibit steep intraurban spatial gradients (Chambliss et al 2021).NO 2 concentrations are highest near emissions sources like power plants (Liu et al 2016) and roadways (Karner et al 2010), with concentrations accumulating during stagnant meteorological conditions (Goldberg et al 2020) or during high-emitting periods like morning rush hour (Zhang and Batterman 2013).Due to the pollutant's short lifetime, NO 2 hotspots can be difficult to observe unless monitoring instruments are collecting data when and where NO 2 is emitted.
To address the need for routine monitoring of NO 2 at intraurban scales, researchers have taken a variety of approaches including: using high-density, low-cost sensor networks (Weissert et al 2020, Jain et al 2021), conducting high-resolution chemical transport modeling (Di et al 2019), and oversampling satellite imagery (Goldberg et al 2021, Dressel et al 2022).The resulting data products have the potential to improve the characterization of NO 2 in urban environments, but they are also subject to important limitations with respect to spatial and temporal data availability or data validity.
Dense citywide sensing networks provide insights in real time, capturing important diurnal and spatial variation in NO 2 concentrations.To enable sampling at the scales needed for adequate coverage, sensors must be affordable and easily replaceable-but ensuring low cost requires important tradeoffs with respect to accuracy (Larkin et al 2017, Morawska et al 2018).Even with dense coverage, gaps remain, and thus researchers must apply interpolation schemes that contribute additional uncertainty (Schneider et al 2017, Gressent et al 2020).
Chemical transport models (CTMs) exploit scientific understandings of chemical and physical processes to provide spatially and temporally fine-scale estimates of air quality, so calibration and validation of these models can lead to new insights, in addition to reducing errors.However, outside of large-scale airborne monitoring campaigns (e.g.Abdi-Oskouei et al 2020, Torres-Vazquez et al 2022), researchers are limited in their ability to adequately validate and calibrate CTMs at fine spatial scales.The most commonly available validation data are spatially-limited regulatory networks or temporally-limited satellite observations (Wong et al 2012, Kuhlmann et al 2015), and thus researchers face challenges when attempting to routinely calibrate CTMs at fine spatiotemporal scales.Reduced complexity models can also produce high spatial resolution air quality products by simplifying or forgoing chemistry and physics, instead leveraging statistical or machine learning relationships between emitting activities and pollutants (Burke et al 2021, Tessum et al 2021, Wang et al 2023).However, these products are likewise challenged to validate their output on the scales at which they resolve pollutants.
Like CTMs, satellite observations can produce air pollution estimates across geographies.To create intraurban estimates of NO 2 , satellite data products are oversampled to provide high-resolution insights (Goldberg et al 2021, Dressel et al 2022).A limitation of the product is that observations correspond to the entire vertical column, rather than to the 'nose-level' pollution that affects human health.Moreover, existing observing systems provide just one or two daily snapshots, which excludes important diurnal variation (Penn and Holloway 2020), as well as large swaths of data due to cloud cover and other meteorological conditions (Van Geffen et al 2020).However, new geostationary satellites promise to ameliorate some of these issues, indicating these new data sources are likely to play a critical role in diversified intraurban observatories.In this paper, we apply a hotspot detection algorithm to identify clusters of elevated NO 2 pollution in each of three state-of-the-science datasets.We refer to post-processed data as a product, i.e. (1) a machine learning-interpolated product built from a dense, citywide low-cost sensing network, (2) a high-resolution CTM simulated product, and (3) an over-sampled satellite-observation derived product.We compare the location, extent, and temporal persistence of hotspots across data products, and identify intra-urban areas of high-, medium-, and lowagreement.We find one high-agreement consensus hotspot and interrogate disagreements between datasets for additional insights.Through this work, we offer a method to amplify signal and better understand noise across multiple novel high resolution air quality data products.

Study domain
Our study focuses on Chicago, Illinois, which is located on the southwest coast of Lake Michigan in the central United States (figure 1).Our motivation to focus on Chicago is two-fold: (1) Chicago is a large source of NO x emissions in the Great Lakes Region, contributing to elevated NO 2 and secondary pollutant concentrations such as O 3 , both locally and regionally.( 2  Given seasonal differences in emissions and meteorology, we analyzed data from one warm month and one cool month (August 2021 and February 2022).Chicago's August 2021 mean temperature was +1.7 • C warmer than average (NWS 2023) and the Eclipse sensing network, which was deployed in July 2021, had largely stabilized, with relatively few devices needing to be relocated or replaced after initial adjustments during the first month of the deployment (Daepp et al 2022).In winter, we selected February 2022 due to its low average temperatures (−0.8 • C from climate normal, NWS 2023) as well as evidence of similar or fewer Eclipse sensors with missing data (n = 89) compared to other cool months (n = 92 for December, 82 for January).

Eclipse low-cost sensors
To create the Eclipse Network product, we obtain ground-level measurements of NO 2 from the Microsoft Research Eclipse sensor network.The Eclipse network is comprised of over 100 lowcost air quality sensors around Chicago, deployed through a collaboration between Microsoft Research and JCDecaux Chicago-the local subsidiary of the world's largest outdoor advertising agency, JCDecaux SA-which maintains over 1000 bus shelters across all geographic sectors of the city.The sensing hardware and network design are described in (Daepp et al 2022).Devices were allocated to 80 sites using a stratified random sampling approach following Matte et al (2013) and 26 additional sites recommended by local and community partner organizations.Because low-cost sensors are subject to error and noise, the network additionally included co-locations with EPA regulatory monitors (Clements et al 2022).We used the ongoing co-location data to develop a calibration algorithm that improved accuracy relative to gold-standard EPA regulatory monitoring data (table A1).To create a 1.3 × 1.3 km-gridded highresolution daily and monthly gridded product for this study, we tested several machine-learning and geostatistical methods for spatial data interpolation and selected the best performing approach, a random forest (RF) model (table A2).Further details on the calibration approach and the interpolation methods can be found in appendix A.

TropOMI satellite instrument
To create the TropOMI product, we obtained geospatially continuous NO 2 observations from the Tropospheric Monitoring Instrument (TropOMI) aboard the Sentinel-5p satellite.The Sentinel-5p satellite is a geo-orbiting satellite, so the TropOMI instrument provides daily retrievals of atmospheric species at 13:00 local time with a nadir resolution of 5.5 × 3.5 km for each grid cell.We used the L2 NO 2 product, processed by the Royal Netherlands Meteorological Institute (KNMI), which applies the DOMINO algorithm to convert Level-1b irradiance measurements in the 405-465 nm range into NO 2 vertical column density (Van Geffen et al 2020).To create a cross comparable product against our other air quality datasets, we used pointbased oversampling and regrid the L2 daily observations of NO 2 from TropOMI to the 1.3 km × 1.3 km WRF-CMAQ grid, then average across the daily retrievals to create a monthly product.Full satellite processing information, a comparison to ground-based observations, and comparison to the WRF-CMAQ-simulated column are available in appendix C.

Land-use and social characteristics of hotspots
To provide additional context and discussion for the detected hotspots, we analyze land use characteristics and socioeconomic variables that prior research has shown are commonly associated with NO 2 pollution (Larkin et al 2017).We assess hotspot relationships with zoning footprints, highway locations, and traffic speeds using data from the Chicago Data Portal (https://data.cityofchicago.org/);greenspace using MODIS normalized difference vegetation index (NDVI) obtained from the Planetary Computer (https://planetarycomputer.microsoft.com/); and census tract-level income and demographic data from the 2016-2020 American Community Family Survey (Manson et al 2022).We regrid datasets using area-weighted averages (zoning, traffic, and socioeconomic data) or bilinear interpolation (NDVI).To assess the relationship of each characteristic to each hotspot, we compare the distribution of characteristics within hotspots to the Chicago average.Relationships are considered statistically robust at the 95th percentile confidence level (p < 0.05) when their t-test with Bonferroni adjustment for multiple testing yields p Bonferroni < 0.0167

Normalization
In addition to the monthly post-processed products described in sections 3.1.1-3.1.3,we normalize the monthly products for the Eclipse, WRF-CMAQ and TropOMI NO 2 products to allow for a direct comparison of spatial patterns (as shown in figure 2).To create normalized products, we compute: where x is the raw NO 2 concentration in a given grid cell in a dataset, x min is the minimum concentration across the dataset, and x max is the maximum concentration across the dataset.Normalization allows for values to range from 0 to 1 and are used herein to facilitate descriptions of relative spatial difference.

Hotspot detection
We define NO 2 hotspots using the Getis-Ord G i * statistic (hereafter, G i * ), which is used to identify areas where significantly high or low values are spatially clustered (Getis andOrd 1992, Ord andGetis 1995).
The G i * statistic computes a Z-score for each grid cell and identifies areas where a grid cell and its neighbors' values are significantly higher or lower than would be expected if values were distributed randomly across space.In this study, we focus on areas of high NO 2 concentrations and assign hotspot status when the Z-score significance testing exceeds the 95% confidence level (p < 0.05).Because we cross-compare three data products, we alter our significance screen with a Bonferroni adjustment for multiple testing such that p Bonferroni < 0.0167.As such, the criteria for a gridcell to be classified as a hotspot is: the G i * Z-score value must be positive and the confidence level must exceed the Bonferroni threshold.If the Z-score is negative or the confidence threshold is not met, the grid cells is not classified as a hotspot.We conduct our G i * analysis using the Python package ESDA (Rey and Anselin 2007).To compute the G i * statistic for a given grid cell i: where x j is the NO 2 concentration for grid cell j and n is the total number of grid cells.We calculate the spatial weight between grid cells i and j using the Queen's Contiguity method (w ij = 1 if two cells are adjacent and 0 otherwise).Moreover, We compute the G i * statistic on each highresolution NO 2 product to identify hotspots for each month considered (figure S1).We then assess agreement across product products (figure S2).We define high-agreement or consensus hotspots as areas whose grid cells meet G i * hotspot criteria in all three datasets.Medium-agreement hotspots are areas whose grid cells meet hotspot criteria in two of the three products, and low-agreement hotspots meet criteria in just one.We conduct this agreement analysis for each product-pairing and each month (figure 4).As an additional meteorological robustness screen, we test the effect of wind direction on identified hotspots by determining daily average wind direction, and then binning days according to quadrant, i.e. northeast, southeast, southwest, and northwest.For each data product, we then average daily NO 2 concentrations and apply the G i * statistic to determine if wind direction substantially influences a hotspot's location or spatial extent (figures S5 and S6).

Comparison of high-resolution NO 2 products
To demonstrate the relative abundance of NO 2 concentrations across the city and differences therein, we begin by normalizing each NO 2 data product independently (figure 2).We also compute the weighted area average of the normalized products (µ) to demonstrate the relative difference in concentration magnitudes between products.Each product provides a distinct spatial pattern of normalized NO 2 pollution over Chicago, although pollutant patterns are not necessarily congruous.All data products show relatively high normalized NO 2 on the western edge of the city.However, over the remainder of the city elevated normalized NO 2 concentration footprints do not consistently overlap across the three datasets.Our Eclipse product has the greatest normalized city-wide mean concentration, with distinct areas of elevated normalized NO 2 whose locations differ across months (figures 2(a) and (d)).In contrast, the WRF-CMAQ normalized NO 2 product has lesser month-to-month variation, with elevated concentrations largely coincident with highways and the city center, a.k.a., 'The Loop' (figures 2(b) and (e)).Lastly, the TropOMI product indicates elevated normalized NO 2 concentrations on the west side of the city during both months, but few elevated areas nearer Lake Michigan (figures 2(c) and (f)).
To quantitatively assess dataset similarity, we conduct pairwise grid cell-to-grid cell comparisons and compute Pearson correlation coefficients (r) and mean biases (mb) between the monthly-averaged NO 2 products using the raw data.The data show positive linear relationships across products, with correlation coefficients ranging from 0.1 to 0.7 (figure S1).Eclipse-derived and WRF-CMAQ products are consistently positively correlated (r ⩾ 0.5), though concentration estimates from Eclipse are on average 0.7 ppb higher in February and 4.8 ppb lower in August (figure S1(c)).Consistency between the TropOMI product and other products varies by month (figures S1(a) and (b)).The TropOMI-derived product is strongly positively correlated with the other data products in August (r = 0.6), but in

Identification of hotspots
Given the comparisons above, we conclude that while each high-resolution product shows distinct intraurban variation, concentration estimates have varying levels of consistency at a grid-cell level.To identify high-impact areas of agreement despite dataset differences, we apply the G i * hotspot identification statistic to each data product (figure S2) and assess levels of agreement between datasets according to overlapping hotspot footprints (figure S3).We primarily focus our results on high-to medium-agreement hotspots (i.e.3/3 and 2/3 datasets in agreement), but discuss lowagreement hotspots (1/3) where appropriate.

Consensus hotspot
We identify one high-agreement region of the citya corridor on the west-central margin of the cityin which all three data products indicate statistically significant high NO 2 (figure 3(a)).This consensus hotspot is found in both months considered, but the spatial footprint of the August hotspot is twice the area of the February hotspot due to a contraction of the TropOMI February hotspot (figure S2).For each dataset, the average NO 2 concentrations that comprise the identified hotspot range from 16% to 31% higher than the Chicago area average in August and 18%-22% higher in February (tables S1 and S2).We find that this consensus hotspot is robust to wind direction, as G i * criteria are met in all three datasets regardless of wind direction or month of consideration (figures S4 and S5).

Medium-agreement hotspots
While there is high agreement amongst datasets on the consensus hotspot depicted in figure 3(a), the datasets differ on its spatial extent.In the Eclipse and TropOMI datasets, we find an adjoining mediumagreement hotspot that extends further north and west of the consensus hotspot (figure 3((b; yellow grid cells).Given the two underlying data sources, we refer to this area as an observational hotspot.We note that the observational hotspot is only found in August.In the August observational datasets, we observe NO 2 concentrations that are 14%-27% higher than the city-wide average (table S1).While NO 2 concentrations in this area of our August WRF-CMAQ simulation do not meet G i * hotspot criteria, they are modestly (4%), but insignificantly, higher than the city average (tables S1 and S2).Due to the missing data daily TropOMI coverage, we are unable to the persistence of this hotspot across wind directions.
A second medium-agreement hotspot is found in February and is identified in both the WRF-CMAQ and Eclipse datasets (figure 3(b), green hatched grid cells).As this area straddles DuSable Lake Shore Drive (LSD) and Interstates-94 and -290 (consult figure 1 for roadway labels), we refer to it as the highway hotspot.Grid cells in the highway hotspot have NO 2 concentrations that are 26% greater than the Chicago average (table S1).The February Highway hotspot meets G i * criteria regardless of wind direction (figures S4 and S5).In addition, we note that in August, WRF-CMAQ simulates an NO 2 hotspot with a similar spatial footprint, with concentrations that are 35% higher than the city-wide average (figure 3(c); purple grid cells).However, G i * hotspot criteria are not met in either observational dataset in August, thus the area only attains low-agreement hotspot status in our analysis.We note that this area includes Eclipse and EPA monitors fortuitously located in close proximity, allowing us to explore potential reasons for this model-observation disagreement below.

Highway hotspot interrogation
To identify factors that contribute to the Highway hotspot disagreement between WRF-CMAQ and Eclipse, we compare diurnal WRF-CMAQ and Eclipse variability to observations from an onroad EPA monitor on the hotspot's northern edge (figures 4(a) and (b)).The EPA monitor is located on an elevated stretch of the I-90/94 Kennedy Expressway and is fortuitously located less than 150 m from an Eclipse monitor (figure 4(c)).Hourly time-series data at this location in February 2022 reveal relatively strong agreement between the EPA sensor observations and both WRF-CMAQ (NMB = −1.1%)and Eclipse data (NMB = −8.1%)(table S2).However, product agreement with the EPA sensor is lesser in August 2021.In August, both Eclipse and WRF-CMAQ have higher magnitude biases compared to February, with Eclipse bias higher and negative (NMB = −32.3%)and WRF-CMAQ bias higher but positive (NMB = 39.6%)(figure 4(a), table S3).We note that WRF-CMAQ's bias is largely attributable to the model's nighttime bias (NMB = 65.1% v. 16.0% during the day; table S3), a finding discussed in Montgomery et al (2023) and Zhao et al (2019) related to low model-simulated nighttime titration due to weak vertical mixing in our chosen planetary boundary layer scheme.In contrast, Eclipse's biases are similar at night (NMB = −30.8%)and during the day (NMB = −34.3%)(table S3).We explore Eclipse's consistent bias below.One potential explanation for the difference in NO 2 hotspot classifications at this location is related to the placement of the Eclipse sensor relative to the primary pollution source i.e. highway traffic.
Previous work has demonstrated that pollution measurements near highways can be substantially influenced by both the distance of the sensor from the highway and the sensor height (Gilbert et al 2003, Salmond et al 2013).At this location, the EPA sensor is on an elevated highway 7 m above ground level, while the Eclipse sensor is 150 m distant and 2.4 m above ground level on a nearby bus station (figure 4(c)).Given previous reports of sensor distance/height impacts on pollutant concentration measurements, we explore distance/height relationships between highways and Eclipse sensors across our city-wide network.In Chicago, uncovered Class-1 roadways, i.e. highways with heavy-duty vehicle traffic, exist at three elevations: ground-level (g.l.), below-grade open cut (a.k.a.recessed, 3 m below g.l.), and elevated (4 m above g.l.) (City of Chicago, 2023).Only one Class 1 highway in the city is mostly (>90% length) at ground-level, DuSable Lake Shore Drive, however, heavy-duty vehicle and commercial traffic are restricted.We therefore exclude the ground-level highway from this analysis.We first examine the relationship between Eclipse sensors within 2.5 km of Class-1 highways and monthly average NO 2 concentrations.Within 2.5 km of Class-1 roads, there are 20 sensors near recessed highways and 15 near elevated highways (figure S6(a)).We find no robust relationships (r < 0.05) between an Eclipse sensor's distance from a highway and its average reported monthly NO 2 concentration regardless of highway elevation or month (figures S6(b) and (c)).Next, we assess the relationship between highway elevation and Eclipse reported NO 2 concentrations.We find that in August, mean NO 2 concentrations at Eclipse sensors near elevated highways are 12% lower in magnitude (−1.2 ppb) than concentrations reported by sensors near recessed highways (figure 3(d)), a difference that is small but statistically robust (p = 0.03, table S4).A similar, although not significant pattern holds in February (13%, −1.9 ppb, p = 0.19, table S4).We hypothesize that systemic differences in NO 2 concentrations reported by Eclipse sensors, such as differences driven by highway elevation, contribute to the high negative Eclipse bias we find at the co-located EPA and Eclipse sensors near the I-90/94 Kennedy Expressway (figure 3(c)), and speculate that this systemic bias may be the reason that Eclipse data near this location does not meet G i * criteria in August.
Highway elevations are not explicitly modeled in the WRF-CMAQ.We note that this model limitation likely contributes to the NO 2 concentration differences noted between the Eclipse sensors and WRF-CMAQ simulations.We perform a highway elevation analysis with WRF-CMAQ (figure S7), like the Eclipse analysis reported in figure 3(d).That is, we bin model grid cells that contain the Eclipse sensors assigned to the recessed and elevated highway categories used above.We find small (3%-5%) and insignificant (p > 0.05, table S4) differences in mean NO 2 concentrations for both highway elevations in WRF-CMAQ.Given the lack of differentiated highway elevations in WRF-CMAQ, the lack of simulated NO 2 concentration differences near highways is expected, however this WRF-CMAQ-Eclipse contrast may help explain some of the model-sensor mismatch inherent to dataset comparisons and hyperlocal topographies.

Land-use and social characteristics of hotspots
In the high-agreement consensus hotspot, we find that compared to the city-wide average, this area is characterized by significantly higher industrial zoning (2.5 times higher), lower greenness (NDVI, −23%), and a significantly high proportion of Hispanic or Latino residents (two times higher) relative to the Chicago average.No other attributes pass our robustness screening.The population of the consensus hotspot consists of 332 000-501 000 people, depending on the month.We do note that the consensus hotspot has a non-significant though high population density (+12%), with lower income (−9%) and lower public assistance (−5%) per capita, though not robustly different than the city on average (tables S1 and S2).
In the medium-agreement observational hotspot, we find no significant relationships with land use characteristics.In contrast with the high-confidence hotspot, the observational hotspot has more residential zoning (+17%), but similar industrial zoning to the city-wide average.Given that emissions data in WRF-CMAQ is partially determined by landuse characteristics, it is perhaps not surprising that NO 2 concentrations are not higher in this area in the model output; if the observational evidence reflects a true signal, there may be an additional emissions source not currently captured in the emissions data.Notably, the medium-agreement observational hotspot has a significantly higher Black population relative to the other hotspots and to the Chicago average (i.e.3.2 times higher).The population of the observational hotspot is approximately 58 000 people.Other nonsignificant attributes include a high population density (+34%), lower income (−20%) and higher public assistance (−13%) per capita, though not significantly different than the city on average (tables S1 and S2).
In the medium-agreement highway hotspot, the grid cells share just one robust land use characteristic with the consensus hotspot, low NDVI (−25%).The highway hotspot does not contain significant industrial zoning; instead, compared to the city-wide average it has significantly higher population density (+63%), more highway coverage (three times higher), higher commercial zoning (+25%) and lower residential zoning (−50%) (tables S1 and S2).Due to the high population density, this area also has a high population (244 000-1.1 million).Importantly and somewhat obviously, the highway hotspot has significantly more traffic than the city-wide average, with 61% more arterials and 3.4 times the average bus speeds.The income and assistance in this hotspot are significant, as the highway hotspot corresponds to an affluent part of the city (with income per capita nearly three times higher than the city average), though there is also high public assistance in this area (+50% public assistance, tables S1 and S2).

Discussion
Each of our identified hotspots can offer insights for researchers and regulators.To provide additional context and discussion for the detected hotspots, in the discussion we include land use characteristics and socioeconomic variables that prior research has shown are commonly associated with NO 2 pollution (Larkin et al 2017).
First, the high-agreement consensus hotspot identifies a large, contiguous region with high NO 2 concentrations relative to the city-wide average.The consensus hotspot affects between 332 000-501 000 people (tables S1 and S2).The consensus hotspot is evident across different wind directions and the affected area has significantly higher industrial zoning and low greenness-factors commonly associated with higher NO x emissions.Taken together, this evidence suggests that local sources, rather than regional transport, contribute to the elevated NO 2 concentrations.Further, the consensus hotspot comprises an area that is majority Hispanic or Latino (54%; tables S1 and S2).Although the estimates from each data product, separately, are subject to concerns regarding potential sources of bias and noise, the consistency of results across data sources as well as the urgency of EJ-related health inequities suggests that this area should be prioritized for clean air interventions (Camilleri et al 2023, Visa et al 2023).
Second, the medium-agreement observational hotspot may identify an area where model simulations could be improved, whether through emissions inputs or model physics.In the Observational hotspot, land-use characteristics do not indicate a source of high-emissions.However, since both observational datasets support its presence, an emission source may be missing in the underlying emissions data.Previous studies have used observational datasets to constrain NO x emissions (Goldberg et al 2022) and to identify specific NO 2 emission sources (Georgoulias et al 2020, Zhang et al 2023).However, low data coverage from TropOMI could impact the identification of the observational hotspot, particularly in the winter when meteorological conditions are not conducive to TropOMI observations.This data scarcity highlights the utility of upcoming remote sensing technologies like TEMPO (Naeger et al 2021) that will provide higher spatiotemporal coverage, which in turn could better identify hotspots and help constrain NO x emission sources.Beyond emission uncertainties, the choice of model physics and parameterizations in WRF-CMAQ can bias simulated NO 2 concentrations due to poorly simulated meteorological processes and/or challenges associated with urban settings (Pleim et al 2014, Gilliam et al 2015, Montgomery et al 2023).The observational hotspot identifies an intra-urban area wherein the causes of model-observation mismatch should be thoroughly investigated, as to determine whether a NO x emission source is missing or if a modification of model physics better captures the observed build-up of NO 2 .
Since the observational hotspot is comprised of twice the average Black population compared to the city average (tables S1 and S2) and sits at the northern edge of the high-agreement Consensus hotspot, determining the validity of the observational hotspot is an important question with environmental justice and regulatory implications.The affected area has relatively more Black residents, lower incomes, and more residents receiving direct public assistance compared to the city average or the other identified hotspots, and thus may again indicate an area where excess emissions constitute an environmental justice burden.Given only medium agreement amongst air quality data products, this area would be well-served by additional routine monitoring specifically for NO 2 or a mobile monitoring campaign that could better evaluate the hotspot's 'true' bounds under a variety of ambient conditions (Chambliss et al 2021, Peters et al 2022).
Third, the medium-agreement highway hotspot highlights an area where both the model and Eclipse data indicate an NO 2 hotspot associated with significantly higher traffic for 1 of the 2 months considered (tables S1 and S2).In August, the Eclipse sensor network does not identify this hotspot.We show that in August, sensors placed below elevated highways report less NO 2 relative to their counterparts near recessed highways (figure 4).While the low-cost sensor network has consistent placement with respect to height and location at ground-level, the placement of sensors near high-emitting sources like highways were not standardized, which complicates the model-sensor comparison.This finding is not discussed in the literature when creating high-density, intra-urban sensor-model comparisons, and this study highlights the importance of this hyperlocal interface for model-sensor comparison.However, in February, we identify a hotspot in the Eclipse data in this area.We note that in February, differences between elevated and recessed Eclipse NO 2 concentrations are less pronounced than in August, potentially related to the lower boundary layer and lesser dilution in the cool season, a phenomenon that is apparent in both Eclipse and CMAQ data (figures 2(d) and (e)).
Whether the observed difference NO 2 from sensor placement between elevated and recessed highways reflects an actual difference in exposure depends on the research question: for regulators seeking to quantify traffic emissions, our findings highlight the importance of placing sensors at the level of elevation at which emissions occur.But for public health researchers and practitioners who are concerned with the pollution levels where people breathe, ground-level sensing captures a meaningful difference in the adverse effects associated with highway heights and shows that CTM output may need to be adjusted for use in hyperlocal pollutant exposure and health impact quantification.This finding further shows the benefit of comparatively evaluating multiple different air quality data products for producing new and valuable insights.
The work reported here is subject to several important limitations.First, we examine just two months of data due to the computationally intensive requirements of running WRF-CMAQ; although we chose these months to be representative of one summer and one winter month based on meteorological conditions, further investigation is needed to evaluate the persistence of observed hotspots over other periods.Our ability to identify hotspots in either observational dataset is further limited by missing data, particularly in the winter.Eclipse sensors, which are solar-powered reported proportionally fewer days at fewer locations in February compared to August.Likewise, February TropOMI retrievals had significantly fewer valid grid cells (−60%) than the August retrieval.This highlights the utility of using CTMs to fill gaps in observational networks.However, new geostationary satellites will mitigate some data coverage issues by enhancing the number of retrievals per day.As shown in Appendix C, the over sampled monthly TropOMI product is more representative of surface and column NO 2 than the daily TropOMI product when compared to WRF-CMAQ NO 2 .Increasing the number of successful retrievals increases the spatial representativeness of the satellite product at finer resolutions, so future technologies are poised to be more successful at analyzing daily hotspots.As such, our hotspot identification research could be used as a foundation for future work preparing for new data sources and the role they could play in diversified intra-urban air quality characterization efforts.Second, the boundaries of the hotspots in each of the different affected areas are sensitive to the interpolation scheme.As shown in appendix B and C, the interpolation of TropOMI and Eclipse to the 1.3 km grid do not integrate additional meteorological or land-use information that influences the spatial heterogeneity of NO 2 .Future work could further enhance the spatial representativeness of the observational data products by explicitly considering these characteristics (Jain et al 2021, Yu andLiu 2021).Given that the interpolation schemes affect the representation of the data on the grid, changing interpolation schemes would affect the resulting clustering output of the model.Third, the magnitude of the difference of hotspots versus the city overall is small in comparison with the precision and error observed for the sensors; nevertheless, the consistency of the hotspots across different data products bolsters our confidence that these measurements represent true differences in pollution levels.Finally, we examined data for just one city; however, our method would easily generalize to other cities as dense, urban-scale sensing networks continue to proliferate.

Conclusion
In this paper, we construct high-resolution products estimating NO 2 pollution across a major U.S. city from three different datasets.Although all three datasets exhibit positive correlations, associations are subject to noise.However, when we apply a hotspot detection algorithm to each of these three products, we identify a region where all three data products show significantly elevated NO 2 concentrations in both summer and winter months, suggesting with high confidence the presence of a large contiguous area with elevated NO 2 pollution.We estimate that this hotspot affects as many as 501 000 people, who are exposed to NO 2 concentrations that are 16%-32% higher than the city-wide average.Moreover, this Consensus hotspot is evident regardless of wind direction suggesting a need to interrogate contributions from local sources.
We also identify two regions where hotspots are detected in either the model-derived or the remote sensing and ground sensor-derived products.While disagreement across data products limits our confidence in the use of these regions for targeted interventions, further interrogation of product differences suggests clear strategies to improve our confidence in each dataset.Future work could adapt our approach to detect areas of concern systematically and automatically, either with high confidence-indicating the need for targeted intervention-or with lower confidence indicating priority areas for expanded monitoring and evaluation.Through this work, we show how multiple novel high-resolution data products can act as complementary components of a diversified urban monitoring and modeling framework.
for their feedback on early versions of this work.We are also grateful to C. Needham Jr, L Story, D Gehring, G Jancke, T Werner, R Mansour, and S Mudd for Eclipse sensor network design and support.In addition, we are grateful to Chicago's community-based organizations for their insights and expertise, as well as P Banerjee, M Grazioli, G Brussel, N Clochard-Bossuet, P Rehus, and the JCDecaux leadership and maintenance team for their support.
) Chicago also has several innovative datasets for neighborhood-scale air pollution research, including a dense, citywide low-cost sensing configuration (Daepp et al 2022) and a highresolution and recently validated WRF-CMAQ simulation configuration (Montgomery et al 2023).

Figure 1 .
Figure 1.The city of Chicago, IL with major geographic features such as airports, highways (navy), and neighborhood boundaries (black) delineated.The locations of the NO2 eclipse sensors used in this study are marked as black circles.In the lower left corner, Chicago is marked as a red star within the map of the United States.

Figure 2 .
Figure 2. Normalized NO2 concentrations for August 2021 and February 2022 from the (a), (d) interpolated Eclipse sensor network, (b), (e) WRF-CMAQ simulations, and (c), (f) TropOMI satellite observations.In the lower left, we provide the weighted city-wide area average (µ) of normalized NO2 concentrations for each data product.

Figure 3 .
Figure 3. High-, medium-, and low-agreement NO2 concentration hotspots.Agreement is based on the number of datasets with overlapping footprints that meet G i * hotspot criteria, i.e. 3/3 = high, 2/3 = medium, and 1/3 = low.Datasets include the eclipse sensor network, WRF-CMAQ model, and TropOMI satellite.The black dot in panels (b), (c) is the location of the I-90/94 Kennedy Expressway EPA monitor discussed in figure 4.

Figure 4 .
Figure 4. Highway hotspot discrepancy investigation.(a) August 2021 and (b) February 2022 average diurnal NO2 concentration variations from an Eclipse sensor(blue), an EPA sensor (orange), and the co-located WRF-CMAQ grid-cell (green).(c) Precise locations of the EPA and Eclipse sensors relative to the I-90/94 Kennedy Expressway.The eclipse sensor is at ground-level (i.e.mounted on a bus stop enclosure) while the EPA sensor is on an elevated highway 7 m above ground level.In (d) we compare normalized NO2 concentrations from all near-highway (<2.5 km) Eclipse sensors (n = 35), grouped by highway heights, i.e. recessed (n = 20) or elevated (n = 15).The asterisk indicates statistically robust distribution differences while diamonds show outliers.Co-located CMAQ grid cell data are presented in figures S1 and S2.