Tropospheric ozone data assimilation in the NASA GEOS Composition Forecast modeling system (GEOS-CF v2.0) using satellite data for ozone vertical profiles (MLS), total ozone columns (OMI), and thermal infrared radiances (AIRS, IASI)

The NASA Goddard Earth Observing System Composition Forecast system (GEOS-CF) provides global near-real-time analyses and forecasts of atmospheric composition. The current version of GEOS-CF builds on the GEOS general circulation model with Forward Processing assimilation of meteorological data (GEOS-FP) and includes detailed GEOS-Chem tropospheric and stratospheric chemistry. Here we add 3D variational data assimilation in GEOS-CF to assimilate satellite observations of ozone including MLS vertical profiles, OMI total columns, and AIRS and IASI hyperspectral 9.6 μm radiances. We focus our evaluations on the troposphere. We find that the detailed tropospheric chemistry in GEOS-CF significantly improves the simulated background ozone fields relative to previous versions of the GEOS model, allowing for specification of smaller background errors in assimilation and resulting in smaller assimilation increments to correct the simulated ozone. Assimilation increments are largest in the upper troposphere and are consistent between satellite data sets. The OMI and MLS ozone data generally provide more information than the AIRS and IASI radiances except at high latitudes where the radiances provide more information. Comparisons to independent ozonesonde and aircraft (ATom-4) observations for 2018 show significant GEOS-CF improvement from the assimilation, particularly in the extratropical upper troposphere.


Introduction
Ozone is produced in the troposphere by photochemical oxidation of emitted carbon monoxide (CO) and volatile organic compounds (VOCs) in the presence of nitrogen oxides (NO x ≡ NO + NO 2 ). It is an atmospheric pollutant that irritates human respiratory tracts and damages vegetation. Ozone is also a major greenhouse gas and has absorption features in both the ultraviolet/visible (UV/Vis) and the thermal infrared (TIR) that allow for sensing by satellites.
Tropospheric ozone is the most measured of all atmospheric trace gases with thousands of surface network sites around the world, regular sonde and aircraft profiles, multiple observing satellites from space, and a long history of observational records dating back to the 19th century (Schultz et al 2017, Archibald et al 2020, Brönnimann 2022). Yet there are large uncertainties in the interpretation of ozone trends, and consistency between different observational data sets is often unclear (Christiansen et al 2022). Here, we show the benefits of ozone chemical data assimilation (CDA) to improve model ozone forecasts in the next generation of the NASA Goddard Earth Observing System Composition Forecast system, GEOS-CF, and to offer a consistent framework for the interpretation of ozone satellite observations.
Large uncertainties exist in model estimates of tropospheric ozone and its trends. Observations from ground stations, ozonesondes, and satellites indicate that tropospheric ozone has generally increased in recent decades (Ziemke et al 2019, Cooper et al 2020, yet current models underestimate that trend (Christiansen et al 2022, Wang et al 2022. Potential sources driving this model bias include errors in tropical emissions (Zhang et al 2021), NO x -VOC chemistry (Shah et al 2023), stratosphere-troposphere exchange (Neu et al 2014, Lu et al 2019, halogen chemistry (Wang et al 2021), and deposition (Clifton et al 2020). Global continuous observations of tropospheric ozone from satellites in the UV/Vis and the TIR could provide an important resource for testing and improving models (Colombi et al 2021, Mettig et al 2022, but are difficult and often inconsistent between satellite instruments including in their reported overall (Schultz et al 2017) and seasonal (Thompson et al 2021) trends. A common framework is needed to combine models with observational data to produce a more realistic representation of the state of tropospheric ozone.
CDA is a tool that uses numerical models' ability to accurately propagate information on relatively short time scales to construct global distributions of chemical species based on assimilated observations (Brasseur and Jacob 2017). Data assimilation has been used for many decades in meteorology to obtain accurate initial conditions for numerical weather forecasts and to construct re-analyses of past weather (Kalnay 2002, Navon 2009). For atmospheric chemistry applications, CDA may be used to optimize 3D fields of atmospheric concentrations (Bocquet et al 2015) and in inverse modeling approaches to optimize emissions (Yumimoto and Takemura 2013, Miyazaki et al 2015, Qu et al 2019. Many studies have used CDA to improve the representation of stratospheric ozone in atmospheric models (Fierli et al 2002, Lahoz et al 2007, Pierce et al 2007, Wargan et al 2020b, with many of these CDA advancements produced by Copernicus and the European Centre for Medium-Range Weather Forecasts (ECMWF; Benedetti et al 2009, Engelen and Bauer 2014, Inness et al 2015, Flemming et al 2017, Peuch et al 2022, and NASA's Global Modeling and Assimilation Office (GMAO; Miyazaki et al 2015, Weir et al 2021, Wargan et al 2023.
GEOS-CF is being developed at NASA to leverage meteorological data assimilation in the GEOS Earth System Model for application to chemical analyses and forecasts (Keller et al 2021). It presently offers global 5-day operational forecasts of atmospheric composition powered by the GEOS-Chem atmospheric chemistry module (www.geos-chem. org) at 25 km resolution (Knowland et al 2022). The current generation of GEOS-CF does not assimilate tropospheric ozone observations and only nudges stratospheric ozone toward precomputed reanalysis fields but otherwise does not directly assimilate any trace gases. Forecast initial conditions are provided by a 1-day replay simulation ('hindcast') constrained by pre-computed meteorological analysis fields. Parallel activities at GMAO have focused on assimilating stratospheric ozone in the GEOS atmospheric general circulation model (GCM) with satellite observations from the Microwave Limb Sounder (MLS; Waters et al 2006) and Ozone Monitoring Instrument (OMI; Levelt et al 2006Levelt et al , 2018 but with precomputed mean chemical production and loss frequency (Ziemke et al 2014, Wargan et al 2015. This assimilation does not benefit the model representation of tropospheric ozone, especially in the lower troposphere (Wargan et al 2015(Wargan et al , 2017. The use of a simple chemistry parameterization is acceptable in the lower and middle stratosphere because of the long ozone lifetime and the simple dynamics, but not in the troposphere where the ozone lifetime is only a few weeks, chemistry induces fast changes in concentrations, and available satellite observations do not contain sufficient vertical information. To date, there has been no application of using a CDA system for tropospheric ozone in GEOS-CF. The hyperspectral structure of the 9.6 µm band has proven to be particularly useful for satellite retrievals of tropospheric ozone (Worden et al 2007, Nassar et al 2008. The retrieval requires local vertical profile information on atmospheric temperature and water vapor (Bowman et al 2006). Assimilation of the retrieved ozone concentrations would incur biases if these profiles were different in the GCM than assumed in the retrieval. A solution is direct assimilation of the radiances (Han and McNally 2010, Dragani and McNally 2013, Karpowicz et al 2022. CDA of 9.6 µm hyperspectral radiances has been implemented in the GEOS GCM but again without a full chemistry model (Karpowicz et al 2022). The addition of these radiances has been found to complement measurements from the MLS, OMI, and OMPS-NM instruments which do not provide a constraint for ozone in the upper troposphere and lower stratosphere (UTLS), while also reducing the model bias against ozonesondes in the Western Pacific (Karpowicz et al 2022). Weather forecasting agencies such as ECMWF have long used hyperspectral radiances to assimilate temperature and water vapor (Mcnally et al 2006, Collard andMcNally 2009), and extension to ozone is straightforward (Han andMcNally 2010, Dragani andMcNally 2013).
Here, we describe the implementation of a threedimensional variational data assimilation (3D-Var) system for ozone in GEOS-CF with support from GEOS-Chem tropospheric and stratospheric ozone chemistry. In addition to MLS and OMI ozone, we assimilate 9.6 µm hyperspectral radiances from the AIRS and IASI satellite instruments. We evaluate the results by comparison to independent ozone observations from ozonesondes and from the ATom-4 aircraft campaign (Bourgeois et al 2020). Characterizing the performance of a 3D-Var system for ozone provides a foundation for improving ozone forecasts in future versions of GEOS-CF (v2.0) and represents a step towards developing chemical reanalysis products.

Model description
Experiments were conducted with a modified version of GEOS-CF v1.0 (Keller et al 2021). The system is based on version 10.23.0 of the GEOS GCM, as described in Rienecker et al (2008) with updates from Molod et al (2015) and including Forward Processing (GEOS-FP) meteorological data assimilation increments. The model uses a finite-volume dynamical core with a cubed sphere grid discretization to avoid grid singularities (Putman and Lin 2007). The physics package includes parameterizations for moist processes, radiation, turbulence, land-surface interactions, and gravity wave drag, as described in Keller et al (2021). The moist module includes an updated Grell-Freitas convection scheme (Freitas et al 2018) and a single-moment parameterization for large-scale precipitation and cloud cover (Bacmeister et al 2006). Shallow convection is based on Park and Bretherton (2009), with boundary layer turbulent mixing following Lock et al (2000) and Louis and Geleyn (1982). The model is run here on a cubed-sphere horizontal grid at c90 resolution (1 • × 1 • ) and on 72 GEOS hybrid-eta model layers from the surface to 0.01 hPa, and for a 1-year period from January to December 2018.
GEOS-CF uses the GEOS-Chem chemistry module to simulate coupled aerosol-oxidant chemistry in the troposphere and stratosphere. This module is a stand-alone component of the standard offline version of the GEOS-Chem chemical transport model (CTM) driven by archived GEOS winds (Long et al 2015, Nielsen et al 2017. It contains all the local operations of GEOS-Chem (emissions, chemistry, deposition) decoupled from transport, and is implemented into GEOS as an Earth System Modeling Framework gridded component. In this way, updates to the GEOS-Chem CTM contributed by its user community can be seamlessly passed to GEOS-CF (Hu et al 2018).
Here we use GEOS-Chem version 13.4.0 (https:// zenodo.org/record/7254268). Halogen chemistry updates in this version cause an underestimate of tropospheric ozone (Wang et al 2021). We use the CEDS inventory for anthropogenic emissions (Hoesly et al 2018) while all other emission inventories are the same as in Keller et al (2021). Lightning NO x emissions are calculated online following Murray et al (2012) without applying time-dependent redistribution factors (Keller et al 2021). Annual total lightning NO x amounts to 5.9 Tg/N for 2018, which is in good agreement with other studies and the GEOS-Chem CTM (Keller et al 2021). The absence of redistribution factors leads to an underestimation of simulated lightning flash rates over the Northern Hemisphere extratropics, which may be one of the primary reasons for the model's underestimation of ozone in the Northern Hemisphere upper troposphere.

Ozone assimilation system
Here we introduce assimilation of ozone data into the GEOS-CF system. Temperature, surface pressure, winds, and water vapor are constrained by the MERRA-2 reanalysis using a replay method as described in Orbe et al (2017) and Wargan et al (2020a). Ozone is not constrained by the reanalysis but, instead, is assimilated using a configuration of the GEOS Data Assimilation System (GEOS-DAS) similar to that used in MERRA-2 (Wargan et al 2017) but with ozone background states (see section 2.4) calculated using GEOS-Chem and with the added capability of ingesting 9.6 µm radiance observations as described in section 2.4 below. The GEOS-DAS uses the Gridpoint Interpolation System (GSI; Purser et al 2003a, 2003b, Todling and El 2018 run in a 3D-Var mode. The system infers three-dimensional constituent mixing ratios on a 0.5 • × 0.5 • latitudelongitude grid and 72 terrain-following vertical levels every 6 hours. Retrieval and background uncertainties are taken from the observational data products as outlined in the following sections.

Background error covariances for ozone assimilation
Hereafter, the term 'background' refers to the ozone concentrations simulated by GEOS-CF prior to the assimilation step, and to be corrected by the assimilation. In CDA, background and observational errors (uncertainties) provide weighting used to update a short-term (in our case, six-hourly) background state with information from the observations. By including the background errors in the analysis, the assimilation system is able to estimate the true state of ozone concentrations in the atmosphere by balancing the information from the observations with the information from the model predictions. However, if the background errors are overestimated, this can lead to a bias in the analysis, which can propagate through the forecast and lead to inaccurate predictions or large errors. We will present results using different approaches for calculating ozone background error fields. The background error covariances are calculated using the recursive filters algorithm described in Wu et al (2002). The correlation length scales are calculated from 48-h and 24-h forecast differences and applied in the same way as for meteorological fields (Wu et al 2002). The vertical length scales span approximately five model levels (about 1 km in the troposphere). The horizontal length scales are 1200-1500 km in the troposphere and increase to up to 3000 km in the stratosphere (Wargan et al 2015).
Previous GEOS assimilation studies assumed that the background error standard deviation for ozone is proportional to the forecast ozone mixing ratio at each grid point, such that errors are largest where ozone mixing ratios are the highest (Wargan et al 2015, Karpowicz et al 2022. These studies inflated ozone background errors by a factor of 4 in the troposphere to make up for the low concentrations and the absence of a tropospheric chemistry simulation, unlike in GEOS-CF where ozone chemistry is included. We will show that including chemistry avoids the need to inflate the relative background error in the troposphere.

Assimilated concentrations and radiances
We assimilate MLS ozone mixing ratio profiles and OMI total column ozone retrievals in combination with AIRS and IASI ozone-sensitive 9.6 µm radiances. MLS provides a strong constraint on stratospheric ozone, and OMI adds column information with indirect information for the troposphere. The radiances are included in this assimilation system to verify if they increase the information content on tropospheric ozone as demonstrated by Karpowicz et al (2022) in the case of parametrized ozone chemistry.
MLS and OMI are onboard NASA's Aura satellite. We use ozone profiles from version 4.2 of the MLS retrieval algorithm (Livesey et al 2020) in which ozone information is derived from 25 spectral channels in a spectral band centered at 240 GHz. We use the MLS observations on 38 vertical layers between 216 hPa and 0.1 hPa following Wargan et al (2017Wargan et al ( , 2023. The instrument makes day and night observations between 82 • S and 82 • N along 15 Sun-synchronous orbits per day. OMI provides total column ozone from UV sensing using averaging kernels on 11 levels (1000-500 hPa, 500-250, 250-125, etc) (Levelt et al 2006(Levelt et al , 2018. The assimilation methods and configuration we employ are the same as in previous published work (Wargan et al 2015(Wargan et al , 2017 but in the context of the GEOS-CF system with chemistry. A more detailed discussion of OMI and MLS observation errors is found in Wargan et al (2015).
We also employ TIR spectra from the AIRS and IASI satellites to assimilate 9.6 µm ozonesensitive radiances. The AIRS instrument is a grating spectrometer with 2378 channels ranging from 3.7 to 15.4 µm (Aumann et al 2003). A subset of 281 channels is available for GEOS operational systems. GEOS-FP, which provides meteorological increments, utilizes 115 from the subset selecting temperature and moisture sounding channels. In this work, four AIRS channels in the 9.6 µm band are assimilated in the CDA system following Karpowicz et al (2022). The IASI instrument is a Fourier Transform Spectrometer with 8461 channels ranging from 3.62 to 15.5 µm (Blumstein et al 2004). Similar to AIRS, a subset of 616 IASI channels is available, where GEOS-FP includes 134 temperature and moisture sounding channels for meteorological increments. Following Karpowicz et al (2022), seven IASI channels in the ozone sensitive 9.6 µm band are assimilated in the CDA system. We use the radiance assimilation module within the GSI to provide ozone increments using 9.6 µm radiance observations from AIRS and IASI. Greater discussion of channel selection criteria and discussion of observation errors is found in Karpowicz et al (2022). Table 1 shows a summary of the satellite information and the average number of assimilated observations per 6-h CDA window used in this work. The observations extend from January through December 2018.

Assimilation simulations
Here we outline the different assimilation configurations that we use in this work for a one-year simulation in 2018, all starting from the same initial conditions. We conduct a free-running GEOS-CF simulation labeled 'Control' without assimilating ozone. In addition, we conduct three simulations that assimilate (1) OMI and MLS ('OMI + MLS'), (2) radiances from AIRS and IASI ('Radiances'), and (3) OMI, MLS, and radiances from AIRS and IASI ('Full Assimilation'). We will also show comparisons to the GEOS-FP ozone reanalysis product which assimilates stratospheric ozone observations from MLS (Knowland et al 2022). GEOS-FP ozone in the troposphere is transported and affected by precomputed chemical production and loss frequencies, which leads to large inaccuracies due to the lack of active chemistry (Wargan et al 2015). To summarize, the following simulations are compared: 1. Control: Free-running simulation with no assimilation of ozone 2. GEOS-FP: Reanalysis product which assimilates MLS stratospheric ozone observations 3. OMI + MLS: Assimilate ozone observations from OMI and MLS 4. Radiances: Assimilate 9.6 µm radiances from AIRS and IASI 5. Full Assimilation: Assimilate ozone observations from OMI and MLS, and 9.6 µm radiances from AIRS and IASI.

Independent evaluation data
We evaluate the GEOS-CF assimilation system's ability to provide realistic representations of ozone concentrations by comparing to independent ozonesonde and aircraft campaign observations. For the one-year period from January 1, 2018 through December 31, 2018, we compare the system's 6hourly average forecast fields to observations from the World Ozone and Ultraviolet Data Center (WOUDC; www.woudc.org) and the NOAA Earth System Research Laboratory's Global Monitoring Division (ftp://ftp.cmdl.noaa.gov/ozww/Ozonesonde/). In addition, we use ozone data from the NASA ATom-4 field campaign (Bourgeois et al 2020), which sampled the remote troposphere with the DC-8 aircraft over the Pacific Ocean from approximately 200 m to 12 km altitude during spring 2018 (April 26 -May 21). We sample the model along the flight tracks, and the observations are averaged to the model grid and time steps.

Sensitivity to the choice of background error covariance
The proper choice of background error is a critical element of a data assimilation system. Its definition depends on the underlying modeling system and is thus sensitive to model changes. The GSI ozone background error variance in the troposphere is defined such that the standard deviation of the error is 20% of the background ozone concentration. The default system is designed for the stratosphere and inflates the background error standard deviation in the troposphere by a factor of 4 to account for the lack of adequate tropospheric ozone chemistry in GEOS-FP and MERRA-2. This can lead to unrealistic ozone fields in the troposphere in a GEOS-CF system with full chemistry as errors in the OMI-MLS residual would exert excessive tropospheric corrections. In other words, overestimating the tropospheric background errors relative to the stratosphere in GEOS-CF can result in disproportionate systematic errors and noise being introduced into the troposphere. This is shown in figure 1, where we compare ozonesonde observations for March, April, and May (MAM) 2018 against ozone profiles derived from the Full Assimilation runs with and without tropospheric error inflation. These simulations show that a uniform relative background error in the stratosphere and troposphere is more appropriate for GEOS-CF. Thus, this approach is used for subsequent simulations.

Ozone analysis increments
Assimilated concentrations at pressures less than 216 hPa are strongly constrained by MLS, while attributing increments at pressures greater than 216 hPa is more challenging. Ozone analysis increments below this vertical level are attributable to OMI and/or the 9.6 µm radiances. Here, we investigate these differences. The left column of figure 2 shows the monthly-averaged 6-h ozone increments applied over the course of the 2018 simulation with the Full Assimilation. Corrections are largest at high altitudes, where the observations are most sensitive. The largest increments are at high northern latitudes where GEOS-CF (Keller et al 2021) and GEOS-Chem (Shah et al 2023) have the poorest fit to observations. The upward (positive) correction is particularly large in spring, when recent GEOS-Chem developments for NO x cloud chemistry (Holmes et al 2019) and halogen chemistry (Wang et al 2021) cause substantial ozone underestimates at high northern latitudes. By contrast, the model tends to overestimate ozone in the tropics and summertime high latitudes, leading to downward (negative) corrections. The corrections are nevertheless small compared to previous CDA in GEOS-FP with no chemistry, where the background tropospheric ozone concentrations were considerably biased (Wargan et al 2015, Karpowicz et al 2022. Also shown in figure 2 are the 6-h increments from the 9.6 µm Radiances assimilation, in which we only assimilate radiances from AIRS and IASI. The increments are generally much smaller than in the Full Assimilation, implying that the radiances provide little information relative to OMI + MLS. An exception is the high northern and southern latitudes, where the 9.6 µm radiances contribute most of the increment in the Full Assimilation. This is consistent with the lack of tropospheric ozone information from the OMI column UV measurements at high latitudes, whereas the 9.6 µm band is still sensitive to ozone at those latitudes. Assimilating the 9.6 µm radiances thus shows significant added value at high latitudes.   Figure 3 compares the time averaged ozone concentrations from the GEOS-FP, Control, and Full Assimilation simulations to ozonesonde observations over 2018. Figure 4 compares the standard deviation of the residual between ozonesonde observations and the GEOS-FP, Control, and Full Assimilation. GEOS-FP (where the treatment of ozone chemistry is simplified and inadequate in the troposphere) generally  overestimates ozone throughout the vertical column and particularly in the lower/middle troposphere where the real-world chemistry leads to net ozone loss. The Control has a greater ability to capture ozone below 500 hPa in the tropics compared to GEOS-FP but tends to largely underestimate UTLS ozone in the extratropical Northern and Southern Hemispheres. The Full Assimilation generally corrects that bias, and the 9.6 µm radiances play a major role in that correction. The smaller standard deviations observed between the ozonesondes and Full Assimilation, compared to those between the ozonesondes and Control, suggest that Full Assimilation not only improves upon biases but also generally captures more variability. The correction is usually small below 500 hPa because of relative lack of sensitivity from the observations (figure 2). Information from the assimilation can be transported from the UTLS down to lower altitudes, as seen in figure 3 for southern high latitudes, but in the tropics and Northern Hemisphere the effect of this transport is overwhelmed by active chemistry. Figure 5 shows the vertical distributions of tropospheric ozone in the Control and Full Assimilation simulations in comparison to the ATom-4 field campaign over the Pacific and Atlantic Oceans. We sample the GEOS-CF model using 6-hourly average forecast fields and map them to the nearest times and locations of each ATom-4 observation. The Control simulation with no assimilation simulates the ATom-4 observations with an R 2 of 0.39 and a root-meansquare error (RMSE) of 25 ppb. The Full Assimilation simulation has an R 2 of 0.77 and RMSE of 15 ppb. The improvement is greatest at northern high latitudes, consistent with the ozonesonde results, but there is also significant improvement in the tropical middle troposphere.

Conclusions
This work described the implementation of a 3D-Var CDA system for tropospheric ozone in the NASA Goddard Earth Observing System Composition Forecast (GEOS-CF) system including GEOS-Chem stratospheric and tropospheric chemistry. The motivation was to improve the capability of GEOS-CF to provide analyses and forecasts of tropospheric ozone. The results here will provide the basis for the next version of the GEOS-CF system (v2.0) which will include such a CDA system. We assimilated satellite observations including MLS vertical profiles of ozone down to 216 hPa, OMI total ozone columns, and AIRS and IASI 9.6 µm radiances. We find that employing GEOS-Chem chemistry in GEOS-CF provides an improved ozone model background field for assimilation as compared to previous GEOS model versions, resulting in smaller assimilation increments in the troposphere. CDA corrections are mostly in the upper troposphere and are generally dominated by information from MLS and OMI, but information from the 9.6 µm radiances can dominate the correction at high latitudes where the OMI UV retrieval has little sensitivity to the troposphere. Comparison to independent ozonesonde and ATom-4 aircraft observations shows that CDA significantly improves the representation of tropospheric ozone in GEOS-CF, with most of that improvement taking place in the upper troposphere.

Data availability statement
GEOS-CF and GEOS-FP model output is centrally stored at the NASA Center for Climate Simulation (NCCS). Public access to these archives is provided by the GMAO at https://gmao.gsfc.nasa.gov/GMAO_ products/. Output from the assimilation experiments described in this paper are available upon request from CAK.