Earth as an Exoplanet. III. Using Empirical Thermal Emission Spectra as an Input for Atmospheric Retrieval of an Earth-twin Exoplanet

In this study, we treat Earth as an exoplanet and investigate our home planet by means of a potential future mid-infrared space mission called the Large Interferometer For Exoplanets (LIFE). We combine thermal spectra from an empirical data set of disk-integrated Earth observations with a noise model for LIFE to create mock observations. We apply a state-of-the-art atmospheric retrieval framework to characterize the planet, assess the potential for detecting the known bioindicators, and investigate the impact of viewing geometry and seasonality on the characterization. Our key findings reveal that we are observing a temperate habitable planet with significant abundances of CO2, H2O, O3, and CH4. Seasonal variations in the surface and equilibrium temperature, as well as in the Bond albedo, are detectable. Furthermore, the viewing geometry and the spatially and temporally unresolved nature of our observations only have a minor impact on the characterization. Additionally, Earth’s variable abundance profiles and patchy cloud coverage can bias retrieval results for the atmospheric structure and trace-gas abundances. Lastly, the limited extent of Earth’s seasonal variations in biosignature abundances makes the direct detection of its biosphere through atmospheric seasonality unlikely. Our results suggest that LIFE could correctly identify Earth as a planet where life could thrive, with detectable levels of bioindicators, a temperate climate, and surface conditions allowing liquid surface water. Even if atmospheric seasonality is not easily observed, our study demonstrates that next generation space missions can assess whether nearby temperate terrestrial exoplanets are habitable or even inhabited.

1. INTRODUCTION The atmospheric characterization of terrestrial exoplanets in the habitable zone (HZ; Kasting et al. 1993;Kopparapu et al. 2013) and the search for life are key endeavors in exoplanet science (e.g., Astrobiology Strategy and Astro 2020 Decadal Survey in the United States: Hays et al. 2017;National Academies of Sciences, Engineering, and Medicine 2021).Constraining the composition, structure, and dynamics of exoplanet atmospheres yields valuable insights into Corresponding author: Jean-Noël Mettler, Björn S. Konrad jmettler@phys.ethz.ch,konradb@student.ethz.ch* Both authors contributed equally to this publication.planetary habitability and could lead to the detection of life beyond our solar system.
Terrestrial HZ exoplanets are detectable with current observatories (see, e.g., Hill et al. 2023, for a catalogue).Exoplanet transit surveys such as the Kepler mission (Borucki et al. 2010) and the Transiting Exoplanet Survey Satellite (TESS; Ricker et al. 2015) as well as current long-term radial velocity (RV) surveys have revealed that HZ planets with Earth-like radii and masses are abundant in the galaxy (e.g., Bryson et al. 2021).Such exoplanets have already been detected within 20 pc of the sun with both the transit (e.g., Berta-Thompson et al. 2015;Gillon et al. 2017;Vanderspek et al. 2019) and the RV (e.g., Anglada-Escudé et al. 2016;Ribas et al. 2016;Zechmeister et al. 2019) meth-ods.Ongoing observations with the James Webb Space Telescope (JWST) are revealing whether terrestrial HZ exoplanets transiting nearby M dwarfs have significant atmospheres (e.g., Koll et al. 2019;Greene et al. 2023;Zieba et al. 2023;Lustig-Yaeger et al. 2023b;Ih et al. 2023;Lincowski et al. 2023;Madhusudhan et al. 2023;Lim et al. 2023).However, performing a detailed atmospheric characterization for such planets with JWST is challenging (e.g., Morley et al. 2017;Krissansen-Totton et al. 2018).Observations with the future 40 m ground-based extremely large telescopes (ELTs) will reach unprecedented spatial resolution and sensitivity.The ELTs will directly detect HZ exoplanets around the nearest stars via their thermal emission (e.g., Quanz et al. 2015;Bowens et al. 2021) and the reflected stellar light (e.g., Kasper et al. 2021).However, none of the current or approved future ground-or space-based instruments is capable of performing an in-depth atmosphere characterization for a statistically meaningful sample (dozens) of such exoplanets.
Therefore, the exoplanet community is working toward more capable observatories.LUVOIR (The LUVOIR Team 2019) and HabEx (Gaudi et al. 2020) were designed to directly detect the stellar light reflected by terrestrial exoplanets at ultraviolet, optical, and near-infrared (UV/O/NIR) wavelengths.Following the evaluation of both concepts in the Astro 2020 Decadal Survey in the United States (National Academies of Sciences, Engineering, and Medicine 2021), the space-based UV/O/NIR Habitable Worlds Observatory (HWO) was recommended.However, also the mid-infrared (MIR) thermal emission of exoplanets (and its time variability) contains a wealth of unique information about the planetary atmosphere and surface conditions (e.g., Des Marais et al. 2002;Hearty et al. 2009;Catling et al. 2018;Schwieterman et al. 2018;Mettler et al. 2020;Mettler et al. 2023).The Large Interferometer For Exoplanets (LIFE), a space-based MIR nulling interferometer concept, aims to directly measure the MIR spectrum of terrestrial HZ exoplanets (Kammerer & Quanz 2018;Quanz et al. 2021Quanz et al. , 2022)).
One key challenge in exoplanet characterization is the correct interpretation of their spectra.Measured exoplanet spectra are global averages (due to the large exoplanet-observer separation).Hence, local variations in the atmospheric composition, pressure-temperature (P−T) structure, and clouds are unresolved.Further, since signals from terrestrial exoplanets are faint, the temporal and spectral resolution of observations is limited.Such temporally and spatially unresolved observations can lead to degeneracies, making it hard to interpret the observations.Finally, the inference of planetary characteristics from such spectra is model-dependent (e.g., Paradise et al. 2021;Mettler et al. 2023).Without thorough exploration and validation of our characterization methods, it will not be possible to accurately infer the wide range of climate states expected for habitable planets.Cur-rently, in-situ data, which are necessary for the validation of our methods, can only be acquired for solar system objects.While the spectral libraries and the knowledge about the formation, composition, and atmospheric properties of solar system planets and their moons is continuously growing, Earth remains the most extensively studied planet and the sole known globally habitable planet harboring life.Therefore, Earth and its unique characteristics remain the key reference point to study the factors required for habitability and (the origin of) life (e.g., Meadows & Barnes 2018;Robinson & Reinhard 2018).

Disk-Integrated Earth Spectra Characteristics
From space, Earth's appearance is dominated by oceans, deserts, vegetation, ice, and clouds.Earth's surface is dominated by oceans (≈ 70% of surface), and the land-to-ocean ratio differs between the hemispheres (Northern Hemisphere ≈ 2/3, Southern Hemisphere ≈ 1/4; Pidwirny 2006).The contribution of different surface types and climate zones to a disk-integrated Earth spectrum (and its seasonal variability) depends on their thermal properties, their fractional contributions, and positions on the observed hemisphere.
In general, in the thermal emission spectrum of Earth, land-dominated views show not only higher flux readings but also larger flux variations over one full orbit than oceandominated views (e.g., Hearty et al. 2009;Gómez-Leal et al. 2012;Mettler et al. 2023).Specifically, from Table 2 in Mettler et al. (2023), we see that at Earth's peak emission wavelength (≈ 10.2 µm) the disk-integrated Northern Hemisphere pole-on view (NP) and the Africa-centered equatorial view (EqA) show annual flux variations of 33% and 22%, respectively.In contrast, the ocean dominated Southern Hemisphere pole-on view (SP) and the Pacific-centered equatorial view (EqP), show smaller annual variations (≈ 11%) due to the large thermal inertia of oceans.
Another distinctive characteristic of Earth is its patchy cloud cover (see also Appendix A).Earth's patchy cloud coverage is unique among the three terrestrial planets with significant atmospheres in the Solar System (Venus is completely covered in clouds; Mars has negligible cloud coverage).Using nearly a decade of satellite data, King et al. (2013) show that roughly 67% of Earth's surface is covered by clouds at all times.The cloud fraction over land is approximately 55% and shows a distinct seasonal cycle.Over oceans, cloudiness is significantly higher (≈ 72%) and shows smaller seasonal variations.In addition, the cloud fraction is nearly identical during day and night, with only modest diurnal variation.Clouds are particularly abundant in the midlatitudes (latitudes of ≈ ±60 • ), and infrequent at latitudes from ±15 • to ±30 • (often characterized by arid desert conditions).Thus, there are three bands with a high cloud fraction in Earth's atmosphere: a narrowband at the equator and two wider mid-latitude bands.
Atmospheric clouds can significantly impact both the reflected light and thermal emission spectrum of a planet and can reduce or eliminate spectral features (particularly in the UV/O/NIR; e.g., Des Marais et al. 2002;Lu 2023).Parameters such as cloud fraction, composition, particle size, and altitude as well as multi-layered cloud coverage and cloud seasonality all affect the resulting spectrum significantly (e.g., Des Marais et al. 2002;Tinetti et al. 2006a,b;Hearty et al. 2009;Kitzmann et al. 2011;Rugheimer et al. 2013;Vasquez et al. 2013;Komacek et al. 2020).Konrad et al. (2023) ran retrievals on simulated MIR thermal emission spectra of a Venus-twin exoplanet.They showed that the presence of clouds can be inferred and requires a minimal spectral resolution of 50 and a signal-to-noise ratio of 20.Further, clouds inhibit the accurate retrieval of surface conditions, and inadequate cloud treatment in retrievals (i.e., choosing too complex/simple cloud model given the quality of the input spectrum) can bias the estimates for important planetary parameters (e.g., planet radius, equilibrium temperature, and Bond albedo).However, despite recent efforts to understand how patchy clouds could alter the spectra of terrestrial exoplanets (e.g., May et al. 2021;Windsor et al. 2023), it remains unclear how they affect the characterization of terrestrial HZ exoplanets through MIR retrievals.

MIR Observables of Habitable and Inhabited Worlds
Habitability refers to the degree to which a global environment can support life, and depends on a myriad of factors (Meadows & Barnes 2018).The characteristics of a planet and its atmosphere, the architecture of the planetary system, the host star, and the galactic environment all affect habitability (for an extensive list, see, e.g., Meadows & Barnes 2018).For exoplanets, which can only be observed via remote sensing, we require observable characteristics to assess their habitability.
Analyzing MIR thermal emission spectra of exoplanets with atmospheric retrievals (see, e.g., Section 3; Madhusudhan 2018) and/or climate models yields constraints on the planet's atmospheric structure and composition.Such constraints yield valuable insights into a planet's habitability and could be used to infer the presence of a biosphere.In the following, we list observable signatures of habitability and biospheres in ascending order of difficulty to observe: • Planetary energy budget: A planet's effective temperature and Bond albedo can be calculated from its thermal emission spectrum.
• Water and other molecules: Important atmospheric species, such as water (H 2 O), carbon dioxide (CO 2 ), or ozone (O 3 ), have strong spectral MIR features.
• Atmospheric P−T structure: The P−T structure can be constrained in retrievals and provides vital information about the atmospheric state.
• Surface conditions: If not fully obscured by clouds, thermal emission spectra contain information about a planet's surface temperature and pressure.
• Molecular biosignatures: Important biogenic gases, such as methane (CH 4 ) or nitrous oxide (N 2 O), have MIR features.The presence of a biosphere can be inferred if abiotic sources can be ruled out.
• Atmospheric seasonality: Seasonal periodicities in molecular abundances that are attributable to life are small for Earth (Mettler et al. 2023) and thus challenging to detect in the MIR.However, they could be strong indicators for biological activity (Olson et al. 2018).
For an in-depth review about evaluating planetary habitability and detectable signs of life, we refer to Schwieterman et al. (2018) and references therein.

Context of and Goals for this Study
In a previous study (Mettler et al. 2020), we analyzed 15 years of thermal emission Earth observation data for five spatially resolved locations.We investigated flux levels and variations as a function of wavelength range and surface type (i.e., climate zone and surface thermal properties) and looked for periodic signals.From the spatially resolved singlesurface-type measurements, we found that typically strong absorption bands from CO 2 (15 µm) and O 3 (9.65 µm) are significantly less pronounced and partially absent in polar regions.This implies that estimating correct abundance levels for these molecules might not be representative of the bulk abundances in these viewing geometries.Additionally, the time-resolved thermal emission spectrum provided insights into seasons/planetary obliquity, but its significance depended on viewing geometry and spectral band.
In a follow-up study (Mettler et al. 2023), we expanded our analyses from spatially resolved locations to disk-integrated Earth views.We presented an exclusive dataset consisting of 2,690 disk-integrated mid-infrared (MIR) thermal emission spectra (3.75 − 15.4 µm, resolution R ≈ 1200).The spectra were derived from remote sensing observations for four different viewing geometries at a high temporal resolution.Using this dataset, we investigated how Earth's MIR spectral appearance changes as a function of viewing geometry, seasons, and phase angles and quantified the atmospheric seasonality of different bioindicators.We found that a representative, disk-integrated thermal emission spectrum of Earth does not exist.Instead, both the thermal emission spectrum and the strength of biosignature absorption features show seasonal variability and depend strongly on viewing geometry.
In this paper, we treat Earth as a directly imaged exoplanet to assess the detectability of its characteristics from MIR observations with LIFE.For the first time, we perform a systematic retrieval analysis of real disk-and time-averaged Earth spectra.We investigate how the retrieval characterization depends on the viewing geometry and the season.Uniquely, in this study we do not only have access to the real Earth spectra, but also to ground truth data from remote sensing satellites.Hence, for the first time, we can compare retrieval results from real spectra to ground truth values.This allows us to evaluate the accuracy of the retrieved constraints and thereby validate our retrieval approach.Despite providing a unique opportunity to validate retrieval frameworks and their underlying assumptions, comparable retrieval studies on solar system observations are rare (e.g., Tinetti et al. 2006a;Robinson & Salvador 2023;Lustig-Yaeger et al. 2023a).However, such studies are indispensable to obtain a correct characterization of terrestrial exoplanets in the future.
In Section 2, we introduce the disk-integrated MIR thermal emission dataset and the level 3 satellite products used to derive the ground truths.We introduce our atmospheric retrieval routine and the used atmospheric model in Section 3. In Sections 4 and 5, we present and discuss our retrieval results.We contextualize these results by discussing implications for characterizing terrestrial HZ exoplanets in Section 6.Finally, in Section 7, we summarize our findings and draw conclusions for future observations.

DATASETS AND METHODOLOGY
In order to compile our ground truth and spectral radiance datasets, we make use of Earth remote sensing climate data obtained from NASA's Atmospheric Infrared Sounder (AIRS; Chahine et al. 2006) aboard the Aqua satellite.For comparison and validation, we have also analyzed data from the Infrared Atmospheric Sounding Interferometer (IASI; Blumstein et al. 2004) instrument aboard the MetOp satellite.The details of the datasets and the data reduction is discussed in Sections 2.2 and 2.3.Although we briefly cover the methodology behind our calculation of disk-averaged spectra and the dataset, we refer to Mettler et al. (2023) for a more comprehensive description.

Using Earth Observation Data to Study Earth as an Exoplanet
While there are several methods to study Earth from afar, such as Earth-shine measurements or spacecraft flybys (for a recent review see, e.g., Robinson & Reinhard 2018, and references therein), we chose a remote sensing approach.This approach offers the extensive temporal, spatial, and spectral coverage needed to investigate the effect of observing geometries on disk-integrated thermal emission spectra and timevarying signals.However, for Earth-orbiting spacecrafts it is impossible to view the full disk of Earth and the spatially resolved satellite datasets have to be combined into a spatially resolved, global map of Earth, which can then be disk-integrated (e.g., Tinetti et al. 2006a;Hearty et al. 2009;Gómez-Leal et al. 2012).Furthermore, due to the swath geometry of satellites, daily remote sensing data contain gores, which are regions with no data points, between orbit passes near the equator.In the case of Aqua/AIRS these regions are filled within 48 hours as the satellite continues scanning Earth while orbiting it.
For our analysis we defined four specific Earth observing geometries as shown in Figure 1: North (NP) and South Pole (SP), as well as Africa-(EqA) and Pacific-centered (EqP) equatorial views.For each viewing geometry, we mapped, calibrated, and geolocated radiances onto the globe and calculated the disk-integrated MIR thermal emission spectra.The spectra cover the 3.75 − 15.4 µm wavelength range (with a gap between 4.6 − 6.2 µm) at a nominal resolution of R ≈ 1200 and comprise 2378 spectral channels.The radiances originate from an AIRS Infrared (IR) level 1C product (V6.7)called AIRICRAD1 and are given in physical units of Wm −2 µm −1 sr −1 (Manning et al. 2019).The total dataset contains 2690 disk-integrated thermal emission spectra for four consecutive years (2016-2019) at a high temporal resolution for the four full-disk observing geometries (for an overview, see Table 1 in Mettler et al. 2023).
The viewing geometries as portrayed in Figure 1 evolve throughout the year for a distant observer due to Earth's nonzero obliquity.Whereas the equatorial view blends seasons and has a diurnal cycle, the polar views show one season but blend day and night.Over the expected integration time of future direct imaging missions, the spectral appearance and characteristics of a planet change as it rotates around its spin axis and as spatial differences from clear and cloudy regions, contributions from different surface types as well as from different hemispheres vary with time.In accordance with the preliminary minimum LIFE requirements motivated in Konrad et al. (2022), we adopt a typical integration time of 30 days, which is significantly longer than Earth's rotation period.Hence, we average over the EqA and EqP views and denote the resulting dataset EqC.
To capture the largest variability between observations, our analyses focus on observing Earth at its extremes in January and July.This choice is motivated by the measured relative flux change for these months at Earth's peaking wavelength in the disk-integrated thermal emission signal (Mettler et al. 2023).Although a pacific-dominated view shows comparable variability to the South Pole view, Earth's rotation causes Africa and the Pacific to rotate in and out of the field of view.1.

Compiling and Processing the MIR Spectra
The disk-integrated thermal emission spectra for this study are derived from our previously published dataset.Since Earth's MIR spectrum exhibited negligible differences between consecutive years for a fixed viewing geometry (e.g., Mettler et al. 2020;Mettler et al. 2023), we randomly chose the year 2017 and used the data of that year in order to calculate the monthly averages for January and July for the three viewing geometries: NP, SP and EqC.Blending day and night data to simulate the phase of Earth at its orbital position was unnecessary for polar views due to Earth's obliquity, so they naturally include data of both types.However, in the case of the EqC view, we blended day and night data to simulate a rotating Earth at quadrature.This orbital position is preferred for the direct imaging of exoplanets due to the large apparent angular separation between the exoplanet and its host star.
AIRS spectra exhibit a gap between 4.6 − 6.2 µm due to dead instrument channels.This gap lies in a H 2 O absorption feature centered at 6.2 µm (e.g., Catling et al. 2018).Due to concerns that the partially missing H 2 O feature might de- teriorate our retrieval results, we sourced level 1C data2 for the year 2017 from the IASI instrument aboard the MetOP satellite.We applied the same data reduction steps as for the AIRS dataset described in Section 2.1 and Section 2 of Mettler et al. (2023).Covering the 3.62 − 15.50 µm wavelength regime with 8461 channels, IASI delivers a continuous spectrum comparable to that of AIRS, which makes it a suitable alternative instrument (see Figure 2).However, test retrievals showed no significant discrepancies between the retrieval results obtained for the gapped AIRS and continuous IASI spectra.The lack of discrepancies can be attributed to LIFE's noise level at these lower MIR wavelengths (e.g., Figure 4).Thus, since no significant differences were observed and the fact that our ground truth data introduced in Section 2.3 is based on Aqua/AIRS level 3 monthly stan-dard physical retrievals, we opted to use AIRS spectra for this study for consistency.

Compiling and Processing the Ground Truths
In Section 4, we compare the retrieval outputs to a level 3 (L3) satellite product comprising the P−T profile and the trace-gas abundances.Specifically, we have used the Aqua/AIRS L3 Monthly Standard Physical Retrieval (AIRSonly) 1 degree x 1 degree V7.0 (AIRS3STM) product (AIRS Project 2020), from which we extracted the surface temperature (land and sea surface) as well as the P−T profile.From the trace-gas parameters we extracted the total integrated column burdens and vertical profiles (mass mixing ratios) of H 2 O, CO, CH 4 , and O 3 .Both, the P−T profile and trace-gas abundances are reported on 24 standard pressure levels ranging from 1000 to 1.0 hPa, which are roughly matched to the instrument's vertical resolution (Tian et al. 2020).The H 2 O profile is an exception, as it is only provided at twelve layers ranging from 1000 to 100 hPa, spanning from the surface to the tropopause.
Since the AIRS3STM product did not contain any CO 2 abundances, we sourced the corresponding ground truth from a gridded monthly CO 2 assimilated dataset 3 based on observations from the Orbiting Carbon Observatory 2 (OCO-2).The OCO-2 mission provides the highest quality spacebased XCO2 retrievals to date, where the level 3 data are produced by ingesting OCO-2 L2 retrievals every 6 hours with GEOS CoDAS, a modeling and data assimilation system maintained by NASA's Global Modeling and Assimilation Office (GMAO; NASA/GSFC/GMAO Carbon Group 2021).The data assimilation (or 'state estimation') technique is employed in order to estimate missing values based on the scientific understanding of Earth's carbon cycle and atmospheric transport.The missing values are mainly the result of the instrument's narrow 10 km ground track and limited ability to penetrate through clouds and dense aerosols.
Following the data reduction of the radiances in Section 2.1, the P−T profile and trace-gas abundances were mapped onto the globe for the different viewing geometries and then disk-integrated at each pressure level.For consistency, we also applied the empirical limb/weighting function to the ground truths.The uncertainties of the retrieved parameters from the AIRS L3 standard product were error propagated, and the resulting error bars are displayed for each data point.The results obtained for July and January are shown in Figure 3 and Appendix B, respectively.

ATMOSPHERIC RETRIEVALS
3 OCO-2 GEOS Level 3 monthly, 0.5x0.625 assimilated CO 2 V10r (OCO2_ GEOS_L3CO2_MONTH) at GES DISC (NASA/GSFC/GMAO Carbon Group 2021) First, we introduce the disk-integrated Earth spectra and the LIFEsim noise model used as input for our retrievals (Section 3.1).In Section 3.2, we briefly describe our Bayesian atmospheric retrieval routine.Then, in Section 3.3, we focus on the 1D plane-parallel atmosphere model used as retrieval forward model.Last, we motivate our choice of prior distributions (Section 3.4).

Input Spectra for the Retrievals
As input for our atmospheric retrievals, we use reducedresolution versions of the disk-integrated ARIS spectra from Section 2.1 (NP, SP, and EqC viewing geometries for January and July).All spectra cover the 3.8 − 15.3 µm wavelength range, with a gap between 4.6 µm and 6.2 µm.
Based on the preliminary minimal LIFE requirements presented in Konrad et al. (2022Konrad et al. ( , 2023) ) and Alei et al. (2022) (R = 50, S /N = 10), we consider two resolution cases (R = 50, 100) and two signal-to-noise ratios (S /N = 10, 20) for each of the six disk-integrated spectra.We define R as λ/∆λ, with the width of a wavelength bin ∆λ and the wavelength at the bin center λ.Further, the S /N value corresponds to the S /N in the 11.2 µm wavelength bin.We choose the 11.2 µm bin because it does not coincide with any strong spectral features.In Figure 4, we show the six R = 50 input spectra together with the two different noise levels.
We model the wavelength-dependent S /N expected for LIFE with LIFEsim (Dannert et al. 2022), which accounts for astrophysical noise sources (photon noise of planet emission, stellar leakage, and local-as well as exozodiacal dust emission)4 .To estimate the LIFEsim noise, we put Earth on a 1 AU orbit around a G2V star located 10 pc from the observer.The exozodiacal dust emission of the system was assumed to reach three times the local zodiacal level5 .
In our retrievals, we interpret the noise as uncertainty to the points of the disk-integrated spectra.Thus, the spectral points correspond to the true flux values and are not randomized according to the LIFEsim S /N.While randomized spectra would provide a more accurate simulated observation, a retrieval study based on a single noise realization will yield biased parameter estimates.Ideally, we would run retrievals for multiple (≳ 10) noise realizations of each spectrum.However, the number of retrievals required make such a study computationally unfeasible.Yet, Konrad et al. (2022) motivate that results from retrievals on unrandomized spectra provide reliable estimates for the average expected retrieval performance on randomized spectra.

Bayesian Retrieval Routine
For this study, we utilized the Bayesian retrieval routine introduced in Konrad et al. (2022).The initial routine was improved and modified in Alei et al. (2022) and Konrad et al. (2023).We provide a brief summary of the routine here, and refer to the original publications for an in depth description.
Our retrieval framework uses the radiative transfer code petitRADTRANS (Mollière et al. 2019(Mollière et al. , 2020;;Alei et al. 2022) to calculate the theoretical emission spectrum of a 1D plane-parallel atmosphere model.petitRADTRANS assumes a black-body spectrum at the surface and models the interaction of each atmospheric layer with the radiation to calculate the spectrum at the top of the atmosphere.The model atmosphere is defined via a set of forward model parameters (see Section 3.3 for our forward model).In a retrieval, we search the space spanned by the prior probability distributions (or "priors") of the forward model parameters for the parameter combination that best reproduces the input spectrum.To efficiently search the prior volume, we use the pyMultiNest (Buchner et al. 2014) package, which uses the MultiNest (Feroz et al. 2009) implementation of the Nested Sampling algorithm (Skilling 2006).Here, we ran all retrievals using 700 live points and a sampling efficiency of 0.36 .
The retrieval yields the posterior probability distribution (or "posterior") for the model parameters.The posterior estimates how likely a certain combination of model parameter values is given the observed spectrum.Further, our routine estimates the Bayesian evidence Z, which is a measure for how well the used forward model fits the input spectrum and can be used for model comparison (see Appendix C).

Atmospheric Model in the Retrievals
As in Konrad et al. (2022Konrad et al. ( , 2023)), and Alei et al. (2022), we characterize each layer of the model atmosphere by its temperature, pressure, and the opacity sources present.We provide a list of all model parameters in Table 2.A comparison between different forward models to justify our choice is provided in Appendix C.
In our forward model we parameterized the atmospheric P−T profile using a fourth order polynomial: Here, P is the pressure, T the corresponding temperature, and the a i are the parameters of the P−T model.As shown in Konrad et al. (2022), a polynomial P−T model allows us to minimize the number of P−T parameters and thereby minimize the retrieval's computational complexity.Learning based P−T models require fewer parameters, but their accuracy for terrestrial planets is currently limited by the availability of sufficient training data (e.g., Gebhard et al. 2023).
We consider various opacities in our forward model.First, we account for the MIR absorption and emission by CO 2 , H 2 O, O 3 , and CH 4 (see Table 3 for line lists, broadening coefficients, and cutoffs).We assume constant vertical abundance profiles for all molecules and discuss potential effects of this simplification in Section 5. Second, we model collisioninduced absorption (CIA) and Rayleigh scattering features (CIA-pairs and Rayleigh-species are listed in Table 3).
We neglect scattering and absorption by clouds.The patchy clouds in Earth's atmosphere partially block contributions from high-pressure atmosphere layers and thereby impede the characterization thereof.Konrad et al. (2023) show that neglecting clouds in retrievals can lead to systematic errors in the retrieved surface temperature, surface pressure, and the planet radius.We provide a detailed discussion on potential effects of this simplification in Section 5. Note-The third column lists the priors assumed in the retrievals.We denote a boxcar prior with lower threshold x and upper threshold y as U(x, y); For a Gaussian prior with mean µ and standard deviation σ, we write G(µ, σ).

Prior Distributions
We list the priors assumed for all retrievals in Table 2.The priors on the P−T parameters a i and the surface pressure P 0 cover a wide range of atmospheric structures (from tenuous Mars-like to thick Venus-like atmospheres).For N 2 , O 2 , CO 2 , H 2 O, O 3 , and CH 4 , we select broad uniform priors that extend significantly below the minimal detectable abundances estimated in Konrad et al. (2022) (≈ 10 −7 in mass fraction for our R and S /N cases).
As in Konrad et al. (2022Konrad et al. ( , 2023) ) and Alei et al. (2022), we use Gaussian priors for the planet radius R pl and mass M pl .The R pl prior is based on Dannert et al. (2022), who suggest that a planet detection with LIFE yields a constraint on R pl 7 .The statistical mass-radius relation Forecaster8 (Chen & Kipping 2016), is then used to infer the prior on log 10 (M pl ) from the R pl prior.

RETRIEVAL RESULTS
Here, we present the retrieval results obtained with the forward model from Section 3.3.In Figure 5, we summarize  the results from the retrieval on the R = 100, S /N = 20 EqC Jul spectrum, which are representative of all retrieval results.We show the retrieved P−T structure, the posteriors of the atmospheric trace gases and radius R pl , and estimates for the equilibrium temperature T eq and the Bond albedo A B (derived from the posteriors using the method outlined in Appendix D).The N 2 and O 2 posteriors are not shown since we did not constrain either abundance.We further plot the ground truths for all parameters.The true atmospheric abundances of H 2 O, O 3 , and CH 4 depend on the atmospheric pressure (see Figure 3).To indicate the range of these ground truth profiles, we plot the ground truths at four different pressures (1 bar, 10 −1 bar, 10 −2 bar, 10 −3 bar).
We provide the P−T profile results from all other retrievals in Appendix E. The posteriors for all retrievals (excluding the P−T parameters a i ) along with T eq and A B estimates are shown in Figure 6.We list the corresponding numeric values in Appendix E.
From the results shown in Figure 5, we would rightly conclude that we are observing a potentially habitable planet.We find temperate surface conditions that would allow for liquid water to exist and easily detect the highly relevant atmospheric gases CO 2 , H 2 O, and O 3 .Importantly, we also detect the potential biosignature CH 4 .These findings hold for all considered viewing geometries, seasons, R, and S /N.In the following, we address systematic differences between our retrieval results and the ground truths.
From the retrieved P−T profiles in Figure 5 and Appendix E, we see that our retrieved estimates for the surface conditions and the overall atmospheric P−T structure are inaccurate.While T 0 is well retrieved (roughly centered on the ground truth, uncertainty ≤ ±10 K), P 0 is underestimated by up to an order of magnitude (uncertainty ≤ ±0.5 dex).This observation does not only hold for the surface conditions but for the entire P−T profile.While the shape of the temperature structure is accurately retrieved, it is shifted relative to the ground truth to lower pressures.This effect is observable for all spectra, and becomes smaller for the higher R and S /N cases.Further, constraints on the P−T structure in the upper atmosphere (≲ 10 −3 bar) are weaker, which we expect due to negligible signatures from these layers in MIR emission spectra.The obtained constraints are due to extrapolation of the polynomial P−T model and thus not physical.
Considering the parameter posteriors in Figure 6, we observe that most parameters are well retrieved (i.e. at least one of the disk-integrated ground truths lies within the 16%−84% percentile of the posterior).Further, as expected, the constraints on the posteriors get stronger as we consider higher R and S /N spectra, since these spectra contain more information and thus yield stronger constraints.However, several parameter posteriors are biased relative to the ground truths.
First, R pl is underestimated for all considered R and S /N cases.This bias is strongest for the S /N = 20 results.The retrieved R pl biases are directly linked to the too low T eq and A B estimates, since both parameters are derived from the R pl posterior (see Appendix D).
Second, the aforementioned systematic underestimation of P 0 (and the P−T structure) is accompanied by a systematic overestimation of the trace-gas abundances.This is most apparent for CO 2 and CH 4 , since their ground truths do not vary strongly throughout the atmosphere.The shifts in the retrievedP 0 and P−T structure translate to overestimated CO 2 and CH 4 abundances.This correlation is caused by a well-known degeneracy between the trace-gas abundances and the pressure-induced line-broadening by the bulk atmosphere (see, e.g., Misra et al. 2014;Schwieterman et al. 2015).This degeneracy also affects the H 2 O and O 3 posteriors.However, due to the strong dependence of the ground truth on the atmospheric pressure, biases are not directly visible (posteriors lie within the ground-truth range).Yet, lower retrieved P 0 lead to higher H 2 O and O 3 estimates, implying a degeneracy.
In Appendix F, we provide a detailed analysis of the biases discussed above.We show that if correct estimates of P 0 or R pl are available, the retrieved biases on the remaining parameters can be largely eliminated.

Reducing Abundance Biases by Considering Ratios
As discussed above, our estimates for the trace-gas abundances are strongly affected by a degeneracy with P 0 and the P−T structure.Further, we expect the trace-gas posteriors to be impacted by a physical degeneracy with the planet's surface gravity g pl and thus M pl (see, e.g., Mollière et al. 2015;Feng et al. 2018;Madhusudhan 2018;Konrad et al. 2022;Alei et al. 2022;Konrad et al. 2023) 9 .If the retrieved abundance posteriors of two different trace-gases are affected by these degeneracies in the same way, the biases in our retrieval results can be largely eliminated by considering their pointwise ratio (i.e., divide one posterior by another).Despite not providing information on the absolute trace-gas abundances, such ratios are of interest since they can help identify states of atmospheric chemical disequilibrium, which can indicate biological activity (see Section 6.2 for an extended discussion; Lovelock 1965Lovelock , 1975)).
We present the relative abundance posteriors for all tracegas combinations in Figure 7 (numerical values in Tables E1  to E3).The uncertainties on the relative trace-gas abundances are significantly smaller than on the absolute abundances due to the elimination of the aforementioned M pl degeneracy.Further, in contrast to the absolute abundance posteriors (Figure 6), all ratios lie within the range of the relative ground truths, indicating that the biases invoked by the P 0 degeneracy are mostly eliminated.

DISCUSSION OF RETRIEVAL RESULTS
The present study is, to our knowledge, the first to systematically run retrievals on real disk-and time-averaged MIR Earth spectra.By comparing our retrieval results to known ground truths, we can draw robust conclusions for the characterization performance of LIFE for Earth-like exoplanets.Further, by comparing our findings with other studies, we can find potential causes for the biases discussed in Section 4.

Comparing the LIFE Performance to Previous Studies
Previous studies have evaluated how well LIFE could characterize terrestrial HZ exoplanets.Konrad et al. (2022) find preliminary estimates for LIFE's minimal R and S /N requirements by running retrievals on simulated Earth spectra.Alei et al. (2022) run retrievals on simulated spectra from Rugheimer & Kaltenegger (2018), which represent different stages in Earth's temporal evolution.Konrad et al. (2023)   vestigate how well LIFE can characterize the atmosphere and clouds of Venus.All studies analyze spectra that were calculated with simplified, temporally constant, 1D atmosphere models.In contrast, we run retrievals on real disk-and timeaveraged MIR Earth spectra.Despite large differences in the complexity of the considered spectra between our study and the previous ones, we confirm the previous findings for the detectability of different trace gases.Crucially, CH 4 , the main driver for the minimum LIFE requirements from Konrad et al. (2022) (R = 50, S /N = 10), remains detectable here.Further, the strength of the parameter constraints we retrieve here are equivalent to the prior studies, which demonstrates their robustness.The retrieved 1 σ parameter uncertainties in all studies are < ±0.5 dex for pressures, < ±0.1R ⊕ for radii, < ±20 K for temperatures, and < ±1.0 dex for trace-gas abundances.

Main Source for Radius Bias
In Section 4, we state that our R pl estimates underestimate Earth's true radius.This leads to biased estimates for T eq and A B , which are calculated from R pl (see Appendix D).
We mainly attribute underestimation of R pl to neglecting Earth's patchy cloud coverage in our forward model (see Section 3.3).Clouds reduce the total MIR emission at the top of the atmosphere by partially absorbing the thermal emission from the warm, high-pressure atmosphere layers below them.By using a first-order approximation (see Appendix G for details), we can demonstrate that the magnitude of the bias on our R pl estimate can be fully attributed to the missing cloud treatment in our forward model.
Further evidence for links between clouds and biased R pl estimates is provided by other thermal emission retrieval studies.Konrad et al. (2022), who run retrievals on simulated cloud-free Earth spectra, retrieve bias-free R pl estimates.In contrast, Alei et al. (2022) run cloud-free retrievals on simulated cloudy Earth spectra and also underestimate R pl .

Main Source for Pressure Bias
As stated in Section 4, the retrieved P 0 and P−T structure are offset to lower pressures relative to the ground truth.These biases are linked to offsets in the retrieved trace-gas abundances, which are degenerate with the pressure-induced line-broadening (see, e.g., Misra et al. 2014;Schwieterman et al. 2015).
We attribute these biases to our assumption of vertically constant trace-gas abundances in our forward model (see Section 3.3).This claim is motivated by comparison with previous LIFE retrieval studies.Konrad et al. (2022) assume constant abundance profiles both to generate their 1D Earth spectra and in their forward model, and retrieve unbiased P 0 , P−T, and abundance estimates.In contrast, Alei et al. (2022) assume constant abundance profiles to run retrievals on 1D Earth spectra from Rugheimer & Kaltenegger (2018), which were generated using non-constant abundance profiles.Their results show offsets in P 0 , the P−T structure, and the tracegas abundances, which are comparable in magnitude to our offsets.
Further, we argue that H 2 O is the cause of the observed biases.First, in contrast to CO 2 , O 3 , and CH 4 , H 2 O has multiple strong absorption features in Earth's MIR spectrum (see, e.g., Figure 3 in Konrad et al. 2022).Second, the ground truths in Figure 3 show that the variances for H 2 O are more than two orders of magnitude greater than for the other species.Third, the main H 2 O variance occurs in the lowermost atmosphere layers, where H 2 O condensation occurs.These layers contribute most strongly to Earth's MIR thermal emission.

Implications for Retrievals on Exoplanet Spectra
In the present study, ground truth measurements of Earth's atmosphere have allowed us to validate our results.We found important biases in the posteriors, which we attribute to simplifying assumptions made by our forward model.A proposed remedy is to derive quantities that are less affected, such as abundance ratios (see Section 4.1).Also, in a future study, we aim to reduce biases by adding a parametrization for patchy clouds and a vertically non-constant H 2 O profile (motivated by H 2 O condensation) to our forward model.
Independent of the success of this future effort, intercomparison efforts (e.g., Barstow et al. 2020) have shown that retrieval results also strongly depend on framework specificities (e.g., parameter estimation algorithms, radiative transfer implementations, and line-lists).To ensure the correct characterization of exoplanets, robust and bias-free retrieval frameworks are required.Thus, community efforts, such as the CUISINES Working Group10 , that benchmark, compare, and validate different frameworks on real and simulated spectra with known ground truths are indispensable.

IMPLICATIONS FOR CHARACTERIZING
TERRESTRIAL HZ EXOPLANETS

Effects of Viewing Geometries and Seasons
As described in Section 1.1, Earth exhibits an uneven distribution of land and ocean regions.Further, different surface types have different spectral and thermal characteristics (e.g., Hearty et al. 2009;Gómez-Leal et al. 2012;Madden & Kaltenegger 2020).Also, the distribution of life on Earth is non-uniform with a measurable gradient in the abundance and diversity of life, both spatially (e.g., from deserts to rain forests) and temporally (e.g., from seasonal to geological timescales) (Méndez et al. 2021).In Mettler et al. (2023), we find that a representative, disk-integrated thermal emission spectrum of Earth does not exist.Instead, the MIR spectrum and the strength of the absorption features show seasonal variations and depend on the viewing geometry.For future observations of HZ terrestrial exoplanets, the viewing geometry will be unknown.Thus, we must understand how the viewing geometry impacts exoplanet characterization, observable habitability markers, and signatures of life.
As we see from Figure 6, most parameter posteriors show no significant dependence on either the viewing geometry or the season (exceptions: T 0 , T eq , and A B ).For the R and S /N levels studied here, both the retrieved R pl and the tracegas abundance estimates show no measurable variations with the exoplanet's orientation relative to the observer.Thus, we conclude that their characterization depends on neither the viewing geometry nor the season for an Earth-like exoplanet.
For T 0 and T eq , the variations in the posteriors are largest for the NP view, where the differences between January and July are robustly detected in all R and S /N scenarios.For the SP and EqC views, variations in T 0 and T eq between January and July are much smaller and not confidently detected.This is in agreement with Mettler et al. (2023), who find the seasonal disk-integrated thermal emission flux differences for the landmass dominated NP view to be 33%, as opposed to only 11% for the ocean dominated SP and EqP (Pacific-centered equatorial) views.The increased T 0 variance observed for the NP view can be attributed to the large landmass fraction.Also, our results for T 0 and T eq indicate that the NP Jul, SP, and EqC views cannot be differentiated from one another despite vastly different characteristics like climate zones and landmass fractions.This highlights the strong spectral degeneracy with respect to seasons and viewing geometries, and agrees with other studies (e.g., Gómez-Leal et al. 2012;Mettler et al. 2023).
For A B , our retrieval results show small differences between the viewing angles and seasons.As for the temperatures, the variations are largest for the NP view (NP: 46%; SP: 17%; EqC: 16%).For the NP and SP views, the retrieved A B tends to be higher during winter, which agrees with the lower retrieved T 0 and T eq values.However, due to the uncertainties (±0.05 to ±0.10) and biases on the posteriors, a confident detection of these A B differences is not possible.Thus, our A B characterization is independent of viewing geometry and season.
However, as we discuss in Section 4, the accuracy and strength of our constraints for T 0 , T eq , and A B are limited by our R pl estimates.As we demonstrate in Appendix F, an accurate and strong R pl constraint would yield detectable differences in T 0 , T eq , and A B .In this case, Earth-like seasonal T 0 , T eq , and A B changes are easily detectable for the NP view with a LIFE-like observatory for all R and S /N cases.Also for the SP view, detections of seasonal variations are possible (except for the R = 50, S /N = 10 case).For the EqC view, which blends the two hemispheres, the seasonal variations remain undetected.

Detectability of Bioindicators
Earth's MIR spectrum contains features from numerous bioindicator gases.Examples are O 3 (photochemical product of bioindicator O 2 ), CH 4 , and N 2 O (see, e.g., Schwieterman et al. 2018, for an extensive list).While N 2 O is not detectable at the R and S /N considered, O 3 and CH 4 are (biases ≤ +1.0 dex, uncertainties ≤ ±1.0 dex; see Appendix C).However, the sole detection of a bioindicator gases is not sufficient to infer the presence of life, since they can be produced abiotically (see, e.g., Catling et al. 2018;Schwieterman et al. 2018;Harman & Domagal-Goldman 2018).The simultaneous detection of multiple bioindicator gases provides a more robust marker for biological activity.
Another promising multiple bioindicator is the simultaneous detection of reducing and oxidizing species in an atmosphere (i.e., a strong chemical disequilibrium).Since the two species will react rapidly with each other, simultaneous presence over large timescales is only possible if both are continually replenished at a high rate by life (Lederberg 1965).One example hereof that we confidently detect is the simultaneous presence of O 2 (or its photochemical product O 3 )11 and CH 4 (Lovelock 1965;Lippincott et al. 1967).For all but the R = 50, S /N = 10 retrievals, we accurately constrain the log 10 (CH 4 /O 3 ) abundance ratio to 1.1 dex (uncertainty ≤ ±0.5 dex).Especially, in the context of an Earth-like planet orbiting a Sun-like star, the detection of such an O 2 /O 3 -CH 4 disequilibrium would represent a strong potential biosignature.

Detectability of Seasonal Variations in Bioindicators
Research on the detectability of exoplanet biosignatures has predominantly focused on static evidence for life (e.g., the coexistence of O 2 and CH 4 ).However, the anticipated range of terrestrial planet atmospheres and the potential for both "false positives" and "false negatives" in conventional biosignatures (e.g., Selsis 2002;Meadows 2006;Reinhard et al. 2017;Catling et al. 2018;Krissansen-Totton et al. 2022) underscore the necessity to explore additional life detection strategies.Time-varying signals, such as seasonal variations in atmospheric composition, have been proposed to be strong biosignatures (e.g., Olson et al. 2018), since they are biologically modulated phenomena that arise naturally on Earth and likely also occur on other non-zero obliquity and eccentricity planets.Olson et al. (2018) suggest, that atmospheric seasonality as a biosignature avoids many assumptions about specificities of metabolisms.Further, it offers a direct means to quantify biological fluxes, which would allow us to characterize, rather than simply identify, exoplanet biospheres.
To assess the detectability of such time-dependent atmospheric modulations in exoplanets, we consider the retrieved abundance ratios in Figure 7. Abundance ratios are less affected by parameter degeneracies and thus exhibit smaller uncertainties and biases (≤ ±0.4 dex for the R = 50, S /N = 20 and R = 100 cases).Independent of the viewing geometry, we see no significant differences between the trace-gas ratios retrieved for January and July.Since these months rep-resent Earth's two extreme states, we do not expect differences in the trace-gas ratios to be observable for any two other months.Consequentially, detecting the atmospheric seasonality of trace-gas abundances as a biosignature is not feasible for the studied R and S /N cases.
This agrees with our findings in Mettler et al. (2023), where we studied disk-integrated Earth spectra and quantified the amplitudes of the seasonal variations in absorption strength by measuring the equivalent widths of the biosignature related absorption features.We detected small seasonal variations for O 3 , CO 2 , CH 4 , and N 2 O.For CO 2 and CH 4 the seasonal abundance variations of 1% to 3% are significantly smaller than the uncertainties on our retrieved abundance estimates (≈ ±0.5 dex), which makes a detection unfeasible.
Significantly higher R or S /N MIR spectra are required to be sensitive to the spectral variations evoked by Earth-like seasonal fluctuations in bioindicator gas abundances.Such observations require either a more sensitive instrument or an integration time greater than the assumed 30 days (see Section 2.1).However, while the magnitude of such spectral variations is unchanged for shorter integration times12 (e.g., 10 days), it will decrease for extended integration times (e.g., 90 days).For Earth, significant seasonal changes occur during such extended observations.However, the measured spectrum represents the average state of the observed atmosphere.Thus, the magnitude of the spectral variations evoked by seasonal fluctuations is diminished, which counteracts the sensitivity gain attained via an increase in observation time.
However, terrestrial exoplanets could display seasonality patterns that are very different from that of Earth or other Solar System planets.Given the extensive diversity among exoplanets (e.g., in terms of mass, size, host star type, and orbit), it is likely that some exhibit detectable seasonal variations.Seasonal signals could be amplified by several factors (see, e.g., Section 4.3 in Mettler et al. 2023) such as: shorter photochemical lifetimes and/or non-saturated spectral bands, increased orbital obliquity (leads to greater seasonal contrast due to varying ice and vegetation cover), biological activity promoted by moderately high obliquity (e.g.photosynthetic activity) consequently leading to heightened variations in biosignature gases, and the absence of competing effects from admixed hemispheres (particularly relevant for eccentric planets).The detectability of seasonality depends on both the magnitude of the biogenic signal and the degree to which the observation conditions mute that signal, and is likely maximized for an intermediate obliquity.

SUMMARY AND CONCLUSION
In this study, we treated Earth as an exoplanet to examine how well it can be characterized from its MIR thermal emission spectrum.This is the first study that systematically ran atmospheric retrievals on simulated LIFE observations of real disk-integrated MIR Earth spectra for different viewing angles and seasons.By comparing the results to ground truths, we assessed the accuracy and robustness of the retrieved constraints and explored the applicability of simple 1D atmosphere models for characterizing the atmosphere of a real habitable planet with a global biosphere.Further, we investigated whether the viewing geometry and season have a measurable impact on the characterization of an Earth-like exoplanet and searched for signs of atmospheric seasonality, indicative of a biosphere.
Our results at the minimal LIFE requirements (R = 50, S /N = 10) find Earth to be a temperate habitable planet with detectable levels of CO 2 , H 2 O, O 3 , CH 4 .We find that viewing geometry and the observed season do not affect the detectability of molecules, the retrieved relative abundances, and thus the characterization of Earth's atmospheric composition.However, the seasonal flux difference of 33% for the North Pole view causes variations in the retrieved surface temperature T 0 , equilibrium temperature T eq , and Bond albedo A B , which are detectable with LIFE for all tested R and S /N cases.If strong and unbiased estimates for the planet radius R pl are available, temporal variations in T 0 , T eq , and A B are also observable for the South Pole and mixed equatorial views (for R = 50, S /N = 20 and R = 100 retrievals).Finally, we find that Earth-like seasonal variations in biosignature gas abundances are not detectable with LIFE for all R and S /N cases considered.
In Summary, from the six MIR observables of habitable and inhabited worlds listed in Section 1.2, we are able to constrain four (planetary energy budget, the presence of water and other molecules, the P−T structure, and the molecular biosignatures).Regarding the surface conditions, we are able to accurately constrain T 0 despite Earth's patchy cloud nature.In contrast, all retrieved P 0 estimates are biased.In order to obtain a set of possible planetary surface condition solutions, climate models are required, which is beyond the scope of this work.Finally, we do not manage to detect atmospheric seasonality in biosignature gases, which is the last listed observable of habitable and inhabited worlds.
Further, by comparing our retrieval results for diskintegrated Earth spectra to the ground truths, we learn that biased parameter estimates will likely be obtained from retrievals on real exoplanet spectra.Importantly, we find that the commonly used simplifying assumptions of cloud-free at-mospheres and vertically constant abundance profiles do bias retrieval results.Due to such biases, care needs to be taken when drawing conclusions from retrieval results.Derived quantities, such as abundance ratios, can be less affected by biases while retaining valuable information about the atmospheric state.However, community-wide efforts are required to develop robust and reliable frameworks for exoplanet characterization.
Nevertheless, from investigating Earth from afar, we learn that LIFE would correctly identify Earth as a planet where life could thrive, with detectable levels of bioindicators, a temperate climate, and surface conditions that allow for liquid surface water.The journey to characterize Earthlike planets and detect potentially habitable worlds has only started.Our work demonstrates that next generation, optimized space missions can assess whether nearby temperate terrestrial exoplanets are habitable or even inhabited.This provides a promising step forward in our quest to understand distant worlds.
We thank an anonymous referee for the valuable comments.This work has been carried out within the framework of the National Center of Competence in Research PlanetS supported by the Swiss National Science Foundation under grants 51NF40_182901 and 51NF40_205606.J.N.M, S.P.Q., and R.H. acknowledge the financial support of the SNSF.B.S.K. acknowledges the support of an ETH Zurich Doc.Mobility Fellowship.Author contributions.J.N.M and B.S.K contributed equally to this work.Both carried out analyses, created figures, and wrote essential parts of the manuscript.S.P.Q.initiated the project.S.P.Q. and R.H. guided the project.All authors discussed the results and commented on the manuscript.In order to compile Figure A1, we have sourced daily level 3 satellite data for the year 2017 from the CERES-Flight Model 3 (FM3) and FM4 instruments on the Aqua platform.Specifically, we have used the CERES Time-Interpolated TOA Fluxes, Clouds and Aerosols Daily Aqua Edition4A (CER_SSF1deg-Day_Aqua-MODIS_Edition4A) data product (NASA/LARC/SD/ASDC 2015).The provided cloud properties are averaged for both day and night (24-hour) and day-only time periods.Furthermore, they are stratified into 4 atmospheric layers (surface-700 hPa, 700 hPa -500 hPa, 500 hPa -300 hPa, 300 hPa -100 hPa) and a total of all layers.For our analysis we have used the latter, mapped the total cloud fractions onto the globe and calculated the disk-averaged value for each viewing geometry per day.Total Cloud Fraction -Year: 2017 We performed a Bayesian model comparison to justify our choice of atmospheric forward model used for the retrieval analysis in this work (see Section 3.3 and Table 2).In our analysis, we ran atmospheric retrievals (using the routine introduced in Section 3) assuming the following six atmospheric forward models M i of increasing complexity (see Table C1 for the full parameter configuration of each model and the assumed priors): M 1 : (11 parameters) − In addition to the five polynomial P−T parameters a i (see Eq. 1 in Section 3.3), we retrieve for the planet's radius R pl , mass M pl , and surface pressure P 0 .The model atmosphere only contains N 2 , O 2 , and CO 2 .
M 2 : (12 parameters) − In addition to the M 1 parameters, we add H 2 O to the species present in the model atmosphere.
M 3 : (13 parameters) − In addition to the M 2 parameters, we add O 3 to the species present in the model atmosphere.
M 4 : (14 parameters) − In addition to the M 3 parameters, we add CH 4 to the species present in the model atmosphere.
M 5 : (15 parameters) − In addition to the M 4 parameters, we add CO to the species present in the model atmosphere.
M 6 : (15 parameters) − In addition to the M 4 parameters, we add N 2 O to the species present in the model atmosphere.
Let us consider two retrievals assuming different atmospheric forward models A and B on the same disk-integrated Earth spectrum.Both results are characterized by their respective log-evidences ln (Z A ) and ln (Z B ).The Bayes' factor K can be calculated from the evidences as follows: log 10 (K) = ln (Z A ) − ln (Z B ) ln ( 10) . (C1) The Bayes' factor K provides a metric that quantifies which out of the two models A and B performs better for a given spectrum.The Jeffreys scale (Jeffreys 1998, Table C2) provides a possible interpretation for the value of the Bayes factor K. A log 10 (K) value above zero marks a preference for model A, whereas values below zero indicate preference for B. We observe that M 3 is generally preferred over M 2 and M 1 for all considered spectra.Thus, H 2 O and O 3 are confidently detectable with LIFE.Further, M 4 is preferred over M 3 for all but the R = 50, S /N = 10 cases, suggesting that also CH 4 is detectable.In contrast, the log 10 (K) value of roughly 0 indicates that models M 5 and M 6 perform similarly well as model M 4 .However, since M 5 and M 6 each require one additional parameter (abundance of CO or N 2 O, respectively), we prefer model M 4 .This indicates that neither CO nor N 2 O are detectable in Earth's atmosphere at the R and S /N considered here, which is in agreement with the findings in Konrad et al. (2022).In conclusion, M 4 shows the best performance of all models considered.Therefore, we used M 4 as forward model in the retrieval analyses presented in main part of this manuscript.

NP Jan
NP Jul SP Jan SP Jul EqC Jan EqC Jul  Note-In the third column we specify the priors assumed in the retrievals.We denote a boxcar prior with lower threshold x and upper threshold y as U(x, y); For a Gaussian prior with mean µ and standard deviation σ, we write G(µ, σ).The last nine columns summarize the model parameters used by each of the different forward models tested in the retrievals (✓ = used, × = unused).

D. CALCULATION OF THE EQUILIBRIUM TEMPERATURE AND BOND ALBEDO
The equilibrium temperature T eq and the Bond albedo A B are not directly determined in our atmospheric retrievals.However, both parameters provide important information about the energy budget of Earth.In the following, we summarize how we derive estimates for T eq and A B from the retrieved parameter posteriors.
To determine T eq , we first calculate the MIR spectra corresponding to the retrieved parameter posteriors over a wide wavelength range.For each spectrum, we then integrate the flux to estimate the total emitted flux and use the Stefan-Boltzmann law to compute the effective temperature T eff of a black-body with the same flux, which corresponds to the T eq of the planet.From the resulting T eq distribution, we can deduce the planetary A B distribution using: Here, σ is the Stefan-Boltzmann constant, a P is the semi-major axis of the planet orbit around its star, and L * is the luminosity of the star.To calculate A B , we assume that a P and L * to be known with an accuracy of ±1% (i.e., for an exo-Earth, a P = 1.00 ± 0.01 AU, L * = 1.00 ± 0.01 L ⊙ with the solar luminosity L ⊙ ).For each value in the T eq distribution, we randomly draw an a P and L * value from two uncorrelated normal distributions and calculate the corresponding A B value.This yields the distribution for the planetary Bond albedo A B .In Tables E1 to E3, we provide the numerical values corresponding to Figures 6, 7, and F2 for the different viewing angles: • Table E1 − NP viewing angle posteriors, • Table E2 − SP viewing angle posteriors, • Table E3 − EqC viewing angle posteriors.The white square marker shows the true surface pressure P 0 and temperature T 0 , the white circular markers show the true P−T structure, and the gray shaded area indicates the uncertainty thereon.In the bottom right of each panel, we plot the 2D P 0 -T 0 posterior, to visualize the constraints on the retrieved surface conditions.Each panel shows the result for one viewing angle.From top-left to bottom-right: NP Jan, SP Jan, EqC Jan, NP Jul, SP Jul, and EqC Jul.As we motivated in Section 4, the underestimation of P 0 and R pl can be directly linked to biased retrieved estimates of T eq , A B , and the atmospheric trace-gas abundances.Here, we provide further evidence for these correlations between parameter biases by reducing the retrieved posterior distribution to Earth's true P 0 and R pl .

F.1. Posterior Reduction Method
In Section F, we reduced the retrieved posterior distributions to fixed values of P 0 = 1 bar and R pl = 1R ⊕ by assuming a linear correlation between P 0 or R pl and the remaining posteriors.Here, we outline the method used to reduce the posterior distributions.A schematic illustration showing both the true and the reduced posteriors can be found in Figure F1.
In the following, let us consider one point in the retrieved posterior distribution.We denote the value of P 0 or R pl (i.e., the parameter we want to reduce over) as θ red,true and values of the other model parameters as θ param,true .If we assume a linear correlation between θ red,true and θ param,true , we can make a prediction θ param,pred for θ param,true using θ red,true as follows: θ param,pred = m • θ red,true + q. (F1) Here, m is the slope and q the offset with respect to the origin of the linear model.θ param,pred is the parameter value predicted by the linear model.We search for the best fit linear model by minimizing the square difference ∆ between θ param,pred and θ param,true : The best fit linear models are indicated in Figure F1 as black dash-dotted lines.From the figure, we see that the correlations between the parameters considered here are well described by our linear model.In the next step, we fix the value of θ red,true to θ red,fix and calculate the corresponding reduced posterior values θ param,red of the other parameters as follows: θ param,red = θ param,true + m • θ red,fix − θ red,true . (F3) This yields the reduced posterior distribution of a parameter, which we plot in Figure F1.This reduction method allows us to remove the effect of one parameter on the posterior distribution, and identify the origin of biases in the retrieval results.

F.2. Reduction Relative to P 0
To demonstrate the effects of underestimating P 0 , we reduce the abundance posteriors to Earth's true P 0 of 1 bar.The resulting reduced posteriors are shown in the left panel of Figure F2 (numerical values in Tables E1 to E3).The posterior reduction to the true P 0 , leads to significantly better estimates for CO 2 and CH 4 .For CO 2 , the reduced posteriors are perfectly centered on the true value, while the CH 4 abundances are significantly less overestimated.This demonstrates, that the shifts in the CO 2 and CH 4 posteriors in Figure 6 are directly linked to the inaccurately retrieved P 0 .For O 3 and H 2 O, the reduced posteriors are shifted to lower abundances and show a smaller variance between the individual retrievals.These findings suggest that the P 0 reduction also yields improved estimates for atmospheric O 3 and H 2 O abundances.

F.3. Reduction Relative to R pl
In order to investigate how our underestimation of R pl affects the other posteriors, we reduce the R pl posterior to 1 R ⊕ .We plot the reduced posteriors of P 0 , T 0 , T eq , and A B in the right panel of Figure F2 (numerical values in Tables E1 to E3).First, we observe no significant differences between the reduced and the true P 0 posteriors from Figure 6.Thus, no direct correlation between the R pl and the P 0 posterior exists.Second, for T 0 , which is accurately estimated in all retrievals, the reduced posteriors underestimate the truth by ≥ 5 K.This finding indicates that Earth's disk-integrated flux is smaller than what is expected for a cloud-free 1 R ⊕ planet with surface temperature T 0 .This suggests that patchy clouds, which partially block the emission from the high pressure atmospheric layers and thereby reduce the total planet flux, are the likely cause of the R pl biases (see also Appendix G for further evidence).Finally, the reduced T eq and A B posteriors are unbiased and provide accurate truth estimates (uncertainties: ≤ ±2 K for T eq ; ≤ ±0.1 for A B ), which demonstrates the correlations with R pl .

G. QUANTIFYING THE EFFECT OF NEGLECTING CLOUDS ON RETRIEVED PLANET RADIUS ESTIMATES
Here, we use a simplified model for Earth's thermal emission to motivate that the magnitude of the biases on the retrieved R pl estimates (see Section 4) can be explained by an Earth-like patchy cloud coverage.In our cloud-free retrievals, we model Earth as a spherical Black Body (BB) with radius R pl, ret and surface temperature T 0, ret .Neglecting absorption and emission by Earth's atmosphere, the total power emitted (P cloud−free ) is equivalent to the power emitted by a spherical BB (where σ is the Stefan-Boltzmann constant): P cloud−free = 4πR 2 pl, ret σT 4 0, ret .(G4) However, we know that clouds are present in Earth's atmosphere.To obtain a first order approximation of the total power emitted by a partially cloudy Earth (P cloudy ), we assume opaque clouds (i.e., the clouds block all thermal radiation from lower atmosphere layers) that emit BB radiation of temperature T cloud−top at the cloud-top.Using the cloud-coverage fraction ( f cov ; i.e. the percentage of Earth's surface covered by clouds), we can approximate P cloudy as a weighted sum of the BB emission from cloudy and cloud-free regions: Here, R pl, true and T 0, true are Earth's true radius and average surface temperature respectively.The power emitted by Earth via its thermal emission is measurable and independent of the selected model.We thus set Equations G4 and G5 equal to each other: The results from Section 4 suggest that T 0 is accurately estimated by our retrievals, despite not accounting for Earth's partial cloud coverage.Motivated by this finding, we substitute T 0, ret and T 0, true by T 0 in Equation G6.Subsequent rearranging yields: Next, we assume that the temperature difference between T cloud−top and T 0 is ∆T .We implement this assumption by replacing T cloud−top with T 0 − ∆T in Equation G7.Further, for Earth, ∆T is significantly smaller than T 0 .Thus, we can approximate as follows: By inserting Equation G8 into Equation G7 and simplifying the resulting expression, we obtain: By inserting numeric values into Equation G9, we assess if the retrieved R pl biases are consistent with an Earth-like patchy cloud coverage.Motivated by our retrieval results (see, Section 4 and Appendix E), we assume R pl, ret /R pl, true = 0.90 ± 0.03.For T 0 , we select the lowest and the highest retrieved values to cover the full T 0 range (272 ± 6 K for NP Jan; 287 ± 6 K for NP Jul).Last, we assume an Earth-like f cov of 0.67 (see, Appendix A).Inserting these values into equation G9 yields: 19 ± 5 K for NP Jan view.20 ± 5 K for NP Jul view.

(G10)
This implies that T cloud−top must lie roughly 20 K below T 0 , if the retrieved bias on R pl is evoked by Earth's patchy cloud coverage.Assuming a lower limit of 4 K/km for Earth's moist adiabatic lapse rate, the ∆T requirement translates to an upper limit of 5.0 ± 1.2 km for the cloud-top altitude.Similarly, from Earth's dry adiabatic lapse rate (≈ 10 K/km), we obtain a lower limit of 2.0±0.5 km for the cloud-top position.Both altitudes lie well below the tropopause (≈ 9 km at the poles to ≈ 17 km at the equator) and span the atmospheric layers where Earth's abundant low-to mid-level clouds form (Houze 2014).This first-order approximation demonstrates that the magnitude of the retrieved R pl biases can be explained by the missing cloud treatment in our retrieval study.

Figure 1 .
Figure 1.The four observing geometries studied (taken from Mettler et al. 2023).From left to right: North Pole (NP), South Pole (SP), Africacentered equatorial view (EqA), and Pacific-centered equatorial view (EqP).Due to the continuously evolving view of low latitude viewing geometries as the planet rotates, the two equatorial views EqA & EqP were combined to one observing geometry, EqC.

Figure 2 .
Figure 2. Comparison between a single-day disk-integrated AIRS (black) and IASI (purple) spectrum for the NP view.The gap between 4.6 − 6.2 µm is clearly visible in the AIRS spectrum.

Figure 3 .
Figure 3. Disk-integrated atmospheric profiles for July.From left to right: P−T profile followed by O 3 , CH 4 , CO, and H 2 O atmospheric profiles.The error bars are the error propagated uncertainties of the retrieved parameters from the AIRS L3 standard product.The different colors correspond to the viewing geometries: NP (blue), SP (turquoise), EqC (green).The insets display the profiles on a linear scale instead of a logarithmic one.

Figure 4 .
Figure 4. Disk-integrated R = 50 Earth spectra considered in our retrieval study.We indicate the S /N = 10 and S /N = 20 LIFEsim noise levels as shaded areas.Spectra from the top left to the bottom right: NP Jan, NP Jul, SP Jan, SP Jul, EqC Jan, EqC Jul.

Figure 5 .
Figure 5. Retrieval results for the R = 100, S /N = 20 EqC Jul Earth spectrum.The leftmost panel shows the retrieved P−T structure.Green-shaded areas indicate percentiles of the retrieved P−T profiles.The white square marks the true surface conditions (P 0 , T 0 ).The white circles and the gray area show the true P−T structure and the uncertainty thereon.In the bottom right of the P−T panel, we show the retrieved constraints on the surface conditions.The remaining panels show the posteriors of the trace gas abundances and other parameters.Green lines indicate posterior percentiles (thick: 16% − 84%; thin: 2% − 98%).Thick black lines indicate pressure-independent ground truths.Thin gray lines show the true abundance at different atmospheric pressures (solid: 1 bar; dashed: 10 −1 bar; dashed-dotted: 10 −2 bar; dotted: 10 −3 bar).

Figure 7 .
Figure 7. Posteriors of atmospheric trace-gas abundances relative to each other.The figure structure is equivalent to Figure 6.

Figure A1 .Figure B1 .
Figure A1.Total Cloud Fractions: This figure illustrates the total cloud fractions for the year 2017 across the investigated viewing geometries in this study.The data are derived from a level 3 satellite product (NASA/LARC/SD/ASDC 2015).The scattered points represent daily measurements, while the solid line depicts their rolling average with a window size of 8 days.The central points in the error bar scatter plot represent the monthly mean cloud coverage, while the accompanying error bars indicate the corresponding standard deviation.The shaded areas highlight the months January and July which were investigated in this study.The annotated cloud coverage values signify the monthly cloud fractions for these specific months.
Figure C1 summarizes the results from our model comparison efforts for all considered disk-integrated Earth spectra (viewing geometries, R, and S /N).The M i correspond to the models listed above, while the S i represent different combinations of R and S /N of the input spectra (S 1 : R = 50, S /N = 10; S 2 : R = 50, S /N = 20; S 3 : R = 100, S /N = 10; S 4 : R = 100, S /N = 20).Green squares indicate positive log 10 (K) values and preference of the M i with the high i, while red squares represent negative log 10 (K) values and preference of the low i M i .The color shading indicates the strength of the preference.

Figure C1 .
Figure C1.Bayes' factor log 10 (K) for the comparison of the different models in Appendix C. Positive values of log 10 (K) (green) indicate preference of the model M i with the higher i value, while negative values (red) indicate the opposite.The color shading indicates the strength of the preference.The S i represent different combinations of R and S /N of the input spectra (S 1 : R = 50, S /N = 10; S 2 : R = 50, S /N = 20; S 3 : R = 100, S /N = 10; S 4 : R = 100, S /N = 20).Columns summarize the results obtained for the viewing geometries.From left to right: NP Jan, NP Jul, SP Jan, Sp Jul, EqC Jan, and EqC Jul.
for interpretation of the Bayes' factor K for two models A and B. The scale is symmetrical, i.e., negative values of log 10 (K) correspond to very weak, substantial, strong, or decisive support for model B.

Figure
Figure E1.P−T profiles retrieved for the six disk-integrated, R = 50 and LIFEsim S /N = 10 Earth spectra.The color-shaded areas indicate percentiles of the retrieved P−T profiles.The white square marker shows the true surface pressure P 0 and temperature T 0 , the white circular markers show the true P−T structure, and the gray shaded area indicates the uncertainty thereon.In the bottom right of each panel, we plot the 2D P 0 -T 0 posterior, to visualize the constraints on the retrieved surface conditions.Each panel shows the result for one viewing angle.From top-left to bottom-right: NP Jan, SP Jan, EqC Jan, NP Jul, SP Jul, and EqC Jul.

Figure E4 .
Figure E4.As for Figure E1, but for the R = 100 and LIFEsim S /N = 20 Earth spectra.

Table 1 .
Data and observation details, and spectral information.

Table 2 .
Parameters of the retrieval forward model.

Table 3 .
Line and continuum opacities used in the retrievals.

Table C1 .
Parameter configurations of the nine tested retrieval forward models.