Aurora: A Generalised Retrieval Framework for Exoplanetary Transmission Spectra

Atmospheric retrievals of exoplanetary transmission spectra provide important constraints on various properties such as chemical abundances, cloud/haze properties, and characteristic temperatures, at the day-night atmospheric terminator. To date, most spectra have been observed for giant exoplanets due to which retrievals typically assume H-rich atmospheres. However, recent observations of mini-Neptunes/super-Earths, and the promise of upcoming facilities including JWST, call for a new generation of retrievals that can address a wide range of atmospheric compositions and related complexities. Here we report Aurora, a next-generation atmospheric retrieval framework that builds upon state-of-the-art architectures and incorporates the following key advancements: a) a generalised compositional retrieval allowing for H-rich and H-poor atmospheres, b) a generalised prescription for inhomogeneous clouds/hazes, c) multiple Bayesian inference algorithms for high-dimensional retrievals, d) modular considerations for refraction, forward scattering, and Mie-scattering, and e) noise modeling functionalities. We demonstrate Aurora on current and/or synthetic observations of hot Jupiter HD209458b, mini-Neptune K218b, and rocky exoplanet TRAPPIST1d. Using current HD209458b spectra, we demonstrate the robustness of our framework and cloud/haze prescription against assumptions of H-rich/H-poor atmospheres, improving on previous treatments. Using real and synthetic spectra of K218b, we demonstrate the agnostic approach to confidently constrain its bulk atmospheric composition and obtain precise abundance estimates. For TRAPPIST1d, 10 JWST NIRSpec transits can enable identification of the main atmospheric component for cloud-free CO$_2$-rich and N$_2$-rich atmospheres, and abundance constraints on trace gases including initial indications of O$_3$ if present at enhanced levels ($\sim$10-100x Earth levels).


INTRODUCTION
The last decade witnessed a revolution in our understanding of exoplanets and the nature of their atmospheres. Since the detection of the first atmosphere of a transiting exoplanet (Charbonneau et al. 2002), spectroscopic observations of exoplanets have provided wide ranging insights into the compositions, temperature structures, and physical processes in their atmo- luis.welbanks@ast.cam.ac.uk nmadhu@ast.cam.ac.uk spheres (see e.g., Seager & Deming 2010; Madhusudhan 2019; Zhang 2020, for a review). Most of the atmospheric observations have been made using transmission spectroscopy which is conducted when an exoplanet transits in front of its host star and light from the star passes through the planet's atmosphere before reaching the observer (Seager & Sasselov 2000). A transmission spectrum, through a wavelength-dependent change in the apparent size of the planet, encodes information about the atmosphere at the day-night terminator of the planet. Particularly, transmission spectroscopy has been key in detecting and quantifying the abundances of multiple molecules and atoms (e.g., Charbon-neau et al. 2002;Deming et al. 2013;Madhusudhan et al. 2014;Kreidberg et al. 2014a;Wyttenbach et al. 2015;Sedaghati et al. 2017;Nikolov et al. 2018;Chen et al. 2018;Wakeford et al. 2018), as well as providing important insight into clouds/hazes in exoplanetary atmospheres (e.g., Pont et al. 2008;Sing et al. 2016;Nikolov et al. 2018;Benneke et al. 2019a).
The extensive availability of computational methods and packages for statistical inference has made it possible for retrieval codes to update their capabilities and include multiple optimisation algorithms. For instance CHIMERA has used OE, Bootstrap Monte Carlo (BMC), MCMC, as well as MultiNest nested sampling (e.g., Line et al. 2013;Colón et al. 2020). NEMESIS has been adapted, beyond OE, to use MultiNest nested sampling (e.g., Krissansen-Totton et al. 2018). Similarly, T -REx through different updates has used MCMC and diverse nested sampling algorithms (e.g., Waldmann et al. 2015a;Al-Refaie et al. 2019).
Parallel efforts are being made towards exploring the viability of machine learning algorithms as a replacement or aid to traditional Bayesian optimisation algorithms. Some studies have used the Random Forest algorithm (e.g., Breiman et al. 1984) to train estimators and predict the parameters that better explain an observed spectrum (e.g., Márquez-Neila et al. 2018;Fisher et al. 2020;Guzmán-Mesa et al. 2020;. Other studies have used Deep Belief Neural Networks (albeit in studies of emission spectroscopy, e.g., Waldmann 2016), Generative Adversarial Networks (Zingales & Waldmann 2018), Deep Neural Networks (Soboczenski et al. 2018), and Bayesian Neural Networks (Cobb et al. 2019) in efforts to predict atmospheric prop-erties of exoplanets. A complementary approach has been to use machine learning to help inform the priors in a traditional retrieval (e.g., Hayes et al. 2020). These advancements in retrievals are an active area of research and future work may elucidate on the synergies between traditional retrievals and these novel machine learning techniques.
There have also been developments in model considerations for atmospheric retrievals of transmission spectra. Recent works have investigated the impact of cloud and hazes in atmospheric retrievals (e.g., Line & Parmentier 2016;MacDonald & Madhusudhan 2017;Pinhas et al. 2018;Mai & Line 2019;Barstow 2020). Similarly, studies have investigated the relative importance of various model and data considerations, including temperature structures, clouds, and optical data (e.g., ) over simpler isobaric, isothermal, semi-analytic model assumptions. Other works have looked into the effect of uncertainties in the system parameters (e.g., de Wit & Seager 2013;Batalha et al. 2019; or temperature and abundance inhomogeneities (Caldas et al. 2019;Changeat et al. 2019;MacDonald et al. 2020) in the retrieved properties of the atmosphere. Further efforts have investigated the impact of stellar contamination in transmission spectra (e.g., Pinhas et al. 2018;Iyer & Line 2020;Bruno et al. 2020).
While the studies above have focused primarily on retrievals for giant planets with H-rich atmospheres, some studies have also developed retrieval frameworks for smaller planets where the atmosphere may not be assumed to be H-rich a priori. Benneke & Seager (2012) investigated an agnostic retrieval framework for super-Earths, which could have a wide range of atmospheric compositions. They highlight that assuming log-uniform priors for the mixing ratios of the chemical species sampled in a retrieval can lead to a highly asymmetric prior for the last species derived using the unit sum constraint, which is unfavourable in the absence of a priori knowledge of the dominant species in the atmosphere, e.g., for super-Earths. To overcome this limitation, Benneke & Seager (2012) suggest a reparameterization of the chemical compositions that is applicable to both H-rich and non H-rich atmospheres. The parameterization, based on centered-log-ratio transformations (e.g., Aitchison 1986), allows for equal prior probability distributions for all the chemical species considered; we discuss this in depth in section 2.2. In subsequent work, Benneke & Seager (2013) demonstrate the potential of using Bayesian model comparisons along with high-precision transmission spectra of super-Earths/mini-Neptunes to differentiate between cloudy H-rich atmospheres and those of high mean-molecular weight, e.g., H 2 O-rich.
After this decade of revolutionary work on retrievals, the next generation of retrieval codes is upon us. Such retrievals must incorporate the lessons learned from atmospheric studies of giant exoplanets and also be adaptable to low-mass planets. In preparation for upcoming observations of temperate mini-Neptunes and super-Earths, the methods for non H-dominated atmospheres must be implemented and updated to be compatible with the latest optimisation algorithms. Upcoming codes should be able to expand their modelling functionalities motivated by data requirements. Lastly, with the increasing model complexity and data quality, new retrieval codes must be prepared for assessing multidimensional, highly degenerate problems.
We introduce Aurora, a next-generation retrieval framework for the atmospheric characterisation of Hrich and non H-rich planets. Our code incorporates new key features on the previous retrieval code AURA (Pinhas et al. 2018). First, we reparameterise the volume mixing ratios in the atmosphere expanding the previous scope beyond H-rich atmospheres, adapting methods previously used for super-Earths (e.g., Benneke & Seager 2012) and other areas of compositional data analysis (e.g., Aitchison 1986;Aitchison & J. Egozcue 2005;Pawlowsky-Glahn & Buccianti 2011). Second, Aurora incorporates the next-generation nested sampling algorithms PolyChord and Dynesty, as well as maintaining compatibility with MultiNest. Third, Aurora includes a new generalised parametric treatment for inhomogeneous clouds and hazes. Compared to previous prescriptions, our new treatment of clouds/hazes is robust against assumptions of whether the atmosphere is H-rich or not.
Lastly, Aurora incorporates different modular capabilities that enhance the study of transmission spectra using retrievals and forward models. These include assessing stellar heterogeneity (e.g., Rackham et al. 2017;Pinhas et al. 2018), allowing for underestimated variances in the data (e.g., Foreman-Mackey et al. 2013;Colón et al. 2020), and considering correlated noise using Gaussian processes (e.g., Rasmussen & Williams 2006). Additionally, our forward modelling capabilities can account for light refraction and forward scattering , as well as Mie-scattering due to a variety of condensate species (Pinhas & Madhusudhan 2017). Aurora's modular capabilities can be incorporated in retrievals should observations require it.
In what follows, we present our retrieval framework in section 2. We benchmark the results of Aurora on current and synthetic observations in section 3, and present case studies for characterising atmospheres of hot Jupiters, mini-Neptunes, and rocky exoplanets with JWST. We summarise our conclusions in section 4 and discuss the implications of our findings and possible avenues for future developments of Aurora.

AURORA RETRIEVAL FRAMEWORK
Aurora builds upon the AURA retrieval framework (Pinhas et al. 2018) developed in our group and, among other features, expands the retrieval capabilities to rocky exoplanets without the assumption of H-rich atmospheres. The core retrieval methodology for H-rich atmospheres is explained in Pinhas et al. (2018), with its implementation previously explained in  and employed in different retrieval studies (e.g., Chen et al. 2018;von Essen et al. 2019;Colón et al. 2020). Here, we reintroduce the basic retrieval methodology of the AURA code and discuss the new enhancements made in Aurora. Figure 1 shows the schematic diagram of the Aurora framework.
Like any contemporary retrieval framework (Madhusudhan 2018), AURA and Aurora comprise of a forward model that is interfaced with a Bayesian inference and parameter estimation scheme. The forward model computes a transmission spectrum given a set of parameters for the temperature structure, chemical composition, and presence of clouds/hazes on the planet's atmosphere. The parameter estimation scheme explores the model's parameter space in search of regions of highlikelihood that can explain a set of observations. The Bayesian inference scheme estimates the model evidence and posterior probability distributions of the model parameters, and is performed using the nested sampling algorithm. In what follows, we describe each of these components for our retrieval framework. We highlight the following key advancements introduced in Aurora.
• Generalised inhomogeneous cloud/haze parameterisation We first discuss the standard features that we retain from the retrieval framework of Pinhas et al. (2018), followed by description of the new features in the Aurora framework built in this work.

Forward Model
Aurora computes the transmission spectrum of an exoplanet in transit assuming plane-parallel geometry. Our forward model is comprised of a parametric pressuretemperature (P-T) profile; parametric chemical abundances and consideration for multiple sources of opacity including atomic and molecular line opacity, Rayleigh scattering and collision-induced absorption; a treatment for inhomogeneous clouds and hazes; and a line-by-line radiative transfer solver under hydrostatic equilibrium. The forward model can consider light refraction, forward scattering, Mie-scattering and stellar heterogeneity (see section 2.4).

Pressure-Temperature Profile
The temperature in the atmosphere of an exoplanet as a function of pressure is determined by the pressuretemperature (P-T) profile. We follow the parametric prescription of Madhusudhan & Seager (2009). We choose this profile as it is motivated by the profiles observed in the solar system and has been successfully applied to exoplanet studies (e.g., Madhusudhan & Seager 2009Madhusudhan et al. 2014;Blecic et al. 2017). The equations for temperature in this parameterization divide the model atmosphere into three distinct regions: where we maintain the empirical choice of Madhusudhan & Seager (2009) to set their parameters β 1 = β 2 = 0.5. Here, T 0 is the temperature at the top of the model atmosphere P 0 (e.g., 10 −6 bar in this work), P 1 is the boundary between the first and second regions, P 3 is the boundary between the second and third regions, P 2 is the pressure in the parameterization which can capture possible thermal inversions if P 2 > P 1 , and α 1 and α 2 are the values that determine the curvature of the profile in the different layers. We restrict our temperature profiles to those with P 2 ≤ P 1 , for observations of the day-night terminator where thermal inversions are not expected. Aurora has the option of considering an

Model spectrum
Binned spectrum Figure 1. Schematic of Aurora's retrieval framework. The retrieval framework combines an atmospheric forward model with a Bayesian inference and parameter estimation algorithm to derive the model evidence and posterior probability distributions of the model parameters given an observed spectrum. Aurora also has modular capabilities for including additional effects in the atmospheric model if required, e.g., light refraction, forward scattering and Mie-scattering.
isothermal profile in which case the free parameter is T 0 . Then, the temperature at all points in the model atmosphere is assumed to be T 0 .

Sources of Opacity
The presence of different chemical species in the atmosphere of an exoplanet is retrieved by considering their contribution to the star light's extinction. The extinction coefficient κ(λ, P, T ) of the atmosphere is a pressure, temperature, and wavelength dependent quantity that contributes to the differential optical depth dτ (λ, P, T ) = κ(λ, P, T )ds (4) along the line of sight s. The extinction coefficient for each species is given by κ i (λ, P, T ) = n i σ i (λ, P, T ), where n i is the number density and σ i the absorption cross-section of the species. The number density, n i , is parameterised through the volume mixing ratio, X i = n i /n tot , where n tot is the total number density. The volume mixing ratio of each species is a free parameter and assumed to be uniform in the atmosphere. For H-rich atmospheres, Aurora calculates the volume mixing ratio of H 2 and He by assuming a particular He/H 2 ratio (X He /X H2 ) and the following relations where we adopt a solar value of X He /X H2 = 0.17 (Asplund et al. 2009) and consider a total of n species in the model atmosphere. The treatment of the volume mixing ratios when a H-rich atmosphere is not assumed a priori is described in section 2.2. Aurora in general considers the opacity sources expected in the atmospheres of hot Jupiters, mini-Neptunes and temperate rocky planets (e.g., Madhusudhan 2012; Moses et al. 2013;  The opacities for the chemical species are computed following the methods of Gandhi & Madhusudhan (2017), with the updated values of Gandhi & Madhusudhan (2018), and with H 2 -broadened Na and K cross sections as explained in .
Aurora also incorporates a continually updated library of cross-sections of various other atomic and molecular species (Gandhi et al. 2020). Figure 2 shows the cross section for most of the molecular opacity sources considered in this work for a pressure of 1 bar and a temperature of 300 K, from 0.4-30 µm covering the wavelength range expected to be observable by JWST. The Na and K profiles can be seen in Figure 1 of .
The resulting extinction coefficient is where σ H2-H2 and σ H2-He are the H 2 -H 2 and H 2 -He CIA cross-sections. The extinction coefficient can be amended to remove H 2 -He and H 2 -H 2 CIA and/or include CIA due to other species. Furthermore, the total extinction coefficient can include H 2 -Rayleigh scattering κ H2-Rayleigh (λ, P, T ) = X H2 n tot (P, T )σ H2 scat (λ) (7) where the wavelength dependent cross-section in cgs is given analytically by Dalgarno & Williams (1962) as σ H2 scat (λ) = 8.14 × 10 −45 λ 4 + 1.28 × 10 −54 λ 6 and is incorporated up to its third term in Aurora. Aurora also includes opacity sources relevant for modelling H-poor atmospheres of rocky planets. This library contains collision induced absorption (CIA) cross- man et al. 2019). These additional CIA cross-sections are generated within the temperature and wavelength limits available in the HITRAN data. The cross-sections for temperatures beyond those limits are set to values at the boundaries. We assume no opacity for wavelengths beyond the database range, as these values are not known. Future efforts, both experimental and theoretical, on extending and revising opacity databases would help obtain cross-sections over the full range of wavelengths and temperatures applicable for such planets. Aurora can also include Rayleigh scattering due to a variety of species including O 2 , N 2 , Ar, Ne, CO 2 , CH 4 , H 2 O, CO, and N 2 O (Shardanand & Rao 1977;Sneep & Ubachs 2005;Thalman et al. 2014). Rayleigh scattering due to species i is κ i-Rayleigh (λ, P, T ) = X i n tot (P, T )σ i scat (λ). We include collision induced absorption due to CO 2 -CO 2 , N 2 -N 2 , as well as Rayleigh scattering due to N 2 , H 2 O, and CO 2 in the H-poor models presented in section 3.4.

A New Cloud and Haze Parameterization
We introduce a new cloud and haze treatment for inhomogeneous cover that considers a total of four distinct spatial areas (sectors) covering the planet. These four areas are (1) a clear, cloud-free and haze-free, area affected only by Rayleigh scattering, (2) an area covered by hazes only, (3) an area covered by a gray cloud deck with Rayleigh scattering above the cloud deck, and (4) an area covered by a gray cloud deck and hazes above it. The total transit depth is a linear superposition of the transit depths of each sector ℎ +ℎ Figure 3. Schematic of the four-sector generalised parameterization of inhomogeneous clouds and hazes introduced in this work. The planet is enveloped by its atmosphere which is divided into four regions. These are (1) a clear, cloudfree and haze-free, sector, (2) a sector with hazes only, (3) a sector with clouds only, and (4) a sector with clouds and hazes.
Hazes, e.g., small size particles resulting from photochemical processes, are implemented into our model atmosphere by parameterising their effect on the spectrum as deviations from H 2 -Rayleigh scattering (Lecavelier Des Etangs et al. 2008). The parameterization provides a cross-section σ hazes = aσ 0 (λ/λ 0 ) γ , where γ is the scattering slope, a is the Rayleigh-enhancement factor, and σ 0 is the H 2 -Rayleigh scattering cross-section at the reference wavelength λ 0 . We adopt values of σ 0 = 5.31 × 10 −31 m 2 and λ 0 = 350 nm for consistency with previous works (e.g., MacDonald & Madhusudhan 2017;. Future observations of non H-rich planets could motivate the use of scattering cross-sections for different species. The extinction due to hazes is κ haze (λ, P, T ) = X H2 n tot (P, T )σ hazes (λ).
The regions of the atmosphere covered by a gray cloud deck are included by adopting a parameter for the cloud top pressure P cloud . The optical depth for all pressures higher than P cloud is considered to be infinite. The ex-tinction coefficient due to the cloud deck κ clouds (P ) is infinite for P > P cloud or zero for P < P cloud .
Previous studies have considered the effects of patchy clouds in transmission spectra (e.g., Line & Parmentier 2016;MacDonald & Madhusudhan 2017;Barstow 2020). Our model here generalises the approach of previous studies while being able to reduce to previous treatments under specific conditions. If the model prefers to consider the presence of clouds and hazes together, the fractions φ hazes and φ clouds approach zero and we obtain previous treatments for inhomogeneous cover (e.g., MacDonald & Madhusudhan 2017; Welbanks & Madhusudhan 2019). On the other hand, if the combined fraction is zero (e.g., φ clouds+hazes = 0), our approach allows us to consider the effect of clouds and hazes separately and distinguish whether the contribution to the spectrum is mostly due to deviations from H 2 -Rayleigh scattering produced by the hazes, or muted features due to a gray cloud deck. Lastly, if the combined fraction is zero and so is the haze only fraction (e.g., φ hazes = φ clouds+hazes = 0) we recover the expression for patchy clouds of Line & Parmentier (2016). By following this approach, we obtain a more robust and flexible treatment compared to our previous prescription that combines the effects of clouds and hazes into one sector (e.g., MacDonald & Madhusudhan 2017; Welbanks & Madhusudhan 2019). We present a schematic of our cloud and haze treatment in Figure 3.
We find the generalised treatment of clouds and hazes introduced in this work leads to consistent abundance estimates regardless of whether a H-rich atmosphere is assumed or not. In other words, the existing degeneracies between clouds/hazes and composition are treated equally irrespective of the assumption of the bulk atmospheric composition of the planet. On the other hand, combining clouds and hazes into one individual sector as previously performed (e.g., MacDonald & Madhusudhan 2017; Pinhas et al. 2018;) can lead to biases and an incomplete exploration of the parameter space that results in distinct solutions when assuming a H-rich atmosphere or not on the same data set. This is mitigated by our new cloud and haze prescription. We discuss these aspects further on the case study of the hot Jupiter HD 209458 b in section 3.1.

Radiative Transfer
Our model solves line-by-line radiative transfer in transmission geometry in a plane parallel atmosphere. The model atmosphere is divided into a predetermined number of pressure layers equally spaced logarithmically. The number of layers and their span in pressure space can be arbitrarily established by the user depending on the application. For this work, and based on the empirical results of , we use 100 pressure layers uniformly spaced in log 10 (P) from 10 −6 to 10 2 bar under hydrostatic equilibrium. Our calculation of hydrostatic equilibrium is performed considering the retrieved composition through the atmospheric mean molecular weight (e.g., µ = X i m i , where X i and m i are the volume mixing ratio and the atomic/molecular mass of species i, respectively), the retrieved pressure-temperature profile, and altitudedependent gravity.
We solve numerically for the transit depth of the planet where R star is the stellar radius, H A is the maximum height of the planetary atmosphere, τ λ is the total slant optical depth and integral of equation 4, b is the impact parameter, and R p is the radius of the planet. We present equation 10 as a three part expression to highlight the fact that the chosen R p may not correspond to an optically thick surface. If R p corresponds to an optically thick surface, the last integral in equation 10 evaluates to zero. Otherwise, the integral considers the contribution of the non-optically thick parts of the atmosphere, below the arbitrarily chosen position in the planet, to the transit depth. The selected value of R p , at a given reference pressure P ref , is used to construct a radial distance grid. The distance and pressure grids follow a one-to-one correspondence determined by hydrostatic equilibrium. It is possible in a retrieval to choose a value of R p for which the P ref parameter will be retrieved, to choose a value of P ref for which the associated radius R p will be retrieved, or leave both R p and P ref as free parameters.  showed that the retrieval results remain mostly unchanged regardless of the choice of free parameter (R p and/or P ref ). In this work we choose to keep R p as our independent variable for which we retrieve P ref .

Considerations for Non H-rich Atmospheres
A core assumption present in most atmospheric retrieval codes for hot Jupiters is that the atmosphere is H-rich. Such assumption can be appropriate for massive planets which, from a formation perspective, captured a gas mixture of predominantly H and He in cosmic proportions from their protoplanetary nebula (Seager & Deming 2010). However, when characterising the atmospheres of less massive planets or when pursuing an agnostic approach applicable to atmospheres of general composition, such assumption may need to be relaxed. Instead of assuming a H-rich atmosphere, studies could attempt to retrieve the main gas component of the atmosphere. Such approach would aim to explore a wider range of atmospheric compositions like N 2 -rich or CO 2rich atmospheres, and not be constrained to H-rich atmospheres only.
However, when pursuing this agnostic approach, the unit-sum constraint, i.e., the requirement that all the volume mixing rations in the atmosphere must add up to one, must be incorporated into the statistical modelling appropriately. Incorporating such constraint is nontrivial and has been the subject of study in a sub-field of statistical analysis called compositional data analysis (e.g., Pearson 1897;Tanner 1949;Chayes 1960;Aitchison 1986). The tools developed by this subfield of statistics have been implemented in a number of different disciplines like medicine, chemistry, economy, geophysics, amongst many others (see Aitchison & J. Egozcue 2005, for a review of the history of compositional data). The concepts of compositional data analysis were introduced to the exoplanet retrieval literature through the work of Benneke & Seager (2012).
Implementing the same methods used for the retrieval of H-rich atmospheres to retrievals in which the main atmospheric constituent is not known can result in biased results that do not explore all compositions equally. The traditional method would sample the volume mixing ratios of n-1 species (i.e., minor species) and assign the volume mixing ratio of the nth species (i.e., H 2 in the case of a H-rich atmosphere) following the unit-sum constraint. Benneke & Seager (2012) highlight that following this approach will result in a highly asymmetric prior (see Fig. 1 of Benneke & Seager 2012) for the nth species. Under these circumstances, the retrieval is not truly agnostic and the resulting atmospheric composition will be dependent on which molecule was chosen to be the nth species.
To circumvent this problem one must allow for all species to have the same prior probability density in a permutation-invariant prescription. If the prior probability for all species is identical, it is safe for the retrieval to sample over the parameter space of all n species. The solution is the centered-log-ratio transformation, defined as where g(X) is the geometric mean g(X) = (X 1 ...X N ) 1/N (Aitchison 1986). The transformed z i values, also called compositional parameters, treat all part of the gas symmetrically.
In Aurora, when not assuming a H-rich atmosphere we reparameterise the volume mixing ratios (X i ) using the centered-log-ratio transformation and obtain the compositional parameters (z i ). We assume that the combination of H 2 and He is one single part (z (H2+He) ) which we then use to determine the separate H 2 and He volume mixing ratios using a He/H 2 ratio. Then, we sample over the entire transformed space for all n gas components with the assumption that one of those is a mixture of H 2 and He in solar proportion.
Once sampling is performed in the space of the centered-log-ratio transformation, and to maintain the descriptions above about the treatment of different opacity sources, the inverse transformation (Pawlowsky-Glahn & Buccianti 2011) is calculated and the volume mixing ratios X i 's are used in our calculations. It is important to highlight that the compositional parameters (z i ) have slightly different properties than their counterparts, the volume mixing ratios (X i ). While the typical prior range for X i is 10 −12 < X i < 1, the limits for z i is −∞ < z i < ∞ where −∞ is the limit of a species not being present and ∞ means the species is the only one in the atmosphere. While a straightforward expression for the scenario in which all volume mixing ratios are equal is not available, the compositional parameters are present in equal parts when all z i = 0. Lastly, the unit-sum constraint for the volume mixing ratios is X i = 1, and transforms to z i = 0 for the compositional parameters.

Multialgorithmic Statistical Inferences
The strength in the retrieval approach when assessing the properties of an exoplanet's atmosphere resides in its ability to provide robust statistical estimates of the parameters and models used to explain the observations. As explained in the section 1, many statistical approaches exist in exoplanetary atmospheric retrievals: grid-based searches (e.g., Madhusudhan & Seager 2009 others (see Madhusudhan 2018, for a review). Of the different approaches available, Bayesian inference tools ease the comparison of models while providing estimates of the posterior distributions of the model parameters. One of these methods, nested sampling (Skilling 2006) has been successfully incorporated into exoplanetary retrieval literature (e.g., Benneke & Seager 2013;Line et al. 2013;Waldmann et al. 2015b;MacDonald & Madhusudhan 2017;Gandhi & Madhusudhan 2018;Pinhas et al. 2018;Krissansen-Totton et al. 2018;Mollière et al. 2019;Zhang et al. 2020) due to its ability to handle high dimensionality problems, sample the complete parameter space of the model, and use prior information on the model parameters. An overview of the Bayesian approach to inference problems is available in Sivia & Skilling (2006); Trotta (2008Trotta ( , 2017. The likelihood of observing the data (D) given a specific set of model parameters (θ M ) for a model (M) is Considering the Bayesian approach, where the degree of belief on the model assumptions must be accounted for, one must incorporate the prior distribution (π) on the model parameters π = P (θ M |M). The marginalised likelihood, also known as evidence, is obtained by integrating the likelihood over the full parameter space The model evidence is the quantity we are interested in evaluating when comparing different models. This is also the quantity different nested sampling algorithms aim to provide. Furthermore, using Bayes theorem it is possible to obtain the posterior probability distribution for each parameter given the data Aurora uses a likelihood function for data with independently distributed Gaussian errors 16) for a data set of length N and computed for each model realisation M i . Aurora follows the same binning strategy as AURA (see section 2.1.6 in Pinhas et al. 2018) where a model spectrum at a much higher resolution than the data is convolved with the point spread function (PSF) of the instrument with which the observations were obtained and then binned down to the spectral resolution of the data.
The prior distributions employed in this study are shown in Table A1 in the appendix. The priors for the parameters are mostly standard prescriptions adopted from previous studies (e.g., Pinhas et al. 2019;. The priors for the molecular abundances generally span the complete detectable range unless stated otherwise, with the prior distribution either log-uniform for the volume mixing ratios for H-rich retrievals or uniform in the corresponding compositional parameters (z i ), discussed in section 2.2, for non H-rich retrievals. The priors for the parameters associated with other physical properties, e.g., pressure-temperature profile and cloud/haze parameters, are also uniform or log-uniform and span the corresponding physically plausible ranges.

Next-generation Bayesian Inference Algorithms
The main functionality of a nested sampling algorithm is to obtain the model evidence (Z) while also deriving the posterior probability distributions of the model parameters as a by-product. A full description of the nested sampling algorithm is available in Skilling (2004Skilling ( , 2006; Feroz et al. (2009). In Aurora we implement three different algorithms, MultiNest  through its implementation PyMultiNest (Buchner et al. 2014), PolyChord (Handley et al. 2015a,b) through its implementation PyPolyChord, and Dynesty . Each nested sampling algorithm is different and the in-depth description for each implementation is available in their release papers listed above.
Generally, a nested sampling algorithm generates a number of live points drawn from the prior distribution, which sample the parameter space ). In each iteration, the point with lowest likelihood is replaced by a new one which ought to have a larger likelihood. This means that the live points sample the prior volume using continuously shrinking iso-likelihood contours, which with every iteration converge to the highest likelihood regions of the parameter space. At each step, every sampled value creates a model realisation that results in an evaluation of the likelihood function. The process finishes when a termination condition, like a pre-set fractional change in the the model likelihood, is met. Upon completion, the combination of all sampled points can be used to estimate the model evidence. The procedure to generate new live points can vary between different implementations of the nested sampling algorithm which are briefly discussed below.
MultiNest has been previously implemented in exoplanet retrievals (e.g., Benneke & Seager 2013;Line et al. 2013;Waldmann et al. 2015b;MacDonald & Madhusudhan 2017;Gandhi & Madhusudhan 2018;Pinhas et al. 2018;Krissansen-Totton et al. 2018;Mollière et al. 2019;Zhang et al. 2020). To draw unbiased samples from the likelihood-constrained prior, MultiNest uses what is called an ellipsoidal rejection sampling scheme. The basis for this scheme is that the replacement point is sought from within the set of ellipsoids described by the full set of live points at any iteration (Feroz et al. 2019). With each iteration the ellipsoids described by the isolikelihood contours shrink. This procedure is optimal for a small number of parameters but has an exponential scaling with dimensionality.
PolyChord, on the other hand, uses what is called slice-based sampling. In this procedure, the algorithm samples uniformly within the parameter space for which the posterior probability is higher than a given probability level or 'slice'. Unlike the exponential scaling problem with MultiNest at higher dimensions, PolyChord's scaling is ∼ O(D 3 ) (Handley et al. 2015b). This makes MultiNest preferred for low dimensionality problems, while PolyChord is preferred at higher dimensionalities (see Figure 7 in Handley et al. 2015b).
Lastly, Dynesty (Speagle 2020) uses a generalisation of nested sampling, in which the number of live points is variable, called dynamic nested sampling (Higson et al. 2019). In dynamic nested sampling, an initial run with a constant number of live points is used by the algorithm to approximate areas in prior space of the highest likelihood. Then, the algorithm proceeds to iteratively calculate the range of likelihoods where a larger number of live points will have the greatest result in accuracy. In dynamic nested sampling the number of live points is dynamically allocated to control the resolution at which the prior space is sampled. This would allow for runs that focus on sampling the posterior distribution or better estimate the model evidence. Dynesty allows for both dynamic and static nested sampling. Furthermore, Dynesty has four main approaches to generating samples: uniform sampling (including from ellipsoids like MultiNest), random walks, multivariate slice sampling (similar to PolyChord), and Hamiltonian slice sampling. Each approach has its benefits and impediments, and can be better suited for different problem dimensionalities. Speagle (2020) offers an extensive overview of each feature available in Dynesty.
Every algorithm for nested sampling offers different capabilities. While Dynesty is able to handle both static and dynamic sampling, it comes at the cost of multiple tuning parameters that can affect the behaviour of a given run. PolyChord is able to handle problems of higher dimensionality more efficiently than MultiNest, but MultiNest still outperforms PolyChord in the number of likelihood evaluations required for problems in low dimensions ( 80 dimensions, Handley et al. 2015b). Aurora offers the tools to perform retrievals optimising for evaluation of the model evidence, parameter posterior distributions, or both. The user has the freedom to choose the correct sampling algorithm for their needs depending on the complexity of the problem and its dimensionality.

Model Comparison and Detection Significance
The difference in evidence (Z) between models can be used to derive an equivalent detection significance (DS), a figure of merit traditionally used to compare different models. The detection significance is traditionally expressed in units of 'sigma' (σ) and corresponds to the number of standard deviations away from the mean of a normal distribution (Trotta 2008). Expressing a result in 'sigmas' does not necessarily mean the detection of new physics or a species in the spectrum of a planet. Instead, it is a useful way to translate the odds in favour of a more complex model into a frequentist metric. The relevance of a model preference can be somewhat arbitrary and different authors suggest different categories for expressing them. For instance, Trotta (2008) suggests that a difference of 2.0 to 2.6σ is weak at best, while Kass & Raftery (1995) suggests that the equivalent of ∼ 2.1σ is positive evidence. A way to transform the difference in model evidences to a detection significance was proposed by Benneke & Seager (2013). We perform the comparison of our models by solving equation 11 in Benneke & Seager (2013), and obtaining a detection significance as where erfc is the complementary error function, W −1 is the Lambert W function in its lower branch (i.e., k = −1 branch), e is Euler's number, and B is the Bayes factor defined as B = Z 1 /Z 2 , with the set requirement of Z 1 ≥ Z 2 so the Bayes factor is greater than or equal to unity.

Modular Capabilities
Aurora's design is modular ensuring that future capabilities can be easily incorporated into the existing retrieval framework. As part of this modular structure, we include in Aurora preexisting features in AURA (Pinhas et al. 2018) such as the functionality to retrieve stel-lar properties from a transmission spectrum. Furthermore, we introduce new modular capabilities that aid in the analysis of transmission spectra in the context of retrievals and forward models. These key additions include new considerations for noise modelling and forward models considering light refraction, forward scattering, and Mie-scattering.

Stellar Heterogeneity
One of the main features of AURA (Pinhas et al. 2018) was to retrieve stellar properties embedded in the transmission spectrum as well as the planetary properties. Inhomogeneities in the stellar photosphere were modelled by retrieving the areal fraction of the projected stellar disc covered by heterogeneities (δ), hot faculae or cool spots, the heterogeneity temperature (T het ), and the photospheric temperature (T phot ). Aurora inherits this capability but we do not include it in the present study.

New Noise Modelling Modules
Aurora has the capability to treat noise models different from the traditionally assumed white noise. Aurora can consider the possibility of underestimated variance in the data by retrieving an error-bar inflation free parameter ). This approach assumes that the variance is underestimated by a fractional amount f . The variance of the data is where σ obs is the error in the observations and ∆ mod is the model's transit depth. This term replaces the variance term in equation 16. This functionality has been tested on the recent spectroscopic observations of KELT-11b (Colón et al. 2020). Aurora also has the capability to consider correlated noise in the data being analysed. To do so, we have incorporated Celerite (Foreman-Mackey et al. 2017) and George (Ambikasaran et al. 2015) to model the covariance function and compute the likelihood of a Gaussian Process (GP) model (Rasmussen & Williams 2006). The effects of a GP in transmission spectra fall beyond the scope of this paper and we reserve it to a future study.

Refraction and Forward Scattering
In Aurora we have incorporated the analytic descriptions for forward scattering and refraction of transit spectra proposed by Robinson et al. (2017). These prescriptions have been incorporated in the context of producing forward models and synthetic observations. Refraction effects are calculated using the prescription for the maximum pressure at which the effect of refraction is large enough to cause a light ray from one side of  the planet to come from the far limb (i.e., opposite side) of the host star . We incorporate the wavelength-dependent refractivity , and use them to calculate the maximum pressure probed (P max ) at each wavelength following equation 15 of Robinson et al. (2017). The optical depth for pressures higher than P max is set to infinity. Figure 4 shows the effect of considering refraction in forwards models of K2-18b. For these forward models we consider refraction due to H 2 , H 2 O, or N 2 . The forward models are determined by the median retrieved parameters in section 3.2. Figure 4 shows that the effects of refraction are almost negligible, ∼ 4 ppm. Additional models considering the effect of refraction for a rocky exoplanet are shown in appendix B.
The standard forward model in Aurora combines the absorption and scattering optical depths into the total optical depth as seen in equation 4. However, it is possible that a portion of the scattered light in the planet's atmosphere will be directed towards the observer. This portion of light is said to be forward scattered. The additional fraction of light reaching the observer results in an attenuation to the transit depth. In Aurora, we can model this by correcting the effective optical depth for the effects of forward scattering. The modified optical where f is the forward scattering correction factor andω o is the single scattering albedo . We calculate the correction factor f using the analytic correction expressed in equation 6 of Robinson et al. (2017) for the Henyey-Greenstein phase function (Henyey & Greenstein 1941). The correction proposed by Robinson et al. (2017) is a function of the stellar radius, the planet-star physical separation, and the asymmetry parameter g. Figure  5 shows the decrease in transit depth due to considering forward scattering, in the same H 2 -rich forward model for K2-18b described above, assuming g = 0.95 andω o = 1. The effect of incorporating forward scattering in the model of K2-18b results in a difference of less than 1 ppm. Models considering forward scattering for a rocky exoplanet are shown in appendix B. The detectability of these secondary effects remains to be confirmed. Current observations using HST, and ground based observatories do not posses the precision necessary to identify them. In the meantime Aurora possesses the capabilities to model these effects in transmission spectra of exoplanets in the context of forward models. The implementation of these models in the context of retrievals remains a possibility for future studies should the data require so.

Mie-Scattering Forward Models
Aurora includes Mie-scattering in the forward models due to condensates with different particle sizes and compositions adopted from Pinhas & Madhusudhan (2017). The effective cross-section for these species is calculated using their extinction and scattering coefficients, along with the corresponding asymmetry parameter g following equation 11 of Pinhas & Madhusudhan (2017), obtained using Mie theory. Figure 6 shows the spectroscopic features of Miescattering for different compositions in the H 2 -rich atmosphere of K2-18b. The models assume the retrieved chemical abundances from the results in section 3.2. The models shown include H 2 -Rayleigh scattering and H 2 -H 2 and H 2 -He CIA. In black we show the H 2 -rich atmosphere only, while in purple, orange, and green the effects of Mie-scattering due to MgSiO 3 , NaCl, and KCl are shown respectively. The assumed abundances for the condensate species is 10 −16 , similar to expectations for NaCl and KCl from equilibrium chemistry calculations (e.g., Woitke et al. 2018) for the equilibrium temperature of the planet of T eq. ∼ 290 K (e.g., , with a particle size of 4.89×10 −2 µm (< 0.1µm, e.g., Adams et al. 2019;Lavvas et al. 2019). As shown in the bottom panel of Figure 6, the maximum difference between the clear H 2 -rich model and the models considering Mie-scattering is ∼ 25 ppm, within the precision limits of current observations. Future observatories with high-precision measurements in the optical wavelengths may be able to distinguish the effects of these condensate species in the atmospheres of exoplanets.

RESULTS
We validate Aurora's new retrieval features on real and synthetic spectro-photometric observations. First we validate our H-rich and non H-rich approaches as well as the new prescription for inhomogeneous cloud and haze cover on the prototypical hot Jupiter HD 209458 b (Henry et al. 2000;Charbonneau et al. 2000) using observations from Sing et al. (2016). Next we test the different nested sampling algorithms included in Aurora using the most recent observations of K2-18b (Foreman-Mackey et al. 2015) from Benneke et al. (2019b), and investigate the robustness of the retrieved abundance estimates comparing them to previous works (e.g., Benneke et al. 2019b;Madhusudhan et al. 2020). Lastly, we investigate future atmospheric constraints of mini-Neptunes and rocky exoplanets using synthetic observations.

Validation of Aurora on hot Jupiter HD 209458 b.
We perform a series of retrievals on the transmission spectrum of HD 209458 b from Sing et al. (2016), composed of spectro-photometric observations with HST-STIS, HST-WFC3, and Spitzer . We use the standard model set-up described in Pinhas et al. 2019;). Our sources of opacity include H 2 -H 2 and H 2 -He CIA, H 2 -Rayleigh scattering, and line opacity due to H 2 O, Na, K, CH 4 , NH 3 , HCN, CO, and CO 2 . We conduct a range of retrievals with different cloud and haze prescriptions, and assumptions of whether the atmosphere is H-rich or not.
We perform retrievals using four models with different considerations for clouds and hazes allowed by our generalised prescription explained in section 2.1.3. Model 0 considers a clear atmosphere (i.e., φ clouds = φ hazes = φ clouds+hazes = 0). Model 1 considers one sector for a clear atmosphere and one sector for the combined effects of clouds and hazes (i.e., φ clouds = φ hazes = 0). Model 2 considers one sector for a clear atmosphere, one sector for the presence of clouds only, and one sector for the presence of hazes only (i.e., φ clouds+hazes = 0). Model 3 considers one sector for a clear atmosphere, one sector for clouds only, one sector for hazes only, and one sector for the combined presence of clouds and hazes (i.e., the full inhomogeneous cloud and haze prescription introduced in this work). For each cloud and haze model above, we perform a retrieval assuming a H-rich  Table C2). Horizontal lines at the bottom of the figure show the wavelength coverage of HST-STIS, HST-WFC3, and Spitzer. Yellow, orange, and blue horizontal lines show the approximate wavelength regions where Na, K, and H2O spectral features are expected, respectively.
atmosphere and a retrieval relaxing such assumption. In summary we perform eight retrievals in this section with the models above, four assuming a H-rich atmosphere and four not assuming a H-rich atmosphere.

A Generalised Cloud and Haze Prescription
We consider a generalised cloud and haze prescription in order to explore a larger parameter space than available when restricting the presence of clouds and hazes to one sector only (i.e., model 1). We find that assuming a H-rich atmosphere or not can result in different solutions when restricting the clouds and hazes to the same region, as in model 1 (e.g., MacDonald & Madhusudhan 2017). This is not the case for any of the other models in this section (i.e., model 0, model 2, or model 3). When assuming a H-rich atmosphere we find that, using any of the models for inhomogeneous cloud/haze cover, the spectrum of HD 209458 b can be explained by two possible scenarios. The first is the known solution with median values of sub-solar 1 H 2 O of log 10 (X H2O ) ∼ −4.5, 1 We clarify that in this context we refer to abundances of H 2 O as sub-solar by assessing them relative to expectations from thermochemical equilibrium for solar elemental abundances (Asplund et al. 2009). For a solar composition, the expectation is a H 2 O abundance of log 10 (X H ; the second is a physically implausible solution with high Na abundances that make up for ∼20% of the atmosphere's composition and an atmosphere fully covered by clouds and hazes (e.g., log 10 (X H2O ) ∼ −2.5, log 10 (X Na ) ∼ −0.7, log 10 (X K ) ∼ −2.4). Both modes are simultaneously retrieved by model 2 and model 3, regardless of whether or not a H-rich composition is assumed. However, when treating clouds and hazes together (i.e., model 1), while assuming a H-rich atmosphere results in the two modes as discussed above, relaxing the H-rich assumption results in only the high Na abundance solution. Therefore, model 1 may be susceptible to potential biases in retrieved solutions when the dominant atmospheric composition may not be assumed to be H-rich a priori. On the other hand, models 2 and 3 provide more generalised parameterisations that do not depend strongly on the Hrich assumption. In what follows we restrict the prior space of the log 10 abundances of Na, K and CO to an upper limit of -1.5, consistent with assumptions in previous studies (

Effect of Cloud and Haze Prescriptions
We re-run all eight cases with the new constraints on the abundances of Na, K and CO. We present the complete set of retrieved parameters for the 4 cloud and haze models assuming a H-rich atmosphere and not assuming a H-rich atmosphere in Table C2 included in the appendix. Figure 7 shows the median retrieved spectrum for model 3 which results in the highest model evidence, while figure 8 shows the H 2 O, Na, and K posterior distributions for all 4 cloud and haze models.
Considering a cloud-free atmosphere (model 0) results in tight H 2 O abundance constraints with precisions smaller than 0.5 dex. Regardless of the treatment for the main gas constituent in the atmosphere, both cloud free retrievals result in sub-solar H 2 O abundances with abundance estimates smaller than the models considering clouds and hazes. These low abundances are the consequence of having a larger observable atmosphere (i.e., larger atmospheric column), unocculted by clouds, in which a small abundance of H 2 O can contribute enough to explain the observations (see e.g., .
Contrary to the cloud-free solutions, the cloudy and hazy scenarios (i.e., models 1, 2 and 3) result in higher H 2 O abundances although with still generally sub-solar values. Models 1, 2 and 3 are consistent in their retrieved parameters when assuming a H-rich atmosphere and when relaxing this assumption. The retrieved H 2 O abundances are consistent with each other and within 1σ between all three cloud and haze models. The same is true for the Na and K abundances. Model 1, consisting of one fraction combining clouds and hazes as in MacDonald & Madhusudhan (2017), results in tighter constraints relative to models 2 and 3. These tighter constraints indicate that part of the parameter space explored by the other two prescriptions was not considered in model 1. The increase in model evidence for model 2 and model 3 relative to model 1 indicate that the increased parameter space contains previously unsampled regions of high likelihood. The retrieved P-T profiles are consistent between models 1, 2, and 3, with a retrieved temperature near the photosphere for model 3 assuming a H-rich atmosphere of T 100mbar = 1308 +345 −278 K, consistent with previous studies (e.g., Welbanks & Madhusudhan 2019; ). On the other hand, the retrieved P-T profile for model 0 is tightly constrained at colder temperatures (e.g., T 100mbar = 852 +20 −12 K for the retrieval assuming a H-rich atmosphere) and inconsistent with the planet's equilibrium temperature (T eq. ∼1450 K, e.g., . When comparing the retrievals assuming H-rich atmospheres, model 3 has the highest model evidence with a value of log(Z) = 958.40. Using model 3 as our reference, model 0 is strongly disfavoured at 4.6σ; model 1 is disfavoured at 1.8σ; and model 2 is weakly disfavoured at 1.4σ. A similar interpretation is available when comparing the non H-rich retrievals amongst themselves.

H-rich vs. Non H-rich Assumptions
We also compare retrievals assuming a H-rich atmosphere against retrievals not assuming a H-rich atmosphere. Relaxing the assumption of a H-rich atmosphere requires an additional parameter to retrieve the volume mixing ratio of a mixture of H 2 and He in solar proportion. This additional parameter results in a decrease in model evidence relative to retrievals assuming a Hrich atmosphere. Retrievals using model 3 favour assuming a H-rich atmosphere at 2.82σ over not assuming a H-rich atmosphere. Despite this decrease in evidence, retrievals not assuming a H-rich atmosphere find that 99.9% of the atmosphere is made up of H 2 and He. By not assuming a H-rich atmosphere a priori, our models are able to robustly confirm that the data corresponds to the atmosphere of a H-rich planet. Our results indicate that assuming a H-rich atmosphere is appropriate for the spectrum of HD 209485b as expected. These results demonstrate for gas giants that both approaches, assuming a H-rich atmosphere or not, are consistent and that the retrieved parameter estimates are robust against either methodology.

Assessing the Highest Evidence Model
The highest evidence model (i.e., model 3, H-rich assumption) results in retrieved abundance estimates for H 2 O, Na, and K that are consistent with previous results (e.g., −1.01 for the H-rich case). This may indicate a tendency to over estimate the Rayleigh-enhancement factor in the hazes when using model 1 (e.g., MacDonald & Madhusudhan 2017). If true, this possibility must be accounted for when studying the nature of super-Rayleigh slopes as performed in recent studies (e.g., Ohno & Kawashima 2020). Similarly, although consistent with each other, the retrieved median value for the scattering slope γ is higher for model 1 than for models 2 and 3. The constraints from the H-rich retrievals are γ = −14.04 +4.53 −2.60 for model 3. We note that the interpretation of the Rayleigh-enhancement factor (a) should be done in conjunction with the value for the scattering slope (γ) as these parameters are correlated. Lastly, the retrieved cloud-top pressure for the gray cloud deck is consistent within 1σ between all approaches with a retrieved value of log 10 (P cloud )=−4.72 +0.99 −0.84 for the model with the highest evidence.
Finally, we compare model 2 to model 3. The retrieved parameters are consistent between the two approaches and have similar precisions. Due to their similar performance and relatively small difference in model evidence, we consider both approaches interchangeable for the effects of this work. In what follows we consider model 2 (i.e., one sector for a clear atmosphere, one sector for clouds only and one sector for hazes only) as our preferred model due to its smaller number of parameters and similar performance to model 3 (i.e., full inhomogeneous cloud and haze prescription). We utilise model 2 as our approach for inhomogeneous cloud and hazes in the remaining of the results section unless otherwise stated.

Testing Multiple Nested Sampling Algorithms
We validate the different nested sampling algorithms in Aurora by performing retrievals using the same model and the same data. We discuss three nested sampling algorithms in section 2.3.1. Four retrievals are performed, one using MultiNest, one using PolyChord, one using Dynesty in its static nested sampling mode, and one using Dynesty in its dynamic nested sampling mode. We use the observed transmission spectrum of K2-18b from Benneke et al. (2019b) including K2 band photometry, HST-WFC3 G141 grism spectra, and Spitzer IRAC photometric observations. The model considers an isothermal and clear atmosphere. We assume a H-rich atmosphere and consider the absorption due to H 2 -H 2 and H 2 -He CIA, H 2 O, CH 4 , NH 3 , CO and CO 2 . In total, the model has 7 free parameters: 5 molecular species, 1 parameter for the temperature of the isotherm, and 1 parameter for the reference pressure. The retrieved parameters are used to produce the forward models in sections 2.4.3 and 2.4.4.
When initialising the nested sampling algorithms, different parameters responsible for the algorithm's settings can be modified. Examples of such parameters are the maximum number of iterations in the sampling algorithm (PyMultiNest), parameters to increase the number of posterior samples produced (PyPolyChord), or the maximum number of likelihood evaluations before terminating (Dynesty). We keep most settings for the nested sampling algorithms to their default values. We only modify parameters needed for a direct comparison, e.g., the number of live points used to sample the prior distributions.
MultiNest was setup with 2000 live points. PolyChord was also setup with 2000 live points and 7 repeats. The number of repeats is specific to PolyChord's settings and it corresponds to the length of the sampling chain used to generate a new live point. The longer the chain, the less correlated the live points and the more reliable the evidence inference is, however, the run takes longer to be completed. The default value for the number of repeats used by PolyChord is 5× the number of dimensions in the problem; that is 5× the number of model parameters. Since we do not need an estimate for the model evidence in this exercise, as we are not comparing the model evidence between samplers, we do not need to choose a significantly larger number of repeats. We find that for our atmospheric model with 7 free parameters (i.e., 7 dimensions), our choice of 7 repeats (i.e., 1× the number of dimensions) is sufficient. For Dynesty, two separate runs were performed: one using static nested sampling, and the other using dynamic nested sampling. For the static Dynesty run we used 2000 live points. Similarly, the dynamic Dynesty run had an initial number of 2000 live points. The Dynesty runs were set up to generate the new live points using multi-ellipsoidal decomposition with uniform sampling, this is so that their sampling methods were similar to MultiNest. Figure 9 presents the retrieved spectra for the data of K2-18b when using each of the different nested sampling algorithms in Aurora, along with the posterior distributions for the parameters of interest when comparing to previous works (e.g., Benneke et al. 2019b;Madhusudhan et al. 2020). The complete list of retrieved parameters is shown in Table 1.
All four nested sampling algorithms included in Aurora retrieve values consistent with each-other and with previous works. The retrieved H 2 O abundance for MultiNest is log 10 (X H2O ) = −2.28 +1.16 −1.15 , for PolyChord it is log 10 (X H2O ) = −2.21 +1.24 −1.20 , for Dynesty in its static run is log 10 (X H2O ) = −2.29 +1.20 −1.15 , and for Dynesty in its dynamic run is log 10 (X H2O ) = −2.32 +1.20 Comparing between the four sampler setups, the posterior distributions obtained for the parameters studied are largely consistent with each other. The retrieved precision on the model parameters is consistent between samplers, with a precision on the H 2 O abundance of ∼ 1.2 dex.
Our comparison shows that despite the differences in sampling algorithms, the parameter posterior distributions are mostly independent of the method employed for a problem of this dimensionality (i.e., 7 model parameters) and with current data quality. We assess the different run times for each of the nested sampling algorithms by performing these retrievals on one thread of a 4 core Intel Core i7-4700MQ CPU at 2.40GHz. The fastest Nested algorithm under these conditions was MultiNest, followed by the static run of Dynesty (∼ 3× longer), the dynamic run of Dynesty (∼ 5× longer) and Polychord (∼ 6× longer). These run times are not necessarily representative of the full potential of each sampler. Different parameters in the setup of each algorithm can result in different run times.
In general, we have shown that Aurora includes the capabilities to retrieve the atmospheric properties of an exoplanet spectrum using different nested sampling algorithms. For current data and models, MultiNest is an optimal tool that retrieves the posterior distribution of our parameters efficiently. As data increases and possible degeneracies between the parameters in our models are exacerbated, or as the dimensionality of our problems and models increases, PolyChord and Dynesty are tools that offer alternative ways to perform retrievals. The user in Aurora has the freedom to choose the optimal tool for the problem at hand.

Application to Mini-Neptune K2-18b
We validate Aurora's retrieval framework on current observations of K2-18b (Benneke et al. 2019b) including K2, HST-WFC3, and Spitzer, spectro-photometric data. Unlike the previous section, we perform retrievals not assuming a H-rich atmosphere. Using a full Bayesian approach, we test the validity of previous assumptions of a H-rich atmosphere when analysing the most recent transmission spectrum of this mini-Neptune. Then, we perform retrievals on HST-STIS and JWST-NIRSpec synthetic observations of the same planet and assess the constraints on the chemical abundances and cloud/haze properties.

Case Study: Current Observations of K2-18b
While previous studies have assumed that K2-18b has a H-rich atmosphere (e.g., Tsiaras et al. 2019;Benneke et al. 2019b;Madhusudhan et al. 2020), the robustness of this assumption has remained untested. Here, we apply the new functionality of Au-rora, to retrieve atmospheric properties of an exoplanet without assuming a H-rich atmosphere to the broadband transmission spectrum of K2-18b. As discussed in sections 3.2 and 3.3, we include spectroscopic observations from HST-WFC3 G141 grism (1.05-1.7 µm), as well as photometric data in the Spitzer IRAC 3.6 and 4.5µm bands, and K2 optical bands (0.43-0.89µm) from Benneke et al. (2019b). We also redo an analysis assuming the planet has a H-rich atmosphere and with more model considerations than the results in section 3.2. By performing both sets of retrievals we can compare their model evidences and assess if the assumption of a H-rich atmosphere is preferred by our retrievals. Furthermore, we expand on previous studies and consider the possibility of O 2 and O 3 absorption for illustration.
The retrievals on the full broadband spectrum of K2-18b consider absorption due to H 2 O, N 2 , CH 4 , HCN, NH 3 , CO, CO 2 and H 2 -H 2 and H 2 -He CIA. A second set of retrievals expands the number of absorbers Note-All retrievals were computed using MultiNest. N/A means that the parameter in question was not considered in the model by construction.
included by considering O 2 and O 3 absorption. Our models employ a full parametric P-T profile, include the presence of H 2 -Rayleigh scattering, and follow our new inhomogeneous cloud and haze treatment using two distinct cloud/haze sectors (i.e., model 2 in section 3.1). We perform retrievals by assuming the atmosphere is Hrich as well as relaxing this assumption. The retrieved parameters are shown in Table 2. Figure 10 shows the retrieved spectra and a subset of the retrieved posterior distributions for the highest evidence models assuming a H-rich atmosphere and not assuming a H-rich atmosphere. We first assess the retrievals when assuming a H-rich atmosphere. . These cases do not consider O2/O3 as these molecules were not preferred by the data (see Table 2). Bottom row shows the posterior distributions for H2+He, H2O, CH4, and NH3 for the two retrievals.
The retrieved parameters when not assuming a Hrich atmosphere are consistent, within 1σ, with the retrieved parameters when assuming a H-rich atmosphere discussed above. Although consistent, this second approach results in wider and higher abundance estimates for all the chemical species considered. The retrieved H 2 O abundances have median values almost 1 dex higher than those obtained when assuming a H-rich atmosphere. These retrieved abundances are log 10 (X H2O ) = −1.20 +1.15 −1.81 (not considering O 2 or O 3 ) and log 10 (X H2O ) = −1.22 +1.18 −2.03 (considering O 2 and O 3 ). Despite the higher H 2 O abundance estimates, the retrievals indicate that the main component of the atmosphere is H 2 and He with retrieved abundances of log 10 (X H2+He )=−0.09 +0.09 −5.71 (not considering O 2 or O 3 ) and log 10 (X H2+He )=−0.14 +0.13 −6.64 (considering O 2 and O 3 ). The retrieved H 2 +He abundance estimates correspond to a median of ∼72-81%, and allow for H 2 +He abundances of less than 1% within 1σ as shown in Figure  10. Assuming a solar He/H 2 ratio of 0.17 (Asplund et al. 2009), the retrieved median H 2 +He abundance estimate of log 10 (X H2+He )=−0.09 (H 2 and He volume mixing ratio of ∼81%) indicates a log 10 (X H2 )=−0.16 (H 2 volume mixing ratio of ∼69%) and log 10 (X He )=−0.93 (He volume mixing ratio of ∼12%).
All other chemical abundances are poorly constrained with most uncertainties greater than 3 dex. Similarly, the cloud and haze parameters remain unconstrained. Overall, retrieving the main gas constituent in the atmosphere of K2-18b using current observations results in a H 2 and He-rich atmosphere (∼72-81% median volume mixing ratio) with strong H 2 O absorption (∼6% median volume mixing ratio), consistent with previous retrieval studies (e.g., Benneke et al. 2019b;Madhusudhan et al. 2020).
The highest model evidence corresponds to the retrieval assuming a H-rich atmosphere and not considering absorption due to O 2 or O 3 . Neither approach, assuming a H-rich atmosphere or not, favours the presence of O 2 and O 3 absorption in the atmosphere of K2-18b. In the H-rich approach, the additional parameter space due to considering the presence of these extra two absorbers dilutes the model evidence to a 1.17σ equivalent level. Likewise, the non H-rich approach results in a decrease in model evidence equivalent to a 1.54σ level when considering absorption due to O 2 or O 3 .
Increasing the parameter space to retrieve the abundance of H 2 and He results in a decrease in model evidence. When not considering O 2 and O 3 absorption, the model evidence for the H-rich assumed retrieval is higher at a 2.68σ level compared to the retrieval without a priori assumptions on the bulk composition of the atmosphere. Similarly, the H-rich assumption is preferred at a 2.79σ level over the non H-rich assumed model when considering absorption due to O 2 and O 3 . This preference of almost 3σ for the H-rich model should not be interpreted as a detection of a H-rich atmosphere on K2-18b but instead must be understood as an indication that the additional parameter space explored by the non H-rich approach is of lower likelihood. On the other hand, the fact that both retrieval approaches infer a H-rich atmosphere can be interpreted as a demonstration that current data favours a H-rich atmosphere on K2-18b.

Future Spectroscopic Observations: K2-18b
In order to investigate the abundance constraints that may be possible with future observations, we generate synthetic HST-STIS and JWST-NIRSpec observations of K2-18b based on the retrieved median H 2 O abundance for our highest evidence model in section 3.3.1. We choose abundances for CH 4 and NH 3 that are ∼ 1× solar (log 10 (X CH4 ) = −3.3, log 10 (X NH3 ) = −3.9, e.g., Woitke et al. 2018;Madhusudhan et al. 2020), consistent with their apparent depletion relative to the retrieved ∼ 10× solar H 2 O abundance (see section 3.3.1, e.g., Madhusudhan et al. 2020). The input model also includes absorption due to HCN, CO, and CO 2 , with an input nominal abundance of 1 ppm. We generate a model spectrum at a constant spectral resolution of R=5000 between 0.3 and 5.5µm. Given that current observations of K2-18b do not place strong constraints on the presence of clouds and hazes, we use input values for the cloud and haze prescription that fall within 1σ of the retrieved parameters in section 3.3.1. These input parameters are a Rayleigh enhancement factor a = 10, a slope γ = −10, a gray cloud deck with a top pressure in bar of log 10 (P cloud )=-1.6, and a 25% cover due to the hazes and 30% cover due to clouds. The input P-T profile is set by the retrieved parameters for the highest evidence model in section 3.3.1.
The synthetic JWST observations are generated using PANDEXO . We generate observations for a transmission spectrum of K2-18b observed with JWST-NIRSpec using its three highresolution gratings (G140H/F100LP, G235H/F170LP, and G395H/F290LP) in the subarray SUB2048 mode, i.e., a total of 3 transits. Further details about the model inputs to PANDEXO are described in appendix D.1. We also model synthetic HST-STIS observations covering the optical wavelengths from ∼ 0.3−1.0µm. Comparing an observed HST-WFC3 transmission spectrum of K2-18b (Benneke et al. 2019b) with that of HD 209458 b (Deming et al. 2013) it is seen that 9 transits of K2-18b provide data of comparable quality, in terms of precision per spectral bin, to 1 transit of HD 209458 b. Since there are no HST-STIS observations of K2-18b available, we derive a synthetic HST-STIS spectrum of K2-18b by scaling the uncertainties and resolution from an observed HST-STIS spectrum of HD 209458 b (Sing et al. 2016) in the same proportion as that of the HST-WFC3 spectra between the two planets. We note that the resulting synthetic observations of K2-18b would require a significantly larger number of transits with HST-STIS than the 9 observed with HST-WFC3. Nevertheless, we consider this optimistic scenario as a test case to demonstrate our retrievals. Our synthetic observations have Gaussian-distributed uncertainties with a mean precision of ∼72 ppm in the STIS G430L band and ∼71 ppm in the STIS G750L band.
The resulting synthetic observations are shown in Figure 11. The synthetic HST-STIS and JWST-NIRSpec observations provide a spectral range of ∼ 0.3-5.0µm, encoding information about the presence of clouds, hazes and absorption due to different species like H 2 O, CH 4 , and NH 3 . We perform a retrieval on these observations considering absorption due to H 2 O, CH 4 , HCN, NH 3 , CO, CO 2 , N 2 , O 2 , O 3 , and H 2 -H 2 and H 2 -He CIA. We employ the same cloud and haze prescription employed in section 3.3.1. We do not assume the bulk composition of the atmosphere and retrieve it instead. Figure 11 shows the retrieved posterior probability distributions for H 2 +He, the detected species H 2 O, CH 4 , and NH 3 , and some relevant cloud/haze parameters. The full posterior distribution is shown in the appendix E. The bottom half in Figure 11 also shows (in gray) the probability distributions obtained with current K2, HST-WFC3, and Spitzer spectro-photometric observations (first column in Table 2). Comparing both gray and green probability distributions it is possible to appreciate that abundance estimates will be largely improved using JWST observations. For the assumed synthetic model and data considerations, we retrieve the abundances to be consistent with the input values at: log 10 (X H2O ) = −1.66 +0.50 −0.55 , log 10 (X CH4 ) = −2.94 +0.35 −0.37 and log 10 (X NH3 ) = −3.79 +0.36 −0.40 . The corresponding detection significances of the molecules are ∼5σ, ∼7σ, and ∼3σ for H 2 O, CH 4 and NH 3 respectively. In principle, even better abundance precisions and detection significances may be attained by combining with other observations (e.g., JWST-MIRI, JWST-NIRISS, HST-WFC3), or considering data of higher resolution. We also note that these precisions and detection significances are dependent on the input model assumptions: ∼10×solar H 2 O and ∼1×solar CH 4 and NH 3 . Never-theless, these results demonstrate the capability of Aurora to precisely retrieve the true input values of a mini-Neptune like K2-18b.
Furthermore, the retrieval on synthetic data demonstrates Aurora's ability to retrieve the bulk atmospheric composition of a mini-Neptune like K2-18b. With a retrieved abundance of log 10 (X H2+He ) = −0.013 +0.010 −0.024 , Aurora correctly identifies H 2 and He as the bulk composition of the atmosphere as shown in Figure 11. The retrieved median abundance indicates that H 2 and He account for more than 97% of the atmosphere's composition, consistent with the input model. Future observations with JWST will make it possible to unequivocally retrieve the bulk gas composition in the atmosphere of K2-18b improving on present day estimates derived using current K2, HST-WFC3, and Spitzer observations.
The cloud and haze parameters in the input model are motivated by current constraints on K2-18b using existing data, as discussed above, which indicate a relatively cloud/haze free atmosphere. Under these cloud/haze assumptions, the retrieved abundance estimates and their detection significances are not strongly affected when only JWST-NIRSpec observations are considered in our retrievals. The retrieved cloud and haze parameters are mostly unconstrained, consistent with the cloud/haze free input model, and similar to constrains obtained with current data (e.g., gray posterior distributions in Figure  11). Even when both HST-STIS and JWST-NIRSpec observations are considered, the cloud/haze constraints are only marginally improved, as shown in Figure 11 and appendix E, as expected considering the low cloud/haze cover in the input model. Regardless of the cloud/haze constraints, however, the chemical abundances are still derived precisely as discussed above.
In principle, further spectroscopic observations, including those with other JWST instruments like NIRISS and MIRI, could help obtain better constraints than those reported here. At the same time, it could also be important to revisit the model assumptions in present retrievals (e.g., by considering higher dimensional models, temporal variability, etc.) when confronted with observations of higher quality (e.g., higher-resolution, better signal-to-noise, broader wavelength coverage) expected in the near future. We discuss these implications and the prospect for future works in section 4.

Application to Rocky Exoplanets.
We demonstrate Aurora's capabilities to identify the composition of atmospheres which are not H-rich. We use synthetic JWST-NIRSpec observations of the rocky exoplanet (i.e., terrestrial-size exoplanet) TRAPPIST- Agol et al. 2020) to validate Aurora's retrieval capability for H-poor atmospheres. Of all seven TRAPPIST-1 planets, TRAPPIST-1 d is the closest to Earth in terms of insolation (see Figure 12). This makes TRAPPIST-1 d an appealing candidate for characterisation with JWST, especially in the context of planets residing in their habitable zone. This opportunity has been recognised by the JWST Guaranteed Time Observations (GTO) program by planning to observe 2 transits of the planet using NIRSpec prism (GTO 1201, PI: David Lafreniere). Figure 12 shows the planet radius versus stellar insolation for the planets in the TRAPPIST-1 system. When compared to planets in the solar system, TRAPPIST-1 d falls between Venus and Earth in terms of insolation. As such, we consider two possible model configurations for our application of Aurora: a CO 2 -rich atmosphere (e.g., loosely similar to Venus' atmosphere), and an N 2rich atmosphere (e.g., loosely similar to Earth's atmosphere with enhanced O 3 ). The CO 2 -rich atmosphere is composed of 98% CO 2 , 1% H 2 O, 11.7 ppm H 2 +He, and N 2 in the remaining percentage (∼ 0.99%). The N 2 -rich atmosphere is composed of 77.51% N 2 , 21.38% O 2 , 1% H 2 O, 0.1% CO 2 , and 0.01% O 3 . We note that  this O 3 abundance is ∼ 10 − 100× higher than present day Earth's atmospheric abundance in the stratosphere (e.g., Anderson 1987; ). The atmospheres are modelled to follow an isotherm at 250 K. We considered three cases for each composition: (1) a clear atmosphere, (2) an atmosphere with a gray cloud deck covering the entire planet at a cloud top pressure P cloud =0.1 bar, and (3) a gray cloud deck covering the entire planet at a cloud top pressure P cloud =1 mbar. The model spectra are generated at a constant resolution of R=5000 between 0.3-5.5µm. Besides demonstrating Aurora's capabilities, we explore the number of JWST transits required for the spectroscopic observations to provide chemical detections and abundance constraints of TRAPPIST-1 d's atmosphere. We consider observations from 1, 2, 3, 5 and 10 JWST transits. We consider 2 transits motivated by the upcoming GTO 1201 program, and an upper limit at 10 transits based on estimates for characterizing the TRAPPIST-1 system from Batalha et al. (2018) (see section 4.2). We empirically find that 10 transits with JWST-NIRSpec prism can provide chemical constraints of ∼ 1 dex or better and robust detections ( 3σ) for multiple chemical species in both the CO 2 -rich and N 2 -rich model atmospheres. The synthetic observations are generated using PANDEXO ) for the NIRSpec prism using subarray SUB512. The synthetic observations include Gaussian white noise. Additional details are described in appendix D.2.
The synthetic observations are generated using models that consider CO 2 -CO 2 and N 2 -N 2 collision induced absorption, as well as Rayleigh-scattering due to N 2 , CO 2 , and H 2 O. The models in the retrievals do not assume a H-rich atmosphere and consider absorption due to H 2 O, CO 2 , N 2 , O 2 , and O 3 . Including the effects of H 2 -H 2 and H 2 -He CIA, and following the description for non H-rich atmospheres in section 2.2, the retrieval considers a total of 6 chemical components: H 2 +He, H 2 O, CO 2 , N 2 , O 2 , and O 3 . The model also considers one parameter for an isothermal temperature profile, and one parameter for P ref . We do not consider the full cloud and haze prescriptions presented in this work due to the lack of observations in the optical required to robustly constrain the presence of hazes. However, we consider the presence of inhomogeneous clouds using two parameters: one for the cloud cover (φ clouds ) and one for the cloud top pressure (log 10 (P cloud )). In summary, the retrieval model has a total of 10 parameters: 6 parameters for the chemical components, 1 parameter for the isothermal temperature, 1 parameter for the reference pressure, and 2 parameters for the presence of inhomogeneous clouds. We begin by analysing the results from our exploration of the number of transits required to characterise TRAPPIST-1 d's atmosphere. Figure 13 shows a summary of the chemical detections for the various numbers of transits considered. We perform this exploration following a conservative approach in which any model preference < 2σ does not constitute a detection (red squares), model preferences > 2σ and < 3σ are suggestive of a chemical detection (yellow squares), and model preferences > 3σ may be considered detections (green squares). Our search suggests that for a CO 2 -rich clear atmosphere, 10 transits with JWST-NIRSpec will be able to provide detections of CO 2 and H 2 O. Likewise, for an N 2 -rich clear atmosphere 10 transits would provide detections of H 2 O, and possibly O 3 if present at enhanced levels (10-100× Earth levels) as assumed in the input model described above. For the N 2 -rich atmosphere, although N 2 is found to be the main atmospheric component, its lack of spectral features makes its robust detection difficult.
However, considering clear atmospheres only results in optimistic estimates that may be revised when considering the presence of clouds and hazes. We consider observing the cloudy cases described above using the same number of transits (right columns of Figure 13). As expected, the presence of a cloud deck mutes several of the spectral features resulting in weaker chemical detections or non-detections.
Next, we present the retrieved constraints using Aurora for the cases with the strongest chemical detections, i.e., 10 transits for a cloud-free model, starting with a clear CO 2 -rich atmosphere. The retrieved chemical abundances of interest and retrieved spectrum are shown in Figure 14 along with the synthetic observations. The retrieved abundances for the species included in the true input model are log 10 (X H2+He ) = −5.53 +3.01 −3.02 , log 10 (X H2O ) = −1.64 +0.63 −0.92 , log 10 (X CO2 ) = −0.07 +0.06 −0.78 , and log 10 (X N2 ) = −4.27 +3.35 −3.76 . The retrieved values are consistent within ∼ 1σ of the true model input values. Aurora is capable of accurately distinguishing the main constituent of the modelled CO 2 -rich atmosphere, with the posterior distribution of CO 2 corresponding to high abundances. Furthermore, the precisions on the retrieved H 2 O abundance is 1 dex, comparable to current chemical constraints for gas giants (e.g., . Although the input model corresponds to a cloud free atmosphere, we consider in our retrieval model the possibility of inhomogenous cloud cover. The cloud parameterization retrieves a cloud cover of φ clouds = 0.37 +0.39 −0.25 and log 10 (P cloud )= 0.09 +1.22 −3.08 , consistent with a clear atmosphere (e.g., relatively small cloud fraction cover) with the gray cloud deck placed below the expected pho-tosphere (e.g., not muting spectral features) in agreement with the input model. Overall, these results indicate that the chemical characterisation of a CO 2rich cloud-free atmosphere is possible with 10 JWST-NIRSpec transits. Under these conditions, Aurora is capable of detecting the main component of the atmosphere (CO 2 ) at 4.7σ and the trace gas (H 2 O) at 4.3σ.
We briefly mention the results from considering the more challenging scenario of a cloudy CO 2 -rich atmosphere. As described above, we consider scenarios with 100% cloudy atmospheres with cloud top pressures of 0.1 bar and 1 mbar. While both cases indicate a CO 2 rich atmosphere, the muted spectral features result in weaker detections of the main atmospheric constituent (e.g., 4.3σ for 0.1 bar and ∼ 2σ for 1 mbar, see Figure  13). Although not resulting in strong detections, Aurora can correctly identify that the observations correspond to a cloudy atmosphere, with the 1 mbar case retrieving φ clouds = 0.82 +0.13 −0.25 and log 10 (P cloud )= −2.80 +1.25 −1.24 (i.e., relatively high cloud cover fraction and a top cloud deck pressure above the expected photosphere). In agreement with other studies (see section 4.2), our results suggest that robustly characterising the atmosphere of a cloudy rocky exoplanet will need more than 10 JWST-NIRSpec transits. Next, we present the results from retrieving the clear N 2 -rich atmosphere with 10 JWST-NIRSpec transits.
Aurora retrieves abundances of log 10 (X N2 ) = −0.0037 +0.0035 −2.2195 , log 10 (X H2O ) = −3.06 +2.27 −0.81 , log 10 (X CO2 ) = −4.18 +1.97 −1.23 , and log 10 (X O3 ) = −4.12 +1.56 −0.68 , for the species with detection significances 2σ. The retrieved values are consistent within 1σ of the input parameter despite the white noise in the observations. The retrieved cloud parameters are φ clouds = 0.51 +0.31 −0.34 and log 10 (P cloud )= −0.34 +1.12 −1.86 , consistent with a cloud-free atmosphere due to the retrieved cloud deck top pressure being below the expected photosphere. Figure 15 presents the synthetic observations as well as the retrieved spectrum and the posterior distributions for the chemical species of interest.
Although the retrieval indicates that N 2 is the main component of the atmosphere, the lack of strong spectral features results in a moderate detection of N 2 with 10 JWST-NIRSpec transits. Besides identifying the main component of the atmosphere, Aurora is able to detect H 2 O (6.1σ) and O 3 ( 2σ) with 10 transits and provide constraints in their abundances to a precision of ∼1.5 dex. Although optimistic, these estimates present a tantalising prospect for the detection of possible biosignatures in habitable zone rocky exoplanets. If present in ∼10-100× higher abundance than presentday Earth stratospheric levels (e.g., Anderson 1987, see section 4.2), 10 NIRSpec prism transits could provide an initial indication of O 3 in TRAPPIST-1 d.
As performed with the CO 2 -rich case, we investigate the effect of a cloud deck at 0.1 bar and 1 mbar on the estimates above. As shown in Figure 13, the presence of a cloud deck results in weaker or no chemical detections. Nonetheless, the retrieved cloud parameters are mostly consistent with the input cloudy models, with the 1 mbar case retrieving φ clouds = 0.89 +0.07 −0.12 and log 10 (P cloud )= −4.62 +1.07 −0.87 . In agreement with the CO 2rich case, and as expected, this N 2 -rich case suggests that a cloudy atmosphere is more difficult to characterise than a clear-atmosphere.
The number of transits with JWST required for the chemical characterisation of rocky exoplanets can vary depending on the system parameters, instrument of choice, and the desired precision on the retrieved atmospheric properties. As such, our result of 10 JWST-NIRSpec transits is specific to the cases considered here.
We discuss additional considerations that could revise these results as well as future considerations in section 4.2.

SUMMARY AND DISCUSSION
In this work we introduce Aurora, a next-generation atmospheric retrieval framework for transmission spectra of a wide range of exoplanets: from gas giants with H-rich atmospheres to rocky exoplanets with secondary non H-rich atmospheres. Aurora retains the capabilities from previous retrieval codes and presents advancements for the analysis of future observations. These key advancements are: • Aurora can retrieve the bulk composition of any exoplanet atmosphere without the assumptions of a H-rich atmosphere. The retrieved parameter estimates are robust for H-rich and H-poor atmospheres. We demonstrate this on current and synthetic observations of hot Jupiters, mini Neptunes and rocky exoplanets.
• We introduce a new generalised treatment for inhomogeneous clouds and hazes. The new cloud and haze prescription explores a larger parameter space compared to previous treatments. Our new approach mitigates some biases and limitations in previous prescriptions and is robust when assuming a H-rich atmosphere or not.
• Aurora incorporates in its retrieval framework the next generation of nested sampling algorithms. These are highly adaptable and designed for handling highly degenerate problems and problems of higher dimensionality. This advancement is key in the development of multidimensional retrieval techniques and alleviates some of their computational needs.
• Aurora has a modular structure designed to evolve with the needs of future spectroscopic observations. The new modular capabilities include: -Noise modelling capabilities beyond the traditional assumed independently distributed Gaussian errors. These include the ability to retrieve an inflation for the standard deviation of observations and consider correlated noise using Gaussian processes. -Forward models considering the effects of light refraction, forward scattering, and Miescattering due to condensates.
In this work we have validated Aurora's Bayesian retrieval framework using up-to-date existing observations of the hot Jupiter HD 209458 b and the mini-Neptune K2-18b. We further validate Aurora's retrieval framework using HST-STIS and JWST-NIRSpec synthetic observations for K2-18b, and JWST-NIRSpec synthetic observations for the rocky exoplanet TRAPPIST-1 d. Our results highlight 4 findings: • For hot Jupiters, the retrieved parameter estimates are robust against assumptions of a Hrich atmosphere or not. The cloud and haze prescription introduced in this work results in a higher model evidence than previous inhomogeneous cloud and haze prescriptions when applied to current observations of the well-studied hot Jupiter HD 209458 b.
• For current observations of mini Neptunes, we have demonstrated that the atmosphere of K2-18b is H-rich. Furthermore, the nested sampling algorithms included in Aurora retrieve almost identical parameter estimates. The retrieved properties of K2-18b are consistent with previous results.
• For future observations of mini Neptunes with JWST, abundance estimates could result in precisions of ∼ 0.5 dex or better. Abundance estimates obtained with JWST observations can be robust even in the absence of observations in the optical wavelengths, for relatively low cloud/haze covers as for the case of K2-18b.
• For future observations of rocky exoplanets, Aurora can robustly identify their dominant atmospheric composition as well as reliably detect and constrain the abundance of trace gases. For example, 10 JWST transits of TRAPPIST-1 d could enable clear detections and abundance constraints of H 2 O in a cloud-free N 2 -rich or CO 2 -rich atmosphere. Furthermore, 10 JWST transits could enable initial indications of O 3 if present at enhanced levels (∼ 10 − 100× present day Earth's stratospheric abundances) in a cloud-free N 2 -rich atmosphere.
We discuss the implications of our results for the analysis of current and future observations of hot Jupiters, mini Neptunes and rocky exoplanets.

Constraining the Composition of Mini-Neptunes
Recent spectroscopic observations (e.g., Benneke et al. 2019a,b;Tsiaras et al. 2019) have demonstrated that mini-Neptunes are advantageous targets in the search of H 2 O vapour and other possible molecular features in low-mass exoplanets. The lack of an analogue for this type of planet in our solar system, represents a unique opportunity to learn about the diversity of planet configurations and compositions. Straddling the gap between terrestrial planets and ice giants, it is not always clear whether the atmospheres of some of these planets are H-rich or not. Similar to our approach in section 3.3.1, we suggest that the interpretation of future observations of mini-Neptunes and possible super-Earths should begin by not assuming a bulk atmospheric composition. One should perform an agnostic retrieval first, and retrieve the main atmospheric constituent. Then, and if the data suggests the planet's atmosphere is H-rich, a second retrieval assuming an H-rich atmospheric composition could be informative and should be performed. In this context the two retrieval approaches should be seen as complementary and informative.
Considerations about the presence of clouds and hazes in these mini-Neptunes remain to be explored. The possible absence of clouds in the observable atmosphere of temperate planets like K2-18b, as presented here and in agreement with previous studies Madhusudhan et al. 2020), represents a surprise when compared to hotter cloudy planets like GJ 1214b (Kreidberg et al. 2014b). On the other hand, the possibility of Mie scattering clouds as recently argued in the atmosphere of GJ 3470b (Benneke et al. 2019a) could indicate a complex diversity in the presence of condensates in this type of planets.
Future studies could investigate the need for more complex cloud and haze models in retrievals when interpreting observations of mini-Neptunes. For instance, while Benneke et al. (2019a) develop and implement a new Mie-scattering cloud parameterization for atmospheric retrievals in order to explain the observations of GJ 3470b,  explain the same observations without invoking Mie-scattering and implementing previous prescriptions for inhomogenous clouds and hazes. Investigating the performance of those cloud prescriptions and the one introduced here could elucidate whether the apparent drop in transit depth in the spectro-photometric observations of GJ 3470b indeed requires invoking Mie-scattering particles. Furthermore, incorporating the Mie-scattering module available in Aurora into its retrieval framework, could provide further insights into the atmospheric nature of this and other planets.
Likewise, future studies may investigate how the limitations of previous inhomogenous cloud and haze prescriptions unveiled in this work affect recent studies that investigate the influence of cloud model choices on retrieval solutions (e.g., Barstow 2020). A full discussion of the different degeneracies and biases in different pre-scriptions for the clouds and hazes is beyond the scope of this work and reserved for future investigations.
Nonetheless, in order to robustly constrain the presence and properties of clouds and hazes, spectroscopic observations in the optical wavelengths are essential (e.g., Line & Parmentier 2016;. While the results in section 3.3.2 suggest that state-of-the-art spectroscopic observations in the optical with HST-STIS may not be precise enough to robustly constraint the properties of cloud and hazes for the marginal cloud/haze cover in our K2-18b test case, achievable constrains with HST-STIS for instances with enhanced cloud and haze cover and for other mini-Neptunes remain to be explored. Future studies could also explore constraints on the properties of clouds and hazes using observations from multiple instruments on JWST, HST, and ground based facilities.

Constraining the Composition of Rocky Exoplanets
Constraining the chemical composition of rocky exoplanet with heavy mean molecular weight atmospheres, needs dedicated observational efforts. Exploratory studies using synthetic observations, as performed here and other studies discussed below, can inform the requirements for future observational campaigns. On the number of transits required for the characterisation of a TRAPPIST-1 d like planet, our results are broadly consistent with previous studies of the TRAPPIST-1 planets (e.g., Morley et al. 2017;Krissansen-Totton et al. 2018;Batalha et al. 2018;Lustig-Yaeger et al. 2019;Wunderlich et al. 2019).
For instance Batalha et al. (2018), using models for TRAPPIST-1 f that consider the presence of a gray cloud deck and an information content based approach (Batalha & Line 2017), find that the NIRSpec prism could detect and constrain the dominant atmospheric absorber in H 2 O-rich and CO 2 -rich atmospheres of rocky exoplanets by the 10th transit to uncertainties smaller than 0.5 dex. Batalha et al. (2018) argue that if the dominant absorber has not been observed by the 10th transit it is unlikely that more transits could provide more information. Naturally, our suggestion of characterising a rocky exoplanet using 10 JWST transits is valid only if a featureless spectrum has been ruled out using the first few transits.
If not pursuing a robust chemical characterisation, a fewer number of transits could help identify spectroscopic features and reject a featureless spectrum. Our results broadly agree with the study of Lustig-Yaeger et al. (2019) who find that two NIRSpec prism transits are enough to rule out a featureless spectrum for TRAPPIST-1 d, although they use a signal to noise metric and we use a Bayesian detection significance metric. Additionally, Lustig-Yaeger et al. (2019) employ atmospheric models from Lincowski et al. (2018) considering self-consistent atmospheric compositions, and find that CO 2 could be weakly detected in TRAPPIST-1 d using one transit of JWST-NIRSpec prism in a variety of O 2 and CO 2 rich atmospheres. While based on simpler atmospheric models and limited to the CO 2 -rich composition, our results suggest that for a cloud free atmosphere two transits of JWST-NIRSpec prism may suffice to provide initial detections (∼ 2σ) of the main atmospheric component in TRAPPIST-1 d.
Nevertheless, our results are limited to the specific model considerations we have investigated. Modifying our model assumptions of a cloud-free atmosphere, limited number of absorbers, lack of stellar contamination, isothermal temperature profile, amongst others, could result in a larger number of transits required for the desired atmospheric constraints. For instance, using more complex general circulation models (GCM), Komacek et al. (2020) demonstrate that ∼10 NIRSpec prism transits would be required to detect H 2 O vapour in the atmosphere of a terrestrial-size exoplanet orbiting a latetype M dwarf when ignoring the effect of clouds and using a similar signal to noise metric to Lustig-Yaeger et al. (2019). This result is broadly consistent with our estimate and that of other studies (e.g., Batalha et al. 2018). However, when the effect of clouds is considered, Komacek et al. (2020) find that 63 or more transits are required to detect water. Their results, and our exploration of cloudy models in CO 2 -rich and N 2 -rich atmospheres, suggest that the presence of clouds may significantly increase the number of transits required to detect water features.
Similarly, considering the effect of stellar contamination could significantly affect our interpretations. Recently, Rackham et al. (2018) argue that the stellar contamination impact in the transmission spectra of the TRAPPIST-1 planets can be comparable to or larger than the signal produced by an atmospheric feature. In that case, not accounting for stellar contamination could result in a false positive and be a limiting factor in obtaining reliable abundance constraints. Future studies could investigate the effect of stellar contamination in retrievals (e.g., as in Pinhas et al. 2018) for super-Earths/mini-Neptunes using the stellar heterogeneity module included in Aurora and revisit our reported estimates.
Furthermore, considering more complex atmospheric compositions with multiple absorbers as investigated by Morley et al. (2017) could better inform our esti-mates. In their study, Morley et al. (2017) use radiativeconvective models of the TRAPPIST-1 planets assuming Earth-like, Venus-like, and Titan-like atmospheres to determine the number of NIRSpec/NIRISS transit observations required to rule out a flat spectrum at ∼ 5σ confidence. Their results suggest that as few as 13 transits could rule out a flat spectrum for a Venus-like atmospheric composition on TRAPPIST-1 d. While our results seem to be broadly consistent with those of Morley et al. (2017), albeit more optimistic, the impact of non-isothermal profiles and other chemical compositions on our results remains to be investigated.
Lastly, the number of transits required for characterising H 2 O and CO 2 may not be representative of the requirements for detecting and characterising other chemical species, including possible biosignatures. For instance,  investigate the number of transits required to detect O 3 in the atmosphere of TRAPPIST-1 d. Their study assumes an atmospheric chemistry identical to Earth's present day atmosphere and employs an optimal estimation retrieval algorithm with isothermal models with clouds deep in the atmosphere where they do not have a significant effect on the spectrum. Their results suggest that present-day Earth levels of O 3 would be detectable with 30 transits of NIRSpec prism and MIRI Low-Resolution Spectrometer. Our results suggest that 10 transits of TRAPPIST-1 d with NIRSpec prism could provide initial indications of O 3 in a N 2 -rich cloud-free atmosphere if present at enhanced abundances (∼ 10−100× present-day stratospheric Earth levels, e.g., Anderson 1987; Barstow & Irwin 2016; . Future studies using Aurora could further investigate the requirements for the detection and robust characterisation of O 3 and other possible biosignatures (e.g., Krissansen-Totton et al. 2018;Wunderlich et al. 2019).
The characterisation of rocky exoplanets with JWST remains an attractive avenue in the search for atmospheric features in habitable zone planets and the search for possible bio-signatures. Although our results indicate that precise abundance constraints will be possible with the upcoming generation of telescopes, several outstanding considerations mentioned above need to be explored. Particularly the presence of clouds, hazes, and stellar contamination may present a significant hindrance in the characterisation of rocky exoplanets. If true, temperate cloud-free sub-Neptunes like K2-18b may be the best targets for atmospheric characterisation of low-mass exoplanets.

On Multidimensional Effects
Modelling the presence of inhomogeneities in the atmospheric properties of a planet requires models beyond 1-dimensional considerations. Depending on their degree of inhomogeneity, these irregularities can potentially affect the retrieved atmospheric properties. For instance, in this work we have explored how inhomogeneities in cloud and haze cover affect the retrieved chemical abundances when assuming or not a H-rich atmosphere. Aurora currently employs a combination of 1-dimensional models to capture the multidimensional effect of inhomogeneous cloud and haze cover. Nonetheless, other heterogeneities and their multidimensional nature can also affect the retrieved atmospheric properties. Recent studies have explored possible limitations of 1-dimensional retrievals in the context of transmission spectroscopy (e.g., Changeat et al. 2019;MacDonald et al. 2020;Pluriel et al. 2020).
For instance, compositional differences in the atmospheric chemistry of exoplanets is an effect largely expected for ultra hot Jupiters (UHJs) with day-side temperatures 2200 K (e.g., Arcangeli et al. 2018;Parmentier et al. 2018). These highly irradiated, tidally locked planets, can exhibit large contrasts between the atmospheric temperature of their day-side and their nightside, which in turn can result in strong variations in their atmospheric composition. Retrievals of UHJs should not assume a homogeneous chemical composition or temperature structure. Aurora's retrieval framework is currently designed for planets without strong temperature inhomogeneities across the terminator (e.g., low or moderately irradiated planets). However, we have designed Aurora with the implementation of multidimensional effects in mind.
Our current retrieval framework can be readily generalised to incorporate multi-dimensional P-T profiles and non-uniform mixing ratios, just as we have done for clouds and hazes. The inclusion of new nested sampling algorithms, optimised for the treatment of highdimensional parameter spaces and highly degenerate solutions, aids our retrieval framework in these future developments. Future works can expand the current retrieval framework to consider the effects necessary for the appropriate study of UHJs. Nevertheless, these considerations are not imperative for the planets considered in this study where large compositional/temperature gradients between the day-side and night-side of a planet are not expected.

Concluding Remarks
Currently over 50 transiting exoplanets have been observed with transmission spectroscopy and nearly 20 chemical species have been detected in exoplanetary atmospheres (e.g., Madhusudhan 2019). While observations of hot gas giants with H-rich atmospheres have been the most abundant, advancements in observing facilities (e.g., Gillon et al. 2011Gillon et al. , 2017, large observing campaigns (e.g., Kreidberg et al. 2014b;Benneke et al. 2019b), as well as the so called M-dwarf opportunity (e.g., Scalo et al. 2007;Charbonneau & Deming 2007) have allowed for tantalising transmission spectra of mini-Neptunes and super-Earths. The imminent launch of JWST and the continuous observing efforts with HST and ground facilities promise to reveal many more spectra of transiting exoplanets, including the prospect of spectral features in non H-rich atmospheres. From hot Jupiters to temperate low-mass exoplanets, their spectra could provide further insights into their formation paths, possible trends in compositions and maybe even their prospects for habitability. It is in this context that retrieval capabilities like Aurora could play an important role in accurate interpretation of spectroscopic observations. APPENDIX A. PRIORS USED IN THIS WORK Table A1. Parameters and priors used in this work.

B. ADDITIONAL FORWARD SCATTERING AND REFRACTION MODELS FOR A ROCKY EXOPLANET.
For completion, we consider the effects of wavelength-dependent refraction and forward scattering presented in section 2.4.3 on the rocky exoplanet TRAPPIST-1 d. We consider the same N 2 -rich atmospheric model used in section 3.4. Namely, we consider an atmosphere composed of 77.51% N 2 , 21.38% O 2 , 1% H 2 O, 0.1% CO 2 , and 0.01% O 3 . The model spectra are generated at a constant resolution of R=5000 between 0.3-5.5µm, and smoothed for the purposes of Figure B1.
When considering the effects of wavelength-dependent refraction, we specifically include N 2 refraction as our atmospheric model is N 2 -rich. The forward model considering the effects of forward scattering uses the same assumptions as in section 2.4.3 (i.e., g = 0.95 andω o = 1) and the planet-star orbital separation reported by Agol et al. (2020). Figure B1 shows these forward models and their respective model residuals.
These model considerations have a stronger effect in this rocky exoplanet test case than in the mini-Neptune test case presented in section 2.4.3. The impact of N 2 refraction results in a model difference of ∼ 12 ppm. On the other hand, forward scattering results in a model difference 2 ppm. The impact of these considerations is ∼ 3× stronger in this TRAPPIST-1 d case than in the K2-18b case presented in the main text. Nonetheless, these results are dependent on the assumed synthetic model considerations and should not be generalised to other test cases beyond the ones considered here.  The assumed stellar spectrum for K2-18 is interpolated from the Phoenix Stellar Atlas (Husser et al. 2013) by PANDEXO assuming T eff = 3457 (Benneke et al. 2019b), [Fe/H]=0.12 (Sarkis et al. 2018), log 10 (g)=4.858 cgs (Crossfield et al. 2016), R star = 0.4445R (Benneke et al. 2019b), and a K-band magnitude of 8.899. We assume a transit duration of 2.663 hours (Benneke et al. 2017). The in transit and out of transit fluxes are computed using this stellar spectrum and a constant transit depth model assuming a planet radius of 2.61 R ⊕ (Benneke et al. 2019b).
Our simulated observations assume a saturation limit of 80% full well and an extremely optimistic noise floor of 5 ppm. Our assumed noise floor is much smaller than expectations from Greene et al. (2016) and Beichman & Greene (2018). The upcoming JWST Cycle 1 will better inform the existence of a possible systematic noise floor. Nonetheless, given that the noise in our synthetic observations at the native resolution of the gratings is much higher than noise floor expectations, our choice of 5 ppm has little impact on our results. We generate these synthetic observations for 1 transit and the number of groups per integration suggested by PANDEXO following the optimize option. The number of groups per integration is 13, 14, and 25 for G140H, G235H, and G395H respectively. Once the observations are simulated using PANDEXO, we proceed bin every two pixels to obtain individual spectral resolution elements. Subsequently, we remove any resolution elements that fall within the gaps in the spectral configurations chosen; namely 1.31-1.35 µm for G140H/F100LP, 2.20-2.27 µm for G235H/F170LP and 3.72-3.82 µm for G395H/F290LP. After this, we remove any outlier elements with noise levels above 500 ppm for G140H and G235H, and 400 ppm for G395H. We proceed to bin the observations every 40 resolution elements, remove any binned elements that fall within the gaps in the spectral configurations chosen, and obtain an average resolution of R∼70. The synthetic observations have a mean precision of ∼ 26 ppm, ∼ 30 ppm, and ∼ 39 ppm, for G140H, G235H, and G395H, respectively.

D.2. PANDEXO Input for TRAPPIST-1 d
The assumed stellar spectrum is interpolated from the Phoenix Stellar Atlas assuming T eff = 2566 K (Agol et al. 2020), [Fe/H]=0.04, R star = 0.1170R (Gillon et al. 2017), log 10 (g)=5.2396 cgs (Agol et al. 2020) and a K-band magnitude of 10.296. We assume a transit duration of 48.87 minutes (Agol et al. 2020). The in transit and out of transit fluxes are computed using this stellar spectrum and a a constant transit depth model assuming a planet radius of 0.788 R ⊕ (Agol et al. 2020).
We maintain the assumption of a saturation limit of 80% full well and an optimistic noise floor of 5 ppm. We generate the observations for 10 NIRSpec prism transits and 2 groups per integration. The instrument configuration for PANDEXO uses aperture S1600A1 and subarray SUB512. We bin every two pixels to obtain individual resolution elements. We remove any elements with precisions larger than 300 ppm, leaving elements 0.7µm and 5.3µm.