Alfnoor: assessing the information content of Ariel's low resolution spectra with planetary population studies

The ARIEL Space Telescope will provide a large and diverse sample of exoplanet spectra, performing spectroscopic observations of about 1000 exoplanets in the wavelength range $0.5 \to 7.8 \; \mu m$. In this paper, we investigate the information content of ARIEL's Reconnaissance Survey low resolution transmission spectra. Among the goals of the ARIEL Reconnaissance Survey is also to identify planets without molecular features in their atmosphere. In this work, (1) we present a strategy that will allow to select candidate planets to be reobserved in a ARIEL's higher resolution Tier; (2) we propose a metric to preliminary classify exoplanets by their atmospheric composition without performing an atmospheric retrieval; (3) we introduce the possibility to find other methods to better exploit the data scientific content.


INTRODUCTION
In the last decade the number of known exoplanets has increased tenfold: at the end of 2009 around 400 exoplanets were known, while at the end of 2019 the confirmed discoveries reached more than 4000. This rapid increase in the exoplanetary science yield is expected to continue and it will affect not only the number of discovered planets, but also our knowledge of planetary formation and evolution. While the discoveries will increase thanks to space missions such as TESS (Ricker et al. 2016), CHEOPS (Cessa et al. 2017), PLATO (Rauer et al. 2014) and GAIA (Gaia Collaboration et al. 2016), and to ground instrumentation such as HARPS (Mayor et al. 2003), HATnet (Bakos 2018), WASP (Pollacco et al. 2006), KELT (Pepper et al. 2018), OGLE (Udalski et al. 2015), NGTS (Wheatley et al. 2013) and many others, our understanding of planets' histories can only grow through planetary composition analysis.
The most effective strategy used today to reveal the atmospheric chemistry and thermodynamics of transiting exoplanets is to use multi-band photometry and spectroscopy (e.g. Seager & Sasselov 2000;Charbonneau et al. 2005;Tinetti et al. 2007;Sing et al. 2016;Madhusudhan et al. 2012;Huitson et al. 2012;Kreidberg et al. 2014;Edwards et al. 2020;Pluriel et al. 2020b;Guilluy et al. 2021;Mugnai et al. 2021). Current instrumentation has enabled this kind of atmospheric characterisation for a few tens of exoplanets over a limited wavelength range (e.g. Sing et al. 2016;Tsiaras et al. 2018). To interpret the observed spectra, spectral retrieval techniques, often developed for the study of the Earth and Solar System planets, have flourished and were adapted to the new field of investigation (e.g. Irwin et al. 2008;Line et al. 2013;Waldmann et al. 2015b;Gandhi & Madhusudhan 2017;Al-Refaie et al. 2021). Most recently, an intense effort has been performed to compare and validate different models developed by different teams to assess potential discrepancies among them (Barstow et al. 2020), demonstrating the robustness and consistency of those models.
Tier 1 was created to deliver a reconnaissance survey where all planets are first observed at low spectral resolution, and only a subset of Tier 1 planets will be further observed to reach SNR ≥ 7 at a higher spectral resolution (Tier 2, Tier 3). Tier 1 observations have a SNR ≥ 7 when raw spectra are binned into a single spectral point in NIRSpec, two in AIRS-CH0 and one in AIRS-CH1, for a total of 4 spectral and 3 photometric data points. For ∼ 50% of total observed planets, Ariel will provide spectra at Tier 2 resolution. In this Tier, raw spectra are binned at, respectively, R = 10, 50, and 15 in NIRSpec, AIRS-CH0 and AIRS-CH1, with a SNR of 7 or larger. Tier 3 is meant to provide spectra with SNR ∼ 7 for 5 to 10% of the total observed targets. In this Tier the raw spectral data are binned at R = 20, 100, 30 in NIRSpec, AIRS-CH0 and AIRS-CH1, respectively. Finally, Tier 4 is conceived for bespoke or phase curves observations. Among the main goals of Tier 1 observations is that to identify planetary spectra that show no molecular absorption features, and to select those to be re-observed in the successive Tiers.
The aim of this paper is threefold: 1. to show the capability of selecting the planets with featureless spectra, that may not be observed again in successive Tiers, without involving retrieval techniques; 2. to introduce a metric and show its principal applications as a tool to classify Tier 1 observed planets on their molecular content, to aid in the selection of targets to be re-observed in successive Tiers; 3. to show other strategies to exploit Ariel Tier 1 data are possible such as those based on Machine learning.
In Sec. 2 we present our strategy to address these three goals. Our new software, Alfnoor, able to build entire planetary populations is presented in Sec. 2.1. Then we discuss the targets chosen to build the populations and the atmospheric properties used in Sec. 2.2. In the same section, we also describe a method to identify the flat spectra in the sample (Sec. 2.3), which is the first paper goal. This method results are then described in Sec. 3.1. Then we describe the metric developed as mentioned in the second goal of this paper (Sec. 2.4) and we introduce a classification algorithm to compare the metric with (Sec. 2.4.1). We present in detail the results obtained by our algorithm (Sec. 3.2), we show the relation between the metric and the input molecular abundances in the planets, and we discuss biases and limitations. Finally, we provide a preliminary assessment of the application of Machine and Deep Learning techniques to the problem of spectra classification in Sec. 2.5, discussing their performance in Sec. 3.3, but leaving a more thorough investigation to future work. In Sec. 4 we discuss and compare the results in details. Ariel will provide a sample of hundreds of planetary spectra. To simulate this data set we develop a new algorithm: Alfnoor, the thousand lights simulator, which was also used for Tier 2 data in Changeat et al. (2020). Alfnoor is a wrapper of TauREx 3 (Al-Refaie et al. 2021) and ArielRad . TauREx 3 is a complete rewrite of the atmospheric retrieval code TauREx (Waldmann et al. 2015a,b). ArielRad is the Ariel radiometric model: a software that, given the Ariel payload and mission strategy descriptions, can simulate the signal propagating from a candidate target through the instruments, and return the expected instrument noise. ArielRad, therefore, can compute the number of observations needed to match each of the Ariel Tier requirements (to reach a minimum SNR=7 at the Tier spectral resolution).
By combining the two software, Alfnoor produces the atmospheric high resolution forward model of a planet with TauREx 3, it bins down the spectrum to the Ariel Tier wavelength grid and adds the expected noise estimated by ArielRad. Consequently, Alfnoor returns a simulation of the planet spectrum as observed in each of the Ariel mission Tiers. Iterating this procedure for different planets or compositions, Alfnoor automates the process of building entire planetary populations and therefore a data set that is representative of the one Ariel will provide.
The Alfnoor and the ArielRad tools are not publicly available, currently. However, both TauREx 3 23 and a generic radiometric simulator called ExoRad 2.0 45 , are publicly available on GitHub and PyPI. ArielRad is ExoRad 2.0 configured for the Ariel payload.

Planetary populations
2 https://github.com/ucl-exoplanets/TauREx3 public 3 https://pypi.org/project/taurex/ 4 https://github.com/ExObsSim/ExoRad2-public 5 https://pypi.org/project/exorad/ To build a diverse sample of planets in terms of masses, radii and temperatures, we use the Ariel candidates list of Edwards et al. (2019). This list contains 1000 planets, selected from both NASA's Exoplanet Archive and TESS predicted discoveries, and covers a wide range of planetary radii (from ∼ 0.4 to ∼ 27 R ⊕ ), masses (from ∼ 0.01 to ∼ 3000 M ⊕ ) and equilibrium temperatures (from ∼ 200 K to ∼ 3900 K). From that list, we extract the parameters listed in Tab. 1. Our goal is not to reproduce accurately the composition of the planets in that list, but to test a diverse sample, and therefore we randomly build an atmosphere for each of the listed targets. We produce three planetary populations that will be of use for this work. We call them POP-I, POP-II and POP-III.
POP-I -For each planet we randomise the equilibrium temperature, choosing a value between 0.7×T p and 1.05 × T p , where T p is the planet equilibrium temperature in Edwards et al. (2019). This randomisation is biased toward lower temperature values as we probe the terminator region, where the spectral features are affected both by the day side and the night side temperatures (Caldas et al. 2019;Pluriel et al. 2020a;Skaf et al. 2020). The temperature randomisation range is consistent with the work presented in Changeat et al. (2020).
Then, for each planet we consider an isothermal temperature-pressure profile; we add a constant vertical chemical profile (Moses et al. 2011) for every molecule from a list of selected molecules (the abundances are randomised according to defined boundaries). Finally, we add randomly generated grey opaque clouds. We use the plane-parallel approximation, building 100 plane-parallel layers to uniformly sample in log-space the pressure range 10 −4 → 10 6 Pa. Every atmosphere is built with randomised relative abundances of CH 4 , H 2 O, CO 2 and NH 3 on a uniform logarithmic scale between 10 −7 and 10 −2 . Such a large range allows us to explore the sensitivity of our developed method to very different abundances. We also randomised the cloud surface pressure varying between 5 × 10 2 and 10 6 Pa, similarly to what presented in Changeat et al. (2020), to explore the whole range from overcast to cloud-free atmospheres respectively. Using these boundaries, we obtain that ∼ 40% of the atmospheres in the populations contains clouds to at least 10 4 Pa (surface pressure), as expected from Tsiaras et al. (2018) and Iyer et al. (2016). Every planet is considered filled with a H 2 and He atmosphere with mixed ratio He/H 2 = 0.17. A list of the opacities used in this work is reported in Tab. 2.
As already mentioned, following the aims of this paper, we don't focus on the consistency of the atmospheric models used to build the population. The spectra generated will only be used as  "transmission spectral shapes" to test our methods against. No information other than the planet transmission spectrum is used in this work. Each planetary spectrum generated by Alfnoor is binned at Ariel 's Tier 3 spectral resolution. These spectra make up the "noiseless spectra" data set. ArielRad then predicts the noise for each spectral bin at the Tier resolution. To reproduce a Tier 1 observation we scatter the data around the true value according to a normal distribution with the mean coinciding with the simulated spectrum, and a standard deviation equal to the noise estimated with ArielRad at each spectral bin. This noise is a re-scaled version of the Tier 3 noise, obtained by combining the number of transit observations needed to match the Tier 1 required SNR. Using these scattered spectra, we build the "observed spectra" data set. Examples of the resulting spectra are shown in Fig. 1.
We generate POP-I using the full 1000 planets candidate list and we produce one realisation for each planet. A similar approach was used by Changeat et al. (2020) in their investigation of the Ariel Tier-2 observations. We use the POP-I population to test the strategies described later in the text.
POP-II -We produce another data set keeping the same 1000 planets from the target list and the randomisation rules of POP-I. However, this time we modify the chemical composition to include only H 2 O and CH 4 . We use POP-II to perform tests against a simpler population, as detailed later in the text.
POP-III -To build the last population, we use the same list of 1000 planets, where each planet is repeated 4 times, such that there are 4 randomised atmospheres for each unique set of stellar and planetary properties that defines a planet. While the temperature and clouds conditions used are the same as those discussed for POP-I, for each molecule we widen the abundance boundaries to 10 −9 → 10 −2 on a uniform logarithmic scale. We call this population POP-III, and we use it to train our machine learning algorithms.

Flat planet detection
The first goal of this work, as listed in Sec. 1, is to identify featureless spectra. This will help in the selection of targets to be re-observed in Ariel 's higher Tiers. Given the property of the Ariel payload, we divide the spectral wavelength range in four parts or bands: • from 0.5 to 1.1 µm, sampled by three photometers; • from 1.1 to 1.95 µm, corresponding to the NIRSpec wavelength range; • from 1.95 to 3.9 µm, corresponding to the AIRS-CH0 wavelength range; • from 3.9 to 7.8 µm, corresponding to the AIRS-CH1 wavelength range.
For every planet, and for every band we estimate a χ 2 using all measurements in the band to assess the compatibility with a flat, zero-gradient line: for each planet there are four χ 2 estimates, one for each band above. We reject the hypothesis of spectral flatness in a given band with a 3−σ confidence if χ 2 > 1 + 3 2 ν , where ν are the degrees of freedom. Therefore, if any of the four bands has a χ 2 smaller than this number, we mark the band as flat. If a planetary spectrum has all 4 bands marked as flat, it is classified as a flat spectrum. This strategy is similar to that presented in Zellem et al. (2019), however, while in that work the authors were only focused on the Ariel FGS channels, here we are considering the full Ariel spectral coverage.

An optimised molecular metric
The second goal listed in Sec. 1 is to develop a metric, M mol , to assess the presence of a molecule, mol, in the planets atmosphere. We want this metric to work in such a way that by comparing two molecules, the metric produces a diagram similar to that in Fig. 2. In the diagram we can distinguish four regions: two regions where the atmospheres are rich in a single molecule and therefore only show its characteristic features; a third region where the atmospheres show features from both molecules; a fourth region where features are absent, either because the planets have flat spectra or because the features from both molecules do not emerge from a thick layer of clouds.
To compare different planets and constrain their atmospheric molecular content, the metric should be (i) sensitive to the spectral signature of molecules, (ii) independent of the planet size, and (iii) independent of the scale height. Here we present a metric that fulfils these 3 conditions and we show its current limitations.
For each molecule, we select N bands within the Ariel wavelength range, where the molecular features in the transmission spectrum are strong. Then, for each planet, we compute the average in each band, S band i and its dispersion, σ band i . Figure 2: Illustration of the diagram we expect to build with our metric. Here, the metric is used to compare two molecules, mol1 and mol2. By drawing M mol1 versus M mol2 , we aim to separate four different regions: one rich in the first molecule at the top left (green), where M mol1 grows and M mol2 is low: a similar region at the bottom right (blue), where the planet atmosphere is rich in the second molecule, because M mol2 is high and M mol1 is low; a region where molecular poor planets are located (grey), or those that have no features in the considered bands, where both M mol1 and M mol2 are low; a region for mixed atmosphere (yellow) in the central portion of the diagram.
where M is the number of spectral bins in the band, S j is the atmospheric transmission spectrum estimated in the j th wavelength bin. We do the same with a control band where we know there are no major molecular features from the molecule considered, called "normalisation band", obtaining S norm and σ norm . We select a different normalisation band for each molecule (Tab. 3).
Thus, for each molecule, mol, we define Defined in this way, M mol is similar to a signal-to-noise ratio, where the signals are the molecular features arising above the "normalisation band", and the noise is the dispersion in the band. Therefore, The metric thus designed, by averaging the contribution of N different bands, corresponding to N different features of the same molecule, reduces the chance to be misled by overlapping features in one of the bands considered. As Ariel 's Tier 1 is optimised for low resolution spectroscopy, spectral binning increases the SNR. Also, this metric is (i) sensitive to the presence of molecules, (ii) independent of the planet size, and (iii) independent of the scale height (see Appendix A for details), at the cost of the introduction of a bias: eq. 2 provides an estimate of the spectral dispersion when applied to noiseless spectra, and it is larger for observed spectra because of the presence of measurement noise. Therefore, the absolute value of M mol of eq. 3 is always smaller on observed spectra compared to noiseless spectra of the same planet. While the bias effects are further discussed in Sec. 4.1, we note here that a detailed characterisation of the instrumental noise would allow to de-bias the metric, but we leave this investigation to future work, and we focus the attention on the performance of the metric in extracting information from Tier 1 observations. To maximise the metric efficiency, the challenge is to identify the best performing wavelength range to use: large enough to reduce the uncertainty introduced by the observational noise, but small enough to distinguish the molecular features of interest.
In this work, we consider only H 2 O, CH 4 and CO 2 , and the bands used are listed in Tab. 3. Even though NH 3 is present in our sample, it is used only to introduce a nuisance and challenge our metric, because NH 3 has features overlapping with those of water. We use 3 feature bands for CH 4 and CO 2 and 5 for H 2 O. Examples of the bands used for CH 4 and H 2 O are shown in Fig. 3 where, for the same planetary template, HD 209458 b, we simulate different atmospheres (overcast, CH 4 rich and H 2 O rich) to show how the metric captures the relevant spectroscopic features.
In the next section, we show how we intend to use this metric to build a diagram similar to that of Fig. 2.

Planets classification
The metric requires to be calibrated to assess its capability to estimate the presence of a molecule. The final product is a diagram similar to Fig. 2, that can be used as a look-up table, such that, given an observed spectrum, its corresponding M mol can be located on the diagram, and its possible composition inferred. (c) Water rich HD 209458b -like planet with M CH 4 data bands highlighted. we present three different realisations: a flat atmosphere (first column), a methane rich atmosphere (second column) and a water rich atmosphere (third column). Each column shows the same planetary spectra. Grey solid lines are the original binned spectral data (Tier 3 spectral resolution), the filled grey areas are the 1 − σ uncertainties (Tier 1), and blue dots are the simulated observation data used in this work. The top row highlights the M CH 4 feature bands from Tab. 3, while the bottom row shows the M H 2 O bands. In green are reported the molecular feature bands values, with their dispersion, while in red are reported the normalisation bands'. Comparing the rows we see how the bands selected match the relevant molecular spectral features.
To assess the ability of the metric to separate the atmospheres in the sample, we use the k -nearest neighbours (KNN) algorithm, a non-parametric pattern recognition algorithm (Hastie et al. 2009). This algorithm, after a training process, assigns a class to an element given the properties of its neighbours. The goal is to classify observed spectra by their molecular content, according to their M mol . Considering two molecules at a time, we first define four classes of planets: molecular poor, mol1 rich, mol2 rich and mixture, as defined in Tab. 4.
The KNN algorithm used classifies each planet according to the 20 (k = 20) nearest planets, in the M mol1 vs M mol2 space, in the same data set. We choose to use 20 neighbours (2% of the full data set) to minimise the number of misclassified planets. The closest neighbours are uniformly weighted, Class Condition molecular poor Ab mol1 < 10 −5 and Ab mol2 < 10 −5 mol1 rich Ab mol1 > 10 −4 and Ab mol1 > 10 × Ab mol2 mol2 rich Ab mol2 > 10 −4 and Ab mol2 > 10 × Ab mol1 mixture everything else Figure 4: Planets classification summary. The figure reports the steps implemented to build the diagram in Fig. 2. Starting from POP-I, for each planet we compute (M mol1 , M mol2 ) for the considered molecules and for both observed and noiseless data. Following the top branch, classes are assigned to the observed spectra (step 1 in the text). Following the middle branch, a KNN classification is performed on noiseless spectra to calibrate the metric space (step 2 in the text). Following the bottom branch, the distribution of noiseless metric data points is convolved with a 2D Gaussian with varying widths to generate a unit-normalised volume. The intersection between this volume and the calibration of step 2 selects the best sampled (i.e. calibrated) region in the metric space (step 3 in the text). The combination of these three steps is shown in the rightmost diagram to be compared with Fig. 2.
and we verified that weighting the neighbours with their Euclidean distance in the metric space does not affect the results significantly. The analysis involves three separated steps, summarised in Fig. 4, applied to POP-I.
Step 1. We estimate the (M mol1 , M mol2 ) on the POP-I observed spectra. We assign a class to each POP-I planet using its input molecular abundance values, Ab mol , that are stored during the population production. This process is described in the top branch of Fig. 4 Step 2. To calibrate the metric, we map the metric space grid by training the KNN algorithm on the (M mol1 , M mol2 ) estimated from the noiseless POP-I planetary spectra. We assign again a class to each planet using its input molecular abundance, Ab mol , and the training is performed on a randomly chosen selection accounting for 70% of the data set, while we use the remaining 30% to test the success of the training. Finally, we classify each point (M mol1 , M mol2 ) of the M mol space grid M mol sampled at a step width of 0.2 M mol , obtaining a map comparable to Fig. 2. This part of the procedure corresponds to the central branch of Fig. 4.
Step 3. Since the noiseless planetary spectra are not expected to sample the parameter space uniformly, we build a mask to select a region of the (M mol1 , M mol2 ) space that is sufficiently well sampled to achieve a reliable classification. To do so, we replace each (M mol1 , M mol2 ) point representing a noiseless planetary spectrum with a two-dimensional Gaussian distribution using the metric dispersion in the two directions as σ. We sum the Gaussian volumes on the parameter space, ending up, after volume normalisation, with a statistical distribution of our data points on the parameter space grid. Then, we select a region in the metric space that results in a total volume of 95%, therefore removing all under-sampled areas from the grid. This last step is represented in the bottom branch of Fig. 4.
The combination of the three steps is shown in the rightmost panel of Fig. 4 and it is the equivalent of Fig. 2 calibrated for the metric on the investigated population.

Deep and Machine Learning
The metric presented in Sec. 2.4 is based on binning the spectra, and therefore is equivalent to using Ariel as a multi-band photometer. This strategy is in line with the Tier 1 definition of Tinetti et al. (2018). However, we are also investigating different strategies to classify spectra by their molecular content (third goal listed in Sec. 1). Deep Learning and Machine Learning (ML) techniques are promising because these algorithms can learn to classify planets from their spectral shape over the whole wavelength range sampled by Ariel. Another advantage over the metric is that ML techniques are not supposed to be biased by the instrumental noise, or at least they can be made to learn how to deal with the bias provided that a sufficiently large and representative set of examples is provided in training. To train the algorithms we use the POP-III observed spectra and their known abundances as a training sample. Each example spectrum is normalised to zero mean and unit dispersion. The normalisation facilitates the training process but might introduce a bias that may be very similar to that affecting the metric. A detailed investigation of these aspects concerning ML is left to future work. Knowing the input abundance of each planet, Ab mol , we can define a threshold and flag a planet as bearing a certain molecule if Ab mol is larger than the threshold. This means that, for each molecule, the algorithm learns to flag the planets as bearing that molecule by looking at characteristic spectral shapes. Then we measure the algorithm ability to "learn" by how much they can generalise their predictions to unknown shapes, testing it on POP-I observed spectra, used as a test data set. The comparison of the ML classification with the known input abundance of each POP-I planet provides an estimate of the success rate.
A detailed investigation of the use of these algorithms and their limitations will be discussed in future work: here we report only an example of how these tools might be used and we compare some preliminary results with the outcomes of the metric of Sec. 2.4. We implemented all algorithms in Python using the scikit-learn 6 package presented in Pedregosa et al. (2011).
The first ML algorithm we use is the KNN algorithm described above. This time we want to simply classify the planets and not to produce a map as in Sec. 2.4.1. For this exercise, we use the scikitlearn default KNN setting: k = 5 and uniform weight for the neighbours. Other Machine Learning algorithms can be used to classify planets. Here we also present our preliminary results using a Multilayer Perceptron (MLP) classifier, a Random Forest Classifier (RFC) and a Support Vector Classifier (SVC) (e.g. Goodfellow et al. 2016;Sturrock et al. 2019). The MLP is a feed-forward neural network composed of multiple layers of perceptrons largely used in classification problems. To produce the results shown later in the text we use an MLP network keeping the scikit-learn default settings (a single hidden layer made of 100 units) and we classify the spectra with the same procedure used for the KNN. The RFC is an ensemble of decision trees used for classification, where each decision tree is a directed graph and each vertex is a binary test. In this work, we use an RFC set-up commonly used in binary decision problems, which has a number of features equal to the square root of the number of input data points, again, as per scikit-learn is the default configuration. The SVC is a Support Vector Machine method, a family of non-probabilistic linear classifiers that construct hyper planes to separate the data points. For the aim of this paper, we implemented a simple SVC shaping the decision function in "one-vs-one" mode, as it is the default configuration in scikit-learn at the moment of writing. Fig. 5 is the frequency of observed planets in the POP-I population that have a certain number of flat bands. In this population, 16% planets are to be considered "flat" as all of the four spectral bands considered are flat. From the figure, we notice that around 46% of the planets in the population have three or more flat bands, which is consistent with POP-I known properties and with the ground truth (Tsiaras et al. 2018;Iyer et al. 2016), as mentioned in Sec. 2.2. In the same figure it is shown the same statistic for the 100 planets of POP-I most covered in clouds (corresponding to a cloud surface pressure of roughly < 10 3 Pa), and for the 100 planets of POP-I with fewer clouds (corresponding to a cloud surface pressure of roughly > 10 5.5 Pa). This comparison shows how overcast planets averagely present more flat bands than clean planets, demonstrating how this approach is sensitive to the presence of clouds.

Shown in
This result clearly shows that Tier 1 observations are effective in the identification of atmospheres with no detectable molecular absorption features.

Spectra classification
The M mol (Sec. 2.4) estimated for the observed POP-I planets are shown in Fig. 6 for different pairs of molecules: CH 4 -CO 2 and CH 4 -H 2 O. Comparing the top left and right panels in Fig.  6, we notice from the colour scale that our metric can separate between planets bearing more or less methane (dark and light green coloured dots respectively) or carbon dioxide (dark and light orange coloured dots respectively). The bottom panels, and the bottom-right panel in particular, show that it is harder to separate planets bearing more or less water (dark and light blue coloured We consider four bands: one for the photometers (VisPhot, FGS1, FGS2) and one for each spectrometer (NIRSpec, AIRS CH0 and AIRS CH1). Each band is compared with a constant value using a χ 2 test to determine its compatibility with flatness. The light blue histogram shows the frequency of planets in the POP-I population with flat bands. The red dashed histogram shows the same statistic but for a selection of the 100 planets of POP-I that are more overcast. The green dotted histogram shows the opposite situation, for a selection of the 100 planets in POP-I for which the cloud pressure surface is the lowest (see text for details). We notice that the overcast planets show more flat bands than planets with fewer clouds. dots respectively). Water data appear more clustered around the axes' origin than the top row, and the water coloured data points are not as clearly separated according to their colour gradient as the methane or the carbon dioxide data points are. A possible explanation is that CH 4 and CO 2 have strong spectral features, with isolated transmission features in the range 3 → 4 µm and 4 → 5 µm respectively, while H 2 O features are less obvious and frequently overlap with the ones of NH 3 , that is present in the population (Tinetti et al. 2013). An alternative explanation is that involving a bias in the metric that affects more strongly the water bands.
The diagrams of Fig. 6 are reproduced in Fig. 7, where the data points are now colour coded following the assigned classes (step 1, Sec. 2.4.1) and the background colours, constructed by training the KNN on noiseless spectra (step 2 and 3, Sec. 2.4.1), serve as reference and calibrated regions in the metric space. It can be noticed that the metric has the desired response from the similarities between the reference regions in Fig. 7 with those of Fig. 2, with a clear separation in the metric  space. The data points tend to cluster towards the origin of the grid more strongly than the reference regions. This is the effect of the bias, further discussed in Sec. 4.1. Fig. 8 shows the relation between the metric, M mol , estimated on POP-I observed spectra, and the input abundances, Ab mol . The coefficients of the linear trends of M mol vs the logarithm of Ab mol are  cases. The superimposed dots are from the POP-I observed spectra and the error bars represent the metric dispersion. Colours correspond to classes described in Tab. 4. Grey dots: planets that contain less than 10 −5 in mixing ratio for the considered molecules; green points: planets that contain 10 times more CH 4 than the other molecule and Ab CH 4 > 10 −4 ; red points: planets that hold 10 times more CO 2 than CH 4 and Ab CO 2 > 10 −4 ; blue points: planets with 10 times more H 2 O than CH 4 and Ab H 2 O > 10 −4 ; yellow dots: all the other possible configurations. The same colour scheme applies to the painted region of the diagram, built from the noiseless spectral data. Grey area: planets with low quantities of water and methane; green area: where we expect to have methane rich planets, blue: for water-rich planets; yellow: for mixed atmospheres. The regions best sampled by the noiseless data, as described in Sec. 2.4.1, are fully coloured, while other regions are transparent.
listed in Tab. 5. An appreciable trend is detected with log abundances of CO 2 and CH 4 , while the H 2 O metric shows only a weak trend with input log abundance. Anti-correlations between e.g., M CH 4 -log(Ab CO 2 ), or M H 2 O -log(Ab CH 4 ) are present as we are considering juxtaposed bands to size these molecules, as listed in Tab. 3. The logarithmic abundances of H 2 O and NH 3 show similar correlations with M H 2 O . While this is expected, as the two molecules manifest similar spectral shapes, the water sensitivity of the metric to the abundance may also be limited by the noise, by a bias squeezing the metric to small values, or both, and further investigation is required in future work. However, the metric is an estimator for the classification of atmospheres on the basis of their molecular content, and it would be misleading to expect the metric to provide robust estimates of abundances, for which spectral retrieval techniques are more appropriate. These aspects are further discussed in Sec. 4.2 as well as in Sec. 4.4, where we show with an example how a retrieval exercise is effective in constraining the input abundances of the molecules considered, water included. We can use Fig. 8 to obtain an estimate of the probability that a molecule mol has abundance in excess of 10 −4 , conditioned to the metric being larger than some value M mol, * , i.e. P (Ab mol > 10 −4 |M mol > M mol, * ). For this, we can use the well known chain rule for the conditional probability that states that P (A|B) = P (A ∩ B)/P (B), where A and B are two separate events. We estimate the number of data points found in a region of the diagrams of Fig. 8 where both conditions are    ) and C 1 (bottom table) coefficients for M mol = C 0 · log(Ab mol ) + C 1 for all the possible combination of considered molecules. The bands used for M mol are reported in Tab. 3. satisfied (favourable outcomes) divided by the number of data points for which only the condition M mol > M mol, * is satisfied (total outcomes). From POP-I observed spectra, we can obtain a single realisation of P . Therefore we simulate 1000 realisations of POP-I observed spectra, using the same input noiseless POP-I population spectra, and randomising the noise realisations. In this way we simulate 1000 realisations of P from which medians and 1-σ confidence levels are computed. Figure 9: Probability that a molecule mol has abundance in excess of 10 −4 , conditioned to the metric being larger than some value M mol, * , i.e. P (Ab mol > 10 −4 |M mol > M mol, * ). CH 4 , H 2 O and CO 2 cases are shown by the green, blue and orange lines, respectively. The lines are computed as the median of the probability estimates from 1000 different realisations of the POP-I observed population. The shaded regions are the 1 − σ confidence levels associated with the median probability. Vertical dotted lines mark metric values, M mol, * , corresponding to a probability of 68%. Fig. 9 suggests that the metric can be used to classify planetary primary atmospheres for the presence of CH 4 and CO 2 , and to a less extent H 2 O, and atmospheres that are likely missing these molecular contributions. With reference to Fig. 9, it can be seen that when M CH 4 ≥ 0.5, the number of planets wrongly classified to have Ab CH 4 > 10 −4 is only 20%, or 1 out of 5 are false positives. However, and as expected, the case of water is different, and our metric is not as effective in detecting the presence of water as it is for the other molecules. Even for large values of M H 2 O , the rate of false positives is close to 40%. Table 6: Percentages of correct identifications for the considered molecules and with different thresholds. In each column we report a different minimum Ab mol and in each row a different molecule. The percentages represent how many of the atmospheres have been correctly identified by the algorithm to have at least the specified minimum amount of that molecule, and therefore they represent the algorithm accuracy. Each ML algorithm has been trained on POP-III and tested on POP-I.

Deep and Machine Learning
(a) KNN percentages of success to identify spectra bearing different minimum amounts of molecules. The percentages of correct classifications for all considered molecules and for different minimum input abundances are reported in Tab. 6a for KNN, in Tab. 6b for MLP, in Tab. 6c for RFC and in Tab. 6d for SVC.

Molecule
Tab. 6 shows that for all Deep and Machine Learning algorithms, the percentages of success in identifying the presence of molecules inside the atmosphere grow with the minimum molecular abundances that we set as a threshold for the classification. While this is expected, it may come as a surprise that in general these algorithms appear to be effective in detecting the presence of all individual molecules with a relatively small fraction of false positives (about 30% or smaller) even at low abundances. This is perhaps because ML algorithms learn to classify atmospheres by recognising spectral shapes. These algorithms performances can be to a certain level independent of the molecules considered, as long as the training set contains sufficiently diverse spectra to allow a secure identification, including water in the presence of ammonia or biases, that is where our metric shows its more severe weaknesses. We also notice from Tab. 6 that KNN, MLP, RFC and SVC show comparable overall performance, and that CH 4 and CO 2 are the most straightforward molecules to identify in Tier 1 planetary spectra.
A comparison between these results and our metric is presented in Sec. 4.5.

DISCUSSION
In this section, we discuss the metric results shown in Sec. 3.2. We first discuss the bias (Sec. 4.1), then we focus on the metric characteristics, such as the relation between the metric estimates and the input molecular abundances (Sec. 4.2) and the detection limits (Sec. 4.3). Then we compare the metric performance with a spectral retrieval (Sec. 4.4), and with Deep and Machine Learning algorithms (Sec. 4.5).

Metric bias
The KNN analysis discussed earlier and shown in Fig. 7 is trained on POP-I noiseless spectra, and the data-points shown in that figure are obtained estimating the metric on POP-I observed spectra, as described in Sec. 2.4.1. To verify if the metric is biased, the KNN analysis is repeated with data-points obtained estimating the metric on POP-I noiseless spectra. This is shown in Fig. 10 that should be compared with Fig. 7. The background colours are very similar in either cases, with small variations due to the training process that selects randomly 70% POP-I noiseless examples. In absence of biases, we expect the distribution of observed data-points to be that of noise-less datapoints, convolved with the distribution of the noise. However, it can be noticed from the comparison of the two figures, that the distribution of the observations is more clustered towards the origin of the coordinate axes, compared to noiseless data-points. This is a consequence of the bias introduced by the metric normalisation discussed in Sec. 2.4: normalisation is required such that the metric response is insensitive to the atmospheric scale height, and sensitive only to the presence of molecular signatures, at the cost of biasing the estimator. We should additionally point out that Fig. 9 results are also affected by the bias. The observing noise reduces the M mol average estimates, and therefore for smaller observing noise, the three coloured lines in the figure are shifted to the right, and the 68% of success corresponds to higher M mol values.
The work presented here demonstrates that the metric we have designed is a powerful tool capable of revealing the presence of a molecule in an atmosphere and that the prediction is independent of the type of the planet and its basic parameters (such as temperature, radius, and pressure) within the limits explored here. However, this comes at the cost of biasing the estimator by a quantity that depends on the instrumental noise as discussed in Sec. 2.4. Provided that the metric can be de-biased, it can be used in a predictive way where an observation (along with its dispersion estimate) can be compared to the calibrated (trained) metric space to infer the possible molecular content of the target. Because instrumental noise can be well characterised, it would be possible to de-bias the metric estimator. This requires a detailed noise analysis, taking into account the uncertainties on the noise estimates, which is beyond the scope of this paper. In the rest of this section we focus on what we can learn from this kind of analysis provided that the metric can be de-biased, and we leave to future work a detailed study on how this de-biasing can be secured.

Relation with the input abundances
We see in Fig. 8 that the correlation between M mol and log(Ab mol ) is in general not strong enough to quantify the input molecular abundances. This is because atmospheric spectra are made of complex non-linear contributions from all the molecules. Therefore, a method based only on spectral shapes (i.e., this metric), is inadequate to quantify molecular abundances. However, the goal of this metric, provided that the bias can be removed, is not to assess the abundance of a certain species in the planet atmosphere, but only its possible presence, avoiding the use of spectral retrieval techniques, that may not be indicated for Tier 1 data.
Focusing on Tab. 5 and looking at the coefficients fitted for M H 2 O over log(Ab H 2 O ) and over log(Ab N H 3 ) we may infer that the metric may not be effective to distinguish between water and ammonia. However, the degeneracy can be broken by performing a spectral retrieval if the target was observed at Ariel Tier 2 SNR, as shown in an example in sec 4.4. This population analysis is based on the study of spectral shapes only, and it does not make use of parameters such as planetary mass, radius and temperature. Although it has proven difficult to distinguish between water and ammonia with this metric, using some knowledge of planetary properties may help us to disentangle the two molecules in a future work; for example, while a Neptune can hold ammonia, a Hot Jupiter planet is not expected to. One of the goals of Tier 1 is to identify targets with interesting spectra to be re-observed in higher SNR Tiers. From this point of view, even if the metric cannot clearly separate between water and ammonia, it can suggest the presence of interesting molecules in the spectrum. This can in turn be used to make informed decisions about targets to be selected for further studies.

Metric detection limit
To explore the detection limit of molecules by the metric, we examine the molecular poor/spectral flat region of Fig. 2. A planet spectrum would be found in that region because of i) clouds, ii) a low temperature (i.e. small scale height), iii) low molecular abundances or a combination of the three. In all cases, the spectrum is expected to be featureless, i.e. flat. Point iii) is defined from input abundances smaller than 10 −5 (Tab. 4). The metric detection limit can then be investigated by removing flat spectra before training the KNN, by rising before training the molecular poor spectra threshold to above 10 −5 , and by monitoring the KNN classification results. As the threshold increases, we expect the KNN to begin failing the molecular poor/flat classification when spectra can no longer be considered flat.
We perform the KNN training on the noiseless spectra of both POP-I and POP-II, the latter containing only CH 4 and H 2 O, the former containing all molecules considered in this work. Each noiseless spectrum has its associated observed spectrum. Flat spectra are identified on observed spectra, and the corresponding noiseless spectra are ignored in the KNN training.
The motivation behind using POP-II is as follows. If we have a population containing only CH 4 and H 2 O and we properly remove all planets with a flat spectrum, there should be no targets left with non-detectable molecular features. In the case of POP-I, however, we do not expect all the planets with Ab CH 4 and Ab H 2 O < 10 −5 to be flat, because other molecules (CO 2 and NH 3 ) can show   Fig. 7, but the superimposed dots are now from the POP-I noiseless spectra, and the errorbars represent the metric dispersion on the spectra before the application of Ariel 's observing noise. The parameter space area best sampled by the noiseless data is now well filled with the dots.
features. Therefore, the flat spectra removal procedure will not empty the molecular poor planets class in this population. Using POP-II instead, we expect that, after removing all flat planets, there will not be molecular poor atmospheres anymore. The procedure is summarised in Fig. 11. The outcome of this analysis is shown for POP-I and POP-II in respectively Fig. 12 and Fig.  13. Only the calibrated regions are shown and data-points have been omitted for clarity. Fig. 12a shows the POP-II KNN analysis with all planets and planetary classes of Tab. 4, in Fig. 12b the KNN is trained removing flat spectra from the training set, and in Fig. 12c the training is done removing flat spectra first, and rising the threshold of molecular poor spectra from Ab mol < 10 −5 to Ab mol < 10 −4 . We notice that Fig. 12b shows no molecular poor atmosphere after excluding spectrally flat cases. This confirms that our metric is able to separate the more complex atmospheres from the flat ones in the simple case of only two molecules. By contrast, Fig. 12c still shows a grey area, signifying that atmospheres with 10 −5 < Ab mol < 10 −4 cannot be considered flat. This can be interpreted as a molecular detection limit. We also notice from the figure that these spectra populate the bottom left corner of the best sampled area of the diagram, meaning that they are classified as having the smallest spectral features of the samples. This confirms the relation between the metric and the molecule abundance. The detection limit is expected to improve in Tier 2 observations, and Changeat et al. (2020) find that the detection limit using spectral retrieval techniques on Tier 2 is about two orders of magnitude smaller compared to that of the metric.
In Fig. 13 we remove all flat spectra from the planetary population POP-I and we report the results of KNN analysis. Here we see that, as expected, while removing all flat spectra from POP-II does also remove all molecular-poor instances, the same does not occur in POP-I. In this case, molecular-poor spectra in any two molecules, such as CH 4 -CO 2 or CH 4 -H 2 O, may appear non-flat because of the presence of the other two molecules, i.e. NH 3 -H 2 O or NH 3 -CO 2 , respectively.
Figure 11: The figure shows the strategy adopted to identify the molecular detection limit for the developed metric. Starting from POP-I, we classify the planets as described in Sec. 2.4.1. Without removing the flat spectra from the population, we would end up with the same results described in Fig. 4; by contrast, if we remove flat spectra, we end up with similar results but with fewer molecular poor planets, because even without flat spectra atmospheres, there will be planets bearing molecules different from the couple investigated by the plot. Different is the case of POP-II : here we have only two molecules in the population, and therefore if we remove the flat spectra planets, we will end with no molecular poor atmospheres.

Input abundances retrieval
We compare here two atmospheric retrievals of the same planet observed both in Tier 1 and in Tier 2. This exercise has two goals: 1. to confirm that a spectral retrieval is capable of disentangling water and ammonia, and to constrain the atmospheric composition of POP-I targets observed in Tier 2 with Ariel ; 2. to show that even though it is possible to perform a spectral retrieval on Tier 1 data for some selected planets, its performance is comparable with that of the metric.
From the POP-I planets, we select one that has water and ammonia in high abundances, low cloud presence, high temperature and a diameter larger than Jupiter's. Such selection will help us to investigate the capability of Tier 2 observed data (simulated as described in Sec. 2.2) to break   Fig. 4: we used the noiseless planetary spectra to classify the metric space and to select the best sampled regions. Table 7: Retrieval parameter table showing fit boundaries, true inputs, and retrieved parameters with uncertainties for Tier 1 and Tier 2 observations. As in Fig. 14, the notation log(X), where X is one of CH 4 , CO 2 , H 2 O or NH 3 , represents the retrieved logarithm of the molecular abundance of the given species and should be compared to the input log(Ab X ). the water-ammonia degeneracy, as well as to estimate the uncertainties from a retrieval using Tier 1 observed data only.
To perform the retrieval, we use TauREx  For the selected planet, we notice that in Tier 2 the abundances of the molecules considered are well constrained, and, as expected, low level (high pressure) clouds are undetected in both cases.   Fig. 4: we used the noiseless planetary spectra to classify the parameter space and to select the best sampled areas. Comparing this figure with Fig.  7, we notice that the "molecular poor" area is still present because even if there are no CO 2 and CH 4 in the planet atmosphere, there could be NH 3 and H 2 O having features (left case) or if there are no H 2 O and CO 2 there could be NH 3 and CO 2 (right case).
The Tier 1 results can be linked to our previous analysis on molecular input abundance detection (Sec. 4.2). We compute the probability to have molecular abundances greater than 10 −4 from the retrieval posteriors and compare these with the probability obtained with our metric (Fig. 9). In this case, the measured M mol are: M CH 4 = −0.47, M CO 2 = 0.54 and M H 2 O = 0.29. The results are listed in Tab. 8. Tier 2 observations provide a confident detection of methane, carbon dioxide and water, while Tier 1 retrievals are broadly comparable to our metric approach in detecting the presence of these molecules.
These results appear to confirm that spectral retrievals may not be best suited or at the very least necessary to analyse Tier 1 data. Retrievals are model-dependent, and one needs to define planet parameters, as well as cross-sections, pressure-temperature profiles, etc. Priors might need to be imposed to ensure convergence. Retrievals are also computationally expensive, making it not trivial to conduct the analysis on hundreds of targets. A photometric metric instead, is model-independent, which may be an advantage when assessing a planet observation for the first time. The full analysis takes only minutes on a desktop computer to reduce 1000 observations. Figure 14: Retrieved spectra and posteriors. The corner plot shows the posteriors for each retrieved parameter using Tier 1 (blue) and Tier 2 (orange) observed data. Input values are shown by the black lines. The panel in the top right corner shows the retrieved spectra from Tier 1 (blue) and Tier 2 data using coloured shaded bands for 1 and 2 − σ uncertainties, and the input (black solid line). The notation log(X), where X is one of CH 4 , CO 2 , H 2 O or NH 3 , represents the logarithm of the molecular abundance of the given species and should be compared to log(Ab X ).  a These percentages arise from a discrete distribution of data and therefore we cannot exactly identify the 68.3% quantity. In this case 69% is the closest possible value.

Comparison with Deep and Machine Learning
ML techniques are difficult to interpret, and so a comparison between their performance and that of our metric can help us in gaining confidence in the outcomes from ML classifiers. For this purpose, we consider a planet as bearing a molecule if Ab mol > 10 −4 . Then with our metric we select all planets that have M CH 4 ≥ 0.22 that according to Fig. 9 corresponds to a probability of ∼ 68.3% to have a Ab mol > 10 −4 for CH 4 .We repeat the same procedure, letting M CO 2 ≥ 0.26 for CO 2 and M H 2 O ≥ 0.80 for H 2 O. In each sample, we check how many of the selected planets have molecular abundances in excess of 10 −4 , obtaining a percentage of success for our metric (or metric precision). In the same way, we check how many of the planets flagged by each of the Deep and Machine Learning algorithms in the full sample actually bear the molecules, such that we can compare their precision performance in Tab. 9.
We notice a marginally better success rate for Deep and Machine Learning algorithms in the cases of KNN and MLP, while RFC and SVC algorithms suggest a better performance when compared to that of the metric. Better performances are expected because, while our metric considers only specific bins in the spectrum, the classification algorithms gather information from all the spectral data points. The comparable performance of the metric with the KNN and MLP suggests that the molecular bands chosen for the metric are not far from ideal, but the comparatively better performances of RFC and SVC provide an indication that margins for improvement may exist.
While more work is required along this path, which is beyond the scope of this work, Deep and ML appear to be very promising for this classification problem, and we shall leave to dedicated works, as the one presented in Hou Yip et al. (2020), a more exhaustive investigation of these techniques, their comparison with more physically motivated strategies similar to the metric, and a thorough investigation of biases that may affect all these techniques.

CONCLUSION
This work presents data analysis methods to extract atmospheric information from Ariel Tier 1 observations of a large and diverse sample of exoplanets. Ariel 's Tier 1 has been optimised as a reconnaissance survey of exoplanets, with SNR larger than 7 after averaging the observed spectra in about 7 photometric data points over the 0.5 -7.8 µm wavelength range. Therefore, having only 7 effective data points per spectrum, Tier 1 data may not be ideally suited for detailed spectral retrieval and to constrain chemical abundances, for which Tier 2 or 3 observations are needed. However, Tier 1 data contain a wealth of information such as the spectral signatures of important molecules, whose presence can in principle be detected, therefore enabling targets to be classified, and can be used to assess planets with featureless spectra.
In this work we simulate the entire population of exoplanets using Alfnoor, assigning a randomised atmosphere to each planet in the Ariel Mission Reference Sample that comprises a diverse population of 1000 exoplanetary targets. We consider primary atmospheres with contributions from clouds, methane, water, carbon dioxide and ammonia. This simulated data set is expected to be representative of the Ariel Tier 1 reconnaissance survey.
The aim of this paper is threefold: (1) to show the capability of Tier 1 to detect featureless spectra, (2) to define a metric to classify and select planets to be re-observed in higher resolution Tiers and (3) to introduce other strategies that can be used to maximise the science exploitation of Ariel 's Tier 1 data, for consideration in future studies.
(1) We presented a reliable method to identify flat spectra. By dividing the Ariel wavelength range into 4 bands, we classify as flat those planets where the 4 spectral bands response is compatible with a flat line, following a χ 2 test.
(2) We developed a model-independent metric that bins the observed spectra over selected bands bearing the signatures of the molecules under investigation. From the observed spectrum alone, this method proves capable to indicate the presence of an atmosphere and its possible composition, independently of the planet parameters such as mass, size and temperature. Applying the metric to a Tier 1 observed spectrum, we find a 1 − σ confidence level in identifying CH 4 , CO 2 or H 2 O when their abundance in the atmosphere is in excess of 10 −4 in mixing ratio, and their estimates M CH 4 ≥ 0.22, M CO 2 ≥ 0.26 or M H 2 O ≥ 0.80, respectively, demonstrating how the metric may be used in a statistically quantitative way. However, we find that the metric is biased, and the bias depends on the magnitude of the instrumental noise. De-biasing the metric is required for its predictions to be quantitative. De-biasing is expected to be possible, following a detailed characterisation of the instrumental uncertainties, and we reserve to investigate these aspects in a future study.
The metric struggles to separate H 2 O and NH 3 . This may be partially due to the effect of a bias, or, more likely, because of the two molecules partially overlapping features. However, the metric is successful in classifying these targets as having an atmosphere. Should these targets be selected for Tier 2 observations, a spectra retrieval analysis can constrain all abundances to high significance.
(3) We have performed a preliminary comparison of four different Deep and Machine Learning algorithms for the chemical classification of Tier 1 atmospheres. We find that their performance in identifying the presence of a certain molecule in the spectrum is marginally better than that of the metric in the case of KNN and MLP, but RFC and SVC outperform the metric, justifying a detailed follow-up study in future work.
By combining the previous equations as done in eq. 3, we finally obtain Therefore, we remove the planet and star radii dependence in the measurement. Similarly to what has been done in Désert et al. (2009), the subtraction between Z band i and Z norm finally removes the scale height dependency as Z band i − Z norm = ln abs, band i σ abs, band i abs,norm σ abs,norm where abs, band i σ abs, band i is the equivalent of abs σ abs (λ) in the band. This factor identifies the contribution of the main absorber in the band. Therefore, if we compare a band where a certain molecule has a strong feature, with one where is not supposed to give contributions to the spectrum, we can identify the molecular presence, compared to what is present in the second band. So, finally M mol becomes M mol = 1 N N i ln abs, band i σ abs, band i abs,norm σ abs,norm σ 2 So, as promised, the metric is also sensitive to the molecular content.
To summarise, we removed the star, planet and atmosphere size dependencies by subtracting the interesting feature bands for a normalisation band and dividing the results by the combined dispersion. This results in a metric that is sensitive to the molecules contained in the atmosphere, but introduces a bias. In fact, the spectral dispersion σ Z band i depends on both the atmospheric feature dispersion and on the observational noise.