APERO: A PipelinE to Reduce Observations—Demonstration with SPIRou

Neil James Cook; Étienne Artigau; René Doyon; Melissa Hobson; Eder Martioli; François Bouchy; Claire Moutou; Andres Carmona; Chris Usher; Pascal Fouqué; Luc Arnold; Xavier Delfosse; Isabelle Boisse; Charles Cadieux; Thomas Vandal; Jean-François Donati; Ariane Deslières

doi:10.1088/1538-3873/ac9e74

1. Introduction

Astrophysical data, like the echelle data taken with Spectro Polarimètre Infra ROUge (SPIRou, Donati et al. 2018), require data reduction pipelines and data analysis software to produce scientific results. A data pipeline is a software that takes data from the origin (in our case, the raw data provided by a specific telescope at an observatory) to a destination (in our case, servers hosting raw, intermediate, and output files accessible by principal investigators).

What exactly is part of the data reduction pipeline and what is part of the data analysis pipeline can be vague. In most cases, there are complex connections and feedback required from data reduction to the observatory (influencing the telescope, the instrument, and thus the raw data) and between the reduction and the analysis software. For the purposes of this paper we define the reduction pipeline as the software that processes raw data in an autonomous way, and specifically per scientific observation, to a point where it can be used by the wider scientific community without prior expertise in the workings of the telescope, instrument or observational procedure (e.g., calibrations, combining of frames, etc.). We define a data analysis pipeline as any steps after the reduction pipeline (i.e., after apero) where one, many, or all scientific observations (of a specific astrophysical object of scientific interest or, indeed, multiple such objects) may be required to further gain scientific insight.

In the latter half of the 20th Century, astronomy made the transition from analog to digital (McCray 2004, 2014; Borgman & Wofford 2021). One could argue that with this digital revolution, data reduction software became a necessity, allowing processing in a more autonomous, uniform manner not possible with a more manual approach. Most early reduction pipelines focused on image processing, but tools such as the Image Reduction and Analysis Facility (IRAF, Tody 1986) became general-purpose software allowing developers to produce reduction pipelines for imaging and spectroscopy. More recently tools such as AstroImageJ (AIJ, Collins et al. 2017, an astronomical image analysis software package based on ImageJ, Rasband 2011) have provided astronomers with general-purpose tool kits for reducing data in a uniform manner. However, although generic software is suitable for data sets that have most or all their characteristics in common, such as imaging time series in the optical for photometry which rely on a common set of calibrations (flats, darks, biases), more complex observations, such as multi-object and/or cross-dispersed spectrographs cannot simply have a one-size-fits-all tool that will work on every instrument. As a result many reduction pipelines exist today (see Table 1 for a few examples for some recent echelle spectrographs¹³ ) and some generalized tools have been developed (e.g., PyReduce; Piskunov et al. 2021, specutils; Earl et al. 2022 and HiFLEx; Errmann et al. 2020).

Table 1. Some Recent Echelle Spectrographs and their Pipelines

Instrument	Facility	Pipeline	References
CAFE	Calar Alto	CAFExtractor	Aceituno et al. (2013), Lillo-Box et al. (2020)
CARMENES	Calar Alto	DRS*	Quirrenbach et al. (2014), Caballero et al. (2016)
CRIRES+	ESO	ESO CPL	Follert et al. (2014), Seemann et al. (2014)
ESPRESSO	ESO	ESO CPL	Pepe et al. (2021)
ESPaDOnS	CFHT	LIBRE-ESPRIT	Donati (2003), Donati et al. (1997)
GIANO-B	La Palma	IRAF	Oliva et al. (2006)
GRACES	Gemini North	OPERA or DRAGraces	Martioli et al. (2012), Chené et al. (2021)
HARPS	ESO	DRS*	Mayor et al. (2003), Rupprecht et al. (2004)
HARPS-North	La Palma	DRS*	Cosentino et al. (2012)
HIRES	Keck	MAKEE, HIRES Redux	Vogt et al. (1994), Tran et al. (2016)
HPF	McDonald	HPF Pipeline	Mahadevan et al. (2010, 2012)
iSHELL	IRTF	Spextools (V5)	Cushing et al. (2004)
iRD	Subaru	DRS*	Kotani et al. (2018)
MAROON-X	Gemini North	DRAGONS	Seifahrt et al. (2020), Labrie et al. (2019)
NIRPS	ESO	ESO CPL and APERO	Wildi et al. (2017)
SPIRou	CFHT	APERO	Donati et al. (2018, 2020)
SOPHIE	OHP	DRS*	Bouchy et al. (2009, 2011)
X-SHOOTER	ESO	ESO CPL	Vernet et al. (2011), Goldoni et al. (2006), Modigliani et al. (2010)

Note. The asterisk denotes unnamed data reduction software. The ESO Common Pipeline Library (CPL) is a collection of pipelines for ESO instruments where each instrument has specific scripts and some shared functionality (McKay et al. 2004).

Download table as: ASCII Typeset image

The inner workings of a data pipeline should be accessible to the user, meaning it should be open-source (where possible). Open-source software is important as it allows inclusiveness with international partners and collaboration from a wide range of people. Using a free, commonly used language like Python allows developers and users to have quick access to code and science algorithms. This enables deep exploration when problems arise and results are not understood. Likewise, one wants very specific scripts that handle all the complex details of the data, but also have the ability to crunch tens of thousands of individual observations in a reasonably short amount of time and without human intervention in a way that is uniform through time.

This paper presents apero,¹⁴ an open-source Python pipeline to reduce observations, demonstrated with SPIRou. Section 2 gives an overview of the pipeline, Section 3 introduces apero as the official pipeline for SPIRou, Sections 4–11 detail the different parts of our pipeline (from the raw data through to science ready products) and Section 12 summarizes our work.

2. Overview

2.1. The Users of APERO

apero is primarily designed to be used at data centers either at the observatories providing the raw data to be reduced or in collaboration with them. In its current form any user wanting to reduce data with apero needs access to all calibration data (to fully calibrate the data: Sections 5–7) and all hot star data (to fully telluric correct the data: Section 8). Thus in general individual PIs are not expected to use apero directly but will, of course, be users of apero data (supplied by the observatory or the large collaborations involved with the instrument). For SPIRou the Spirou Legacy Survey (SLS, Donati et al. 2018, 2020) collaboration was responsible for producing the data reduction pipeline and thus data centers at the observatory (CFHT) and in Canada and France have been responsible for reducing the data. To allow individual PIs to reduce a single file or single night's worth of data, a release of the full calibration and hot star data sets would be required. However, there is no time frame for such a release of this data.

2.2. Design

Although SPIRou has been the main driver during the development of apero, we took every opportunity possible to keep science algorithms separate from both core-functionality algorithms (such as logging, reading and writing files, database interface and management) and instrument-specific functionality (such as hard-coded value, number of fibers and FITS-header keywords). As an example see Appendix H for details on the changes required for use with the Near Infra Red Planet Searcher (NIRPS; Wildi et al. 2017).

apero is an open-source, publicly available (see footnote 13) Python 3 package. The purpose of apero is to take raw data from the telescope, calibrate and correct instrumental and systematic effects where possible, extract and output spectra (2D and 1D) both before and after telluric correction, as well as provide an estimate of the radial velocity and calculate the polarimetry data where required. In this section, we will detail the generic features of apero and throughout the rest of the paper concentrate on its use with SPIRou.

The package is split into base, core, input-output, language, plotting, science and tool sub-packages (see Figure 1). The scripts that are run are hereafter referred to as "recipes," as they only contain isolated steps of the specific reduction step, not dissimilar to a cookery recipe (i.e., step 1: do this, step 2: do that). All algorithmic complexity is kept to a minimum in these recipes and stored in one of the above-mentioned sub-packages. Recipes are defined either as reduction recipes or tool recipes. The former usually takes one or a small set of observations in order to reduce some part of the overall data set, and the latter aid the user or developer in a specific task, be it in processing a large amount of data, resetting output directories and databases, maintaining and updating various data structures and databases within apero or obtaining logs and statistics.

The only installation requirement for apero is a separate python environment (i.e., with conda ¹⁵ or venv ¹⁶ ). All python modules in the environment are strictly controlled by apero via pip (see footnote 15), thus installing in a general python environment with the user's own modules is not recommended. Installation instructions can be found in the apero documentation.¹⁷ Running the installation script provides a full walk-through of all options required to set up apero. One specific feature is the use of apero "profiles" designed to accommodate multiple setups (i.e., differing constants, configurations, data directories, and database setups, etc.). The installation process sets up and checks paths to the data directories, copies all default files and assets, and configures the various databases required for operation. The database installed is either using MySQL, which requires additional installation by the user, or by default, SQLite3 which is supported within python without any additional setup required by the user.

The input, working, and output directories are also designed to work in a flexible way: they can consist of all sub-directories within one primary directory, they can be at unrelated locations on the computer system, and all files can be symbolically linked from elsewhere, for maximum system compatibility. The most basic design for the data directories is shown in Figure 2. For a single instrument, it can often be useful to have shared raw data directories for different apero profiles (which can have different assets, preprocessed, reduced, directories and database setups, etc.).

In addition to the apero python package directory (Figure 1) and the data directories (Figure 2) there is also a settings directory (Figure 3), where each apero profile is stored, containing information on specific instrument and setup parameters). The user has complete control over all installation and database parameters, as well as the ability to manipulate almost all default parameters given for a particular instrument. Explanations of each parameter can be found in the full apero documentation (see footnote 16).

2.3. Notable Features

Due to apero's modular design, it can be run in various ways. The most basic way is to run recipes individually giving all the required arguments (all recipes come with a --help argument to display all required and optional arguments). All parameters inputted are saved in the FITS outputs as an additional extension.

In addition to running recipes separately, there is the apero tool called apero_processing.py which is designed to automate the reduction based on an input run.ini file (saved in the runs directory; see Figure 2). The apero processing script scans the raw directory and reduces all data requested by the run.ini file in an automated way, optimizing any steps that can be run in parallel and automatically waiting at the end of steps that require elements of a previous step to be completed. We define sequences of recipes that can be used to reduce specific steps of the full reduction process.

Another notable feature, as mentioned in Section 2.2, is that apero can be used for multiple instruments (or the same instrument with different settings) without conflict or duplication of the software—this is managed by having different apero profiles (see Section 2.2). apero currently works for both SPIRou (Donati et al. 2018, 2020) and NIRPS (Wildi et al. 2017, see Appendix H) data but there are plans to extend compatibility with other instruments, such as SPIP (Donati et al. 2018, J. F. Donati et al. 2022, in preparation) and ANDES (formerly HIRES, Marconi et al. 2021). Any new instrument can use all, some, or none of the scientific algorithms available from other instruments while always benefiting from the apero architecture and core functionality.

3. APERO—The Official CFHT Pipeline for SPIRou

For the rest of this paper will we only discuss apero in terms of SPIRou; however, future publications will discuss apero's implementation with other instruments. We briefly discuss the NIRPS implementation in Appendix H. Here we present the aspects of SPIRou relevant for this paper. For further details we encourage the reader to consult Donati et al. (2018, 2020). apero was first adapted from the HARPs DRS (Mayor et al. 2003; Rupprecht et al. 2004) with the philosophy of keeping the code accessible and general enough that adding new instruments later would be possible. Note that in addition to apero, the Libre-Esprit pipeline from Donati et al. (1997) has also been adapted to handle SPIRou data (but is not available publicly).

SPIRou is a near-infrared (0.98–2.5 μm) spectro-polarimeter that saw first light at the Canada France Hawaii telescope (CFHT) in 2018 April (Donati et al. 2018, 2020). SPIRou was designed to have spectral resolving power better than 70,000 and to achieve precision radial-velocity (pRV) stability better than 1 m s⁻¹. SPIRou is composed of three units: the Cassegrain unit (i.e., the polarimeter attached to the Cassegrain focus of the telescope), the calibration unit, and the cryostat, containing the spectrograph. The cryostat is in a vacuum and temperature controlled at the milli Kelvin level, offering state-of-the-art stability. The detector is an H4RG-15 HgCdTe¹⁸ array (Hodapp et al. 1996, 2019; Zandian et al. 2016; Artigau et al. 2018) with 4096 × 4096 pixels, with 4 of these pixels at the top, bottom, left and right reserved as reference pixels; they are not light-sensitive and are only used for common-mode readout noise rejection. The H4RG is read through 32 amplifiers that are each 128 × 4096 pixels in size. The Cassegrain module has two Fresnel rhombs (an ensemble of prisms used to rotate polarization states) coupled to a Wollaston prism. The Wollaston prism allows the incoming beam (either from the telescope or the calibration unit) to be split into two orthogonally polarized beams. The light of these two beams is conducted to the cryostat by two fluoride fibers (i.e., the science fibers, hereafter fibers A and B, or when combined AB). In addition to the two science fibers, a third fluoride fiber directly connects the calibration unit to the spectrograph (hereafter fiber C) providing light from various calibration lamps:¹⁹

1.
a flat field exposure (via a halogen lamp), referred to hereafter as a FLAT.
2.
a uranium neon hollow cathode lamp for arc spectra, referred to hereafter as an HC.
3.
a Fabry–Pérot etalon with tens of thousands of lines, referred to hereafter as an FP.

as well as providing an option for an unilluminated dark signal, hereafter referred to as a DARK, where the fiber sees a cold source inside the calibration unit. More details about the calibration unit can be found in Boisse et al. (2016). The light from all three fibers is passed through a slicer to increase the spectral resolution for a given fiber size; Micheau et al. (2018), leading to four closely packed slices per fiber.

The spectrograph itself is cross-dispersed in the perpendicular direction using an R2 echelle grating; this allows the H4RG detector to capture the entire spectral range of SPIRou on the detector with no wavelength gaps but does lead to curved echelle orders with some overlap in wavelength between consecutive orders. We extract 49 orders²⁰ with each order spread along the 4088 pixels (grating diffraction orders #79 to #31). The apero input spectrum through various stages of the pipeline can be seen in Figure 4 and the layout of the three fibers (two science and one calibration) can be seen in Figure 5 in the raw and preprocessed rotations (see Section 4 for details).

**Figure 4.** SPIRou data at different points in the pipeline. All panels show a hot star observation of data product type (dprtype) obj_dark image (with the OBJ, in this case, being the hot star observation in the science fibers and a DARK in the reference fiber). Top left: the raw obj_dark (raw input to apero). Top right: the obj_dark after pre-processing (flipped and resized, Appendix B.2). Middle: the extracted e2ds obj_dark with combined flux from the A and B fibers (see Section 7). Bottom: the one-dimensional spectrum of the hot star observation with the telluric correction in red. In the 2D images, nan values are shown in green. Note due to the number of pixels in these images we have down-sampled the image leading to apparent aliased structures, which are not present in the real data.
Download figure:
Standard image High-resolution image

**Figure 5.** The layout of the three SPIRou fibers. Left: An example raw obj_fp image (an M dwarf star observation in the science fibers and an FP in the reference fiber). Right: The same obj_fp image but pre-processed. A and B fibers are the science fibers and C is the calibration (or reference) fiber. Pre-processed images are rotated 90° clockwise compared to the raw images (see Section 4.6).
Download figure:
Standard image High-resolution image

The SPIRou detector control software reads the detector continuously every 5.57 s and produces a 2D image (4096 × 4096) constructed from the linear fit of the pixel value versus time (henceforth the slope, as well as an intercept, error, and number of frames used for quality checks—see Section 4.5). This is the raw 2D "ramp" image used by apero as an initial input. An overview of this can be found in Appendix A but this software is not provided as part of apero (but the raw cubes are stored by CFHT). The "ramp" images are supplied by CFHT (via CADC²¹ ) and are thus referred to as the raw images for input into apero.

3.1. Definitions of aperoÍnput Files

One of the first actions of apero is to identify data product types (dprtype) for each possible input file taken from the telescope. For this purpose, we use the notation²² {AB}_{C} where fibers A and B are the science fibers and C is the reference fiber. The values of each correspond to what light was present in the fiber; a science signal (OBJ or POLAR, see below for details) or a calibration signal (i.e., DARK, FLAT, HC or FP). Note fibers A and B can have a calibration signal but fiber C cannot have a science signal.

Some of the most frequently used dprtype are thus:

1.
dark_dark_int, where the INT added indicates that the DARK signal in the science AB channel comes from the calibration unit (at parking position) and is fed through the Cassegrain unit (i.e., the thermal contribution of the calibration unit, the polarimeter, and the fiber feedthrough into the cryostat; this is the dark used to calibrate calibration exposures).
2.
dark_dark_tel, where the TEL added indicates that the DARK signal in the science AB channel sees the sky (in fact the mirror covers) which includes the thermal contributions from the Cassegrain module and the fiber feedthrough into the cryostat. This is the dark used for on-sky exposure calibration.
3.
flat_dark and dark_flat, where we have either a FLAT (halogen lamp) in the A and B fibers and a DARK (INT) in the C fiber, or a DARK (INT) in the A and B fibers and a FLAT in the C fiber respectively.
4.
flat_flat, where all three fibers have a FLAT signal.
5.
fp_fp, where all three fibers have the FP (Fabry–Perot etalon) signal.
6.
dark_fp, where fibers A and B are DARK and fiber C has the FP.
7.
hcone_hcone, where all three fibers have the HC (UNe hollow cathode) signal. The HCONE here distinguishes the UNe hollow cathode from the ThAr hollow cathode lamp (HCTWO).
8.
obj_dark and polar_dark, where fibers A and B have a science signal (either in spectroscopic or polarimetric configuration) and the C fiber has a DARK (INT) signal.
9.
obj_fp and polar_fp, where fibers A and B have a science signal (either in spectroscopic or polarimetric configuration) and the C fiber has an FP signal.

We list all usable combinations of fibers A, B, and C and how they are identified from SPIRou header values in Appendix F.

On a standard night of operations, daily calibrations are taken both before and after observations. The standard set of observations for each calibration set is as follows:

1.
dark_dark_int (×2)
2.
dark_dark_tel (×2)
3.
dark_flat (×5)
4.
flat_dark (×5)
5.
flat_flat (×5)
6.
dark_fp (×2)
7.
fp_fp (×5)
8.
hcone_hcone (×2)

These calibrations are tuned to be optimal for the extraction of all objects, avoiding saturation while taking a minimal amount of time to obtain.

SPIRou has two science signal modes: spectroscopy mode, and polarimetry mode. In apero this is distinguished via the definition of the reduction mode drsmode (spectroscopy or polar) and is seen in dprtype leading to the distinction between obj_dark and polar_dark, and, obj_fp and polar_fp. For drsmode set to spectroscopy, the rhomb position must be P16 for fiber A and P16 for fiber B (which means no polarization in either fiber). For drsmode set to polar any other combination of rhomb positions (P2, P4, P14, P16) is deemed a polarimetric setup, however, only certain combinations of rhomb positions are used and are valid for calculating the polarimetric products (these are dealt with in the polarimetry code, see Section 10).

3.2. Overview of the Reduction Process

The reduction process for SPIRou is separated into eight main steps: the pre-processing (Section 4), the reference calibrations (Section 5), the nightly calibrations (Section 6), the extraction (Section 7) for science observations and hot stars, the telluric absorption correction (Section 8), the RV analysis (Section 9), the polarimetry calculations (Section 10), and the post-processing (Section 11). These steps are summarized in Figure 6.

**Figure 6.** Overall flow chart for the apero reduction process for SPIRou. Raw files are shown in gray, preprocessed (pp) files are shown in green, extracted products are in yellow, telluric products are in blue, and radial velocity and polarimetry products are shown in purple. The stacked inputs show that these inputs are used from every night of observation, but run on a per-file or per-night basis. Note that other inputs are required (e.g., calibration files) and detailed in the individual sections.
Download figure:
Standard image High-resolution image

Although each of these eight steps can be run individually, and vary from a single recipe to a set of recipes for each step, the primary reduction method is automated to provide efficiency and to be reproducible wherever the data are being reduced. Note that apero is also compatible with the CFHT automation scripts. The CFHT automation scripts were developed to handle the constant flow of incoming raw data and handle sending the outputs after apero has the final products. As such we define "sequences" which can contain multiple or all steps. Example sequences are a "full sequence" having all steps, a "limited sequence" having all steps for a particular subset of astrophysical objects, a "calibration sequence" designed just to reduce calibration observations, a "telluric sequence" to create all files required for telluric correction, a "science sequence" designed to be run once all calibration and telluric steps have been run (either on one, multiple or all astrophysical objects), or an "engineering sequence" to do special steps required only for engineering and test purposes. This is all controlled via a user's run file (a run. ini file). The apero_processingrecipe reads the run file containing information such as:

1.
which observation directories (e.g., which nights) to reduce or skip.
2.
which science targets to reduce.
3.
which sequence or sequences to use.
4.
which individual recipes of a sequence should be run.
5.
which recipes should be skipped or repeated.
6.
the number of physical cores to use.

Based on this run file and the raw data on disk, apero works out all recipes that should be run and the order in which these should be done, and most importantly which of these recipes can be run at the same time in parallel and which cannot—ultimately making the most efficient use of any machine resources that apero is being run on.

For a standard run (where no data has been previously processed) the apero_processing recipe will process the sequences in the order mentioned above without any human intervention (assuming nominal input data). It can work for both a single night of data or a complete re-reduction of all data, since the first light has been shown to produce reproducible results at multiple data centers.

4. Pre-processing

The raw images (those retrieved from the telescope after the ramp fitting algorithm has been run) require some preliminary processing to remove detector artifacts. These artifacts are documented in this section. All frames independent of dprtype are preprocessed in the same manner before any other step of apero is run.

4.1. Header Fixes and Object Resolution

The SPIRou header provides the required information to process files. However, to facilitate data reduction a few header keys are added or updated.

The first header key we add is the apero object name (drsobjn), this header key is the object name used throughout apero. In general, it is the object name taken from the raw input file but all punctuation and white spaces are removed and replaced with underscores (_) and all characters are capitalized, while "+" and "−" are replaced with "P" and "M" respectively. This avoids object names with slightly different formats being considered as different objects (e.g., TRAPPIST-1 versus Trappist 1) and allows for use in filenames. Next, the target type (TRG_TYPE) with a value of either TARGET, SKY or a blank string is added. This key exists in the raw file header of newer files (2020 and later) but has been found to be incorrect or missing for older files, especially when dealing with some sky frames (older sky frames can usually be identified by a suffix or prefix "sky" in the object name if not already identified as a sky by the target type header key). As well as this a mid-exposure time (mjdmid) is added which is equivalent to the time recorded at the end of exposure minus half the exposure time (mjdend − exptime/2). mjdmid time is used throughout apero and is the recommended time to use, as opposed to other header keys such as mjdate, which is not strictly the start of observation time but the time the observation request is sent. The last two keys added, as mentioned in Section 3.2, are the drsmode and dprtype.

Once the headers are fixed with the above additions and corrections, if the raw files are of dprtypeobj_fp, obj_dark, polar_fp, or polar_dark we cross-match the drsobjn with an object database of object names, positions, motions, parallax, known radial velocity estimates, temperatures and aliases. These are mostly sourced directly from SIMBAD (Wenger et al. 2000), and cross-matched with the most up-to-date proper motion and parallax catalogs (based on an id cross-match from SIMBAD with Gaia EDR3; Gaia Collaboration et al. 2021, DR2; Gaia Collaboration et al. 2018, DR1; Gaia Collaboration et al. 2016, UCAC4; Zacharias et al. 2013 or Hipparcos; Perryman et al. 1997). This ensures the object name given is not already known by another object name, and all astrometric parameters are consistent even for observations from differing PIs. This is important for steps in the telluric process where we combine all objects of the same drsobjn where possible (see Section 8). This local database of object names can be updated and is maintained in such a way as to keep consistency and inform users when updates have been made. All reductions of a single drsobjn should always be done with a single set of astrometric parameters.

4.2. File Corruption Check

Not every raw file contains usable data. For example, a rare occurrence where the detector acquisition system has a synchronization issue in retrieving the pixel stream leads to a 1 pixel offset of the readout. Therefore as part of the pre-processing, we check for corrupt files. We do this by comparing images to a list of known hot pixels. We verify that hot pixels are at the expected position. If they are not at the expected position, this is corrected by registering the pixel grid to the nominal pixel position. Missed lines or columns at the edge of the array are replaced by nan values. This does not lead to a loss in science pixels as the 4 pixel edge of the array consists of non-light-sensitive reference pixels.

4.3. Top and Bottom Pixel Correction

The first part of the correlated noise filtering accounts for gradients along the long axis of the amplifiers readout by removing the slope between the first and last read reference pixels within each amplifier. We take a median of the amplifier "bottom" and "top" reference pixels and subtract for each amplifier the slope between these regions. This accounts for fluctuations in the detector electronics on timescales comparable to or longer than the readout time. Higher-frequency noises are handled as a common-mode between amplifiers in the following step (Section 4.4). High-frequency readout noise that is not correlated between amplifiers cannot be corrected as it overlaps with scientific data and cannot be measured independently; it represents the limiting factor for the fainter targets observed with SPIRou. This correction is represented by the correction of the "amplifier signal" in Figure 7.

**Figure 7.** Schematic image to illustrate median filter dark and 1/f noise correction. The left image is equivalent to a raw data image with common-mode noise between amplifiers and 1/f noise present. The middle image shows the common-mode noise between the amplifiers fixed. The right image shows the image corrected for the common-mode noise and the 1/f noise. The reference pixels are those outside the white-dashed line. This image has been exaggerated as the full image has 4088 non-reference pixels and is thus impossible to view the 4-wide reference pixels. Illustrative spectral traces have been added to guide the eye.
Download figure:
Standard image High-resolution image

4.4. 1/f Noise Correction

The 1/f noise component arises from the detector readout electronics that induce structures that are common to all amplifiers and sampled by the reference pixels. The 1/f noise manifests itself as stripes perpendicular to the amplifiers.

This noise has power at all frequencies and they affect the entire array (i.e., both light-sensitive and reference pixels). Ideally, we would fully correct with the 8 reference pixels, but the high-frequency components have an SNR that is too low to measure them robustly within these pixels. We correct the low-frequency components of the 1/f noise (>32 pixels) with the reference pixels and then measure the high-frequency component with the unilluminated part of the array (the large (∼800 pixel wide beyond K-band orders; "dark" region in Figure 8). Once measured, this common-mode 1/f noise is subtracted from all columns of the science array. A cartoon of the 1/f signal is shown in Figure 7.

**Figure 8.** Features and characteristics of the SPIRou raw images (obj_fp). Panel (A): A zoom-in of some of the reddest orders (showing the striping in the across-order direction). Panel (B): A zoom-in of the bluest orders (also showing the striping in the across-order direction as well as a vertical band in the along-order direction). Panel (C): A zoom-in on the individual slices for fibers A, B, and C (showing the shape of the individual slices within a fiber). Panel (D): The unilluminated region (showing examples of the detector effects to be removed during pre-processing, as well as one of the large-scale defects present on the detector). Panel (E): two of the large detector holes. The detector artifacts highlighted here are corrected in the preprocessing step (Section 4) and this figure is reproduced after preprocessing in Figure 9. Color bars have units of ADU s⁻¹.
Download figure:
Standard image High-resolution image

4.5. Cosmic Ray Rejection

Cosmic rays hits are easier to flag with infrared arrays than they are with CCD data sets due to the acquisition through multiple readouts. Pixels without a cosmic ray hit are expected to see an accumulation of electrons in their well that is linear with time while a cosmic ray hit would induce a glitch in that accumulation that can easily be flagged. One could attempt to reconstruct a ramp while including a discontinuity at the moment of the hit (e.g., Giardino et al. 2019); considering that cosmic rays are rare and that this would add a significant burden in terms of data processing, we opt to simply flag pixels hit by a cosmic ray as invalid (nan values). The flagging of cosmic rays is done in two steps.

First, we check for the consistency between the total number of photons received over the entire ramp and the formal ramp error statistics from the linear fit. Discrepant points, even if they remain within the unsaturated regime of the pixel dynamic range, are flagged as invalid. Second, the ramp fitting of the pixel value provides both a slope and an intercept. The slope is the signal used for scientific analysis, and the intercept is discarded. This intercept value corresponds to the state of the detector prior to the first readout, which, for HxRG arrays, is a structured signal. The intercept values have a typical dispersion of ∼1000 ADUs, and discrepant values indicate that photons within a given pixel do not follow a linear accumulation with time. The consistency of the intercept value with expected statistics is used to further flag invalid pixels within a ramp. The flagged cosmic ray pixels can be seen when comparing Figures 8–9.

**Figure 9.** Same as Figure 8 but after pre-processing (without the apero rotation applied to match the orientation of the same raw obj_fp image). Panel (A): A zoom-in of some of the reddest orders (showing that the striping in the across-order direction has been removed). Panel (B): A zoom-in of the bluest orders (also showing the striping in the across-order direction has been removed as well as a vertical band in the along-order direction having been removed). Panel (C): A zoom-in on the individual slices for fibers A, B, and C showing some flagged hot pixels—given a nan value, in green). Panel (D): The unilluminated region (showing how well the pre-processing has cleaned the unilluminated region, as well as one of the scratches present on the detector). Panel (E): shows two of the large-scale defects also with many of the pixels flagged as unusable (given a nan value, in green). Color bars have units of ADU s⁻¹.
Download figure:
Standard image High-resolution image

4.6. Rotation of Image

The pre-processed images are then rotated to match the HARPS orientation. This is a legacy change left over from when some algorithms shared a common ancestry with the HARPS DRS pipeline (Mayor et al. 2003; Rupprecht et al. 2004). For SPIRou data this is equivalent to a 90° clockwise rotation (shown in the top left and right of Figure 4).

5. Reference Calibrations

While many calibrations are taken on a nightly basis, as one expects slight drifts in the instrument that need to be calibrated, a number of instrument parameters are expected to be fixed on a timescale of years in the absence of a major upgrade (e.g., changing optical elements or science array). Furthermore, a number of calibrations, in particular those linked to the wavelength solution, require an approximate solution to derive an accurate one more reliably. In addition, having a reference calibration is very useful when performing quality assessments of nightly calibrations. We therefore define a night that has been vetted for spurious calibrations as the "reference calibration night." In the future, we may have reason to split the data between reference nights (i.e., before and after a thermal cycle). Currently, the reference night uses calibration data from 2020 August 31 but we have experimented with different reference nights. We postulate that using two or more reference nights would split the data and that any and all data from different reference nights should not be combined until the last possible moment (i.e., when comparing radial velocities, for pRV work).

The reference recipes rely on certain nightly recipes to be run for the reference night; we thus include some nightly calibrations from Section 6 in the reference sequence of recipes. Note that the nightly calibration recipes are used as part of the reference sequence and are not equivalent to running a nightly calibration sequence before running the reference recipes (this also means a nightly calibration sequence on the reference night must be run again as part of the nightly calibrations). The order of the reference sequence is as follows:

1.
apero_dark_ref: a high-pass reference dark from a large subset of preprocessed dark_dark files on disk (Section 5.1).
2.
apero_badpix: a nightly bad pixel map on the reference night (Section 6.1).
3.
apero_loc_spirou: a nightly measurement of the order position on the reference night (Section 6.2).
4.
apero_shape_ref: a reference map of each order's spectral and spatial shape using a large subset of preprocessed fp_fp files on disk and the reference night hcone_hcone preprocessed files (Section 5.2).
5.
apero_shape: a nightly snapshot of each order's shape on the reference night (Section 6.3).
6.
apero_flat: a nightly measurement of the blaze and flat profile on the reference night (Section 6.4).
7.
apero_thermal: a nightly extracted internal dark for determining thermal correction on the reference night (Section 6.5).
8.
apero_leak_ref: a reference map of FP leakage from the calibration to science fibers using all preprocessed dark_dark files on the reference night (Section 5.3).
9.
apero_wave_ref: a reference wavelength solution using the reference night fp_fp and hcone_hcone preprocessed files (Section 5.4).
10.
apero_thermal: a nightly extracted telescope dark for determining thermal correction on the reference night (Section 6.5).

The overall reference calibration sequence flow can be seen in Figure 10.

5.1. Generating the Reference Dark Calibration File

As SPIRou has no moving internal parts for increased stability, one cannot move the fiber out of view and independently measure the detector's dark current. Thus dark frames are non-trivial to construct, as there are two independent contributions to the "dark" image, one arising from the dark current of the science arrays and the other from thermal emission. This problem is mainly seen in the K band and is shared with any pRV spectrograph for which the fiber thermal emission is commensurate with the per-pixel dark current.

The thermal background manifests itself as a very low-level contribution (typically 0.015 e- s⁻¹ pixel⁻¹), well below the typical target flux, but has a "high flux" tail of much brighter pixels. As the SPIRou science array has an extremely stable temperature (sub-milli Kelvin), one expects the pixel dark current to be very stable. From all preprocessed dark_dark files, across all nights, we select a subset of 100 dark_dark files, uniformly distributed in time as much as possible using a sorting function. If there are less than 100 dark_dark files across all available nights we use all files; this becomes our reference dark.

One could use this as the single step for dark correction, but a significant challenge arises. The fiber train is always connected and the science array always sees the thermal emission from the fibers and the hermetic feedthrough connecting the fibers to the cryostat. This thermal emission changes with the temperature of the fiber train and moves, at the pixel level, on timescales of months to years following thermal cycles and maintenance of the instrument. Applying a simple scaling of the dark current, including the thermal background from the fiber, would lead to erroneous subtraction in science data, with sometimes an over subtraction of ∼2.4 μm flux, leading to negative flux. We opt for a decoupling of the two contributions in the data calibration. We construct a high-frequency median dark current, which contains pixel-to-pixel detector contributions and low-frequency components from the thermal background of the fiber train. The high-frequency component can be scaled with integration time while the low-frequency one needs to be adjusted (see Section 6.5). This high-pass reference dark image is then saved to the calibration database for use throughout apero.

5.2. Generating the Reference Shape Calibration Files

In pRV measurements, constraining the exact position of orders on the science array, both in the spectral and spatial dimensions, is key as the position of our spectra on this science array encodes the sought-after velocity of the star. The diffraction orders of SPIRou, and nearly all pRV spectrographs, follow curved lines, and the image slicer has a 4-point structure (see Figures 8 and 9, panel (C)) that is not parallel to the pixel grid.

Within the apero framework, we decided to split the problem into two parts: a reference shape calibration (this section) and a nightly shape calibration (Section 6.3). For the reference step, we constrain the bulk motion, as defined through an affine transformation, and register all frames to a common pixel grid to well below the equivalent of 1 m s⁻¹. We perform the order localization and subsequent steps on a nightly basis as it has the significant advantage that registered frames have all orders at the same position to a very small fraction of a pixel. Furthermore, having registered frames allows for better error handling within apero; one does not expect pixel-level motions between calibrations after this step.

The reference shape recipe takes preprocessed fp_fp and hcone_hcone files (as many as given by the user or as many as occur on the nights being used via apero_processing). The reference shape recipe combines the fp_fp files into a single fp_fp file and the hcone_hcone files into a single hcone_hcone file (via a median combination of the images). After combining, the fp_fp and hcone_hcone images are calibrated using our standard image calibration technique (see Appendix B). In addition to the combined fp_fp and hcone_hcone, we create a reference FP image. This reference FP image is created by selecting a subset of 100 fp_fp files (uniformly distributed across nights) and combining these with a median. This reference FP image is then saved to the calibration database for use throughout apero.

The registration through affine transformations is done using the fp_fp calibrations. We take the combined fp_fp files and localize in the 2D frame the position of each FP peak and measure the position of the peak maxima. Considering the 3 SPIRou fibers and 4 slices (i.e., 12 2D peaks per FP line), this means there are >100,000 peaks on the science array. These are taken as "reference" positions. For each calibration sequence, we then find the affine transformation that minimizes the rms between the position of the FP and the FP reference image calibration. The resulting affine transformation consists of a bulk shift in dx, dy, and a 2 × 2 matrix that encodes rotation, scale, and shear. These values are kept and can be useful to identify shifts in the optics (e.g., after earthquakes or thermal cycles) as well as very slight changes in plate scale and angular position of the array which can be of interest in understanding the impact of engineering work onto the science data products. For example, we can readily measure a 10⁻⁵ fractional change in the SPIRou plate scale following a maintenance thermal cycle of the instrument; the ratio of the point-to-point rms to the median of the plate scale value is at the 1.7 × 10⁻⁷ level. The interpolations between pixel grids are done with a 3rd order spline. We note that changes in the FP cavity length arise from a number of reasons such as gas leakage and temperature and will lead to a motion of FP peaks on the array that is not due to a physical motion of the array or optical elements within the cryostat. Considering that typical drifts are at the ∼0.3 m s⁻¹ day⁻¹ level, to first order this leads to a typical 10⁻⁹ day⁻¹ fractional increase in the plate scale along the dispersion direction. This effectively leads to a minute change in the effective dispersion of the extracted file wavelength solution. As this change is common to both the FP, the HC, and the science data, it is accounted for when computing the wavelength solution and cavity length change (see Sections 5.4 and 6.6).

Once the affine transformation has been applied, images are registered to a common grid (the reference FP image). We then construct a transform that makes the orders straight and corrects for slicer structure in the dispersion direction. This leads to the construction of two maps corresponding to x and y offsets that need to be applied to an image to transform it into a rectified image from which a trace extraction can be performed directly through a 1D collapse in the direction perpendicular to the dispersion of a rectangular box around the order. The y direction (see Figure 14 for orientation) map is computed from the order-localization polynomials (Section 6.2). The x direction map is determined by first collapsing the straightened orders of a fp_fp calibration and cross-correlating each of the spectral direction pixel rows to find its offset relative to the collapsed-extracted spectrum. The x and y offsets are then saved to the calibration database for use throughout apero.

5.3. Generating the Reference Leak Calibration File

For pRV observations, the observational setup is most often one with a science object in the A and B fibers and an FP illumination in the C fiber (i.e., obj_fp or polar_fp). Considering that the SPIRou slicer has sharp edges in its pupil, there is a diffraction pattern that leads to a spike in the cross-fiber direction and a modest cross-fiber component in the leakage. The leakage of the FP spectrum onto the science spectrum is constant through time as it is solely due to pupil geometry, and can therefore be calibrated and subtracted. The reference leak recipe finds all dark_fp files in the raw directory (from the reference night). Each dark_fp file is then extracted (see the extraction process in Section 7). Once all dark_fp files are extracted they are combined for each fiber: AB, A, B, and C (via a median across all extracted e2ds files) creating one image (49 × 4088) per fiber. Conceptually, the leak correction is straightforward: take the combined dark_fp, normalize each C fiber FP to unity (using the 5th percentile of FP flux within the order) and measure the recovered spectrum in the A and B fibers. For any given obj_fp or polar_fp observation, one simply measures the C fiber FP flux and scales the leakage in A and B accordingly. An example of this is shown in Figure 11 for a DARK in the science fibers and an FP in the calibration fiber. The method has been tested over the lifetime of SPIRou and subtracts the high-frequency component of the leakage at a level better than 1 in 100 in the most contaminated orders. The reference leak calibration file (ref_leak) is then saved to the calibration database for use throughout apero.

**Figure 11.** Example of the leakage from the calibration fiber to the science fibers, when the science fiber is unilluminated (DARK). Left: The pre-processed dark_fp image with a Logarithm stretch in flux (flux measured in electrons). Pixels with a nan value are shown in green. Right top: The extracted dark_fp for order #44 before correction (black, input spectrum) and after (red, corrected spectrum). Right bottom: A zoom-in on the extracted dark_fp for order #44.
Download figure:
Standard image High-resolution image

5.4. Generating the Reference Wavelength Calibration Files

The wavelength solution generation follows the general idea of Hobson et al. (2021) however since publication there has been an overall reshuffling of the logic. As such we present an overview of the process here but refer the reader to Hobson et al. (2021) for further specific details.

The reference wavelength solution recipe takes preprocessed fp_fp and hcone_hcone files (as many as given by the user or as many as occur on the nights being used via apero_processing) from the reference night. It combines the fp_fp and hcone_hcone files into a single fp_fp and a single hcone_hcone file (via a median combination of the images). These combined fp_fp and hcone_hcone files are then extracted (see the extraction process in Section 7).

We first consider the combined flux in fibers A and B (the AB fiber). We locate the hcone_hcone lines, starting with a line list generated as in Hobson et al. (2021), fitting each peak with a Gaussian and measuring the position of the peak, and inferring peak wavelength from an initial guess at the wavelength solution from physical models. The first time this HC finding is performed we allow for a global offset between the current hcone_hcone file and the initial guess at the wavelength solution (this is important when our reference night is far in time from when our initial wavelength solution data was taken).

For the fp_fp AB fiber, a similar process is followed. However, instead of a single Gaussian, an Airy function is used (to account for the previous and following FP peak in the fitting process):

$\begin{eqnarray}&&{F}_{\mathrm{airy}}=A{\left(0.5\left(1+\displaystyle \frac{2\pi (x-{x}_{0})}{w}\right)\right)}^{\beta }+{DC}\end{eqnarray} \tag{ 1 }$

where F is the modeled flux of the FP, A is the amplitude of the FP peak, x₀ is the central position of the FP peak, w is the period of the FP in pixel space, β is the shape factor of the FP peak and DC is a constant offset. Once we have found all HC and FP lines in the AB fiber we calculate the wavelength solution.

The accurate wavelength solution for reference night is then found through the following steps:

1.
From FP peak spacing within each order, derive an effective cavity length per order.
2.
Fit the chromatic dependency of the cavity with a 5th order polynomial and keep that cavity in a reference file; through the life of the instrument, we will assume that cavity changes are achromatic relative to this polynomial.
3.
From the chromatic cavity solution, we find the FP order value of each peak, typically numbering from ∼9600 to ∼24,500 respectively at long and short wavelength ends of the SPIRou domain.
4.
From the peak numbering, which is known to be an integer, we can refine the wavelength solution within each order. This solution is kept as a "reference" wavelength solution.

The finding of the fiber AB HC and FP lines and the calculation of the wavelength solution is repeated multiple times (in an iterative process). We essentially "forget" the locations of the HC and FP lines and re-find them as if we had not found them before, only this time instead of the initial guess wavelength solution we use the previous iteration's calculated solution and the previous iteration's calculated cavity width fit as a starting point.

Finally, after three iterations, which is sufficient to converge to floating point accuracy, we re-find the HC and FP lines for the AB fiber one last time using the final reference wavelength solution and final cavity width fit. We also make an estimate of the resolution, splitting the detector into a grid of 3 × 3 and using all HC lines in each sector to estimate the line profile and thus the resolution of each sector. We then process each fiber (A, B, and C) in a similar manner to the AB fiber (finding HC and FP lines from the extracted images and calculating the wavelength solution) the only difference being we do not fit the cavity width nor do we fit the chromatic term; we force the coefficients to be the ones found with the AB fiber.

For quality control purposes we calculate an FP binary mask using the cavity width fit and use this to perform a cross-correlation function between the mask and the extracted FP for all fibers (AB, A, B, and C). We use the cross-correlation function to measure the shift of the wavelength solutions measured in fiber AB compared to fibers A, B, and C and confirm that this is less than 2 m s⁻¹. As a second quality control, we match FP lines (found previously) between the fibers and directly calculate the difference in velocity between these lines as a second metric on the radial velocity shift between the fibers' wavelength solutions. Note that typically for the reference night the value of these quality control metrics is around 10–20 cm s⁻¹ between fibers (i.e., AB − A, AB − B, AB − C).

The reference wavelength solution file (ref_wave) for each fiber, a cavity fit file, and a table of all HC and FP lines found are then saved to the calibration database for use throughout apero. A resolution map is also saved. The hcone_hcone and fp_fp extracted files have their headers updated with the reference wavelength solution.

6. Nightly Calibrations

SPIRou calibrations are taken twice a day: once before the start of the science observations and once after all science observations are completed (i.e., once in the evening and once in the morning). When processing, each step (i.e., each recipe type) is run on all applicable data for all nights before moving on to the next recipe unless a user is running one night at a time (e.g., processing new data from one night or one run only). For an individual night calibrations from before and after science observations are combined.

The order of the night calibration sequence is as follows:

1.
apero_badpix—a bad pixel map for each night (Section 6.1).
2.
apero_loc_spirou—a measurement of the order position on each night (Section 6.2).
3.
apero_shape—a snapshot of each order's shape on each night (Section 6.3).
4.
apero_flat—a measurement of the blaze and flat profile on each night (Section 6.4).
5.
apero_thermal—an extracted internal dark for determining thermal correction of calibrations on each night (Section 6.5).
6.
apero_wave_night—a wavelength solution measurement on each night (Section 6.6).
7.
apero_thermal—an extracted telescope dark for determining thermal correction of on-sky observations on each night (Section 6.5).

The overall nightly calibration sequence flow can be seen in Figure 12.

6.1. Generating Bad Pixel Calibration Files

The bad pixel recipe takes preprocessed dark_dark and flat_flat files (as many as given by the user or as many as occur on the nights being used via apero_processing). It combines all dark_dark files and all flat_flat files into a single dark_dark and a single flat_flat (via a median combination of the images). Bad pixels are then identified in the flat_flat by using Equation (2)

$\begin{eqnarray}{M}_{\mathrm{flat}\ i,j}=\left\{\begin{array}{cl}1: & {\mathrm{FLAT}}_{i,j}\ \mathrm{is}\ \mathrm{not}\ \mathrm{finite}\\ 1: & | ({\mathrm{FLAT}}_{i,j}/{\mathrm{FLAT}}_{\mathrm{med}\ i,j})-1| \gt \mathrm{cut}\_\mathrm{ratio}\\ 1: & {\mathrm{FLAT}}_{\mathrm{med}\ i,j}\lt \mathrm{illum}\_\mathrm{cut}\\ 0: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ 2 }$

where FLAT_i,j is the flux in the ith row jth column of the flat_flat image; FLAT_med is the median filtered flat image (using a filtering width of 7 pixels) and M_{flat i,j} is 1 to flag a bad pixel or 0 otherwise, cut_ratio is 0.5 (flagging pixels with a response less than 50% of their neighbors or unphysically brighter than neighbors) and illum_cut is 0.05 (flagging pixels at the edge of the blaze response). FLAT and FLAT_med have first been normalized by the 90th percentile of flux in the median filtered flat image. Thus M_flat is a Boolean flag map of bad pixels on the flat image. For the dark_dark image, bad pixels are identified using Equation (3)

$\begin{eqnarray}{M}_{\mathrm{dark}\ i,j}=\left\{\begin{array}{cl}1: & {\mathrm{DARK}}_{i,j}\ \mathrm{is}\ \mathrm{not}\ \mathrm{finite}\\ 1: & {\mathrm{DARK}}_{i,j}\gt 5.0\,\mathrm{ADU}\,{{\rm{s}}}^{-1}\\ 0: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ 3 }$

where DARK_i,j is the flux in the ith row jth column of the dark image. Thus M_dark is a Boolean flag map of bad pixels on the dark image. We choose a value of 5.0 ADU s⁻¹ as it is representative of the pixel flux of a typical science target. Including pixels with a brighter level of dark current than this leads to a loss in SNR rather than a gain. We note that this threshold could be target-dependent but for simplicity we use a single value.

In addition to this bad pixels in a full detector engineering flat (FULLFLAT taken during commissioning) are also identified using Equation (4)

$\begin{eqnarray}{M}_{\mathrm{full}-\mathrm{flat}\ i,j}=\left\{\begin{array}{cl}1: & | {\mathrm{FULLFLAT}}_{i,j}-1| \gt 0.3\\ 0: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ 4 }$

where FULLFLAT_i,j is the flux in the ith row jth column of the full detector engineering flat. Thus M_full−flat is a Boolean flag map of bad pixels on the full detector engineering flat image. We chose 0.3 as this flagged the defective regions identified manually on the detector. The 1σ dispersion of the full detector engineering flat image is 2%.

These three bad pixel maps are then combined into a single bad pixel map (Equation (5)) that is shown in Figure 13

$\begin{eqnarray}&&{M}_{i,j}={M}_{\mathrm{flat}\ i,j}\ \mathrm{or}\ {M}_{\mathrm{dark}\ i,j}\ \mathrm{or}\ {M}_{\mathrm{full}-\mathrm{flat}\ i,j}\end{eqnarray} \tag{ 5 }$

where M_i,j is the Boolean flag value for the ith row jth column used as the final bad pixel map. A standard bad pixel map can be seen in Figure 13.

The final step with the bad pixel maps is to dilate clumps of large bad pixels, identifying additional pixels from the edges of these clumps. This is done using the binary_erosion and binary_dilation functions in scipy (Virtanen et al. 2020) using circular apertures of 5 and 8 pixels respectively for the erosion and dilation. The erosion is used to remove small bad pixel clumps and isolate pixels from a copy of the bad pixel map. This then allows the remaining larger bad pixel clumps to be identified. The dilation then increases the size of these large clumps flagging pixels around their edges as bad. This copy of the bad pixel map is then merged back into the original bad pixel map.

In addition to this, we use the bad pixel map along with the flat_flat image to define a mask of the out-of-order regions. It is defined by slicing the image into ribbons of width 128 pixels (in the along-order direction); this width is chosen such that it is small enough for the orders not to show a significant curvature within each ribbon (essentially splitting the image into 32 4088 × 128 rectangles). We take a median of each ribbon in the along-order direction (creating 32 vectors of length 4088) creating a profile of the orders for every 128 pixels across the order. We split each of these 32 order profiles into 128 pixel regions and estimate the background of these as the 5th percentile of the flux in this sub-ribbon. We then set this background estimate for all pixels in that 128 × 128 box and thus produce BACK_EST, a 4096 × 4096 image where each 128 × 128 sub-region is set to the 5th percentile value. The out-of-order region mask is then set by Equation (6). The ribbon regions, background estimate (BACK_EST), and out-of-order region mask are shown in Figure 14, where bad pixels (with nan value) are set to green

$\begin{eqnarray}{M}_{\mathrm{back}\ i,j}=\left\{\begin{array}{cl}1: & {\mathrm{FLAT}}_{i,j}\lt {BACK}\_{{EST}}_{i,j}\\ 0: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ 6 }$

where M_{back i,j} is the out-of-order region mask, FLAT_i,j is the flux in the ith row jth column of the flat_flat image and BACKEST_i,j is the crude 5th percentile background estimate for the ith row jth column (from the 128 × 128 sub-regions).

**Figure 14.** Left: The flat_flat combined and preprocessed input image to the bad pixel calibration recipe. Over plotted in red are the ribbons defined at 128 pixel steps in the across-order direction, Middle: the low spatial frequency background estimate (*BACK*_EST) based on the 5th percentile of 128 × 128 chunks, Right: a zoom in on the out-of-order region mask (M_back) calculated from Equation (6), white area is identified as in-order pixels, black as out-of-order pixels.
Download figure:
Standard image High-resolution image

Both the bad pixel map (badpix) and out-of-order region mask (backmap) are saved to the calibration database for use throughout apero. The bad pixel map and background map are used in our standard image calibration (see Appendix B).

6.2. Generating Localization Calibration Files

The localization recipe takes preprocessed dark_flat or flat_dark files (as many as given by the user or as many as occur on the nights being used via apero_processing). It is run twice, once for the C fiber localization (with a set of dark_flat) and once for the AB fiber localization (with a set of flat_dark). It combines the dark_flat files or the flat_dark files into a single dark_flat or flat_dark (via a median combination of the images). After combining, the images are calibrated using our standard image calibration technique (see Appendix B).

The first step in the localization code is to take the combined and calibrated dark_flat or flat_dark and apply a weighted box median (shown in Equation (7))

$\begin{eqnarray}{{IM}}_{\mathrm{orderp}\ j}=\left\{\begin{array}{ll}\mathrm{MED}({{IM}}_{j=0:j=k+1}): & k\lt 5\\ \mathrm{MED}({{IM}}_{j=k-5:j=4088}): & k\gt 4088-5\\ \mathrm{MED}({{IM}}_{j=k-5:j=k+5+1}): & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ 7 }$

where ${{IM}}_{\mathrm{orderp}\ j}$ is the order profile flux for all rows in the jth column, ${{IM}}_{j=x:j=y}$ is the combined, calibrated dark_flat or flat_dark, that spans all columns from j = x to j = y, and k is the column index number and ranges from j = 0 to j = 4088.

This produces the order profile image of the dark_flat or flat_dark which is used for the optimal extraction (see Section 7.1) and to locate the orders.

To locate the orders we use the scikit-image measure.label algorithm (Fiorio & Gustedt 1996; Wu et al. 2005) which labels connected regions. Two pixels are defined as connected when both themselves and their neighbors have the same value. We use a connectivity value of 2 meaning that any of the 8 surrounding pixels can be neighbors if they share the same value.

In order to facilitate the labeling we first perform a 95th percentile of a box (of size 25 × 25 pixels) across the whole image, as true illuminated pixels' flux is location-dependent. We set a threshold at half that value and label all pixels above this threshold as one and all pixels below this to a value of zero. We then perform the measure.label on this Boolean map (referred to from this point on as Mask_orders). This is just a first guess of the order positions and usually returns many labeled regions that are not true orders.

To remove bad labels we first remove any labeled region with less than 500 pixels. We then remove any pixel within a labeled region that has a flux value less than 0.05 times the 95th percentile of all pixels in that given labeled region and remove this pixel from Mask_orders. We then median filter each row of Mask_orders to clean up the labeled edges and apply a binary dilation (scipy ndimage.binary_dilation) algorithm. This binary dilation essentially merges labeled regions that are close to each other together by expanding regions marked with ones around the edges of these regions. After Mask_orders has been updated we re-run the labeling algorithm. As a final filtering step, we remove any region center that does not overlap with the central part of the image in the along-order direction (i.e., the center ±half the width of the detector, 2044 ± 1022 pixels).

Once we have the final set of labeled regions we use Mask_orders on each order to fit a polynomial fit (of degree 3) to the pixel positions in that labeled region forcing continuity between orders by fitting each coefficient across the orders. We also use the Mask_orders pixel positions to linearly fit the width of each order.

For a dark_flat, this produces polynomial fits and coefficients for 49 orders for the C fiber. For a flat_dark input, this produces polynomial fits and coefficients for 98 orders (49 orders for A and 49 orders for B). These polynomial coefficients for the positions of the orders and the widths of the orders are then converted into values as a function of position across each order. These fits can be seen in Figures 15 and 16.

**Figure 15.** The localization polynomial fits for position and width in the four corners of the dark_flat image (for fiber C). In total we fit 49 orders (see footnote 19) for fiber C. The y-axis shows the across-order direction and the x-axis shows the along-order direction. Color bar is flux in electrons.
Download figure:
Standard image High-resolution image

**Figure 16.** The localization polynomial fits for position and width across all orders of the flat_dark image (for fibers A and B). In total, we fit 49 orders (see footnote 19) for fiber A and 49 orders (see footnote 19) for fiber B. The y-axis shows the across-order direction and the x-axis shows the along-order direction. The color bar is flux measured in electrons.
Download figure:
Standard image High-resolution image

As part of quality control we check that:

1.
the number of orders is consistent with the required number of orders (49 for fiber C, 98 for fibers A+B).
2.
the across-order value at the center of the detector is always larger than the value of the previous order

The order profile (order_profile), locations of the orders (loco), and widths of the orders are saved to the calibration database (if both quality control criteria are met) for use throughout apero.

6.3. Generating Nightly Shape Calibration Files

Before extracting the spectrum, we need to transform the image into a format that is amenable to a simple one-dimensional collapse. Given our reference FP grid and the x and y displacements maps from Section 5.2, on a given night, we only need to find the affine transform that registers FP peaks onto the reference FP image and updates the x and y transform maps within the affine contribution. This assumes that the order curvature is constant through the life of the instrument and that the slicer shape is stable. We note that as the order profiles are determined in each nightly calibration, a slight (sub-pixel) modification of the position of orders would have no impact on the extracted spectra which are extracted with the profile measured for the corresponding night.

The nightly shape recipe takes preprocessed fp_fp files (as many as given by the user or as many as occur on each of the nights being used via apero_processing). It combines the fp_fp files into a single fp_fp per night (via a median combination of the images). After combining, the fp_fp images are calibrated using our standard image calibration technique (see Appendix B). We take the ref_fp, shape_x and shape_y calibrations from the calibration database (created in Section 5.2). If multiple exist we use the closest in time (using the header key mjdmid from the header). To find the linear transform parameters (dx, dy, A, B, C, and D) between the reference fp_fp and this night's fp_fp we find all the FP peaks in the reference fp_fp image and in the nightly fp_fp image. Once we have the linear transform parameters we shift and transform the combined and calibrated nightly fp_fp via our shape transform algorithm (see Appendix C, using the linear transform parameters, shape_x and shape_y) and save the transformed image and un-transformed image to disk (for manual comparison to the input fp_fp image).

As part of quality control, we check that the rms of the residuals in both directions (across order and along the order) are less than 0.1 pixel, which has been found to be optimal to flag pathological cases. The transformation parameters (dx, dy, A, B, C, and D, henceforth shape_local) are then saved to the calibration database (if both quality control criteria are met) for use throughout apero.

6.4. Generating Flat and Blaze Calibration Files

An essential part of the extraction process is calibrating the flat field response (removing the effect of the pixel-to-pixel sensitivity variations) and calculating the blaze function. The blaze can be seen visually in the raw and preprocessed images (e.g., region B in Figures 8 and 9) as a darkening of the orders, especially at the blue end, toward the sides of the detector (in the along-order direction).

The nightly flat recipe takes preprocessed flat_flat files (as many as given by the user or as many as occur on each night being used via apero_processing). It combines the flat_flat files into a single flat_flat per night (via a median combination of the images). After combining, the flat_flat images are calibrated using our standard image calibration technique (see Appendix B). The combined, calibrated flat_flat file is then extracted (using the same extraction algorithms presented in Section 7). The rest of the flat and blaze recipe is handled per order. Once extracted, the e2ds (49 × 4088) is median filtered (with a width of 25 pixels) and all pixels with flux less than 0.05 the 95th percentile flux value, or greater than two times the 95th percentile flux value are removed. Each flat_flat e2ds order is then fit with a sinc function (Equation (8))

$\begin{eqnarray}\begin{array}{rcl}{B}_{i} & = & {AS}{\left(\sin (\theta )/\theta \right)}^{2}\\ S & = & 1+s({x}_{i}-L)\\ \theta & = & \pi \bar{{x}_{i}}/P\\ \bar{{x}_{i}} & = & ({x}_{i}-L)+Q{\left({x}_{i}-L\right)}^{2}+C{\left({x}_{i}-L\right)}^{3}\end{array}\end{eqnarray} \tag{ 8 }$

where B_i is the blaze model for the ith e2ds order, A is the amplitude of the sinc function, P is the period of the sinc function, s is the slope of the sinc function, x_i is the flux vector of the e2ds order, L is the linear center of the sinc function, Q is a quadratic scale term, and C is a cubic scale term. The terms fit in the sinc function are A, P, L, Q, C, and s as a function of x_i.

Once we have a set of parameters the blaze function for this order is B_i for all values of the flux for this order. The original e2ds order is then divided by the blaze function and this is used as the flat profile. A standard deviation of the flat is also calculated for quality control purposes. This process is repeated for each order producing a full blaze and flat profile (49 × 4088) for the input flat_flat files. An example blaze fit and the resulting flat can be seen in Figure 17. To avoid erroneous contributions to the flat any outlier pixels (outside 10σ or within ±0.2 of unity) are set to nan. Note that the multiplication of the blaze and the flat is equivalent to the full response function of the detector. For some orders (#34 and #74), there is a large residual at one edge of the blaze falloff. This is due to the mismatch between the analytical function used and the actual profile; the flat-field correction (Figure 17, bottom panel) accounts for this mismatch.

**Figure 17.** Top: The blaze fit to several of the SPIRou flat_flat échelle orders (fiber AB). Bottom: the ratio of the blaze and input spectrum are used as the flat calibration file, here a perfect detector would have a value of one.
Download figure:
Standard image High-resolution image

For quality control, we check that the standard deviation of the flat for each order is less than 0.05. The flat (flat) and blaze (blaze) profiles are then saved to the calibration database (if the quality control criteria are met) for use throughout apero.

6.5. Generating Thermal Calibration Files

The nightly thermal recipe takes preprocessed dark_dark_int files or dark_dark_tel files (as many as given by the user or as many as occur on each of the nights being used via apero_processing). It combines the dark_dark_int or dark_dark_tel files into a single dark_dark_int or dark_dark_tel respectively (via a median combination of the images). These combined dark_dark_int or dark_dark_tel files are then extracted (see the extraction process in Section 7).

The thermal background seen by SPIRou in a science exposure is the sum of the blackbody contribution of the sky, the Cassegrain unit (at the temperature of the telescope), the calibration unit (for the reference channel), and the thermal emission of the hermetic feedthroughs that connect the fibers into the cryostat. A small contribution also arises from the Earth's atmosphere itself. This emissivity is proportional to one minus the telluric transmission at the corresponding wavelength and if left unaccounted for in the thermal model would lead to emission-like features in the thermal-corrected spectrum in the strongest absorption lines. From a series of sky-dark frames, we measured that the median additional emissivity from the saturated absorption line is at the 4% level of the blackbody envelope. We account for the additional contribution by using a median sky absorption spectrum and adding a small contribution proportional to the excess emissivity due to the Earth's atmosphere in strong absorption lines. Note this contribution is only added for the dark_dark_tel files (as the dark_dark_int images do not see the sky). For this reason, we split generating the thermal calibration files into two steps: we generate the dark_dark_int thermal calibration files, then after a wavelength solution has been generated we generate the dark_dark_tel thermal calibration files (which require a nightly wavelength solution to add the contribution due to the emission-like features).

Considering that the telescope and front-end temperature change through the night, one needs to apply a thermal correction that is adjusted per frame (this is done as part of the extraction recipe in Section 7). While the slope of the blackbody contribution changes very little over the 2.1–2.5 μm domain, within which the thermal background is significant, the amplitude of the contribution varies by a factor of >2 between nights (typically a factor 2 for every 8°C) and needs to be adjusted for individual observations. While we have no external measurement of the thermal background, there are a number of completely saturated telluric water absorption features 2.4–2.5 μm that provide a measure of the total thermal emission seen by SPIRou. These regions are used to scale the thermal background model such that they have a median flux of zero.

The thermal calibration files (thermal_int and thermal_tel) are then saved to the calibration database for use throughout apero. The thermal_int calibrations are used for correcting internal lamp spectra (i.e., other calibrations) and thermal_tel calibrations are used to correct all science spectra (Section 7).

6.6. Generating the Nightly Wavelength Solution Files

Considering that the wavelength solution is central in the anchoring of pRV measurement and that the instrument will drift through time, one needs to obtain a wavelength solution as close as possible in time to the science exposures, ideally on a nightly basis. The nightly wavelength solution captures sub-μm level motions within the optical train and high-order changes in the focal plane that are not captured by the affine transform used to register frames as described in Sections 5.2 and 6.3. The nightly wavelength solution recipe takes preprocessed fp_fp files and hcone_hcone files (as many as given by the user or as many as occur on each of the nights being used via apero_processing). It combines the fp_fp and hcone_hcone files into a single fp_fp and a single hcone_hcone file (via a median combination of the images). These combined fp_fp and hcone_hcone files are then extracted (see the extraction process in Section 7).

The rest of the process is similar to the reference wavelength solution (Section 5.4). The wavelength solution is determined as follows:

1.
Under the assumption that the reference wavelength solution is correct at the pixel level, identify HC lines (catalog wavelength) and FP peaks (FP order).
2.
By combining the reference chromatic FP cavity length and position of FP peaks of known FP order, fit a per-order wavelength solution.
3.
Using that wavelength solution, measure the velocity offset in the position of HC lines (Δv_HC) and derive an achromatic increment to be applied to the FP cavity
4.
Scale the 0th order term of the Nth order cavity polynomial by $1-\tfrac{{\rm{\Delta }}{v}_{\mathrm{HC}}}{c}$ , where c is the speed of light in the units of Δv_HC.
5.
Iterate the last two steps until Δv_HC is consistent with zero.

The main difference with the reference wavelength solution for fiber AB is that while we start the calculation of the wavelength solution with the cavity fit and wavelength solution from the reference wavelength solution calibration, we only allow for changes in the achromatic term. This is because the chromatic dependence of the cavity width is related to the coating of the FP etalon, and is therefore not expected to change rapidly. An achromatic shift, on the other hand, corresponds to a change in the cavity length of the FP, due in part to pressure or temperature variations, which may happen between nights. Meanwhile, for fibers A, B, and C we fit nothing and use the fiber AB wavelength cavity coefficients. The FP mask for quality control is also not re-generated. Therefore all cross-correlations between fibers AB and A, B, and C are done relative to the reference night wavelength solution (however we only check quality control on AB − A, AB − B, and AB − C). As with the reference wavelength solution recipe, a wavelength solution for each fiber, and the FP and HC lines founds during the process, are then saved to the calibration database for use throughout apero.

7. Extraction

The extraction recipe takes any preprocessed file (as many as given by the user but in general just one single file). The files are combined (if requested) and are calibrated using our standard image calibration technique (see Appendix B). Once calibrated, the correct (closest in time) order profile (order_profile), positions of the orders (loco), shape_local, shape reference (x and y maps), and wavelength solution are loaded for each fiber (AB, A, B, and C). The order profiles and input image are transformed to the reference FP grid using the affine transformation mentioned in Sections 5.2, 6.3 and Appendix C, and using the shape x and y maps the image is corrected for the slicer geometry, the tilt and the bending due to the echelle orders.

The extraction recipe then extracts the flux (using optimal extraction, Section 7.1), calculates the barycentric correction (Section 7.2), corrects contamination from the reference fiber (if an FP is present in the reference fiber, Section 7.3), corrects for the flat (Section 7.4), corrects for the thermal contribution (Section 7.5) and generates the 1D spectrum (Section 7.6). An overview of the extraction procedure can be seen in Figure 18.

**Figure 18.** Extraction sequence: The input is a pre-processed file, that is then calibrated and extracted. The outputs are the 2D (per order) and 1D spectra for fibers AB, A, B, and C as well as some debugging outputs.
Download figure:
Standard image High-resolution image

7.1. Optimal Extraction

Once the image and the order profile (from localization) have been corrected for the slicer geometry and curvature of the echelle orders we extract out the combined flux in the science channels (fibers A and B) to create a fiber AB, as well as extracting out the flux in A and B (for polarization work) and C separately (for the reference fiber calibrations). As the orders are already straightened we use just the localization coefficient value at the center of the image to extract vertically along each order. We then divide the image by the order profile to provide a weighting across the order (i.e., an optimal extraction, Horne 1986). This produces the image in Figure 19 (the e2dsll, top middle, and bottom middle panels) where we show the image for fiber AB (A+B). The final step of the optimal extraction is to sum vertically across the columns accounting for cosmic rays by using a sigma clip ∣flux∣ > 10σ away from the median value for that column. This creates our e2ds (extracted 2D spectrum) and for SPIRou, this leads to images with 49 orders and 4088 pixels along the orders.

**Figure 19.** Top left: A full 2D image of a star after `apero`_`preprocess` with a science target in the A and B fibers and an FP in the C fiber. Top middle: The full image is corrected for the slicer shape and tilt and straightened to remove the curvature due to the echelle orders—this is directly prior to summing in the extraction process (i.e., the optimal weights have been applied and is known as the e2dsll), here we see the AB extraction i.e., before summing we see both the A and B fiber. Top right: The e2ds—the extracted flux summed along pixel columns across the A and B fiber (for an AB extraction) covering the 4 slices in A and 4 slices in B. Bottom left, middle and right, the corresponding zoom-ins of the same point in the extraction recipe as at the top (bottom left shows 4 orders for fibers A, B and C uncorrected, the bottom middle shows 3 orders for fibers A and B corrected, and bottom right shows 5 orders of the combined A and B flux). The tilt and curvature due to the echelle orders can be clearly seen on the left and are clearly corrected in the middle. Hot pixels from the full 2D image of the star after `apero`_`preprocess` to the middle panels were removed during the calibration process (see Appendix B). In all panels, nan values are shown in green.
Download figure:
Standard image High-resolution image

7.2. BERV Correction

Ideally, any stellar spectrum observed would be measured from a point stationary with respect to the barycenter of the solar system (Wright & Eastman 2014). However, ground-based observations are subject to: the orbit of the Earth, the rotation of the Earth, precession and other Earth motions, and to a lesser extent gravitation time dilation, leap-second offsets, and factors affecting the star itself (i.e., parallax, proper motions, etc.). We use the term Barycentric Earth Radial Velocity (BERV) hereinafter to collect all these terms into a single measurement which can be used to correct a specific spectrum at a specific point in time. We calculate the BERV using the barycorrpy package (Kanodia & Wright 2018; Wright & Kanodia 2020), which uses the astrometric parameters fed in at the preprocessing level (Section 4.1). The calculation from barycorrpy includes the estimate for the BERV itself and the corrected or barycentric Julian Date (BJD) at the mid-exposure time. barycorrpy has a precision better than the cm s⁻¹ level. We also estimate the maximum BERV value for this object across the year. If for any reason the BERV calculation fails²³ with barycorrpy we calculate an estimate of the BERV (precise to ∼10 m s⁻¹, modified from PyAstronomy.pyasl.baryvel; a python implementation of helcorr; Piskunov & Valenti 2002) and flag that an estimated BERV correction was calculated. This estimated BERV is not precise enough for pRV work but is sufficient to allow for acceptable telluric correction.

7.3. Leak Correction

For scientific observations, the reference fiber either has a DARK or an FP illuminating the pixels in this fiber. For pRV an FP allows a simultaneous RV measurement of an FP alongside the measurement of the stellar RV; this allows precise tracking of the instrumental drift when the simultaneous FP is compared to the fp_fp from the nightly wavelength solution calibration, see Section 9. However, as mentioned in Section 5.3, light from the FP has been shown to slightly contaminate the science fibers and thus we provide a correction for such calibration.

During the reference sequence (Section 5) many dark_fp are combined (and extracted) to form a model of the light seen in the science fibers when no light (other than the contribution from the DARK) was present as well as an extracted reference fiber measurement of the FP flux that caused said contamination in the science fibers. Using these models, the contamination measured in the science channels of the reference leak recipe is then scaled to the flux of the simultaneous FP of the observation (using the extracted flux from this scientific observation we are trying to correct). Then, this model is subtracted from the original science observation for each of the science fibers (AB or A or B), order-by-order, as in Equation (9)

$\begin{eqnarray}\begin{array}{rcl}{\mathrm{ratio}}_{i} & = & \displaystyle \frac{{\rm{\Sigma }}(L{\left[C\right]}_{i}S{\left[C\right]}_{i})}{{\rm{\Sigma }}(S{\left[C\right]}_{i}^{2})}\\ {\mathrm{scale}}_{i} & = & \displaystyle \frac{L{\left[{AB},A,B\right]}_{i}}{{\mathrm{ratio}}_{i}}\\ S{\left[{AB},A,B\right]}_{i,\mathrm{corr}} & = & S{\left[{AB},A,B\right]}_{i}-{\mathrm{scale}}_{i}\end{array}\end{eqnarray} \tag{ 9 }$

where L[C] is the model of the FP from the leak reference recipe, S[C] is the 2D extracted spectrum in the reference fiber (fiber C), L[AB, A, B] is the model of the contamination from the FP from the leak reference recipe in the science fibers (either AB or A or B), S[AB, A, B] is the 2D extracted flux in the science fibers (either AB or A or B), $S{\left[{AB},A,B\right]}_{\mathrm{corr}}$ denotes the leak-corrected 2D extracted spectrum in the science fibers (either AB or A or B) and i denotes that this is done order-by-order.

7.4. Flat Correction

Whether or not the reference fiber was corrected for contamination the next step is to correct for the flat. From the nightly calibrations (Section 6) we expect a flat file for each night of observations (and one for each fiber: AB, A, B, and C) to be present in the calibration database. To correct for the flat we simply divide the extracted spectrum (one for each fiber) by the corresponding flat spectrum as in Equation (10)

$\begin{eqnarray}&&S{\left[{AB},A,B,C\right]}_{\mathrm{corr}}=S[{AB},A,B,C]/\mathrm{FLAT}[{AB},A,B,C]\end{eqnarray} \tag{ 10 }$

where $S{\left[{AB},A,B\right]}_{\mathrm{corr}}$ denotes the flat corrected 2D extracted spectrum, S[AB, A, B, C] denotes the 2D extracted spectrum prior to correction and FLAT[AB, A, B, C] denotes the flat applied from the calibration database created using the method described in Section 6.4.

As both the extracted spectrum and the flat are 2D and equivalent in shape there is no need to do this per order. Note that it is highly recommended to use a flat from the same night as the observation, and where possible this is the default option, however, nights near in time to the observation are used when no closer calibration is available. The flat-fielded extracted spectrum is denoted by e2dsff (to distinguish it from the non-flat fielded extracted spectrum e2ds). Note also that no 2D spectrum is ever saved with the blaze correction applied, however, we provide blaze spectra that can be used to correct any spectrum for the blaze. As with the flat, using the blaze for that night is highly recommended, and where possible is the default blaze used by apero when a blaze correction is required.

7.5. Thermal Correction

The reference dark, applied during the standard image calibration phase (see Appendix B), removes the high-frequency components of the dark; however, the thermal contribution still remains (and varies on a night-by-night basis). For this reason, we use nightly extracted dark_dark files to model the thermal contribution present in an observation during the night. The thermal correction model comes in two flavors, one for science observations where we assume there is some sort of continuum to the spectrum and telluric contamination as well as a small contribution arising from the Earth's atmosphere itself, described in Section 7.5.1, and one for HC or FP extractions where these assumptions are not true, as detailed in Section 7.5.2.

7.5.1. Thermal Correction of a Science Observation

In the case where we have a scientific observation, a dark_dark_tel (where the calibration fiber sees the cold source and the science fibers see the mirror covers) is used. The extracted dark_dark_tel (from Section 6.5) is then median-filtered with a width of 101 pixels (on a per-order basis). This width was chosen to be big enough to capture large-scale structures in the dark and not be significantly affected by readout noise. A fit is then made to the red most orders (>2450 nm) using only flux lower than 0.01 from a transmission spectrum from the Transmissions of the AtmosPhere for AStromomical data tool (TAPAS; Bertaux et al. 2014)—i.e., a domain where transmission is basically zero. We assume that we can safely use any flux with a transmission of order zero to scale the thermal background to this zero transmission value (see Equation (11) and Figure 20)

$\begin{eqnarray}\begin{array}{rcl}\mathrm{mask} & = & \left\{\begin{array}{cl}1: & \mathrm{TAPAS}\lt 0.01\\ 0: & \mathrm{otherwise}\end{array}\right.\\ \mathrm{ratio} & = & \mathrm{median}\left(\displaystyle \frac{{TT}[{AB},A,B,C]\times \mathrm{mask}}{S[{AB},A,B,C]\times \mathrm{mask}}\right)\\ S{\left[{AB},A,B,C\right]}_{\mathrm{corr}} & = & S[{AB},A,B,C]-\displaystyle \frac{{TT}[{AB},A,B,C]}{\mathrm{ratio}}\end{array}\end{eqnarray} \tag{ 11 }$

where TAPAS is the TAPAS spectrum, TT[AB, A, B, C] is a nightly extracted dark_dark_tel spectrum, S[AB, A, B, C] denotes the 2D extracted spectrum prior to correction and $S{\left[{AB},A,B\right]}_{\mathrm{corr}}$ denotes the thermally corrected 2D extracted spectrum.

**Figure 20.** Thermal correction of a science observation. Thermal correction is scaled to a science spectrum via mapping flux lower than 0.01 in an inputted transmission spectrum from TAPAS (background sample region, blue dots on the plot). In black is the extracted last order of Gl699 and in red is the scaled thermal background taken from the dark_dark_tel thermal calibration file. The gaps in the spectrum are due to the blaze cut off at the edge of orders and no overlap in this domain. Some deep lines between 2.48 and 2.50 μm will be over-corrected (they lie below the scaled thermal contribution). This arises from the finite resolution of the spectrographs that tends to make narrow absorption features shallower than wider features, such as doublets. This problem will be addressed in the next version of APERO.
Download figure:
Standard image High-resolution image

7.5.2. Thermal Correction of a Calibration

In the case where we have an HC or an FP observation, a dark_dark_int (where all three fibers see only the cold source, not the sky nor the mirror covers) is used. The extracted dark_dark_int is then median filtered (again with a width of 101 pixels on a per-order basis) and a fit is made using an envelope to measure the thermal background in the reddest orders (>2450 nm). The envelope is constructed by using the flux below the 10th percentile (i.e., not in the HC or FP peaks). This is then converted into a ratio and scaled to the observation we are correcting (see Equation (12))

$\begin{eqnarray}\begin{array}{rcl}\mathrm{ratio} & = & \mathrm{median}\left(\displaystyle \frac{{TI}[{AB},A,B,C]}{{P}_{10}({TI}[{AB},A,B,C])}\right)\\ S{\left[{AB},A,B,C\right]}_{\mathrm{corr}} & = & S[{AB},A,B,C]-\displaystyle \frac{{TI}[{AB},A,B,C]}{\mathrm{ratio}}\end{array}\end{eqnarray} \tag{ 12 }$

where P₁₀ is the 10th percentile value, TI[AB, A, B, C] is a nightly extracted dark_dark_int spectrum (median filtered with a width of 101 pixels), S[AB, A, B, C] denotes the 2D extracted spectrum prior to correction and $S{\left[{AB},A,B\right]}_{\mathrm{corr}}$ denotes the thermally corrected 2D extracted spectrum.

7.6. s1d Generation

The e2ds and e2dsff formats are not necessarily the most convenient for science analysis, having duplicated wavelength coverage at order overlap and slightly varying velocity sampling with each order and between orders. We therefore transform the e2dsff file into the s1d format. The s1d is sampled on a constant grid for all objects. We have two differing s1d formats, one with a uniform step in wavelength (0.05 nm pixel⁻¹) and one with a constant step in velocity (1 km s⁻¹ pixel⁻¹), both being sampled between 965 and 2500 nm. Numerically, to construct the s1d, we use as an input the e2dsff file prior to blaze correction and the blaze file (Section 6.4) as inputs. We create two s1d vectors, one corresponding to the total flux and one corresponding to the total blaze on the destination wavelength grid. We use a 5th order polynomial spline to project the flux of a given order onto the flux grid and perform the same operation with the blaze onto the weight vector. We do not consider the blaze below 20% of the peak blaze value and values on the destination wavelength grids that are out of the order's range are set to zero. We loop through orders and sum the contribution of each order onto the respective destination grids for the e2dsff science flux and blaze. The s1d generation is summarized in Figure 21. Note that the s1d generation only depends on the blaze calibration. As such any spectrum (regardless of emission lines, low flux, or strong bands) can be converted to s1d format and we generate s1d for hcone_hcone and fp_fp as well as science targets.

**Figure 21.** Construction of the s1d data product. In constructing the s1d from the e2dsff frame, one consecutively splines individual orders from the science and files onto the destination grid. The top panel shows a sample region within the H band with 3 consecutive orders and the sum of all fluxes. The middle panel shows the same operation performed for the blaze. The bottom panel shows the ratio of the sum of the science flux and blaze, providing seamless stitching of orders.
Download figure:
Standard image High-resolution image

8. Telluric Correction

The detailed performances of SPIRou's telluric correction will be presented in a forthcoming publication (É. Artigau 2022, in preparation); we summarize the main steps here. Most telluric-correction schemes are either model-based (e.g., Molecfit, Smette et al. 2015 or Allart et al. 2022) or purely empirical (Artigau et al. 2014; Bedell et al. 2019). Here we adopt a hybrid method with both a simplistic model-based correction and a further empirical correction of residuals.

The first part of the telluric correction uses the average TAPAS absorption spectrum (Bertaux et al. 2014) for Maunakea for 6 chemical species, H₂O, O₂, CO₂, N₂, CH₄ and NH₃; we combine these into two absorption spectra, one for water and one for all other components, dubbed the "dry" spectrum, at full resolution. We correct the absorption of water and dry components by adjusting the optical depth of each telluric spectra by nulling a cross-correlation function (CCF) of unsaturated absorption lines of the corresponding absorber. To account for the overlapping stellar and telluric features, prior to the CCF, the science spectrum is divided by a stellar template of the object of interest. The "dry" optical depth has been found to be remarkably consistent with the geometric airmass, at the 0.03 airmass level rms. The "water" optical depth cannot be compared to a measurement of precipitable water vapor comparable in accuracy to the geometric airmass but is expected to be similar in accuracy as it is measured through the same method and includes a larger number of lines in its CCF.

In apero the telluric correction is an automated iterative process and an overview schematic of the process can be seen in Figures 22 and 23.

**Figure 22.** Telluric hot star sequence.
Download figure:
Standard image High-resolution image

**Figure 23.** Telluric science sequence.
Download figure:
Standard image High-resolution image

We need only a few specific hot stars (currently around 30) chosen with the following criteria:

1.
fast rotators with a published $v\sin i\gt 200$ km s⁻¹ (to easily identify telluric lines).
2.
B or A spectral type, avoiding Be stars that have Balmer-series emission lines.
3.
bright to minimize overheads, with an H-band brightness of 3–6 mag.
4.
observable from Maunakea.
5.
for stars close in the sky position (<10°), the brighter one is preferred.

The database of hot stars is constructed as follows:

1.
model residual transmission of hot stars (apero_mk_tellu, Section 8.1).
2.
model the absorption contributions from water and from "dry" components (essentially all contributions that are not water, apero_mk_model, Section 8.2).
3.
correct the telluric absorption in these hot stars (apero_fit_tellu, Section 8.3) using the models of residual transmission.
4.
construct a template of these hot stars (using all available observations that pass quality control, apero_mk_template, Section 8.4).
5.
re-model the residual transmission (again using apero_mk_tellu, Section 8.1, but this time with the addition of a template).
6.
re-make the model of the absorption contributions from water and from dry components (using apero_mk_model, Section 8.2).

Once this database of hot stars has been constructed scientific observations can be corrected. This is also an automated iterative process, and is done in the following order:

1.
correct the telluric absorption for each science observation (apero_fit_tellu, Section 8.3) using the models of residual transmission from the hot stars.
2.
construct a template for each science object (using all available observations that pass quality control, apero_mk_template, Section 8.4).
3.
re-correct the telluric absorption for each science observation (again using apero_fit_tellu, Section 8.3, but this time with the addition of a template for that object).

This process is beneficial as it does not require hot star observations every night but relies on a well-organized observing strategy such that hot star observations cover a wide range of airmass and water vapor conditions. The SPIRou observation coverage in airmass and water vapor can be seen in Figure 24. The requirement that these hot stars are bright also means that no more than a few minutes are required to get a sufficient SNR, without taking away too much valuable telescope time for science observations.

**Figure 24.** Hot stars (used for telluric correction) coverage in the water and dry components of absorption (fractional throughput) are used to construct a library of telluric residuals after telluric pre-cleaning.
Download figure:
Standard image High-resolution image

One complication of this method is more data is always available to add to these models and templates of objects. In ideal circumstances, one would regenerate all hot star and science target templates with all data for those objects and subsequently. Then one would redo the later steps of the hot star and the science observation corrections (steps 4 to 6 and 2 to 3 respectively). However, once a sufficient number of observations of a specific object (either a hot star or science target) is reached, we find that it is not necessary to re-make templates for normal processing. However, whenever it is possible we do re-reduce all data including these telluric steps to provide the most optimal template possible. The exact number of telluric stars needed for "sufficient" correction has been the subject of much debate without a clear-cut answer. One way to solve this is to set a requirement that the telluric correction is a minor contribution to the SNR of any given observation. Considering that observations of the brightest science targets have an SNR of 300, if we want telluric stars to have a <5% contribution to this value, then the effective photon noise of the telluric stars should be >1000. Considering that typical hot star observations have an SNR of ∼200, this would imply that one needs at least 25 observations. The full calculation is somewhat more complex as these uncertainties propagate in a linear model and the effective noise will depend on where any given observation falls within the τ[dry] versus τ[water] diagram. As an order of magnitude, we recommend having >100 hot star observations spanning the range of expected observing conditions. There are also criteria on BERV coverage as to whether a template is worth constructing in the first place (see Section 8.4).

8.1. Residual Transmission of Hot Stars (`apero`_`mk`_`tellu`)

The residual transmission recipe (apero_mk_tellu) takes a single hot star observation (an extracted, flat-fielded 2D spectrum). The first step is a pre-cleaning correction which essentially removes the bulk of the telluric absorption, producing a corrected 2D spectrum as well as an absorption spectrum, sky model, and an estimate of the water and dry components of the absorption (É. Artigau 2022, in preparation). The pre-cleaning uses a stellar template, if available, to better measure the water and dry components. The corrected 2D spectrum is then normalized by the 95th percentile of the blaze per order and the residual transmission map is created by using a low-pass filter (per order) on the hot star (and dividing by a template if present).

We make sure the pre-cleaning was successful (i.e., the water component exponent is between 0.1 and 15 and the dry component exponent is between 0.8 and 3.0) and check that the SNR for each order is above 100; subsequently, the hot star residual transmission maps are added to the telluric database.

8.2. Water and Dry Component Models (`apero`_`mk`_`model`)

During the pre-cleaning process (É. Artigau 2022, in preparation) for the hot stars (done as part of apero_mk_tellu, Section 8.1) we calculate the water and dry exponents of absorption. Once we have observed a sufficiently large library of telluric hot stars (see Figure 24), typically a few tens under varying airmass and water column conditions, we take all of the residual transmission maps that passed quality control and calculate a linear minimization of the parameters. The linear minimization is done per pixel per order, across all transmission maps (removing outliers with a sigma clipping approach) against a three-vector sample (the bias level of the residual, the water absorption exponent, and the dry absorption exponent). The output is three vectors each the same size as the input 2D spectrum (49 × 4088), one for each of the three vector samples. These are used in every apero_fit_tellu recipe run (Section 8.3) to correct the telluric residuals after telluric cleaning. The three vectors are saved and added to the telluric database.

8.3. Correcting Telluric Absorption (`apero`_`fit`_`tellu`)

All hot stars and science targets are corrected for telluric absorption. The first step, as with apero_mk_tellu, is the pre-cleaning correction. Then, we correct the residuals of the pre-cleaning at any given wavelength by fitting a linear combination of water and dry components. We assume that any given absorption line in the TAPAS absorption spectrum has a strength that is over or underestimated relative to reality, the residuals after correction will scale, as a first order, with the absorption of the chemical species. The same is true with line profiles; if the wings of a line are over or underestimated, the residuals will scale with absorption. An example corrected telluric spectrum can be seen in Figure 25. We correct the telluric absorption on the combined AB extracted spectrum and subsequently use the same reconstructed absorption (for fiber AB) to correct the extracted spectra for fibers A and B individually.

**Figure 25.** Example telluric correction. The input data (e2dsff) is in black, the calculated reconstructed absorption is in blue, and the final telluric corrected spectrum is in red. Each wavelength domain shown here has a subplot showing the modeled sky emission from OH lines (also corrected for) with arbitrary units.
Download figure:
Standard image High-resolution image

8.4. Template Generation (`apero`_`mk`_`template`)

Templates for each astrophysical object are created simply by shifting all observations (in BERV) from their nightly wavelength solution to the reference wavelength solution. This effectively creates a cube²⁴ of observations for specific astrophysical objects which are then normalized (per observation) by the median for each order. We pass a low-pass filter over this cube and then the cube is reduced to a single 2D (extracted and telluric-corrected) spectrum by taking a median in the time dimension (across observations). The same process is done for the 1D spectrum. The 2D templates are copied to the telluric database for use in the rest of the telluric cleaning process (the second iterations of apero_mk_tellu and apero_fit_tellu), except if the BERV change throughout all epochs is below 8 km s⁻¹. The 1D spectrum is saved as a useful output of apero.

9. RV Analysis

One major requirement for SPIRou is precision radial velocity measurements. apero provides a per-observation estimate of radial velocity using a standard CCF method, cross-correlating an observation with an appropriate set of lines (a mask). This radial velocity estimate is precise enough for quality checks (i.e., at the several tens of m s⁻¹ level) but not at the precision required for pRV. For pRV in the near-infrared, access to multiple observations of a single object is required. As of the publishing of this paper, the newly developed line-by-line (LBL) recipes (Artigau et al. 2022) are self-contained and publicly available.²⁵ However, currently, the LBL recipes cannot be called inside apero. Thus LBL RVs are not automatically produced as part of the fully automated apero framework. Adding the call to the LBL recipes in the apero automated function is one of the main goals for the next version of apero (version 0.8).

Below we describe the current CCF radial velocity estimate provided by apero and briefly describe two alternate methods (developed alongside apero), the LBL method (Artigau et al. 2022) and an external CCF method (Martioli et al. 2022), both giving precision far closer to that required for science. The internal CCF method and LBL are shown in Figure 26.

**Figure 26.** RV sequence
Download figure:
Standard image High-resolution image

9.1. The apero CCF Recipe

The CCF method is very often used for pRV work, particularly in the optical domain. In the early apero effort, it was the main tool to derive precise RV values. When implementing a near-infrared version of the CCF, a number of challenges appeared. The near-infrared domain is plagued with telluric absorption, and even after telluric correction, some wavelength domains are expected to have significant excess noise levels. Deep or saturated telluric lines cannot be corrected and are better left as gaps (represented as nan) in the spectrum that are fixed for the entire time series considered. When computing a CCF, how does one account for gaps in the data? The star's yearly line of sight variations will cause this gap to shift against the stellar spectrum by up to ±32 km s⁻¹ depending on ecliptic latitude. In the optical, one can simply reject the entire domain affected by the gap (64 km s⁻¹ plus the gap width); however, at optical wavelengths, deep absorption lines are sufficiently sparse that the overall loss in wavelength domain due to telluric absorption is small, which is not the case in the near-infrared.

To further obfuscate the issue, telluric absorption varies between nights, so if one went down this path of masking, it would end with the masking of a large window affected by any line that gets deeper than a given threshold, even if only once in a time-series that may include hundreds of visits. The combination of varying conditions and yearly BERV excursions leads to a loss of domain that is simply unacceptable, especially considering that the parts of the near-infrared that are richest in sharp spectroscopic features (see Figure 4 in Artigau et al. 2022) are at the blue and red edges of the H band, which are affected by telluric water absorption.

We opted for a CCF that correlates weighted delta functions against the spectrum but set the weight to zero when reaching a point below 0.5 telluric transmission (where unity is no telluric absorption). This is done on a spectrum-to-spectrum basis, to minimize the effective throughput losses. This CCF measurement is performed per spectrum using one of the 3 standard masks available in apero depending on the star's temperature (GL 846, Gl 699, GL 905 respectively for T_eff > 3500 K, 3000–3500 K, <3000 K). We derive per-order as well as global CCFs. These data products are useful to confirm the systemic velocity of the star, avoiding eventual target misidentifying, as well as for flagging spectroscopic binaries. For time-series analysis, it can be significantly improved upon by using all observations to perform a spectral cleaning (see Section 9.3) to obtain a much cleaner CCF or through completely different methods, such as the line-by-line algorithm (Section 9.2).

9.2. LBL Analysis

The most accurate RV measurements with SPIRou have been obtained within the line-by-line (LBL (see footnote 24); Artigau et al. 2022) framework. The LBL recipes exist externally to apero and the LBL outputs are currently not produced as part of apero. However, the apero outputs are directly used with the LBL scripts (apero data was the first data to work with LBL and SPIRou apero data the main driver for the development of the LBL algorithms). In short, the LBL algorithm builds on the Bouchy et al. (2001) work that defines an RV measurement as the projection of a spectrum-to-template residual onto the derivative of the template. Instead of performing a spectrum-wide RV measurement, the LBL algorithm subdivides the spectrum into thousands of "lines" and derives a per-line velocity. It then uses a mixture model to derive an average velocity, allowing for the presence of velocity outliers. This allows for the filtering of outlying values in the spectra that arise, among other things, from imperfect telluric correction, OH line residuals, and detector defects. The framework is strongly inspired by the work of Dumusque (2018) that used a per-line analysis to mitigate the impact of activity in pRV time series of Sun-like stars. The LBL algorithm also provides a measurement of higher-order derivatives that can be used as tracers of activity. The second derivative of the spectrum notably provides an accurate measurement of changes in the effective FWHM of lines, even in the presence of dense molecular bands.

As detailed in Section 5 of Artigau et al. (2022), the analysis of a long-time series of Barnard's star (from SPIRou data with apero) produces an RV dispersion at the 2.59 m s⁻¹ that is consistent with the 2.57 m s⁻¹ uncertainty, as propagated from noise estimates. This shows that any systematic error from the apero framework, unaccounted for by zero-point corrections (T. Vandal et al. 2022, in preparation), is at the sub-m s⁻¹ level.

apero data analyzed with the LBL algorithm has been presented in Gan et al. (2022b), Martioli et al. (2022), and Cadieux et al. (2022), setting constraints at, or below, the m s⁻¹-level on phase-folded RV curves of sub-Neptune TESS discoveries. A major goal of the next version of apero (version 0.8) is to call LBL recipes inside apero to allow the LBL products to be automatically produced and integrated into the post-processed results (see Section 11.4).

9.3. Comparison with an External CCF Routine

In addition to the apero CCF recipe and the LBL analysis, we also have an external CCF recipe (spirou-ccf). The external CCF package spirou-ccf ²⁶ has been developed to run an independent CCF analysis on the SPIRou data, where one can obtain radial velocity measurements with meter-per-second precision, which is comparable to the results obtained with the LBL method. As mentioned in Section 9.1, a direct application of the method introduced by Pepe et al. (2002) to calculate the CCFs for the SPIRou spectra will not provide very precise velocity measurements due to the fact that the line depths do not reflect the true statistical weight of each line as they are polluted by residual noise left by telluric absorption. In addition, the telluric features move throughout the time series and the signal-to-noise ratio is variable with time.

The methods implemented in this package are described in Martioli et al. (2022). In summary, all spectra of the time series are first loaded and a low-order flux correction is applied so that all spectra match a median template. The time dispersion of each spectral bin is calculated to estimate an empirical statistical weight for each region of the spectrum, which is then used to update the weights in the CCF calculation. An iterative sigma-clip filter is also applied and the spectral regions with large gaps are removed. The gaps are caused by some masking done in earlier stages of the pipeline, for example, by detector defects or by regions with deep telluric absorption where the telluric correction has failed. A global continuum is also calculated and used to normalize all spectra uniformly. Finally, a CCF is calculated for each spectrum and a CCF template-matching algorithm is applied to the entire time series to obtain the final relative velocity shifts for all spectra.

Artigau et al. (2022) also compares this external CCF approach to the LBL, thus we encourage the reader to consult that article for a more in-depth comparison.

10. Polarimetry

The polarimetry module for apero was adapted from the spirou-polarimetry module.²⁷ Figure 27 shows a schematic flowchart of the polarimetry processing in apero. Table 2 shows the APERO input parameters required by the polarimetry module.

Table 2. APERO Pipeline Polarimetric Parameters

Description of Parameter	Default Value	Accepted Values
Use telluric corrected flux	`True`	`True,False`
Use BERV corrected wavelength	`True`	`True,False`
Use source RV corrected wavelength	`False`	`True,False`
Interpolate flux values	`True`	`True,False`
Polarimetry method	`''Ratio''`	`''Ratio,'' 'Difference''`
Apply polarimetric sigma-clip	`True`	`True,False`
Clipping threshold in units of sigma	4	positive real
Polarimetric continuum detection method	`''IRAF''`	`''MOVING`_`MEDIAN,'' 'IRAF''`
Stokes I continuum detection method	`''IRAF''`	`''MOVING`_`MEDIAN,'' 'IRAF''`
Continuum bin size [number of points]	900	positive integer
Continuum fit function	`''spline3''`	`''polynomial,'' 'spline3''`
Order of continuum polynomial	5	positive integer
Stokes I continuum fit number of knots	50	positive integer
Remove continuum polarization	`True`	`True,False`
Normalize Stokes I by the continuum	`False`	`True,False`

Initial velocity of LSD profile [km s⁻¹ ]	−150	negative real
Final velocity of LSD profile [km s⁻¹ ]	150	positive real
Number of points in LSD profile	151	positive integer
LSD mask lines are in air-wavelength	`False`	`True,False`
Minimum line depth for LSD analysis	0.03	(0.0, 1.0)
Min. Landé for LSD analysis	0	(0.0, ∞ )
Max. Landé for LSD analysis	10	(0.0, ∞ )

Download table as: ASCII Typeset image

SPIRou as a polarimeter can measure either circular (Stokes V) or linear (Stokes Q or U) polarization in the line profiles. Each polarimetric measurement is performed by 4 exposures obtained with the Fresnel rhombs set at different orientations (see Section 3.1 of Donati et al. 2020). In Table 3 we provide the index position of each Fresnel rhomb, as they appear in the FITS header, for each exposure in the corresponding polarimetric mode. These indices are used by apero to recognize exposures within a polarimetric sequence, and then correctly apply the method introduced by Donati et al. (1997) to calculate polarimetric spectra.

Table 3. Index Positions of the Fresnel Rhombs (RHB1 and RHB2) for Exposures taken in each Observing Mode of SPIRou

Observing	Exposure 1		Exposure 2		Exposure 3		Exposure 4
Mode	RHB1	RHB2	RHB1	RHB2	RHB1	RHB2	RHB1	RHB2
Stokes IU	P16	P2	P16	P14	P4	P2	P4	P14
Stokes IQ	P2	P14	P2	P2	P14	P14	P14	P2
Stokes IV	P14	P16	P2	P16	P2	P4	P14	P4

Download table as: ASCII Typeset image

10.1. Polarimetry Calculations

The polarization spectra of SPIRou are calculated using the technique introduced by Donati et al. (1997), which is summarized as follows. Let f_i∥ and f_i⊥ be the extracted flux in a given spectral element of fiber A and B channels, where i = {1, 2, 3, 4} gives the exposure number in the polarimetric sequence. Note that the extracted flux can be either the extracted spectrum or the extracted telluric corrected spectrum; by default in apero, we use the telluric corrected spectrum. The total flux of unpolarized light (Stokes I) is calculated by the sum of fluxes in the two channels and in all exposures, i.e.,

$\begin{eqnarray}&&{F}_{I}=\sum _{i=1}^{4}({f}_{i\parallel }+{f}_{i\perp }).\end{eqnarray} \tag{ 13 }$

Let us define the ratio of polarized fluxes as

$\begin{eqnarray}&&{r}_{i}=\displaystyle \frac{{f}_{i\parallel }}{{f}_{i\perp }},\end{eqnarray} \tag{ 14 }$

which gives a relative measurement of the flux between the two orthogonal polarization states. In an ideal system, r = 1 means completely unpolarized light, and other values provide the amount (or the degree) of polarization that can be calculated as in Equation (1) of Donati et al. (1997), i.e.,

$\begin{eqnarray}&&P=\displaystyle \frac{{f}_{\parallel }-{f}_{\perp }}{{f}_{\parallel }+{f}_{\perp }}=\displaystyle \frac{r-1}{r+1}.\end{eqnarray} \tag{ 15 }$

Therefore, in principle, one could obtain the amount of polarization with a single exposure. However, this measurement is not optimal, since it only records the two states of polarization at the same time but not at the same pixel. To obtain a measurement that records the same state of polarization at the same pixel, it suffices to take a second exposure with one of the quarter-wave analyzers rotated by 90° with respect to the first exposure, consisting of the 2-exposure mode. One can also use the 4-exposure (2 pairs) mode, where the polarization state in the two channels is swapped between pairs, which better corrects for slight deviations of retarders from nominal characteristics (retardance and orientation) and also corrects for the differences in transmission between the two channels caused, for example, by different throughput of the two fibers, or by a small optical misalignment. For this reason, SPIRou only operates in the 4-exposure mode, which is accomplished by rotating the analyzers accordingly in each exposure, as detailed in Table 3. The equation to calculate the degree of polarization for the 4-exposure mode can be obtained in two different ways, by using the "Difference" method or by the "Ratio" method, as defined in Sections 3.3 and 3.4 of Bagnulo et al. (2009) and also in Equation (3) of Donati et al. (1997). The degree of polarization for a given Stokes parameter X = {U, Q, V} in the Difference method is calculated by

$\begin{eqnarray}&&{P}_{X}=\displaystyle \frac{1}{4}\sum _{k=1}^{2}\left(\displaystyle \frac{{r}_{2k-1}-1}{{r}_{2k-1}+1}-\displaystyle \frac{{r}_{2k}-1}{{r}_{2k}+1}\right),\end{eqnarray} \tag{ 16 }$

and for the Ratio method the degree of polarization is given by

$\begin{eqnarray}&&{P}_{X}=\displaystyle \frac{{\left({\prod }_{k=1}^{2}{r}_{2k-1}{/r}_{2k}\right)}^{1/4}-1}{{\left({\prod }_{k=1}^{2}{r}_{2k-1}{/r}_{2k}\right)}^{1/4}+1}.\end{eqnarray} \tag{ 17 }$

Another advantage of using two pairs of exposures is that one can calculate the null polarization (NULL1 and NULL2) as in Equations (20) and (26) of Bagnulo et al. (2009), which provides a way to quantify the amount of spurious polarization. The null polarization for the Difference method is given by

$\begin{eqnarray}&&{\mathrm{NULL}}_{X}=\displaystyle \frac{1}{4}\sum _{k=1}^{2}\left[{\left(-1\right)}^{k-1}\left(\displaystyle \frac{{r}_{2k-1}-1}{{r}_{2k-1}+1}-\displaystyle \frac{{r}_{2k}-1}{{r}_{2k}+1}\right)\right],\end{eqnarray} \tag{ 18 }$

and for the Ratio method the null polarization is given by

$\begin{eqnarray}&&{\mathrm{NULL}}_{X}=\displaystyle \frac{{\left({\prod }_{k=1}^{2}{r}_{2k-1}/{r}_{2k}\right)}^{\tfrac{{\left(-1\right)}^{k-1}}{4}}-1}{{\left({\prod }_{k=1}^{2}{r}_{2k-1}/{r}_{2k}\right)}^{\tfrac{{\left(-1\right)}^{k-1}}{4}}+1}.\end{eqnarray} \tag{ 19 }$

Finally, the uncertainties of polarimetric measurements can be calculated from the extracted fluxes and their uncertainties (denoted here by σ) by Equations (A3) and (A10) of Bagnulo et al. (2009). In the Difference method, the variance for each spectral element is given by

$\begin{eqnarray}&&{\sigma }_{X}^{2}=\displaystyle \frac{1}{16}\sum _{i=1}^{4}\left\{{\left[\displaystyle \frac{2{f}_{i\parallel }{f}_{i\perp }}{{\left({f}_{i\parallel }+{f}_{i\perp }\right)}^{2}}\right]}^{2}\left[\displaystyle \frac{{\sigma }_{i\parallel }^{2}}{{f}_{i\parallel }^{2}}+\displaystyle \frac{{\sigma }_{i\perp }^{2}}{{f}_{i\perp }^{2}}\right]\right\},\end{eqnarray} \tag{ 20 }$

and in the Ratio method the variance is given in terms of the flux ratio r_i as defined in Equation (14), i.e.,

$\begin{eqnarray}&&{\sigma }_{X}^{2}=\displaystyle \frac{{\left(\tfrac{{r}_{1}}{{r}_{2}}\tfrac{{r}_{4}}{{r}_{3}}\right)}^{1/2}}{4{\left[{\left(\tfrac{{r}_{1}}{{r}_{2}}\tfrac{{r}_{4}}{{r}_{3}}\right)}^{1/4}+1\right]}^{4}}\sum _{i=1}^{4}\left[\displaystyle \frac{{\sigma }_{i\parallel }^{2}}{{f}_{i\parallel }^{2}}+\displaystyle \frac{{\sigma }_{i\perp }^{2}}{{f}_{i\perp }^{2}}\right].\end{eqnarray} \tag{ 21 }$

Applying this formalism to SPIRou spectra, we obtain values that vary continuously throughout the spectrum and are systematically above or below zero for each spectrum, which we refer to here as the "continuum polarization." For general scientific applications with SPIRou, this continuum polarization is actually spurious as it reflects small differences in the injection between beams, and must therefore be fitted and removed. This step is mandatory before performing measurements in spectral lines. apero applies an iterative sigma-clip algorithm to fit either a polynomial or a spline to model the continuum polarization.

10.2. Least-squares Deconvolution

The least-squares deconvolution method (LSD) is an efficient technique that combines the signal from thousands of spectral lines retaining the same line profile information to obtain a mean velocity profile for the intensity, polarization, and null spectra. A common application of this technique concerns the measurement of the Zeeman split into Stokes V (circularly polarized) profiles. The Zeeman split is a physical process where electronic transitions occurring in the presence of a magnetic field have their main energy transition level split into two additional levels, forming a double line in the intensity spectrum. An interesting feature of these lines is that they are circularly polarized and their polarizations have opposite signs. Therefore, by observing the circularly polarized spectrum one can obtain a characteristic Stokes V profile that provides a way to detect and characterize the magnetism in stellar photospheres with great sensitivity.

apero implements the LSD calculations using the formalism introduced by Donati et al. (1997), summarized as follows. Let us first consider the weight of a given spectral line i, w_i = g_i λ_i d_i, where g is the Landé factor (magnetic sensitivity), λ is the central wavelength, and d is the line depth. Then one can construct the line pattern function,

$\begin{eqnarray}&&M(v)=\sum _{i=1}^{{N}_{l}}w\delta (v-{v}_{i}),\end{eqnarray} \tag{ 22 }$

where N_l is the number of spectral lines considered in the analysis, δ is the Dirac function, and v is the velocity. The transformation from wavelength (λ) to velocity space is performed by the relation dv/d λ = c/λ, where c is the speed of light. The LSD profile is calculated by the following matrix equation:

$\begin{eqnarray}&&{\boldsymbol{Z}}={\left({{\boldsymbol{M}}}^{{\rm{t}}}.{{\boldsymbol{S}}}^{2}.{\boldsymbol{M}}\right)}^{-1}{{\boldsymbol{M}}}^{{\rm{t}}}.{{\boldsymbol{S}}}^{2}.{\boldsymbol{P}},\end{eqnarray} \tag{ 23 }$

where P is the polarimetric spectrum calculated from Equations (16) or (17), and S is the covariance matrix, a diagonal matrix where each element in the diagonal is given by S_jj = 1/σ_j, with σ_j being the uncertainty in the polarimetric spectrum calculated from Equations (20) or (21).

Note that one can also calculate the null polarization LSD profile by substituting the polarimetric spectrum P by the null spectrum N in Equation (23). The intensity LSD is also possible, by using the flux spectrum F , but in this case, the line weight in Equation (22) is simply given by the line depth, i.e., w_i = d_i.

In practice, LSD requires a few important steps to be executed by apero. First, each individual spectrum is cleaned using a sigma-clip rejection algorithm to minimize the impact of outliers in the LSD profile. Then we set a grid of velocities to calculate the LSD profile, where the grid is defined by the following parameters: an initial velocity, v₀, a final velocity, v_f, and the total number of points in the grid, N_v. Next, a fast and accurate method is necessary to project the spectral values onto the velocity grid. Finally, an appropriate catalog of spectral lines (line mask) needs to be adopted for the LSD calculations. apero selects the line mask from a repository of masks, where the selection is based on the proximity to the effective temperature of the star observed. The apero masks are computed using the VALD catalog (Piskunov et al. 1995) and a MARCS model atmosphere (Gustafsson et al. 2008) with an effective temperature ranging from 2500 to 5000 K in steps of 500 K, and the same surface gravity of $\mathrm{log}g=5.0$ dex. The lines that are effectively used in the LSD analysis are selected with line depths above a given threshold, which is set to 3% by default and with a Landé factor of g_eff > 0, resulting in a total of approximately 2500 atomic lines that cover the full spectral range of SPIRou. Figure 28 shows an example of an LSD analysis performed on a 4-exposure Stokes-V sequence of the bright Ap star Gamma Equulei, which has a strong magnetic field (e.g., Bychkov et al. 2006) and therefore shows an obvious Zeeman feature in the SPIRou data.

**Figure 28.** LSD analysis performed on the polarimetric data reduced with APERO and obtained from a 4-exposure Stokes-V sequence of the bright Ap star Gamma Equulei. Panels from top to bottom show Stokes I, Stokes V, and null profiles.
Download figure:
Standard image High-resolution image

In practice, the LSD analysis is not computed in a standard automated run of apero but the module is supplied and can be activated with the use of a single keyword in the apero profiles or run after processing.

11. Post Processing

The final data products that go to PIs are composite files of many of the outputs of apero. For SPIRou, these are sent to the Canadian Data Astronomy Center (CADC)²⁸ but are only produced for science targets and hot stars (i.e., obj_fp, obj_dark, polar_fp, and polar_dark) and not for calibrations by default. There are currently five post-processing files each linked to a single odometer code. These are the 2D extracted output (e.fits, Section 11.1), the 2D telluric corrected output (t.fits, Section 11.3), the 1D output (s.fits, Section 11.2), the velocity output (v.fits, Section 11.4), and the polarimetric outputs (p.fits 11.5). A summary of the CADC output files is available in Table 4 and the post-process sequence is shown in Figure 29.

**Figure 29.** Post-process sequence.
Download figure:
Standard image High-resolution image

Table 4. Science Ready Outputs Sent to the Canadian Data Astronomy Center, (CADC)

File	Description
(odometer)e.fits	2D extracted spectrum for fibers AB, A, B, C, wavelength solution, and blaze
(odometer)s.fits	1D extracted spectrum for fibers AB, A, B, C, and telluric corrected spectrum if available
(odometer)t.fits	2D telluric corrected spectrum for fiber AB, A, B, wavelength solution, blaze, and reconstructed atmospheric transmission
(odometer)v.fits	combined and per order CCFs for fitting the radial velocity of the star
(odometer)p.fits	polarimetric products (Polarimetric flux, Stokes I, Null vectors, wavelength solution, and blaze)

Download table as: ASCII Typeset image

11.1. 2D Extraction Product (e.fits)

These are the combined extracted products. All extensions are two-dimensional spectra of size 4088 × 49. The "e.fits" file contains the extracted spectrum for each order for each fiber and the matching wavelength and blaze solution for each order and each fiber. The files are identified with a single odometer generated at the time of observation followed by an "e.fits" suffix.

11.2. 2D Telluric Corrected Product (t.fits)

These are the combined telluric-corrected products. All extensions are two-dimensional spectra of size 4088 × 49. The "t.fits" file contains the telluric corrected spectrum for each order and each fiber and the matching wavelength and blaze solution for each order and each fiber. The files are identified with a single odometer code at the time of observation followed by a "t.fits" suffix.

11.3. 1D Extraction and 1D Telluric Corrected Product (s.fits)

These are the combined 1D spectrum products and consist of two tables. The two tables consist of the 1D spectrum in 1. velocity units and 2. wavelength units. They each consist of the following columns: the wavelength solution, the extracted flux in AB, A, B, and C, the telluric corrected flux in fibers AB, A, and B (if available), and the associated uncertainties for each flux column. The files are identified with a single odometer code at the time of observation followed by an "s.fits" suffix.

11.4. Velocity Product (v.fits)

The velocity products are packaged into the "v.fits" file. Currently, only the CCF values (Section 9.1) are added as an extension as the LBL products are computed separately. The CCF file consists of the CCF generated for each radial velocity element (by default this is between ±300 m s⁻¹ in steps of 0.5 m s⁻¹) for each order and a combined CCF for the same radial velocity elements. The files are identified with a single odometer code at the time of observation followed by a "v.fits" suffix. Once the LBL module is able to be used with apero it will add an extension to the "v.fits" (the "rdb" extension described in the LBL documentation (see footnote 24)).

11.5. Polarimetric Product (p.fits)

These are the combined polarimetric products. The "p.fits" file consists of eight image extensions and three table extensions. The first two tables are the 1D representations of the 2D polarimetric products (listed in the extensions above) in 1. velocity units and 2. wavelength units. They each consist of the following columns: the wavelength solution, the polarimetric flux, the Stokes I flux, the Null 1 and 2 fluxes, and the associated uncertainties on each flux column. The third table lists the configuration parameters used to run apero. Although polarimetric products are the combination of at least 4 odometer codes, files are associated with a single odometer code (the first in the sequence at the time of observation) followed by a "p.fits" suffix.

Table 5. Example Full Run Reducing all SPIRou Legacy Survey (and some PI) Data from 2018 April to 2022 April

Sequence	Recipes	Number	Time Taken	Efficiency
Pre-processing	`apero`_`preprocess`	75761 files	34.5 hr	0.89

Reference Calibrations	`apero`_`dark`_`ref`, `apero`_`badpix`, `apero`_`loc`_`spirou`, `apero`_`shape`_`ref`, `apero`_`shape`, `apero`_`flat`, `apero`_`thermal`, `apero`_`leak`_`ref`, `apero`_`wave`_`ref`	1 night	1.5 hr	⋯

Nightly Calibrations	`apero`_`badpix`, `apero`_`loc`_`spirou`, `apero`_`shape`, `apero`_`flat`, `apero`_`thermal`, `apero`_`wave`_`night`	432 nights	7.4 hr	0.87

Extraction	`apero`_`extract`	46836 files	63.3 hr	0.63

Telluric (hot star)	`apero`_`mk`_`tellu`, `apero`_`mk`_`model`, `apero`_`fit`_`tellu`, `apero`_`mk`_`template`	1043 files	1.1 hr	0.70

Telluric (science)	`apero`_`fit`_`tellu`, `apero`_`mk`_`template`	45524 files	34.6 hr	0.75

CCF	`apero`_`ccf`	45524 files	7.0 hr	0.89

Polarimetry	`apero`_`polar`	9880 groups	2.1 hr	0.90

Post-process	`apero`_`processing`	48472 files	18.4 hr	0.93

Total			170.1 hr	0.76
			7.1 days	0.76

Note. This was processed on one machine using 35 cores. Note preprocessing is done on both science and calibration observations, reference calibrations are only run on a single night, and each nightly calibration depends on the availability of specific calibrations thus leading to a range of nights (from 366 to 401). Polarimetric groups consist of 4 individual exposures in different rhomb positions and are only processed for polar_fp and polar_dark files. The number of specific steps may depend on previous steps (i.e., quality control failures, engineering nights that were excluded, and odometer codes that were present in a list to not be processed). The efficiency is defined in Equation (24).

Download table as: ASCII Typeset image

12. Discussion

apero has been an ongoing effort since its conception in 2017 October (see Appendix D and Table D1). With the first light of SPIRou in 2018 April, it took nearly 2 yr for apero to start producing precise science results and has been used in publications since 2020 (see Appendix E and Table E1). Here we discuss current performances and limitations and planned future work.

Table D1. History of the Major Versions of apero

First Version	Last Version	Date of First Version	Main Improvements
0.0.000	0.0.048	2017-10-12	First python version of apero
0.1.000	0.1.037	2018-01-10	First version to run on SPIRou engineering data (H2RG detector)
0.2.000	0.2.128	2018-04-17	First version for SPIRou commissioning (H4RG upgrade)
0.3.000	0.3.077	2018-09-06	First implementation of telluric correction
0.4.000	0.4.123	2018-12-08	Re-work wave solution and BERV calculation
0.5.000	0.5.124	2019-05-10	Implementation of reference calibrations/recipes
0.6.000	0.6.132	2019-12-06	Complete re-ordering of apero file structure, first use on NIRPS
0.7.000	0.7.255	2020-10-16	Implementation of SQL databases, Telluric pre-cleaning, upgrade calibration recipes, integration of spirou-polar
0.8.000	active	2022-09-11	Currently in development: uncertainty propagation, nan pixel quality flags, optimal extraction weights, database architecture
1.0.000	⋯	⋯	After 0.8, full documentation, including adding new instruments

Download table as: ASCII Typeset image

Table E1. List of Some Publications Using apero for Science

Title	Citation
Spin–orbit alignment and magnetic activity in the young planetary system AU Mic	Martioli et al. (2020)
Early science with SPIRou: near-infrared radial velocity and spectropolarimetry of the planet-hosting star HD 189733	Moutou et al. (2020)
SPIRou: NIR velocimetry and spectropolarimetry at the CFHT	Donati et al. (2020)
Star-disk interaction in the T Tauri star V2129 Ophiuchi: An evolving accretion-ejection structure	Sousa et al. (2021)
Where Is the Water? Jupiter-like C/H Ratio but Strong H2O Depletion Found on Tau Bootis b Using SPIRou	Pelletier et al. (2021)
TOI-1278 B: SPIRou Unveils a Rare Brown Dwarf Companion in Close-in Orbit around an M Dwarf	Artigau et al. (2021)
Characterizing Exoplanetary Atmospheres at High Resolution with SPIRou: Detection of Water on HD 189733 b	Boucher et al. (2021)
TOI-530b: a giant planet transiting an M-dwarf detected by TESS	Gan et al. (2022a)
TOI-1759 b: A transiting sub-Neptune around a low mass star characterized with SPIRou and TESS	Martioli et al. (2022)
TESS discovery of a sub-Neptune orbiting a mid-M dwarf TOI-2136	Gan et al. (2022b)
Estimating fundamental parameters of nearby M dwarfs from SPIRou spectra	Cristofari et al. (2022a)
Line-by-line velocity measurements, an outlier-resistant method for precision velocimetry	Artigau et al. (2022)
TOI-1452 b: SPIRou and TESS reveal a super-Earth in a temperate orbit transiting an M4 dwarf	Cadieux et al. (2022)
Estimating the atmospheric properties of 44 M dwarfs from SPIRou spectra	Cristofari et al. (2022b)
CO or no CO? Narrowing the CO abundance constraint and recovering the H2O detection in the atmosphere of WASP-127b using SPIRou	Boucher et al. (2022)
Near-IR and optical radial velocities of the active M-dwarf star Gl 388 (AD Leo) with SPIRou at CFHT and SOPHIE at OHP	Carmona et al. (2022)
New insights on the near-infrared veiling of young stars using CFHT/SPIRou data	Sousa et al. (2022)
A sub-Neptune planet around TOI-1695 discovered and characterized with TESS and SPIRou	Kiefer et al. (2022)
The rotation period of 43 quiet M dwarfs from spectropolarimetry in the near-infrared: I. The SPIRou APERO analysis	P. Fouqué et al. (2022, in preparation)
Optical and near-infrared stellar activity characterization of the early M dwarf Gl 205 with SOPHIE and SPIRou	P. Cortés-Zuleta et al. (2022, in preparation)
High-resolution Chemical Spectroscopy of Barnard's Star with SPIRou	F. Jahanadar et al. (2022, in preparation)
New methods to correct for systematics in near-infrared radial velocity measurements: Application to GL725B with SPIRou data	M. Ould-Elhkim et al. (2022, in preparation)
Characterizing planetary systems with SPIRou: the M-dwarf Planet Search survey and the system of GJ 251	C. Moutou et al. (2022, in preparation)

Download table as: ASCII Typeset image

12.1. Performance and Limitations

Throughout development, we have tried to optimize the speed of all recipes (e.g., through the use of SQL databases, numba, bottleneck, etc.). With the most current version (v0.7.256), using 35 CPU cores (on a single node), we can reduce all available data (all SPIRou legacy survey data and all PI data to which we have access, covering ∼430 nights and ∼45,000 science observations) in 7 days; Table 5 shows a breakdown of the various steps using a 35 core machine. These timings are for all data we have access to, which are equivalent to ∼90% of all data taken with SPIRou between 2018 April and 2022 June.

We find that with 35 CPU cores we reach a point where we start to see input-output bottlenecks, most probably caused by writing to disk and/or writing to the various databases. This manifests as an individual slowdown of each recipe run which limits the efficiency of using more CPU cores (i.e., more recipes run at the same time mean more files being written to disk at the same time and more writing to the various database at the same time causing queuing to occur). We define efficiency in Table 5 as the total CPU time taken divided by the total time taken to run multiplied by the number of cores (Equation (24)). A perfectly efficient code would give a value of 1

$\begin{eqnarray}&&\mathrm{Efficiency}=\displaystyle \frac{\mathrm{total}\ \mathrm{CPU}\ \mathrm{time}}{\mathrm{total}\ \mathrm{time}\ \mathrm{taken}\times {N}_{\mathrm{cores}}}.\end{eqnarray} \tag{ 24 }$

We see that on average we have an efficiency between 0.7 and 0.93. We find that recipes that run quickly but save several files have lower efficiency, i.e., that a bottleneck of writing to disk may be occurring; however, these recipes also run quickly and write frequently to the various databases, and as such it is hard to distinguish between the disk and database bottlenecks. Also, recipes that run slow are rated as more efficient due to the amount of time spent using the CPUs (i.e., science algorithms) against reading/writing to and from disk/databases, thus our metric is far from perfect. Another factor is other processes using the machine at that specific time that cannot be easily taken into account when measuring how efficient we are. We will continue to review the performances, speed up the science algorithms and find ways to make the individual recipes faster.

Currently, apero is optimized to run on a single node (i.e., a single machine) with access to many CPU cores on this single node. It is possible to run batches of apero in a manual way using multiple runs of apero_processing and controlling (manually) when to run the next step (i.e., making sure all pre-processing is done before reference calibrations, that all reference calibrations are done before any night calibrations, etc.). One can also run recipes one-by-one bypassing the use of the apero_processing recipe completely. apero is optimized to reduce a full data set, this implies many terabytes of raw and reduced data. Currently to reduce a single night of data one requires at the very least a full reference calibration night (i.e., the calibration data from 2020 August 31) and the full set of calibrations for the night to be reduced (and preferably a few nights surrounding it in case any calibrations fail quality control) and the full telluric database of hot stars (from every night processed). This obviously means a large amount of data is required for even a single night or single observation. We plan to release a full calibration database and full telluric database of hot stars with a way to get and install this data, in order to allow any user to reduce a small set of data. We do however always recommend using data reduced with a full data set done in a uniform way. This is currently available at data centers at CFHT, Université de Montréal, and the Laboratoire d'Astrophysique de Marseille.

12.2. Future Work

As with most pipelines, improvements are always ongoing. With the LBL RV analysis code, we have seen RV accuracy down to 2 m s⁻¹ with SPIRou, which is an indication that apero is at least this precise. However, there are several features we plan to add:

1.
The apero recipes do not currently propagate uncertainties throughout the data reduction process, which can be problematic when trying to understand the limiting factors in the data analysis. Full error propagation is important as feedback to the engineering team; for example, quantifying the impact of the thermal background from the optical train on the measurement of K-band spectroscopic features such as the 2.29 μm CO band-head could justify efforts to cool parts of the optical train or not.
2.
In parallel to the propagation of uncertainties, we plan to propagate quality flags on pixel values, for example, whether a given invalid pixel is due to a cosmic ray hit or a hot pixel. In the current framework, pixels are either deemed valid or invalid and flagged as a nan, which does not allow one to back-trace the origin of missing data. This is done for JWST data products²⁹ with a pixel-level data quality encoded with 32-bit integers.
3.
As the SPIRou fibers are multi-mode, one expects a certain level of noise to arise from the time-varying weight of modes injected in each of the science fibers. This is minimized through the use of a pupil slicer and fiber scrambler at the injection. As the pupil slicer image provides more information on the flux distribution at the fiber exit than a simple fiber (e.g., the bottom-center panel in Figure 19), one could decorrelate this spatially resolved information against the modal noise.
4.
Persistence in infrared arrays is a non-trivial problem for faint-target observations Artigau et al. (2018). For any given image, one sees a decaying remnant image of all previous observations with decreasing amplitude. The amplitude of any given observation falls with an amplitude that is proportional to the inverse of the delay since the last illumination. One notable feature of persistence in infrared arrays is that it is not only the previous frame that matters but the entire history of illumination over the last few hours. A bright target observed for a long time at the beginning of a night (a common example being a multi-hour sequence to monitor a transiting planet) will affect all fainter targets observed later during the night. To add further complexity to the matter, the persistence response varies at the pixel-to-pixel level. Work has begun to construct a persistence model for SPIRou. Furthermore, the algorithms need to be run at the observatory level as data obtained earlier in the night may be proprietary and not accessible to all apero users.
5.
The main limitation for faint object observations with SPIRou or any near-infrared pRV spectrograph, particularly bluer of the K band, is detector readout noise. As has been demonstrated by Payeur et al. (2022), machine-learning algorithms can reduce readout noise in long sequences (100 readouts of ∼10 minutes) from ∼6 e⁻ to <2 e⁻ (see Table 2 therein). This needs to be performed before the current apero steps, as it requires handling the data cubes rather than the 2D images used here. As long as the output format is maintained, the machine-learning images should be used as inputs to apero.
6.
Energetic particles regularly hit infrared arrays and deposit electrons in pixels, leading to spurious signals. These hits happen basically instantaneously (on the timescales relevant to the readout) and manifest themselves as discontinuities in the time series of non-destructive readouts. Efficient algorithms have been proposed to handle cosmic ray hits in ramp-fitting frameworks (Anderson & Gordon 2011) but have yet to be implemented for SPIRou.
7.
The LBL recipes have been designed to use apero byproducts, but they have not yet been implemented within the automated apero framework. Steps that are currently done manually, such as the association of an appropriate stellar template, will be included within apero in the near future.

There are also various planned improvements, they are the optimal extraction (better characterization of extraction weights), the database architecture (currently throttling the maximum number of connections with a large number of cores), some minor memory leaks when parallel processing, handling the thermal contribution at bluer wavelength domains and completing all documentation (see footnote 16).

13. Conclusion

We present A PipelinE to Reduce Observations (apero) and highlight its use as the official pipeline for SPIRou. We walk through the steps going from raw data to science-ready products. We detail the pre-processing of raw data to correct detector issues, the production of reference calibrations and nightly calibrations, and the use of these calibrations to correct and extract hot stars and science observations in a consistent, controlled manner. We summarize telluric correction (which will be detailed in a future publication, É. Artigau et al. 2022, in preparation), RV analysis, polarimetric analysis, and our post-processing recipes delivering telluric corrected 2D and 1D spectra as well as polarimetry products and enabling precise stable radial velocity calculations (via the LBL algorithm, Artigau et al. 2022), good to at least ∼2 m s⁻¹ over the timescale of the current lifetime of SPIRou (5 yr).

We would like to thank the anonymous referee for the valuable comments that improved the quality of the paper. The authors wish to thank everyone involved with writing, maintaining, and updating all Python packages. Specifically, apero has made extensive use of: (a) astropy, (Astropy Collaboration et al. 2013, 2018) (b) astroquery (Ginsburg et al. 2019) (c) barycorrpy (Kanodia & Wright 2018; Wright & Kanodia 2020) (d) matplotlib (Hunter 2007) (e) numpy (Harris et al. 2020) (f) pandas (McKinney 2010, 2011) (g) scipy (Virtanen et al. 2020).

As well as the python packages bottleneck (Goodman 2019), gitchangelog (Lab 2018), ipdb (Chapelle 2021), IPython package (Pérez & Granger 2007), mysql-connector-python (Mariz 2021), numba (Lam et al. 2015), pandastable (Farrell 2016), Pillow (Murray 2021), pyyaml (Simonov 2021), sphinx (Komiya & Brandl 2021), sqlalchemy (Bayer 2021), Scikit-learn (Pedregosa et al. 2011), tqdm (da Costa-Luis et al. 2021), yagmail (van Kooten 2021), and xlrd (Withers 2021).

This research made use of ds9, a tool for data visualization supported by the Chandra X-ray Science Center (CXC) and the High Energy Astrophysics Science Archive Center (HEASARC) with support from the JWST Mission office at the Space Telescope Science Institute for 3D visualization.

This research made use of TOPCAT, an interactive graphical viewer and editor for tabular data (Taylor 2005).

apero would have been impossible without the use of PyCharm, Git, and GitHub.

This work has made use of data from the European Space Agency (ESA) mission Gaia (https://www.cosmos.esa.int/gaia), processed by the Gaia Data Processing and Analysis Consortium (DPAC, https://www.cosmos.esa.int/web/gaia/dpac/consortium). Funding for the DPAC has been provided by national institutions, in particular, the institutions participating in the Gaia Multilateral Agreement. apero has made use of the SIMBAD database, operated at CDS, Strasbourg, France. This research has made use of NASA's Astrophysics Data System. apero has made use of the VizieR catalog access tool, CDS, Strasbourg, France. The acknowledgments were compiled using the Astronomy Acknowledgment Generator.

The authors wish to recognize and acknowledge the very significant cultural role and reverence that the summit of Maunakea has always had within the indigenous Hawaiian community. We are most fortunate to have the opportunity to conduct observations from this mountain.

This work was financially supported by the Natural Sciences and Engineering Research Council of Canada and the Fonds Québécois de Recherche—Nature et Technologies. Observatoire du Mont-Mégantic and the Institute for Research on Exoplanets acknowledge funding from Développement Économique Canada, Quebec's Ministère de l'Éducation et de l'Innovation, the Trottier Family Foundation and the Canadian Space Agency. M.H. and I.B. acknowledge support from ANID—Millennium Science Initiative—ICN12_009. C.M., A.C., P.F., X.D., I.B., and J.F.D. acknowledges funding from the French ANR under contract number ANR18CE310019 (SPlaSH) and the Programme National de Planétologie (PNP). This work is supported by the French National Research Agency in the framework of the Investissements d'Avenir program (ANR-15-IDEX-02), through the funding of the "Origin of Life" project of the Grenoble-Alpes University. J.F.D. acknowledges funding from the European Research Council (ERC) under the H2020 research & innovation program (grant agreement #740651 NewWorlds). T.V. would like to acknowledge funding from the Fonds de Recherche du Québec—Nature et Technologies (FRQNT, scholarship number 320056), and the Institute for Research on Exoplanets (iREx).

Facility: CFHT(SPIRou). -

Software: python3: astropy, astroquery, barycorrpy, bottleneck, ipdb, ipython, matplotlib, numba, numpy, pandas, pandastable, Pillow, pyyaml, scikit-image, scipy, sphinx, sqlalchemy, tqdm, xlrd.

Appendix A: Creating a Raw SPIRou Ramp Image

The SPIRou detector control software reads the detector continuously every 5.57 s and produces a 2D image (4096 × 4096) constructed from the linear fit of the pixel value versus time (as well as a slope, intercept, error, and number of frames used for quality checks). The construction of the 2D image from individual readouts being handled at the acquisition step is not included as part of apero but as software maintained and used at CFHT. The construction of the 2D frame is performed through the following steps.

Individual detector frames are obtained from the detector control software every 5.57 s (at time j, t[j]). A flagging of pixel saturation is performed, and pixels with a nonlinearity that would be larger than ∼10 % are considered unreliable and rejected for all future readouts (binary mask m[i, j] for pixel i at time j). A nonlinearity correction is applied to pixel fluxes (flux at pixel i and at time j is f[i, j]). As individual readouts are in computer memory, intermediate quantities necessary for the computation of the total pixel-level slope are computed. The advantage of preserving these quantities in memory is that one can perform a pixel-level ramp-fitting over an arbitrarily large number of frames without being required to have all pixel values in memory at the same time or without having to access files multiple times

1.
σ_x[i] = ∑_j m[i, j] ∗ f[i, j]
2.
σ_y[i] = ∑_j m[i, j] ∗ t[j]
3.
σ_xy[i] = ∑_j m[i, j] ∗ f[i, j] ∗ t[j]
4.
${\sigma }_{{x}^{2}}[i]={\sum }_{j}m[i,j]\ast f{\left[i,j\right]}^{2}$
5.
n[i] = ∑_j m[i, j].

Among these intermediate quantities, the only one that has a clear signification is n[i]: it corresponds to the number of valid (i.e., below the predefined flux level for saturation) readouts obtained over the entire sequence. For normal scientific exposures, n[i] is equal to the total number of readouts for the vast majority of pixels; pixels with a very large dark current being the exception.

From simple linear algebra, one can show that the per-pixel intercept is

$\begin{eqnarray}&&b[i]=\displaystyle \frac{{\sigma }_{x}[i]{\sigma }_{{xy}}[i]-{\sigma }_{{x}^{2}}[i]\ast {\sigma }_{y}[i]}{{\sigma }_{x}{\left[i\right]}^{2}-n{\sigma }_{{x}^{2}}}\end{eqnarray} \tag{ A1 }$

and correspondingly, the per-pixel slope is:

$\begin{eqnarray}&&a[i]=\displaystyle \frac{{\sigma }_{y}-n[i]b[i]}{{\sigma }_{x}[i]}.\end{eqnarray} \tag{ A2 }$

Once the slope image a[i] has been computed, it is corrected for correlated amplifier noise using the side reference pixels (along the fast readout axis), and amplifier offset is corrected using the top and bottom reference pixels (both extremities of the slow readout axis).

As a quality check of pixel-level value, we further compute the error on the slope to identify pixels having a suspiciously large dispersion of their values around the slope. This is used to flag pixels that have a large slope that is inconsistent with their frame-to-frame accumulation rate. To compute the slope, one has to re-read all frames once (though it is not required to keep them all at once in memory) and compute the following values

1.
x_p[i] = sx[i]/n[i]
2.
y_p[i, j] = b[i] + a[i] ∗ t[j]
3.
${\varrho }_{{x}^{2}}[i]={\sum }_{j}(t[j]-{x}_{p}[i])m[i,j]$
4.
${\varrho }_{{y}^{2}}[i]={\sum }_{j}(f[i,j]-{y}_{p}[i,j])m[i,j].$

From which the slope error is

$\begin{eqnarray}&&\varrho (i)=\sqrt{\displaystyle \frac{{\varrho }_{{y}^{2}}[i]/(n[i]-2)}{{\varrho }_{{x}^{2}}[i]}}.\end{eqnarray} \tag{ A3 }$

Appendix B: Standard Image Calibration

After pre-processing (Section 4), the reference dark calculation (Section 5.1) and the bad pixel correction (Section 6.1), all images that are used in apero need to be calibrated in a standard way (using both the dark reference and bad pixel recipe outputs). This is not a separate recipe but a set of functions that are used in all recipes that use pre-processed files as inputs.

The standard calibration is ordered as follows:

1.
dark reference correction (Appendix B.1).
2.
flip, resize and re-scale the image (Appendix B.2).
3.
flag bad pixels (Appendix B.3).
4.
correct background flux (Appendix B.4).
5.
clean hot pixels (Appendix B.5).
6.
flag pixels that are out of bounds (Appendix B.6).

B.1. Dark Reference Correction

The first step of the standard calibration of pre-processed files is to correct the input image for the dark signal.

$\begin{eqnarray}&&{{IM}}_{\mathrm{corr}\ i,j}={{IM}}_{\mathrm{uncorr}\ i,j}-N({\mathrm{DARK}}_{i,j})\end{eqnarray} \tag{ B1 }$

where ${{IM}}_{\mathrm{corr}\ i,j}$ and ${{IM}}_{\mathrm{uncorr}\ i,j}$ are the flux in the ith row jth column of the corrected image and uncorrected image respectively, N is the number of raw images that went into IM and DARK_i,j is flux in the ith row jth column of the reference dark (see Section 5.1).

The dark reference is taken from the calibration database. If more than one dark reference exists the closest in time to IM is used (using the header key mjdmid from the header).

B.2. Flipping, Resizing, and Re-scaling the Image

For legacy reasons, the image is flipped in the vertical and horizontal directions (see Figure 4). After this the image is converted from ADU s⁻¹ to electrons using Equation (B2)

$\begin{eqnarray}&&{{IM}}_{\mathrm{electrons}\ i,j}={{IM}}_{\mathrm{ADU}/{\rm{s}}\ i,j}\times \mathrm{Gain}\times {t}_{\exp }\end{eqnarray} \tag{ B2 }$

where ${{IM}}_{\mathrm{electons}\ i,j}$ is the flux in electrons for the ith row jth column, ${{IM}}_{\mathrm{ADU}/{\rm{s}}\ i,j}$ is the flux in ADU s⁻¹ for the ith row jth column, the gain is taken from the header key gain (although it has remained constant over the lifetime of SPIRou), and exptime is the exposure time in seconds (taken from the header key exptime).

Once the image is in electrons it is then resized. The image is cut in the cross-order direction to start from pixel 250 and end at pixel 3350 (removing a partial blue order and the whole unilluminated dark amplifier region) and in the along-order direction to start from pixel 4 and end at pixel 4092 (removing just the H4RG reference pixels). Thus after this resizing the image is of size 3100 × 4088 (see Figure 4).

B.3. Flagging the Bad Pixels

The closest bad pixel map (badpix, see Section 6.1) in time to the image (using the header key mjdmid from the header) is loaded from the calibration database and all pixels that are flagged as bad pixels are set to nan, this is shown in Equation (B3)

$\begin{eqnarray}{{IM}}_{\mathrm{corr}\ i,j}=\left\{\begin{array}{cl}\mathrm{NaN}: & {\mathrm{BADPIX}}_{i,j}\equiv 1\\ {{IM}}_{i,j}: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ B3 }$

where ${{IM}}_{\mathrm{corr}\ i,j}$ and ${{IM}}_{i,j}$ are the flux in the ith row jth column of the corrected image and the input image respectively. BADPIX_i,j is a bad pixel flag (1 or 0) in the ith row jth column from the bad pixel map.

B.4. Correcting the Background Flux

Within each science image, we take the median of "background" pixels (identified using the backmap, see Section 6.1) within a region and create a map of large-scale background features (middle panel, Figure 14). This map is then splined into a 4088 × 4088 image and subtracted from the science frame.

B.5. Additional Cleaning of Hot Pixels

Hot pixels are flagged by finding pixels that are 10σ (positive or negative) outliers compared to their immediate neighbors. This is in addition to the cosmic ray rejection applied in Section 4.5 and the bad pixel flagging (Section 6.1) which removes most of the hot pixels. In this additional cleaning of hot pixels, we first construct a flattened image and perform a low-pass filter in the along-order direction, filtering the image so that only pixel-to-pixel structures remain. We then apply median filtering, which removes these big outliers, and then we smooth the image to avoid big regions filled with zeros. We apply a 5 pixel median boxcar and a 5 pixel smoothing in the along-order direction, which blurs along the dispersion over a scale of ∼7 pixels. Bad pixels are interpolated with a 2D surface fit by using valid pixels within a 3 × 3 pixel box centered on the bad pixel.

B.6. Flagging Out of Bound Pixels

Pixel values need to be within reasonable bounds considering the physics of the H4RG detector. If they are not in bounds we set them to nan. The upper bound is the saturation per frame time, but as the flux is expressed as a slope (in the fits2ramp.py), a pixel with a value greater than the saturation point can be recorded by the detector and is nonphysical. The lower bound is set to the negative value of ten times the readout noise, these bounds are shown in Equation (B4)

$\begin{eqnarray}{{IM}}_{\mathrm{corr}\ i,j}=\left\{\begin{array}{cl}\mathrm{NaN}: & {{IM}}_{i,j}\gt \mathrm{saturation}/{t}_{\mathrm{frame}}\\ \mathrm{NaN}: & {{IM}}_{i,j}\lt -10\times \mathrm{readout}\ \mathrm{noise}\\ {{IM}}_{i,j}: & \mathrm{otherwise}\end{array}\right.\end{eqnarray} \tag{ B4 }$

where ${{IM}}_{\mathrm{corr}\ i,j}$ and ${{IM}}_{i,j}$ are the flux in the ith row jth column of the corrected image and the input image respectively, saturation is taken from the header keyword saturate and is converted to electrons via (B2), t_frame is the individual frame time (from header keyword frmtime), and readout noise is taken from the header keyword rdnoise.

Appendix C: Shape Transformation

The shape transform algorithm allows three different transformations, that may or may not be used. Here we define x as the direction along the order, y as the direction across the order.

1.
a linear transform: defined by dx, dy, A, B, C, D, where dx and dy are shifts x and y respectively and A, B, C, D form the transform matrix:
$\begin{eqnarray}\left[\begin{array}{cc}A & B\\ C & D\end{array}\right].\end{eqnarray} \tag{ C1 }$
This combines with dx and dy in order to form a 3 × 3 matrix:
$\begin{eqnarray}\left[\begin{array}{ccc}A & B & {dx}\\ C & D & {dy}\\ 0 & 0 & 1\end{array}\right].\end{eqnarray} \tag{ C2 }$
This 3 × 3 linear transformation matrix allows for scaling, rotation, reflection (not used in our case) and shearing (Gonzalez & Woods 2008).
2.
a shift in x position, where a shift is defined for each pixel.
3.
a shift in y position, where a shift is defined for each pixel.

Appendix D: Version History of APERO

apero has been in development since 2017 October. Here we list a few of the major versions to give the reader an idea of how long the development of a full pipeline can take.

Appendix E: Current Science Publications using APERO

In Table E1 we list some science publications using apero for science. This list is not complete but gives an idea of the range of science enabled by apero with SPIRou.

Appendix F: Inputs

The currently allowed raw file inputs are listed in Table F1. The name becomes the apero header key dprtype. All other columns are header keys found in the raw input files or are added/modified when first processed (in apero_preprocess and apero_processing). Although all SPIRou raw files have suffixes:

1.
a.fits (obstype = align)
2.
c.fits (obstype = comparison)
3.
d.fits (obstype = dark)
4.
f.fits (obstype = flat)
5.
o.fits (obstype = object)

apero does not rely on the filenames to assign a dprtype to a raw input file. Instead, we use header keys to identify file types (Table F1).

Table F1. All Possible Inputs Currently Accepted by apero

Name	OBSTYPE	SBCCAS_P	SBCREF_P	SBCALI_P	INSTRUME	TRG_TYPE*	DRSMODE*
dark_dark_int	DARK	pos_pk	pos_pk	P4	SPIRou	⋯	⋯
dark_dark_tel	DARK	pos_pk	pos_pk	P5	SPIRou	⋯	⋯
dark_dark_sky	OBJECT	pos_pk	pos_pk	⋯	SPIRou	SKY	⋯
dark_fp_sky	OBJECT	pos_pk	pos_fp	⋯	SPIRou	SKY	⋯
dark_flat	FLAT	pos_pk	pos_wl	⋯	SPIRou	⋯	⋯
flat_dark	FLAT	pos_wl	pos_pk	⋯	SPIRou	⋯	⋯
flat_flat	FLAT	pos_wl	pos_wl	⋯	SPIRou	⋯	⋯
flat_fp	FLAT	pos_wl	pos_fp	⋯	SPIRou	⋯	⋯
dark_fp	ALIGN	pos_pk	pos_fp	⋯	SPIRou	⋯	⋯
fp_dark	ALIGN	pos_fp	pos_pk	⋯	SPIRou	⋯	⋯
fp_flat	ALIGN	pos_fp	pos_wl	⋯	SPIRou	⋯	⋯
fp_fp	ALIGN	pos_fp	pos_fp	⋯	SPIRou	⋯	⋯
lfc_lfc	ALIGN	pos_rs	pos_rs	⋯	SPIRou	⋯	⋯
lfc_fp	ALIGN	pos_rs	pos_fp	⋯	SPIRou	⋯	⋯
fp_lfc	ALIGN	pos_fp	pos_rs	⋯	SPIRou	⋯	⋯
obj_dark	OBJECT	pos_pk	pos_pk	⋯	SPIRou	TARGET	SPECTROSCOPY
obj_fp	OBJECT	pos_pk	pos_fp	⋯	SPIRou	TARGET	SPECTROSCOPY
obj_hcone	OBJECT	pos_pk	pos_hc1	⋯	SPIRou	TARGET	⋯
obj_hctwo	OBJECT	pos_pk	pos_hc2	⋯	SPIRou	TARGET	⋯
polar_dark	OBJECT	pos_pk	pos_pk	⋯	SPIRou	TARGET	POLAR
polar_fp	OBJECT	pos_pk	pos_fp	⋯	SPIRou	TARGET	POLAR
dark_hcone	COMPARISON	pos_pk	pos_hc1	⋯	SPIRou	⋯	⋯
dark_hctwo	COMPARISON	pos_pk	pos_hc2	⋯	SPIRou	⋯	⋯
fp_hcone	COMPARISON	pos_fp	pos_hc1	⋯	SPIRou	⋯	⋯
fp_hctwo	COMPARISON	pos_fp	pos_hc2	⋯	SPIRou	⋯	⋯
hcone_fp	COMPARISON	pos_hc1	pos_fp	⋯	SPIRou	⋯	⋯
hctwo_fp	COMPARISON	pos_hc2	pos_fp	⋯	SPIRou	⋯	⋯
hcone_hcone	COMPARISON	pos_hc1	pos_hc1	⋯	SPIRou	⋯	⋯
hctwo_hctwo	COMPARISON	pos_hc2	pos_hc2	⋯	SPIRou	⋯	⋯
hcone_dark	COMPARISON	pos_hc1	pos_pk	⋯	SPIRou	⋯	⋯
hctwo_dark	COMPARISON	pos_hc2	pos_pk	⋯	SPIRou	⋯	⋯

Note. HDR denotes that a keyword is required from an input file header. * denotes header key is added or modified by apero before internal use.

Download table as: ASCII Typeset image

Appendix G: APERO Products

apero produces outputs after every recipe. These are saved to the reduced directory (except for the apero_preprocess recipe that saves outputs into the working directory). These are intermediary products and are used to create the CADC outputs in post-processing (see Section 11) and are show in Tables G1 and G2.

Table G1. Main apero Products, where id for SPIRou is the Odometer Code and the HASH is a Checksum Created by Multiple Inputs (of the same Data Type) to the given Recipe

File	Recipe	Frequency	Description
(id)_pp.fits	`apero`_`preprocess`	every file	preprocessed file
(HASH)_pp_dark_ref.fits	`apero`_`dark`_`ref`	ref night	reference dark file
(HASH)_pp_badpixel.fits	`apero`_`badpix`	every night	bad pixel map file
(HASH)_pp_order_profile_{AB,C}.fits	`apero`_`loc`_`spirou`	every night	order profile file
(HASH)_pp_loco_{AB,C}.fits	`apero`_`loc`_`spirou`	every night	localization center map file
(HASH)_pp_shapex.fits	`apero`_`shape`_`ref`	ref night	dx shape map file
(HASH)_pp_shapey.fits	`apero`_`shape`_`ref`	ref night	dy shape map file
(HASH)_pp_fpref.fits	`apero`_`shape`_`ref`	ref night	FP reference file
(HASH)_pp_shapel.fits	`apero`_`shape`	every night	local shape map file
(HASH)_pp_blaze_{AB,A,B,C}.fits	`apero`_`flat`	every night	blaze correction file
(HASH)_pp_flat_{AB,A,B,C}.fits	`apero`_`flat`	every night	flat correction file
(HASH)_pp_e2ds_{AB,A,B,C}.fits	`apero`_`thermal`	every night	2D Extracted dark_dark_int and/or dark_dark_tel file [49 × 4088]
(HASH)_pp_e2dsff_{AB,A,B,C}.fits	`apero`_`thermal`	every night	2D extracted + flat fielded dark_dark_int and/or dark_dark_tel file [49 × 4088]
(HASH)_pp_s1d_v_{AB,A,B,C}.fits	`apero`_`thermal`	every night	1D extracted + flat fielded dark_dark_int and/or dark_dark_tel file with constants velocity bins
(HASH)_pp_s1d_w_{AB,A,B,C}.fits	`apero`_`thermal`	every night	1D extracted + flat fielded dark_dark_int and/or dark_dark_tel with constants wavelength bins
(HASH)_pp_thermal_e2ds_int_{AB,A,B,C}.fits	`apero`_`thermal`	every night	extracted thermal internal dark calibration file
(HASH)_pp_thermal_e2ds_tel_{AB,A,B,C}.fits	`apero`_`thermal`	every night	extracted thermal telescope dark calibration file
(id)_pp_e2ds_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	2D Extracted dark_fp file [49 × 4088]
(id)_pp_e2dsff_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	2D extracted + flat fielded dark_dark_int and/or dark_dark_tel file [49 × 4088]
(id)_pp_s1d_v_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	1D extracted + flat fielded dark_dark_int and/or dark_dark_tel file with constants velocity bins
(id)_pp_s1d_w_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	1D extracted + flat fielded dark_dark_int and/or dark_dark_tel with constants wavelength bins
(HASH)_pp_leak_ref_{AB,A,B,C}.fits	`apero`_`leak`_`ref`	ref night	leak correction reference file
(HASH)_pp_e2ds_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	2D extracted fp_fp and hcone_hcone file [49 × 4088]
(HASH)_pp_e2dsff_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	2D extracted + flat fielded fp_fp and hcone_hcone file [49 × 4088]
(HASH)_pp_s1d_v_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	1D extracted + flat fielded fp_fp and hcone_hcone file] with constants velocity bins
(HASH)_pp_s1d_w_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	1D extracted + flat fielded fp_fp and hcone_hcone file] with constants wavelength bins
(HASH)_pp_wavesol_ref_{AB,A,B,C}.fits	`apero`_`wave`_`ref`	ref night	reference 2D wavelength solution [49 × 4088]
(HASH)_pp_wavecav_AB.fits	`apero`_`wave`_`ref`	ref night	Cavity width measurement file
(HASH)_pp_e2ds_{AB,A,B,C}.fits	`apero`_`wave`_`night`	every night	2D extracted fp_fp and hcone_hcone file [49 × 4088]
(HASH)_pp_e2dsff_{AB,A,B,C}.fits	`apero`_`wave`_`night`	every night	2D extracted + flat fielded fp_fp and hcone_hcone file [49 × 4088]
(HASH)_pp_s1d_v_{AB,A,B,C}.fits	`apero`_`wave`_`night`	every night	1D extracted + flat fielded fp_fp and hcone_hcone file] with constants velocity bins
(HASH)_pp_s1d_w_{AB,A,B,C}.fits	`apero`_`wave`_`night`	every night	1D extracted + flat fielded fp_fp and hcone_hcone file] with constants wavelength bins
(HASH)_pp_wave_night_ref_{AB,A,B,C}.fits	`apero`_`wave`_`night`	every night	reference 2D wavelength solution [49 × 4088]

Note. A full list of data products for each recipe can be found in the documentation. Continued in Table G2.

Download table as: ASCII Typeset image

Table G2. Main apero Products, where id for SPIRou is the Odometer Code and the HASH is a Checksum Created by Multiple Inputs (of the same Data Type) to the given Recipe

File	Recipe	Frequency	Description
(id)_pp_e2ds_{AB,A,B,C}.fits	`apero`_`extract`	every file	2D extracted science/hot star file [49 × 4088]
(id)_pp_e2dsff_{AB,A,B,C}.fits	`apero`_`extract`	every file	2D extracted + flat fielded science/hot star file [49 × 4088]
(id)_pp_s1d_v_{AB,A,B,C}.fits	`apero`_`extract`	every file	1D extracted + flat fielded science/hot star file with constants velocity bins
(id)_pp_s1d_w_{AB,A,B,C}.fits	`apero`_`extract`	every file	1D extracted + flat fielded science/hot star file with constants wavelength bins
(id)_pp_tellu_trans{AB,A,B}.fits	`apero`_`mk`_`tellu`	every hot star file	Measured telluric transmission file [49 × 4088]
(id)_pp_tellu_pclean{AB,A,B}.fits	`apero`_`mk`_`tellu`	every hot star file	Telluric pre-cleaning (corrected, transmission mask, measured absorption, sky model) [49 × 4088]
trans_model_AB.fits	`apero`_`mk`_`model`	one	Model of all telluric transmission files (residuals in water, dry and a dc level).
(id)_pp_e2dsff_tcorr{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	2D telluric corrected extracted flat fielded file [49 × 4088]
(id)_pp_s1d_w_tcorr_{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	1D Telluric corrected extracted flat fielded file with constants velocity bins
(id)_pp_s1d_v_tcorr_{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	1D telluric corrected extracted flat fielded file with constants velocity bins
(id)_pp_e2dsff_recon{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	2D telluric reconstructed absorption file [49 × 4088]
(id)_pp_s1d_w_recon_{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	1D telluric reconstructed absorption file with constants wavelength bins
(id)_pp_s1d_v_recon_{AB,A,B}.fits	`apero`_`fit`_`tellu`	every hot star/science file	1D telluric reconstructed absorption file with constants velocity bins
(id)_pp_tellu_pclean{AB,A,B}.fits	`apero`_`mk`_`tellu`	every hot star/science file	Telluric pre-cleaning (corrected, transmission mask, measured absorption, sky model) [49 × 4088]
Template_{object}_tellu_obj_AB.fits	`apero`_`mk`_`template`	once per object	2D telluric corrected template of a hot star or science object [49 × 4088]
Template_s1d_{object}_sc1d_w_file_AB.fits	`apero`_`mk`_`template`	once per object	1D telluric corrected template of a hot star or science object with constants wavelength bins
Template_s1d_{object}_sc1d_v_file_AB.fits	`apero`_`mk`_`template`	once per object	1D telluric corrected template of a hot star or science object with constants velocity bins
(id)_pp_e2dsff_tcorr{AB,A,B}_ccf _{mask}_{AB,A,B,C}.fits	`apero`_`ccf`	every hot star/science file	The CCF output file (CCFs per order and fitted parameters)
(HASH)_pp_e2dsff_tcorr_pol_deg.fits	`apero`_`polar`	every polarimetric group	2D polar file [49 × 4088]
(HASH)_pp_e2dsff_tcorr_null1_pol.fits	`apero`_`polar`	every polarimetric group	2D Null 1 file [49 × 4088]
(HASH)_pp_e2dsff_tcorr_null2_pol.fits	`apero`_`polar`	every polarimetric group	2D Null 2 file [49 × 4088]
(HASH)_pp_e2dsff_tcorr_StokesI.fits	`apero`_`polar`	every polarimetric group	2D Stokes I file [49 × 4088]
(HASH)_pp_e2dsff_tcorr_s1d_w_pol.fits	`apero`_`polar`	every polarimetric group	1D polarimetry, null 1, null 2 and stokes I file with constants wavelength bins
(HASH)_pp_e2dsff_tcorr_s1d_v_pol.fits	`apero`_`polar`	every polarimetric group	1D polarimetry, null 1, null 2 and stokes I file with constants velocity bins

Note. A full list of data products for each recipe can be found in the documentation. Continued from Table G1.

Download table as: ASCII Typeset image

Appendix H: Preliminary Usage with NIRPS

One of our main goals with apero was to keep the code generic enough that adding new instruments is possible. To document this we detail here the changes required to add NIRPS_HE and NIRPS_HA modes to apero. This work is preliminary as commissioning of NIRPS is currently underway and we expect additional changes will be required when larger data set over longer periods of time exist (including long science sequences); however apero with NIRPS has already demonstrated precision equivalent to SPIRou. The specific details of all changes are not in the scope of this paper and will be part of a future publication. Currently, after extraction, there are no code differences between NIRPS and SPIRou reductions.

H.1. NIRPS: A Comparison with SPIRou

NIRPS is very similar to SPIRou but differs in several ways. We list key differences that apero must handle:

1.
there are two modes, high efficiency (NIRPS_HE) and high accuracy (NIRPS_HA).
2.
there is only one science fiber and one calibration fiber.
3.
wavelength domain of 980–1800 nm (negligible thermal emission)
4.
missing order(s) around 1400 nm
5.
the resolution is higher ∼100,000 and ∼80,000 for NIRPS_HE and NIRPS_HA modes respectively.
6.
there is no slicer and NIRPS_HE and NIRPS_HA have differing fiber geometry.
7.
there are 73 echelle orders extracted by apero.
8.
there are no dark unilluminated amplifiers.

H.2. APERO Changes for NIRPS

We use Figure 1 as our reference to changes within apero sub-packages. It is worth noting that adapting apero for use with NIRPS did change some code used for SPIRou, as having a second instrument with some unique characteristics informed us of code that could be improved for both instruments. These changes are not mentioned in this section. apero is designed to have each of these instruments in the same code base thus there is no separate installation, nor download of additional code is required for apero for usage with NIRPS.

No code was changed in the following apero sub-packages: apero.core: with the exception of apero.core.instruments, apero.documentation, apero.io, apero.lang, apero.plotting, apero.setup, and apero.tools. Minimal code (adding 3 or fewer changes) was changed in the following apero sub-package: apero.base: adding of NIRPS_HE and NIRPS_HA to the supported instruments list. Some code changes were added to the following apero sub-packages:

1.
apero.data: data files were copied from SPIRou and updated for NIRPS_HE and NIRPS_HA. A few FITS files had to be updated using external scripts. We plan that these scripts will be ingested into apero as tools to be used on other instruments directly with sufficient documentation to do so.
2.
apero.recipes: recipe scripts were copied from SPIRou (mostly unchanged other than filename). One recipe was added that did not exist before (a reference flat done before the preprocessing recipe), which was added to the preprocessing sequence for NIRPS_HE and NIRPS_HA.
3.
apero.science: some science algorithms were added (and called from the recipes) in addition to or to replace SPIRou science algorithms. For example, as NIRPS does not have an unilluminated region similar to SPIRou we have to handle the detector corrections in preprocessing slightly differently, building a background image from the between-order regions.

Substantial code was added to the following apero sub-package: apero.core.instruments: configuration, constants, keywords, file definitions, pseudo constants and recipe definitions were copied from SPIRou and updated for NIRPS_HE and NIRPS_HA. We also removed the polarimetry recipes as there is no polarimetry mode for NIRPS.

Note that pseudo constants are constants and variables that cannot be described by a single number or string of characters, such as a decision between science and reference fibers for a specific step of apero, or a specific fix to a FITS header key for a specific instrument. These pseudo constants are python functions designed to keep these instrument-specific options separate from the rest of the code.

Appendix I: Glossary

In Table I1 we present a list of terms used throughout this paper.

Table I1. Glossary of Terms

Term	Description
apero	A PipelinE to Reduce Observations.
apero profile	A specific setup of apero (i.e., with a certain set of constants, reduction directories, database setups, etc.).
amplifier	Independent electronic readout circuits operating in parallel and used to minimize the total readout time.
CCF	Cross-correlation function
CADC	Canadian Data Astronomy Center.
CFHT	The Canada, France, Hawaii Telescope, situated on Maunakea, Hawaii, US.
dprtype	Data product type—this describes what is in the science and reference fibers and distinguishes different calibrations and observations from each other.
drsmode	Data product mode—for SPIRou this is spectroscopy data or polarimetry data.
e2ds	An extracted order-by-order spectrum.
e2dsff	An extracted order-by-order spectrum that has been flat fielded.
fast-axis (long-axis)	Axis parallel to the amplifier direction on the detector. For SPIRou this is 4096 pixels per amplifier.
FITS file	A Flexible Image Transport System file to hold images, tables, and metadata in the form of a FITS header.
FITS header	A Flexible Image Transport System metadata holder. Consists of 8 character "keys," a value, and a comment.
FP	A Fabry–Perot etalon used for calibration.
hash	A short unique hexadecimal string of characters generated from a long string of characters.
HC	A hollow cathode lamp used for calibration.
hot star	Bright, fast rotators of B or A spectral type that are spectrally featureless under ∼100 km s⁻¹.
LBL	Line-by-line method for measuring radial velocity, presented in Artigau et al. (2022).
NIRPS	Near Infra Red Planet Searcher, a spectrograph on the 3.6 m telescope in La Silla, Chile.
odometer	A unique sequential number used by CFHT to identify individual observations.
order	A domain on the detector at specific wavelengths generated by light passing through a diffraction grating.
PI	Principal investigator.
pipeline	Software that takes data from the origin to a destination.
post-processed data	Data that is given to PIs after apero has been completed.
pre-processed data	Data that has been corrected for detector issues—the first step for handling raw data.
pRV	Precision radial velocity, measurements at the order of m s⁻¹ accuracy.
raw data	For the purposes of apero this is the RAMPS from CFHT.
recipe	A top-level script similar to a cookbook recipe where simple steps follow one another; most calculations and algorithms are hidden from these recipes.
reduced data	Data that has been created from the raw data using a pipeline.
rhomb	An ensemble of prisms used to rotate polarization states.
run.ini file	A configuration file used for a specific reduction sequence i.e., a science sequence or a calibration sequence.
reduction sequence	A set of recipes run in a certain order with specific filters on which files to reduce for a specific purpose.
reference calibration	A calibration done once (not on a nightly basis).
science observation	Any observation that is taken by the telescope specifically for the purposes of science i.e., SLS data and PI data.
slicer	Device that is used to split an image into narrower images to increase spectral resolution.
slow-axis (short-axis)	Axis perpendicular to the amplifier direction on the detector. For SPIRou this is 128 pixels per amplifier.
SPIRou	Spectro Polarimètre Infra ROUge, a spectrograph for CFHT.
SLS	The SPIRou Legacy Survey.
Wollaston prism	A device that allows the incoming beam (either from the telescope or the calibration unit) to be split into two lyorthogonal polarized beams.
1/f noise	A noise component arising from the detector readout electronics that has a low-frequency component that is common to all amplifiers and sampled by the reference pixels.

Download table as: ASCII Typeset image

APERO: A PipelinE to Reduce Observations—Demonstration with SPIRou

Article metrics

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Abstract

1. Introduction

2. Overview

2.1. The Users of APERO

2.2. Design

2.3. Notable Features

3. APERO—The Official CFHT Pipeline for SPIRou

3.1. Definitions of aperoÍnput Files

3.2. Overview of the Reduction Process

4. Pre-processing

4.1. Header Fixes and Object Resolution

4.2. File Corruption Check

4.3. Top and Bottom Pixel Correction

4.4. 1/f Noise Correction

4.5. Cosmic Ray Rejection

4.6. Rotation of Image

5. Reference Calibrations

5.1. Generating the Reference Dark Calibration File

5.2. Generating the Reference Shape Calibration Files

5.3. Generating the Reference Leak Calibration File

5.4. Generating the Reference Wavelength Calibration Files

6. Nightly Calibrations

6.1. Generating Bad Pixel Calibration Files

6.2. Generating Localization Calibration Files

6.3. Generating Nightly Shape Calibration Files

6.4. Generating Flat and Blaze Calibration Files

6.5. Generating Thermal Calibration Files

6.6. Generating the Nightly Wavelength Solution Files

7. Extraction

7.1. Optimal Extraction

7.2. BERV Correction

7.3. Leak Correction

7.4. Flat Correction

7.5. Thermal Correction

7.5.1. Thermal Correction of a Science Observation

7.5.2. Thermal Correction of a Calibration

7.6. s1d Generation

8. Telluric Correction

8.1. Residual Transmission of Hot Stars (apero_mk_tellu)

8.2. Water and Dry Component Models (apero_mk_model)

8.3. Correcting Telluric Absorption (apero_fit_tellu)

8.4. Template Generation (apero_mk_template)

9. RV Analysis

9.1. The apero CCF Recipe

9.2. LBL Analysis

9.3. Comparison with an External CCF Routine

10. Polarimetry

10.1. Polarimetry Calculations

10.2. Least-squares Deconvolution

11. Post Processing

11.1. 2D Extraction Product (e.fits)

11.2. 2D Telluric Corrected Product (t.fits)

11.3. 1D Extraction and 1D Telluric Corrected Product (s.fits)

11.4. Velocity Product (v.fits)

11.5. Polarimetric Product (p.fits)

12. Discussion

12.1. Performance and Limitations

12.2. Future Work

13. Conclusion

Appendix A: Creating a Raw SPIRou Ramp Image

Appendix B: Standard Image Calibration

B.1. Dark Reference Correction

B.2. Flipping, Resizing, and Re-scaling the Image

B.3. Flagging the Bad Pixels

B.4. Correcting the Background Flux

B.5. Additional Cleaning of Hot Pixels

B.6. Flagging Out of Bound Pixels

Appendix C: Shape Transformation

Appendix D: Version History of APERO

Appendix E: Current Science Publications using APERO

Appendix F: Inputs

Appendix G: APERO Products

8.1. Residual Transmission of Hot Stars (`apero`_`mk`_`tellu`)

8.2. Water and Dry Component Models (`apero`_`mk`_`model`)

8.3. Correcting Telluric Absorption (`apero`_`fit`_`tellu`)

8.4. Template Generation (`apero`_`mk`_`template`)