Characterization and Absolute Calibration of the Far Infrared Field Integral Line Spectrometer for SOFIA

We present the characterization and definitive flux calibration of the Far-Infrared Field Integral Line Spectrometer (FIFI-LS) instrument on-board SOFIA. The work is based on measurements made in the laboratory with an internal calibrator and on observations of planets, moons, and asteroids as absolute flux calibrators made during the entire lifetime of the instrument. We describe the techniques used to derive flat-fields, water vapor column estimates, detector linearity, spectral and spatial resolutions, and absolute flux calibration. Two sets of responses are presented, before and after the entrance filter window was changed in 2018 to improve the sensitivity at 52um, a wavelength range previously not covered by PACS on Herschel. The relative spectral response of each detector and the illumination pattern of the arrays of the FIFI-LS arrays are derived using the internal calibrator before each observational series. The linearity of the array response is estimated by considering observations of bright sources. We find that the deviation from linearity of the FIFI-LS arrays affects the flux estimations less than 1%. The flux calibration accuracy is estimated to be 15% or better across the entire wavelength range of the instrument. The limited availability of sky calibrators during each observational series is the major limiting factor of the flux calibration accuracy.

1. INTRODUCTION FIFI-LS (Colditz et al. 2018;Fischer et al. 2018) was the far-infrared field integral line spectrometer for SOFIA (Krabbe 2000) and performed field spectroscopy in the wavelength range 50µm -200µm.The instrument was originally developed at the Max Planck Institute in Garching (Germany, P.I.: A. Poglitsch) in parallel with the PACS instrument (Poglitsch et al. 2010) and offered as a facility instrument to SOFIA (Klein et al. 2010).In 2012, the development was transferred to the University of Stuttgart under a new P.I. (A.Krabbe).FIFI-LS was commissioned in 2014 (Klein et al. 2014) and used for science observations thereafter.In the last few years the instrument was improved by activating the internal calibrator for laboratory measurements (2017) and changing the entrance filter window to improve its sensitivity at 52µm (2018), a wavelength regime that was not accessible to the PACS instrument on Herschel.The last observations with FIFI-LS were made in September 2022, the month when the SOFIA observatory was decommissioned by NASA.for the blue and red array.The plot axes denote the offset in arcseconds from the center of the red array (spaxel 13).
The blue and red fields of view are shifted by approximately 6 arcsec in the horizontal direction of the array.The numbers correspond to the way the image was sliced.The rows (1-5, 6-10, and so on) were rearranged into pseudo-slits which were dispersed by gratings onto the blue and red detectors.
In this paper, we present the characterization of the spatial and spectral resolution of FIFI-LS and study the response of the detectors.Several measurements have been done in the laboratory using the internal calibrator.However, due to poor knowledge of the calibrator emission, the absolute calibration was carried out via observations of celestial calibrators such as asteroids, moons, and planets, as it was the case for PACS.The techniques used to analyze these data, especially to correct them for the effects of the atmosphere, are discussed here.Several objects observed with PACS on Herschel were re-observed with FIFI-LS providing us with the possibility of cross-correlating the flux calibrations of the two instruments.We present the definitive absolute flux calibration for FIFI-LS, computed for the two epochs of the instrument, as defined by the change of the access filter window in 2018, as well as updated estimates of instrumental sensitivity, spectral, and spatial resolution with respect to those presented in previous publications (Colditz et al. 2018;Fischer et al. 2018).
To guide FIFI-LS data users and allow comparisons between recent and previously published results, the differences with the previous calibration release are discussed in detail in the section about the absolute flux calibration.
2. DATA 2.1.Detectors FIFI-LS was an integral field spectrograph able to simultaneously obtain spectra in two different spectral channels over a square field-of-view.To account for the increasing width of the point spread functions (PSF) as a function of wavelength, the blue and red arrays had different pixel sizes and consequently different fields of view.We will refer in the following to pixels in the sky as spaxels, short for spatial pixels.The spaxels are approximately square with a size on the sky of 12.2×12.5sq.arcsec and 6.14×6.25 sq.arcsec for the red and blue arrays, respectively.As shown in Figure 1, the field of view of the two arrays were also slightly shifted with respect to each other by approximately 6 arcseconds in the horizontal direction.By using the spaxel numbering shown in Fig. 1, when the target is centered on the red array (spaxel 13), it falls on the spaxel 12 in the blue array.The spaxels were not perfectly arranged in a grid because of the complexity of aligning the mirrors in each of the integral field units.Moreover, the spaxels of the last column (# 5, 10, 15, 20, and 25) which are shown in Fig. 1 as partially overlapping the adjacent spaxels, are only partially illuminated and require the largest flat correction (see Sec. 3.3).
As elegantly described in Looney et al. (2003), the light was first split into two bands by a dichroic, then sliced into 5 different slices which were rearranged along a one-dimensional pseudo-slit by a system of mirrors.This pseudo-slit comprised the 25 spaxels and served as the entrance for the grating spectrometer.Two different dichroics were used in order to split the light at 105 and 130µm.The dichroics allowed one to simultaneously observe two different lines for the same object, since the red and blue arrays each had its own independent grating.Using each of the dichroics, a line in the overlapping region 105-130µm could be observed either with red or the blue array.Separate blue and red gratings then dispersed the light from each of the pseudo-slit spaxels across 18 spectral pixels.Therefore, the detector arrays consisted of 18 spectral by 25 spatial pixels.However, the two ends of each spectral pixel row were (by design) not illuminated by the sky.In particular the first one was open and can be considered as a bias pixel in a normal optical CCD, while the last one was a resistor pixel which has an additional signal.The open pixel can be used in the data reduction to remove correlated noise, which increases the noise in the ramps (see Sec. 3.8.1).Thus a total of 16 × 25 pixels were illuminated by the sky.

Ramps
Several observing modes were used by FIFI-LS.The most commonly used was the so-called symmetric chopnod mode.To remove the background emission, the secondary mirror was tilted to target alternatively between the source and a sky position (chopped) and the flux of the two beam positions subtracted.However, since the two positions saw a different part of the primary mirror and had slightly different optical paths, the telescope was periodically moved (nodded) to swap the two beams in order to have the target in the other beam position (in what used to be the sky beam), and remove this residual effect by averaging the two measurements.This is the technique used to obtain all of the calibration data discussed in this paper.
The charges on the detectors were sampled with a series of non-destructive readouts followed by a reset.Since at each readout the charges on the detector increase, this is usually referred as a ramp and the incident flux can be derived as the slope of the ramp (slope of the charge versus time), an approach called up-the-ramp fitting.In particular, a ramp consisted of 32 readouts taken in 1/8 s.The minimal integration consisted of two ramps in one chop position and other two in the second chop position for a total of 0.5 s, i.e. a chop frequency of 2 Hz.During the last readout the telescope moved to the next chop position, so this readout is usually discarded.The first readout after the detector reset is highly nonlinear and it is also discarded.The linearity and sat-uration of ramps of the different pixels is discussed in section 3.2.Each set of four ramps was repeated several times, depending on the integration time requested.Then, the chop positions were inverted for the next nod observation.Part of one observation of Mars is shown in Fig. 2. The left panel shows a group of four ramps which constitutes a minimal observation composed of 2 on-and 2 off-ramps.The right panel shows the slopes between consecutive readouts for all the ramps relative to a grating position.This clearly shows that the first readout has to be discarded, as well as the last one which is affected by the chopper movement.The difference in flux (measured in the slope V/s) of the two groups of ramps is due to the flux of Mars on the detector for the on-ramps, while the off-ramps only register background flux.Even in the case of a strong source such as Mars, the signal is dominated by the background due to emission from the sky and telescope.In this specific case, Mars contributes less than 3% to the total signal.
The ramps are stored in FITS files as signed integers.To convert the ramp values into Volts, the following formula should be used: where r is the value of the ramp stored in the FITS files of the observation.The FIFI-LS pipeline (Vacca et al. 2020) does not implement this conversion and computes directly the slope of the data from the instrumental units.We call ADU (analog-digital units) the units of the slope as computed by the pipeline.Then, to obtain a flux density, the slope is divided by the width of the frequency channel corresponding to a single pixel to obtain values in ADU/Hz.For this reason, we express the response in terms of ADU Hz −1 Jy −1 .

Change of filter window
In 2018 the entrance filter window was changed in order to improve the sensitivity of FIFI-LS at 52µm to observe the line of [OIII]51.81µmwhich was not observable with PACS on Herschel.The change was successful and allowed for important observations to estimate the metallicity in galaxies with far-infrared lines (see, e.g., Chen et al. 2023;Chartab et al. 2022).Since this change affected the responses for the two arrays and orders, we present two sets of responses for each combination of array/order/dichroic.The last flight with the old entrance window is Flight 424 in flight series OC5I, while the second set of responses are valid starting from Flight 524 in OC6M.

Pre-flight calibration
SOFIA was able to observe with only one instrument at a time mounted on the telescope.For this reason, FIFI-LS was mounted and used for a series of flights which we define as an observational flight series.Before being mounted on the telescope, the FIFI-LS instrument was inspected, vacuum pumped, and cooled down.For each cooldown of the instrument, several calibration measurements were needed since tiny mechanical changes resulted in slightly modified optical paths.In particular, three series of measurements were performed: (i) wavelength calibration, (ii) alignment of the beam rotator (K-mirror) (Colditz et al. 2014), and (iii) flats.
The wavelength calibration was performed by observing several water vapor lines with gas cells at different pressures.The procedure used is detailed in Colditz et al. (2018).After the change of the filter window, observations of two additional lines at 47.9732 and 51.0711 µm were added to the procedure to improve the calibration at the 2 nd order blue end.These two lines are sufficiently narrow to appear as unresolved lines even when using water vapor in the gas cell.For this reason, these lines have also been used in the study of the spectral resolution of FIFI-LS (see Sec. 3.4).
The first part of the calibration of the beam rotatorcalled K-mirror because it is composed of 3 mirrors combined in a way which resembles the letter K -was done following the technique detailed in Colditz et al. (2014).The measurement of the rotation vector, i.e. the vector connecting the center of rotation to the center of the array, was done using the telescope simulator, a system mounted to the instrument in the laboratory which was able to simulate a point source.The position of the point source was measured at different angles of rotation of the image to infer the parameters of the rotation vector.
Finally, several spectral scans with the internal calibrator at 150 K were done by covering all the bands (red, blue 1 st order and 2 nd order) with the two dichroics in order to compute the relative response of the detectors.The computation of the flats is described at length in Section 3.3.

In-flight calibration
During each observational series several observations were dedicated to calibrate the instrument.
At the beginning of each series, the rotational parameters of the K-mirror obtained in the laboratory were verified by repeating the measurements of a point source when rotating the beam.An accurate knowledge of the rotational parameters was essential since the K-mirror was used not only to orient the field at a certain angle, but also to counteract the rotation of the sky during the so-called rewinds of the telescope.In fact, although the telescope was alt-azimuth mounted, it tracked inertially following the sky like an equatorial mounted telescope rotating around its line-of-sight axis.However, the amount it could rotate about the cross-elevation (i.e., "azimuth") and line-of-sight (i.e., "position angle") axes was limited between ±2.8 o .When the telescope reached its limit in line-of-sight rotation, it was rotated back.This rewind was counteracted by a rotation of the Kmirror to keep the position angle of the array on the sky fixed.Such a rotation, since the K-mirror was not perfectly aligned, introduced a small boresight change which was compensated by the telescope.
In between flight legs, which usually corresponded to different targets, the initial minutes were devoted to observe a few telluric lines to estimate the zenith precipitable water vapor.The procedure employed for these measurements is discussed in Section 3.6.
In each series we observed several calibrators to estimate the absolute flux calibration.The best data were obtained by observing Mars, the brightest source small enough to fit within the FIFI-LS detector footprint.However, Mars was not available during all flight series and secondary calibrators were used (Neptune, the Galilean moons, and asteroids).These calibrators were observed in all the array/order/dichroic combinations covering the whole spectral range.The analysis of the flux calibration is presented in Section 3.7.
Finally, we observed point-source calibrators at a few key-wavelengths to have a direct measurement of the spatial point spread function of FIFI-LS.The results of this analysis are presented in Section 3.5.

Bad pixels
A few pixels of the two detectors with very low responsivity are masked by the FIFI-LS pipeline.The lists of these pixels were compiled before each observational series by inspecting the laboratory calibration data.Figure 3 shows the percentage of bad pixels as a function of the array and date.The percentage of bad pixels for the blue array is around 5%, while for the red array is slightly higher (8%).The number of bad pixels slowly increased with time, although some pixels not responding in one series could work again after a cooling cycle in a subsequent series.
In 2021 all of the pixels corresponding to the spaxels #3 and #21 of the red array became unresponsive.Later in 2022, spaxel #10 became unresponsive.The cold readout electronics for these spaxels were identified as the most likely cause of the issue.A new electronic board was prepared and the repair was scheduled in 2022.However, because of the abrupt end of the SOFIA mission, it was decided to observe as much as possible with FIFI-LS before the decommissioning date and this prevented the upgrade of the electronics.

Linearity
In an ideal detector the relationship between the incident flux and the measured voltage is linear along a ramp.A real detector, however, has a linear behavior only in a limited flux range.Beyond a certain flux it becomes non-linear until saturation when it reaches the maximum amount of charge that it can collect, and the measured voltage no longer relates to the incident flux.
To estimate the linearity of the FIFI-LS detectors we considered a quadratic term in the ramps as a first approximation to the observed ramps.By assuming a single non-linearity term, we can in principle linearize the ramps once the coefficient of non-linearity is known.Let us assume then, as done in the Spitzer IRS handbook 1 , that the observed signal can be written as the sum of the ideal linear signal and a quadratic term that takes into account the non-linearity of the detector: where α is the non-linearity coefficient.Once α is estimated for each pixel, the equation can be easily inverted to linearize the ramps: The top panel of Fig. 4 shows the technique used to estimate the optimal value of the quadratic term α for a  .Flux measured for each pixel of the blue and red arrays (using the dichroic 105 µm), after scaling each pixel flux to the median value of their spaxel.This data was taken in March 2018, before an observational series.To make them visible, the fluxes from the different spaxels are spaced by 0.5 10 −10 and 10 −10 ADU/Hz in the case of the blue and red arrays, respectively.While the different pixels in one spaxel detect a very similar flux in the blue array, the red array is less uniform, especially in the reddest part of the spectrum.
ramps.The example shows that, as charges accumulate on the detector, the slope between readouts changes revealing a non-linear behavior.By computing the slope of the ramps one obtains a value marked with a blue line which is lower than the actual flux.To evaluate the optimal α needed for the linearity correction, we computed the χ2 of the residuals of the corrected slope from the median value of the consecutive slopes.The optimal value with the minimum χ 2 shown in the inset gives the best correction.The slopes of the corrected ramps are shown with orange dots and are now in much better agreement with the slope computed from the entire ramp (orange line).In the computation the first and last slope are discarded since they are affected by the reset and chop change.
The middle panel in Fig. 4 shows the values for the linearity correction coefficient α and the saturation level of the ramp for each pixel in the red array.Similar results are obtained for the blue array.Ramps saturate with voltages greater than 2.4 V., while α is typically around -0.03 except for a few pixels with higher saturation values.
Finally, the bottom panel of Fig. 4 shows the effect of linearizing the ramps on the estimate of their slopes in the case of the red array.The comparison is between all the observations of Mars with the red array done during Flight 312.There is a systematic effect corresponding to less than 3% of the slope.However, the dispersion is less than 1% for slopes higher than 0.1 mV/s.Since the systematic effect is absorbed by the flux calibration (which will be also systematically lower), the effect of the linearization of the ramps is in general around 1%.For this reason, the linearization of the ramps has not been introduced in the FIFI-LS pipeline.

Flats
Spectral scans of the internal calibrator were used to estimate the flats in the different arrays, orders, and dichroics.The internal calibrator was heated at 150 K to have a signal close to the background radiation observed in flight.These measurements were repeated before each flight series to take into account the effect of small mechanical changes in the instruments produced by different cool downs.
The biggest effect of these changes was the variation in illumination of the array.Slight differences in the optical path produced changes in the illumination patterns.To estimate such variations, we considered the median flux detected in all the pixels relative to the same spaxel and computed the ratio of such values to the median flux of all the spaxels.Figure 5 shows the curves measured in March 2018 using the dichroic splitting the light at 105 µm.For each spaxel, the curves of the 16 pixels have been normalized to a median value.In the blue array (two orders in the left side part of the figure) the different pixels have a very similar response.The situation is different for the red array, where the curves differ much more, especially at the longest wavelengths.This behavior can be explained by taking into account the fact that the detector elements in the red array are compressed in their mounting structures in order to shift the area of good spectral sensitivity into the wavelength range of 120-210µm (Rosenthal et al. 2000).Since the pressure applied is not perfectly even, the shape of the response curves of the different pixels varies more than those of the blue array and this is more pronounced at the far-IR end where the detector sensitivity is the lowest.
The spatial flats were derived from the ratio of the median spaxel curves in Figure 5 to the median global curve for each observational series.The spatial flats for the two arrays are plotted on the left panel of Fig. 6.The spaxels relative to one of the lateral columns (5,10,15,20, and 25) usually have the lowest fluxes and the biggest dispersions between observational series.These spaxels were in fact partially illuminated.For this reason, we did not consider them when obtaining the median signal from the various pixels and when obtaining the response curve in Section 3.7.3.
Finally, to obtain the spectral flat of each pixel, we normalized the data from each series with the aforementioned spatial flat field and coadded the ratios of the pixel curves to the median curve of the entire array.As an example, in Fig. 6 the curves from different series for the pixel 6 of the red array are shown in blue.The spectral flat, computed as a mean behavior by using Chebyshev polynomials, is shown with a black line.

Line Profile and Spectral Resolution
Several lines have been measured in the laboratory before each observational run with the purpose of calibrating the instrument in wavelength.To have a high signal-to-noise signal, these measurements were done with 'pure H 2 O' at a pressure of 5 mBar.However, some of these lines are broader than the spectral resolution of FIFI-LS.To better study their profile and the spectral resolution at different wavelengths, a few lines have been also observed with 'air' cells which have air at the pressure of 10 mBar.The list of the lines used is reported in Colditz et al. (2018) with the exception of two new lines (47.9732 and 51.0711 µm) which were added in 2018 when the filter window was changed to extend the range of FIFI-LS down to 47µm.
To study the profile of the lines, we tried to fit the lines with several functions: Gaussian, Voigt, pseudo-Voigt, and asymmetric functions such as a skewed Gaussian, a skewed Voigt, and an asymmetric pseudo-Voigt.For all these functions we used the implementation in the Python package lmfit 2 with the exception of the asym- metric pseudo-Voigt for which we followed the article where it was proposed (Stankic & Brouns 2008).The pseudo-Voigt profile is a very good approximation of the Voigt profile obtained as a sum of fractional contributions of the Gaussian and Lorentzian profiles: with f between 0 and 1.By modifying the dispersion as a function of the distance from the center of the line: with a parameter of asymmetry and λ 0 center of the line, one obtains the asymmetric pseudo-Voigt profile.
In general lines in the blue 2 nd order are slightly asymmetric.Although the best fits are obtained with a skewed Gaussian or an asymmetric pseudo-Voigt, a pseudo-Voigt is still a good approximation.The case of the 1 st order in the blue is different.As already remarked by Colditz et al. (2018) (see Fig. 3), the profile of the line changes in different pixels.The combination of all the pixel signals in Fig. 7 results in an asymmetric profile with a bump on the higher wavelength side.The best shapes to reproduce this unusual profile are either the skewed Gaussian or the asymmetric pseudo-Voigt.Finally, the profile of the lines in the red array are much more regular.They can be fitted very well with a simple pseudo-Voigt.The profile is well behaved and has the smallest residuals among the different bands.
To measure the full-width half-maximum (FWHM) of the lines, we smoothed the combination of all the measurements normalized to 1 as shown in the top panel of Fig. 7 with the non-parametric fitting technique called LOWESS which is available in the statsmodel library in Python.Then, we directly measured the width at 0.5.
The bottom panel of Fig. 7 shows the FWHM measured values with lines at different wavelengths.The red points are done using water vapor cells, while the blue points correspond to measurements with ambient air cells.Pure water vapor gives a stronger emission but also a wider FWHM.For this reason, lines with a FWHM close to the spectral resolution have been observed with air cells.
In the fit to obtain a relationship for the spectral resolution, we did not consider lines measured with H 2 O cells which are broader than the spectral resolution.The plot reports some of these measurements from different series to show their spread due to the H 2 O amount in the cell being variable.They are represented as lower limits with an arrow pointed upwards instead of an error bar.
The reported values are median values of the measurements in all the spaxels, while the dispersion values correspond to the dispersion of the values in the differ-  ent spaxels.A polynomial has been fitted to the points to obtain the dependency of the resolution on the wavelength.A second-degree polynomial is used for the blue array while, in the case of the red array, a first-degree polynomial is sufficient to fit the relationship.11), the instrumental point spread function (PSF) has a FWHM usually smaller than the diffraction limit of the telescope, with the exception of the second order in the blue where it is bigger.To better study the instrumental effects on the size of PSF, we observed several point sources at different key wavelengths in the red array and the two orders of the blue array.For such observations we used a fine dithering pattern in order to better recover the shape of the PSF.We then fitted a Moffat function (Moffat 1969) to the distribution of the fluxes as a function of the distance from the center of the target.Table 1 reports the objects and measurements considered to obtain the spatial resolution of FIFI-LS as a function of the wavelength.These values are plotted in Figure 8 with overplotted the diffraction limit of the telescope (as a continuous grey line), the instrumental contribution to the PSF, and the real FWHM obtaining by combining quadratically these two components.The diffraction limit is computed using the formula of the point spread function for obstructed mirrors (Born & Wolf 1970): .Measurements of the FWHM of the spatial PSF at several key wavelenghts.The values are higher than the diffraction limit of the telescope (solid grey line) because of a small instrumental contribution (dashed line).However, the instrumental part dominates in the 2 nd order of the blue array.
aperture.Since the entrance pupil diameter is 2.5 m and the aperture stop diameter is 0.352 m, the obstruction factor is ϵ = 0.352/2.5 = 0.14 (Krabbe 2000).Finally, J 1 corresponds to a Bessel function of the first kind.The FWHM corresponds to the distance at which I[u] drops to half of the central value, which leads to the theoretical diffraction limit of: a value smaller than that of an unobstructed mirror.
To compute the instrumental contributions, we considered the geometric mean of the parameters in Table 3 of Colditz et al. (2018) in the two dimensions (along and perpendicular to the slits) and we used the conversion factor 3.55 from mm to arcsec on the sky estimated for the telescope simulator.We then shifted the values by subtracting 2.6, 0.8, and 5.0 arcsec for the blue 2 nd order, 1 st order, and red arrays, respectively, to better fit the measured points.We remind that the parameters for the instrumental PSF were obtained with a simulated source which was not perfectly point-like, so the instrumental contribution was overestimated.
Finally, for reference, we overplotted the diffraction limit for an unobstructed mirror of the same size of the SOFIA telescope, with a dotted line, using the standard formula 1.22λ/D, with D the diameter of the telescope mirror.We can see that, except for wavelengths shorter than 70µm, this is a good approximation of the FWHM of the FIFI-LS point spread function.Simple linear relationships can be used to estimate the FIFI-LS spatial resolution at different wavelengths:

Precipitable water vapor
Although flying over more than 99% of the water vapor in the atmosphere, the SOFIA observatory was still affected by the presence of the atmosphere.An important part of the data reduction consists in selecting the best atmospheric model to correct for atmospheric transmission and telluric absorptions.The most prominent telluric absorptions are due to water vapor.For this reason, the main parameters considered in atmospheric models are the altitude of the observation, the elevation angle of the telescope, and the water vapor present in the atmosphere.The two first quantities are known, while the zenith precipitable water vapor (PWV) has to be measured.A water vapor monitor was developed for SOFIA to continuously measure the zenithal PWV (Roellig et al. 2010).Unfortunately, the implementation of this instrument has been problematic and its measurements were extremely unreliable.For this reason, the FIFI-LS team devised a technique to monitor the water vapor values in between the flight legs.As described in Fischer et al. (2021), a short spectroscopic scan was performed to observe a few strong water vapor telluric features in the blue and red array several times during a flight.These features were then fitted with a grid of ATRAN models (Lord 1992) with different values of precipitable water vapor in order to select the value minimizing the residuals (see top panel of Figure 9).
Then, following the procedure outlined by Iserlohe et al. (2021), the values from the ERA5 model (Hersbach et al. 2020) based on satellite observations were scaled to the values measured in flight to estimate the zenith PWV at each instant of the flight (see Figure 9, middle panel).Figure 9 shows the importance of having an estimate of the water vapor during the entire flight.In fact, the values between the second and fourth hour of the flight vary by more than 5µm, while the sample measurements show some variation but cannot quantify it in an accurate way.Also, the variation between the fifth and sixth hour of the flight would have been completely unnoticed.The availability of satellite data, scaled to the direct measurements in flight, allows one to accurately reconstruct the variation of the water vapor along the entire observation.The scaling factors found in the different flights vary between 0.55 and 0.65, with a median value of 0.6.This is the scaling factor adopted for all the flights before flight 524, the first flight for which PWV measurements were performed.Iserlohe et al. (2022) extended the comparison between the FIFI-LS measurements and another re-analysis of atmospheric satellite data (MERRA-2) finding a linear relationship with different scale factors than those with the ERA5 data.The dispersion of the relationship is however worse than that with the ERA5 data.Zenithal PWV values are saved in the header of the raw archived files for each FIFI-LS observation and can be read and used by the FIFI-LS pipeline.When reducing data taken before flight 524, it is advisable to experiment with rescaling the water vapor measurements in the header to better remove the residuals of telluric absorption.The bottom panel of Fig. 9 shows the distributions of the zenith PWV values measured in all the flights in the Northern Hemisphere as a function of the barometric altitude.The plot is divided according to the period of observation, since the altitude of the troposphere depends on the temperature and therefore flying at the same barometric altitude corresponds to a different precipitable water vapor.In particular, we split the year into a summer season (between April and September) and a winter season (between October and March).It is evident that the values in the warm months have a larger spread in values and a higher median at any altitude than those in the cold months of the year.While during cold months good conditions are reached at 39-40,000 feet, one has to reach the altitude of 42,000 feet to have good values of water vapor in the summer.It is not possible to make an analog study for the Southern Hemisphere since we only collected usable water vapor data during seven flights in Chile.

Flux calibration
As discussed in Section 3.3, FIFI-LS has an internal calibrator.Theoretically, if the internal grey body spectral distribution were known and were stable, it would have been possible to compute the absolute calibration factor directly in the laboratory.Since these two conditions were not met, we used sky calibrators to derive the absolute calibration.The observations consisted in spectral scans of a calibrator in any order and choice of dichroic, centering the target on the first four columns of the detectors (see Figure 1) to avoid the partially illuminated column.This strategy also minimized the effect of the spatial ghost, described in Section 3.3 of Colditz et al. (2018).The technique was perfected only in 2015.Before this date only very small parts of the spectral energy distribution of sky calibrators were observed.These old observations are not used in the current work since it is difficult to remove flat residuals and do a satisfying atmospheric correction in such short scans.The total flux of the calibrator is then computed by adding the flux from all the spaxels with the exception of the partially illuminated column which was also affected by the ghost source.

Calibrators
Several sky calibrators were observed during the lifetime of FIFI-LS.The primary calibrator was Mars since it is very bright at any wavelength observed with FIFI-LS.Although it does not appear to be a point source during certain periods of the year, it is sufficiently pointlike to be completely covered by the FIFI-LS blue and red arrays.Secondary calibrators used were Uranus, Jovian moons, and bright asteroids.In the present work we used them only when absolutely necessary, since the signal from these sources is very faint.In particular, we did not use them to define the response of the red array.The models used for the absolute flux of Mars are those developed for Herschel by Lellouch and Moreno3 .Model fluxes were computed at a number of discrete wavelengths, and these were fitted with a blackbody to provide fluxes at all wavelengths across the entire wavelength range of FIFI-LS.In the case of Uranus we used the ESA2 model developed by Orton et al. (2014), which was also used for the calibration of PACS on Herschel.For the Jovian moons, we used the ESA2 models developed for Herschel by Moreno (see Müller et al. 2016).All these models are distributed with the SOFIA pipelines4 .The list of the objects used to obtain the absolute calibration of FIFI-LS is reported in Table 2.

Atmospheric models
The correction for the atmospheric absorption and telluric lines is paramount to obtain accurate response curves.If the telluric lines are not accurately corrected they produce bumps in the response curve which can significantly change the response along a large wavelength range.This was the case of the response curves obtained in 2019 which did not benefit from the knowledge of the amount of precipitable water vapor during the observations (see Section 3.6).In this work we make use of this information and we also use a model of the atmosphere more recent than ATRAN, since we discovered that ATRAN contains several lines which are not detected in our observations (see top panel of Fig. 10).The ATRAN models distributed with the pipeline have been empirically corrected by manually removing these extrafeatures.In the study of the spectral scans, however, to be more accurate we computed a model for each frame of the scan, with different values of altitude, telescope elevation, and precipitable water vapor.We used the AM model (version 12.2)5 which better reproduces the atmospheric transmission with parameters close to the one used by ATRAN.We used a 2 layer model assuming concentrations of 450, 0.28, 0.075, 1.895, 105•10 3 µmol/mol fraction in dry air for the CO 2 , N 2 O, CO, CH 4 , and O 2 (coupled and uncoupled) gases, respectively.The default value for O 3 was assumed to be 320 DU (a Dobson unit is equivalent to 2.687•10 16 molecules/cm 2 ).To compute the column density of the gases and water vapor in the different layers, we used the profile of the mixing ratios of O 3 , H 2 O, and gas mix used by the ATRAN models.We therefore generated atmospheric transmission curves very similar to those computed with ATRAN, but without the features which are not found in our data, as shown in the top panel of Fig. 10.
In the wavelength range 50-52µm, crucial to calibrate the response since it contains the important [OIII]51.81µmline, the AM and ATRAN models do not correctly reproduce the profile of two telluric lines.To obtain a better match to the data we reduced the absorption to 60% of the value predicted by the AM models.Finally, to obtain smoother responses, we did not consider data for which the atmospheric transmission was lower than 50%.Such wavelength intervals are left blank in the response figures (see, e.g., bottom panel of Fig. 10).
Another advantage of using the AM code is the ability to vary the quantity of ozone used to compute atmospheric models.In previous derivations, when ATRAN was used to do the atmospheric corrections, the value assumed was the standard 320 DU.However, in different SOFIA observations the column of ozone can vary since the flights covered different parts of the globe in different seasons.When computing the response, we selected the values of ozone which better corrected the response curves.As shown in the bottom panel of Fig. 10, assuming a value of ozone too high can lead to a significant increase in the noise of the response.In the case shown, a much better correction is obtained by lowering the O 3 column to 200 DU.Unfortunately, during the flights we did not monitor the O 3 column as we did with the water vapor.So, this correction can be done only with spectral scans where the effect is easily quantifiable.This is potentially a problem for [CII]158µm line observations since an ozone feature very close to this wavelength can be overcorrected generating a spurious emission line which, at certain redshifts, can be erroneously interpreted as [CII] emission.

Response curves
Two sets of response curves have been derived using the observations listed in Table 2.In Fig. 11 we show the different response curves derived for the red channel with data from 2018-2022, i.e. after the filter window change.Each color corresponds to a response curve obtained in a different flight as specified in the legend.The top panel shows the original response curves, while the middle one shows the same curves after scaling them to a common median curve.The response curves obtained in different flights are in fact very similar and only differ by a multiplicative factor.The scaling factors used to obtain the response curves in Fig. 12 are reported in the column of Table 2 with title 'Scaling'.The fact that such curves obtained in different flights coincide so well after scaling them to a median curve shows that the limiting factor for the accuracy of the flux calibration is the scatter between different flights.The error in calibration could be reduced by half if we were able to obtain calibration curves for each flight (see example in Fig. 11).Restrictions in the availability of calibrators and the urgency to efficiently use the available science time limited the number of calibrators observed.Moreover, the in- ternal calibrator was usable only in laboratory settings since it was never approved for flights.There was, therefore, no way to calibrate in flux every night of observation.The accuracy in the flux calibration of FIFI-LS is evaluated via the distribution of scaling factors used to match each flight to a common behavior.
The response curves for all the array/orders/dichroics combinations in the two epochs (before and after the change of the filter window) are shown in Figure 12.Dots with different colors correspond to data taken in different flights.The response for the red, blue 1 st and 2 nd orders are shown in red, blue, and purple, respectively.They are obtained by fitting Chebyshev polynomials to the data.In the case of the 1 st order in the blue, dichroic 130µm, the curves have been fitted with two polynomials to better match the cuspid around 90µm which is due to a dip in the transmission of the 130µm dichroic.For the old filter window, the 1 st order has more scatter since calibrators fainter than Mars are used to obtain the response (Callisto and Uranus).

Differences with previous release
The response curves used before this paper were done in 2019 and have been used to process data until 2022.Since many FIFI-LS papers used them in their analyses, we discuss here the main variations between the previous and this final release for the main lines observed in the blue and red arrays.We remind that, since the instrumental fluxes are divided by the response to cal-ibrate the flux in physical units, a lower response will correspond to a higher flux and vice versa.
Changes are negligible for observations in the red array done after the filter window change (2018-2022, corresponding to Flight 524 andlater).For the old filter, the new response is approximately 9% lower than the previous one with the 130µm dichroic and 6% lower with the 105µm dichroic.
For the blue array, there are differences even with the new filter window.For the old filter window (data pre-2018), the 1 st order in the blue is 10% and 12% lower than before with the 105 and 130µm dichroics, respectively.For the 2 nd order the difference is smaller since the new response is 5% lower in the two dichroics.In the case of the new filter window (data between 2018 and 2022), the effect is similar.The new response is 12% lower than the previous one for the 1 st order for the two dichroics.For the 2 nd order the difference is smaller, the new response is 5% and 3% lower than the previous response with the 130 and 105µm dichroics.

Flux cross-correlation with PACS/Herschel
A few cross-correlations of the [CII]158µm line fluxes measured with FIFI-LS and PACS available in literature show a good agreement between the two instruments (Reach et al. 2020;Sutter & Fadda 2022a).In this section we extend this comparison to several other objects observed with Herschel/PACS and SOFIA/FIFI-LS.Most of the PACS observations considered were done with unchopped spectroscopy since it was impossible to chop in extended emitting regions with Herschel.When comparing these observations to FIFI-LS we took care of reprocessing the PACS data using the transient correction pipeline (Fadda et al. 2016) and its recent updates in Sutter & Fadda (2022b).As already shown by Sutter & Fadda (2022a) in the case of NGC 7331, the PACS archival data can be significantly off in flux calibration since they are based on the internal calibrators rather than on the more stable telescope background as is the case of normal chop-nod observations.Such calibrators have a response which depends on memory effects of the detectors and can be off by 20% to 90% in an unpredictable way.In one of the cases we examined (NGC 2146), the nucleus of the galaxy was observed at 158µm with PACS in the two modes : chop-nod and unchopped.A direct comparison of the flux from the central part between the two observations shows that the unchopped flux is 80% higher than the chopnod flux.The difference virtually disappears when using the transient-correction pipeline.In this case, the measured flux agrees very well with that measured by SOFIA/FIFI-LS as shown in Fig. 13.We also noticed that the astrometry was incorrect by a few arcseconds in a few PACS observations.Before measuring the flux, in the case of NGC 2146 and M100, we corrected the astrometry by using the WISE channel 4 image to recenter the central peak of the [CII] emission to the nucleus of the galaxies.To obtain Fig. 13 we first degraded the PACS cube to the same spatial resolution of the FIFI-LS cube.Then, we measured the flux in several apertures fitting the lines with a pseudo-Voigt profile with the interactive software sospex6 (Fadda & Chambers 2018).In the case of the FIFI-LS blue array, we used a Gaussian which better approximates the line profile (see Section 3.4).The measurements  4.
from the two instruments agree very well inside the errors, except for the nucleus of M82 which has several FIFI-LS measurements with discordant fluxes, probably taken in flights where the deviation from the median response was at its highest values.The two PACS measurements, the first of them done during the verification phase, are in very good agreement.However, the FIFI-LS flux reported in Fig. 13 measured on the coaddition of the different flights does not differ significantly from the one measured with PACS.The comparison between PACS and FIFI-LS for the blue array is more difficult since the field-of-view of FIFI-LS is much smaller and many parallel observations of [CII] maps have a shallower and sometimes incomplete coverage.Of the two most intense oxygen lines, we choose to focus on the [OIII] line at 88.3µm since the [OI] line at 63.2µm is contaminated by a telluric line.Since the quality of the FIFI-LS data is not optimal in the 1 st order because of the slightly skewed line profile, the scatter of these observations is larger than that of the comparison of [CII] measurements in the red array.Moreover, there are not many regions observed by both instruments with high signal-to-noise ratio.In fact, although many of the extended galaxies observed with PACS in [CII] have parallel data in [OIII], the parallel data are shallower (due to the different field of view) and usually only the nucleus is detected.Only in the case of NGC 6946 a few deep observations of starforming regions have been obtained.Generally, the measurements of PACS and FIFI-LS agree well with one exception: NGC 253.The observation of NGC 253 has been repeated twice with FIFI-LS with similar results, while the Herschel/PACS observation was done in chopping mode.We suspect that the discrepancy in flux with the FIFI-LS measurement is due to contamination in the sky position of the PACS observation.
The set of data considered in the comparison, as well as the apertures and the flux measurements are reported in Table 4 in Appendix C. The errors reported in Table 4 are computed by fitting the spectra and their errors with the line profiles.Since the dispersion of the ratios is less than 10% and considering the spread in response found studying the response curves (see Sec. 3.7.3), it is reasonable to assume that the calibration is accurate at 15% or better across the entire wavelength range of the instrument.

Bias subtraction
The estimates of the sensitivity are based on the noise of the integration ramps.When studying the noise in the ramps we noticed that there was some level of correlation between the noise of different pixels in the same spatial module.The cause of this noise is unknown but the analysis can be dramatically improved by subtracting the signal of the open pixel which does not see the sky.This procedure is similar to the bias subtraction done with optical CCDs by using a lateral band of the CCD which is not exposed to the sky.By subtracting the open pixels, the noise in the ramps is drastically reduced.Figure 14 shows the effect of the subtraction in the ramps for the different pixels of the central spaxel (number 12).By analyzing the reduction of the noise in the final spectra, we found out that the bias subtraction reduced the noise by at least 10%.For this reason, the option of bias subtraction has been added to the FIFI-LS pipeline in 2020 and it is now used by default.

Continuum and line sensitivities
The sensitivity of FIFI-LS in the different arrays/orders have been computed by estimating the errors in the ramp fitting with observations of the internal calibrator and applying the response curves to these limiting fluxes.The mathematical details of the com-putations used to obtain these curves are presented in the Appendix A. Figure 15 shows the curves for the three different bands covered by FIFI-LS (red, blue in 1 st and 2 nd order) and the two possible dichroics.The solid curves correspond to the 105µm dichroic, while the dashed ones correspond to the 130µm dichroic.The shaded bands are the uncertainties for the estimated curves.
In the figure, the green bands mark the regions of the spectrum which were not observable with PACS on Herschel.In particular, the observation of the important line of [OIII] at 51.81µm was made possible by the upgrade of the filter window in 2018.Depending on the redshift of the object, this line was observed with the dichroic which had the best sensitivity (130 at low redshift, 105 for targets with cz > 4500 km/s).
For comparison, we plotted with grey lines the sensitivity of the PACS instrument on Herschel, which was very similar to FIFI-LS, but was operating from space with a slightly bigger telescope.The main difference between the two instruments is in the blue array, since PACS was using the 2 nd order of the grating while FIFI-LS used the 1 st order of a grating optimized for its wavelength range.Moreover, PACS had the same pixel size for the two arrays, while FIFI-LS had pixels of 6" and 12" size for the blue and red array, respectively.Because of these reasons, the line sensitivity in the blue is only slightly worse for FIFI-LS , while PACS is more sensitive in the red.We stress here that, although the sensitivity is similar, the observations with FIFI-LS are affected by the atmospheric transmission and telluric features which sometimes make the observations rather challenging and the real sensitivity worse than what is shown in Figure 15.

SUMMARY
We presented the characterization and absolute flux calibration of FIFI-LS using measurements made in the laboratory and in flight.We analyzed the non-linearity of the detectors concluding that the effect is minimal.Most of the systematic variations are, in fact, absorbed by the absolute flux calibration while the residual noise is around 1%, a value much lower than the error in flux calibration.We presented the procedure used to estimate the spatial and spectral flats of the detectors.We analyzed laboratory data to study the profile of unresolved lines for the two arrays and two orders in the blue.Lines are slightly asymmetric in the blue.In particular, the blue 1 st order has a bump on the long wavelength side and it is best fitted with skewed Gaussian.Pseudo-Voigt profiles are a good approximation for the red channel and the blue 2 nd order.Formulae to com-pute the spectral and spatial resolution of FIFI-LS are derived.In particular, the spatial resolution has been studied by observing bright point-sources.The technique used to estimate the precipitable water vapor in flight is discussed and the results of the measurements are shown as a function of altitude and season.They clearly show a seasonal effect on the precipitable water vapor.We presented the study of the response of the instrument done by using sky calibrators.New response curves are derived with a much improved correction of telluric lines which makes use of the knowledge of precipitable water vapor and a variable ozone column.The response is shown to be varying by a multiplicative factor between different flights, a fact that contributes to the uncertainty in the flux calibration.Because of this, the uncertainty on calibration is around 15%.The availability of calibrators on each flight would have reduced the uncertainty by a factor two.A cross-correlation between fluxes estimated from PACS and FIFI-LS observations of the same sky regions show a very good agreement between the flux calibration of the two instruments, provided that the transient-correction pipeline is used to reprocess PACS unchopped data rather than using the archival products.We show, in fact, that the flux of the archival products of PACS unchopped observations can be off up to a factor two and that the difference with the FIFI-LS fluxes disappears when the flux is calibrated using the telescope background as reference.
Finally, we presented updated sensitivity curves for FIFI-LS based on the noise on the ramps after subtracting the signal from the bias pixel.
C. ADDITIONAL TABLES Table 3 reports the values of spatial and spectral resolution for the most important lines observed with FIFI-LS at reference wavelength, as well as the instantaneous coverage of the 16 spectral pixels.

Figure 1 .
Figure1.Sky position and size of each spatial pixel (spaxel) for the blue and red array.The plot axes denote the offset in arcseconds from the center of the red array (spaxel 13).The blue and red fields of view are shifted by approximately 6 arcsec in the horizontal direction of the array.The numbers correspond to the way the image was sliced.The rows (1-5, 6-10, and so on) were rearranged into pseudo-slits which were dispersed by gratings onto the blue and red detectors.

Figure 2 .
Figure 2. Left: group of 4 ramps (2 on-and 2 off-target) during an observation of Mars.Right: Slope between consecutive readouts for the on-and off-target ramps.The median for each readout is marked with a darker color.The difference in the average slope between the on-and off-target observations is due to the Mars flux over the dominant background flux.The first and last readouts are discarded in the analysis.

Figure 3 .
Figure 3. Percentage of bad pixels in the two arrays as a function of the date of the observational series.

Figure 4 .
Figure 4. Top: Linearization of ramps for a pixel in the red array.Middle: linearization coefficient α versus saturation level for all the pixels in the red array.For most of the pixels α is around -0.03.Bottom: percentual correction of ramp slopes after linearization for the red array for an entire Mars observation.The median correction is marked with a cyan line.particularpixel.The slopes between consecutive readouts for several ramps of a pixel observing the same object are plotted in blue.A linear behavior would result in a constant value for the slope between consecutive

Figure 6 .
Figure6.Left: Spatial flats color coded by observational series.The partially illuminated column(spaxels 5,10,15,20, and  25)  has usually lower and more dispersed values than those of other spaxels.The effect is particularly evident in the blue channel.Right: Spectral flats for all the spaxels of the red array, dichroic 105µm, pixel # 6.The blue lines correspond to the measurements from all the FIFI-LS series after applying the spatial flats of each series.The black lines are the accepted flats computed via a Chebyshev polynomial smoothing.

Figure 7 .
Figure 7. Top: from left to right, examples of line profiles in the blue 2 nd order, blue 1 st order, and red.The residuals from fits with different functions are shown on the bottom.In general lines are a little asymmetric, especially in the blue.Bottom: Spectral resolution for the three bands of FIFI-LS: blue in two diffraction orders and the red array.
− 550.28 red R = 0.1934λ 2 − 28.89λ + 1664 blue 1 st ord (6) R = 1.937λ 2 − 113.7λ + 2932 blue 2 nd ord 3.5.Spatial Resolution As shown in Colditz et al. (2018) (Figure Figure8.Measurements of the FWHM of the spatial PSF at several key wavelenghts.The values are higher than the diffraction limit of the telescope (solid grey line) because of a small instrumental contribution (dashed line).However, the instrumental part dominates in the 2 nd order of the blue array.

Figure 9 .
Figure 9. Top: telluric features used for estimating the zenith precipitable water vapor (PWV).The left panel shows the value of water vapor at the zenith (WVZ) which minimizes the residuals between model and data.The corresponding model is shown in the right panels plotted in yellow over the features observed with the blue and red arrays.Middle: satellite values scaled to the observed PWV values for one the last flights of FIFI-LS (flight 907).The data at the beginning of the flight, when the telescope is still cooling down, is discarded when computing the scaling factor.Bottom: Precipitable water vapor values (PWV) as a function of altitude and season (October to March in green, and April to September in orange) for the flights in the Northern hemisphere.

Figure 10 .
Figure 10.Top: Three examples of lines in the ATRAN model which are not seen in the data and are not present in the AM models.The data have been renormalized to the maximum of the transmission models.Bottom: Effects of changing the O3 level in the atmospheric correction.The standard value of 320 DU overcorrects the ozone features in some of the observations.A lower O3 value dramatically reduces the noise in the data of this 2016 (flight 312) observation of Mars.The curve with higher O3 correction has been shifted by adding 0.5 • 10 −11 ADU/Hz/Jy to better show the difference in the correction.

Figure 11 .
Figure 11.Response curves for the observation of Mars in the red array (dichroic 105µm) with the new filter window (years 2018-2022).The color code refers to the flight number in the legend.The top panel shows the response curves which are scaled to a common curve on the middle panel.Most of the uncertainty in flux calibration comes from the flight-to-flight multiplicative variation of the response.The bottom panel shows the residuals of the fit with a Chebyshev polynomial (black line).The error in calibration lowers from 12% to 6% after scaling the curves.

Figure 12 .
Figure 12.Response curves for all the arrays/orders/dichroics and periods pre-and post-filter window change.

Figure 13 .
Figure 13.Cross-correlation of FIFI-LS and PACS fluxes in several apertures of common targets showing the excellent agreement between the two instruments in the red (top) and blue (bottom) arrays.The dashed blue horizontal line corresponds to the median of the values.The data in this Figure are reported in Table4.

Figure 14 .
Figure 14.Effect of the subtraction of the open pixel on two consecutive ramps in the blue channel.The ramps of the 16 pixels in the central spaxel of a flat observation in the blue channel show some correlated noise (top panel).After subtracting the signal from the open pixel, plotted in black, most of the correlated noise disappears (see bottom panel).

Figure 15 .
Figure 15.Continuum and line sensitivities for the FIFI-LS instrument in the three different bands and two dichroics with the latest filter window at 4σ with an exposure time of 900 s.The sensitivity of PACS on Herschel is shown for comparison as a grey line.The green bands mark the wavelength ranges which were not observable with PACS on Herschel.

Table 1 .
Measurements of spatial resolution

Table 2 .
List of flux calibrators

Table 3 .
Spatial and spectral resolution for the most important lines observable with FIFI-LS.
Table 4 contains the line fluxes measured inside circular apertures from sources observed both by PACS and FIFI-LS.