GLEAM: Galaxy Line Emission & Absorption Modeling

and

Published 2021 March 2 © 2021. The American Astronomical Society. All rights reserved.
, , Citation Andra Stroe and Victor-Nicolae Savu 2021 AJ 161 158 DOI 10.3847/1538-3881/abe12a

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1538-3881/161/4/158

Abstract

We present Galaxy Line Emission & Absorption Modeling (gleam), a Python tool for fitting Gaussian models to emission and absorption lines in large samples of 1D extragalactic spectra. gleam is tailored to work well in batch mode without much human interaction. With gleam, users can uniformly process a variety of spectra, including galaxies and active galactic nuclei, in a wide range of instrument setups and signal-to-noise regimes. gleam also takes advantage of multiprocessing capabilities to process spectra in parallel. With the goal of enabling reproducible workflows for its users, gleam employs a small number of input files, including a central, user-friendly configuration in which fitting constraints can be defined for groups of spectra and overrides can be specified for edge cases. For each spectrum, gleam produces a table containing measurements and error bars for the detected spectral lines and continuum and upper limits for nondetections. For visual inspection and publishing, gleam can also produce plots of the data with fitted lines overlaid. In the present paper, we describe gleam's main features, the necessary inputs, expected outputs, and some example applications, including thorough tests on a large sample of optical/infrared multi-object spectroscopic observations and integral field spectroscopic data. gleam is developed as an open-source project hosted at https://github.com/multiwavelength/gleam and welcomes community contributions.

Export citation and abstract BibTeX RIS

1. Introduction

One of the main goals of extragalactic astronomy is to understand the cosmic evolution of galaxies and black holes in the context of large-scale structure. To obtain a comprehensive view of the physical processes driving their evolution and unveil their spatial distribution, spectroscopic observations of large samples of galaxies and active galactic nuclei (AGNs) at increasingly high redshift are required. The most efficient way to obtain large samples (>100 to hundreds of thousands of sources) covering large volumes is through simultaneous observations of many objects. As a consequence, multi-object spectroscopy (MOS) and integral field unit (IFU) spectroscopy have experienced significant growth since the 1980s.

An important role is played by MOS in repositioning midsize telescopes, with instruments dedicated exclusively to completing large surveys of galaxies and quasars, such as LAMOST (Cui et al. 2012), WHT/WEAVE (Dalton et al. 2012), SDSS-IV (Blanton et al. 2017), and DESI (DESI Collaboration et al. 2016). The instrument suite of 6–10 m class optical/infrared telescopes usually contains MOS and IFU capabilities, e.g., Keck/DEIMOS (Faber et al. 2003), Keck/MOSFIRE (McLean et al. 2012), VLT/VIMOS (Le Fèvre et al. 2003), VLT/KMOS (Sharples et al. 2006), Gemini/GMOS (Hook et al. 2004), Subaru/FMOS (Kimura et al. 2010), MMT/Hectospec (Fabricant et al. 2005), and Magellan/IMACS (Dressler et al. 2011). In the near future, new wide-field (>1°), high-multiplex (>1000 targets) MOS instruments will be mounted, such as VLT/MOONS (Cirasuolo et al. 2012) and VISTA/4MOST (de Jong et al. 2012). All new-generation ground-based optical/infrared telescopes have MOS and IFU instruments planned, e.g., ELT/MOSAIC (Jagourel et al. 2018), TMT/WFOS (Pazder et al. 2006), and GMT/GMACS (Pak et al. 2020). Instruments on the flagship James Webb Space Telescope, including NIRSpec and MIRI, will also have MOS/IFU capabilities. The MOS/IFU techniques have also been routinely used in the radio and submillimeter regime (e.g., VLA and ALMA; Thompson et al. 1980; Wootten & Thompson 2009).

As a result of transformational large-scale public surveys and concerted guaranteed time efforts completed over the past 3 decades, a growing body of spectroscopic observations has been made available to the community. Complementing guaranteed time observations and large-scale public surveys, individual investigators have added MOS and IFU observations tailored to specific extragalactic science goals. Obtaining statistically robust samples and particular science cases that involve targets distributed across the sky (e.g., galaxy population studies in galaxy clusters, quasar surveys, high-redshift galaxy surveys, or intra-/circumgalactic medium absorption line surveys) requires the combination of data coming from different telescopes and instruments. Further, the advent of online databases has made access to fully or partially reduced observations easier (Ginsburg et al. 2019), enabling individual authors to make use of existing spectroscopic observations for new science goals, possibly combining data from different telescopes. The sheer volume of data warrants automated analysis pipelines with minimal human interaction.

Striving for reproducible results, many authors in the field provide machine-readable data plus scripts used to obtain the results with the publication, e.g., the Jupyter (Kluyver et al. 2016) notebook used for creating figures or the CASA script used to reduce and make images from ALMA data. However, publications mainly focus on the originally intended science case, leaving by-products and intermediate results largely unreported (e.g., unreported line fluxes when the goal was measuring redshifts). In order to incorporate archival spectroscopic observations into new projects, researchers need to partially reproduce and build upon the efforts of the original authors.

The first fundamental property encoded by a spectrum is the source's redshift. A number of powerful, modern tools assist astronomers in obtaining accurate redshifts for large samples in an automatic and unsupervised way while also ensuring the reliability of the results (e.g., EZ; Garilli et al. 2010). Apart from providing redshifts, the scientific potential of MOS and IFU observations is realized in extracting the (resolved) physics and chemistry of extragalactic objects from emission and absorption lines. With a growing body of literature with tailored science goals, each publication uses heterogeneous data and methods to measure emission and absorption lines. As the astronomy community further adopts the Python programming language (e.g., Astropy; Astropy Collaboration et al. 2013), various interfaces for fitting functions exist. However, the low-level function fitting packages require individual authors to write their own bindings to interface between the reduced astronomical data and the fitting software. With a high level of duplicated effort in the community to write tailored code to fit spectral lines and the high costs associated with sharing and maintaining it, access to data analysis software entails a great deal of overhead and represents a barrier to entry for the field of spectroscopy.

Galaxy Line Emission & Absorption Modeling 3 (gleam; Stroe & Savu 2020) is a software tool for fitting Gaussian models to emission and absorption lines in large samples of galaxy and AGN spectra. gleam has versatile science applications involving large samples of 1D spectra or IFU observations. For example, gleam is ideally suited for unveiling the detailed physics and chemistry of galaxies, as derived from interstellar medium line ratios and stellar absorption lines, for a variety of samples spanning both cosmic time and environment. gleam can also aid in the exploration of IFU cubes through spatially resolved physics, kinematics, and chemistry. Requiring only the source redshifts and with little to no interaction, the user can analyze large numbers of spectra in a uniform manner, even with data taken in different conditions, with different instrument setups, on different telescopes, at a range of signal-to-noise (S/N) regimes, and for a wide variety of sources. We tested gleam mainly on optical and infrared spectra; however, we expect it to also work well on radio and submillimeter spectra.

In this paper, we provide an introduction to the gleam software, focusing on features contained in the v1.0 release. Section 2 describes the basic functionality, while Section 3 discusses the necessary input files and expected outputs. Section 4 covers some example applications and uses. In Section 5, we present the open-source development model adopted for gleam, while in Section 6, we discuss possible extensions to the code in the near future.

2.  gleam: Galaxy Line Emission and Absorption Modeling

With gleam, the user can process large numbers of sources in batch mode, taking advantage of the multiprocessing capabilities of modern CPUs. Optionally, gleam also provides an interactive interface to inspect individual line fits on a spectral line and source-by-source basis. gleam fits emission and absorption lines in fully reduced 1D spectra using per-source spectroscopic redshift information and fitting constraints from a central configuration. The central configuration encourages users to define common fitting constraints for broad groups of spectra and be deliberate when defining overrides, which helps prevent user errors and facilitates an easier review of the methods and results by collaborators, referees, and readers. gleam fits all lines listed in a central line list and can jointly fit lines located close together. gleam also reports upper limits and identifies lines without spectral coverage. If required, the user can provide a file containing sky bands, sky lines, and/or OH lines to be masked and disregarded during line fitting. At its core, gleam uses the popular LMFIT Python package 4 (Newville et al. 2019) to perform the line fitting and calculate and report errors on fit parameters. gleam is also well integrated with Astropy 5 (Astropy Collaboration et al. 2013), which enables the use of units and FITS tables.

As output, gleam creates a FITS table with Gaussian line measurements and upper limits (as the case may be), including central wavelength, width, height, and amplitude, as well as estimates for the continuum under the line, the line flux, luminosity, equivalent width, and velocity width. gleam can also make plots of the entire spectrum with fitted lines overlaid, as well as plots for each individual line fitted, using Matplotlib (Hunter 2007).

gleam follows open-source practices, with planned features to be added to the living codebase published online on Github at https://github.com/multiwavelength/gleam. The latest release of gleam can be installed easily with Python pip.

3. The Software

Here we introduce gleam's main functionality, features, required inputs, and outputs. For full documentation of the code, we encourage the reader to consult gleam's Github page at https://github.com/multiwavelength/gleam.

3.1. Model Fitting

In fitting the spectrum, gleam groups neighboring spectral lines. For each spectral line group, a user-defined window of the spectrum around the group is considered for fitting. gleam models each group as the sum of a constant for the continuum and one Gaussian for each spectral line. The assumption that the continuum is locally constant might fail if the window is too wide, while a too narrow window will not have enough line-free spectrum to properly constrain the continuum. Sections of the spectrum suffering from contamination, such as areas with sky lines, can also be masked.

The centers of all Gaussian components can be fixed, constrained to user-defined intervals, or left as free parameters. An initial guess for the central wavelength of each Gaussian is used to identify the component. This initial guess is calculated from the user-provided redshift for each spectrum and the global list of lines at rest-frame wavelengths. To offer the flexibility to fit emission and absorption lines in a range of galaxy and AGN spectra, gleam relies on a single prior, the source redshift, for initializing the line fitting solution. In all but the brightest sources with the highest-S/N spectral lines, a spectroscopic-quality redshift is required.

A fit is accepted when every Gaussian component passes the user-specified S/N. When, due to noise and the overlap with sky lines, there is insufficient information in the data to fit the entire model, gleam iteratively removes Gaussian components in search of an acceptable fit. Any removed Gaussian components are treated as nondetections, and upper limits are computed for them.

gleam employs LMFIT to perform the fitting (Newville et al. 2019). Through a nonlinear least-squares minimization using the Levenberg–Marquardt method, LMFIT enables the robust estimation of both model parameters and their errors.

3.2. Naming Convention

When handling large numbers of spectra coming from different observations, sometimes from different telescopes, it is important to adopt a consistent naming convention. gleam helps with this by prescribing a four-part hierarchical naming convention that allows for easy identification and grouping of spectra.

Each measured spectrum is uniquely identified by the combination of the following four properties.

  • Sample: a label for the parent sample for the source, e.g., the name of the parent galaxy cluster or famous field.
  • Setup: a label for the telescope, instrument, or mode used for the observation.
  • Pointing: an identifier for the individual pointing, fiber configuration, or slit configuration/mask the observation is part of.
  • SourceNumber: a source number to distinguish a target within a sample, setup, and pointing combination.

3.3. Inputs

gleam uses five kinds of input files to gather information about spectra in order to compute the properties of its spectral lines:

  • 1.  
    a set of 1D spectra;
  • 2.  
    metafiles that provide a reference redshift for each spectrum;
  • 3.  
    a configuration file, which specifies the choices of spectral lines, fitting parameters, cosmological parameters, and sky masking;
  • 4.  
    a line table with the rest-frame wavelengths of the spectral lines of interest; and
  • 5.  
    (optional) a sky band catalog, with details of any wavelengths contaminated by sky absorption/emission.

In a single run, gleam can process spectra that originate from different data sets and might have different units for wavelength or flux. It is therefore highly recommended to include units in the headers of all spectra files. gleam propagates the units to the results. Line files and sky band files should also specify units in the file headers to ensure alignment with the spectra.

The metadata file contains information about individual spectra in the project, such as the setup and pointing they were observed with, a numeric identifier, and their redshift. The user may add custom columns to the metadata file to store other information about the spectra, such as the sky coordinates, quality flags, source types, etc. The project can have a single metadata file or multiple ones, as long as the spectra are uniquely labeled. Because gleam does not process the sky coordinates for sources, it cannot detect when two spectra pertain to the same source and, therefore, will produce separate independent fits for each input spectrum. It is incumbent upon the user to reason about which spectrum best fits their science requirements. When it is appropriate, another approach would be to combine/stack the relevant observations into a single spectrum before running gleam.

gleam can uniformly process large numbers of spectra, even with data taken in different conditions, with different instruments on different telescopes, and for a wide variety of sources. The configuration file is used to concisely describe how the different spectra should be processed, so they can be analyzed together. For easy editing and review, the configuration file for gleam uses the YAML 6 format. Taking advantage of the naming convention and the many reasonable defaults, the user can tailor the analysis at three levels. The global-level parameters override the default configuration for all spectra. The setup level offers a way to apply configuration overrides to groups of spectra (named setups). This level can be used to capture differences between telescopes or instruments, such as the spectral resolution. At the most granular level, the user can customize parameters for individual sources. While per-source overrides can help account for some particular cases (e.g., a small percentage of sources with both narrow and broad emission lines), they should be used sporadically due to the associated typing burden and in the spirit of keeping the results comparable. The model parameters for each spectrum are computed by stacking the applicable overrides on top of the default in order: first, the global overrides, then any applicable per-setup overrides, and, finally, any applicable per-source overrides.

gleam cannot specify any default for the line table and instrumental resolution, so this information needs to appear at some level in the configuration file. With these two fields, we present a minimal working gleam configuration example:

gleam fits all lines listed in the line table and iteratively eliminates model components when the data do not yield satisfactory fits for all lines. An S/N parameter, which defines the minimum accepted ratio between the estimated amplitude of a component and its error, separates detections from upper limits for each spectral line. In some setups, the user may select a starting subset of the lines and avoid unnecessary trials (e.g., excluding faint lines the user does not expect to be detected).

By default, there is no sky masking, and the entire spectrum is used. However, the user may control whether the model should ignore portions of the spectrum where sky bands may not have been reliably subtracted. These bands are masked and disregarded for fitting and treated as if no spectral coverage is available. For example, for a data set where sky subtraction only failed for a few of the spectra, the configuration may specify the sky band catalog at the global level but only turn masking on for individual sources (or setups) for which the sky subtraction is inadequate.

gleam offers users a lot of flexibility in choosing the way line models are fit to the data. Neighboring Gaussian components can be fit jointly to account for nearby or blended lines. The user can also define the amount of continuum to be fitted on either side of a group, which should be large enough to encompass enough line-free continuum. Over the range specified, the continuum should be well approximated by a constant. Any unrelated lines that fall within the selected continuum are automatically masked. The center of each Gaussian component is first estimated based on the rest-frame wavelength in the line catalog and on the redshift estimate of the spectrum (listed in the metadata file). The Gaussian center can be fixed to the initial guess, it can be allowed to vary within a small range around it, or it can vary freely within the spectral range of data considered when fitting. This final option can lead to lines being mislabeled or cross-labeled, so it should only be used when the redshift estimate for the spectrum is so poor that neither of the two other options is feasible.

To report luminosities based on the fitted models, gleam uses a set of cosmological parameters: Hubble constant (H0), Ω0, and the cosmic microwave background temperature. While the default values for these parameters are reasonably accurate and up to date, some projects may require slightly different values. The cosmology section overrides one or more of these parameters, and the resulting cosmology is then used consistently across all spectra within a project. This is the only set of overrides that cannot be made on a per-setup or per-source basis, since doing so could produce results that are not comparable between sources.

3.4. Outputs

For each of the sources in the sample, gleam produces a FITS table with all of the line fits and upper limits (with units derived from the input data, if available). Each line fitted is represented in a separate row, with all corresponding line fit details contained in different columns. The output table contains fit parameters and their associated errors (such as the continuum estimation, central line wavelength, Gaussian height, standard deviation, and amplitude), line fluxes, luminosities, and equivalent widths. Each row also contains a flag for spectral coverage (i.e., whether the line is covered by the input spectrum) and another to indicate detection (whether the line is detected above the required S/N). Fit values and errors are omitted when the spectrum does not cover the spectral line. If a line is not detected, gleam only reports an upper limit in the amplitude column and omits all other Gaussian fit parameters. The deconvolved and velocity FWHMs are only reported if the line is spectrally resolved.

If plotting is enabled, gleam produces two types of figures. The first type of figure shows the entire spectrum with zoom-ins on the emission and absorption line fits (see Figure 1). The second type of plot is focused on each line fit. Masked sky areas are shaded gray for clarity.

Figure 1.

Figure 1.  gleam emission and absorption line fits, highlighting different source types and origin telescopes.

Standard image High-resolution image

4. Example Applications

We demonstrate gleam's capabilities by showcasing two natural applications that also served as test beds during the code development.

4.1. A Large, Heterogeneous Sample of Extragalactic Spectra

gleam is well suited for measuring emission and absorption lines in large, heterogeneous samples of extragalactic 1D spectra. In Stroe & Sobral (2021), we thoroughly tested gleam on all of the spectroscopy available to us, which included about 4200 passive galaxies, star-forming galaxies, AGNs, and quasars at redshifts from zero to ∼1. The data were taken with four different instruments (VLT/VIMOS, WHT/AF2, Keck/DEIMOS, and MMT/Hectospec) that employ different techniques to achieve MOS capabilities (slits, fibers) under a range of sky, weather, and seeing conditions. In this project, the focus was on measuring optical emission lines, such as [O ii] (3728 Å), Hβ, [O iii] (4960, 5007 Å), Hα, [N ii] (6550, 6585 Å), Hβ, and [S ii] (6718, 6733 Å), with the spectral resolution for all instruments being sufficient for separating nearby narrow emission lines.

For the entire sample of 4200 sources, a simple, short configuration file was sufficient, as illustrated below. The fitting constraints, the set of spectral lines to be fit, and the sky lines to be masked were the same for the bulk of the sources. With gleam, it was easy to set a different resolution for each instrumental setup and, when necessary, add, for example, a different continuum width (which resulted in more stable fits), turn off sky masking (when data reduction adequately corrected for the sky absorption), or use a different line table (when an air versus a vacuum wavelength calibration was applied to the data). We also set overrides for several individual sources. For example, we fit the full line list when [N ii] was also present in the data or when lines were blended. The YAML configuration file for this example application can be found below and demonstrates all of the customization options that gleam provides. Examples of line fits can be found in Figure 1.

Without plotting, the fitting for 10 spectral lines (with Hα + [N ii] and the [S ii] doublet fit jointly) for the sample of >4000 sources could be completed in less than 20 minutes, using six threads on a modern laptop with a 2.9 GHz 6-Core Intel Core i9 processor. With plotting, the process can take up to 3 hr.

4.2. IFU Observations

Very powerful Python wrappers tailored for the analysis of IFU data exist in the literature (e.g., GIST; Bittner et al. 2019). Their design is tailored to accomplish complex applications, such as the detailed modeling of absorption lines in passive galaxies, which require input spectra with good continuum detections. gleam does not make assumptions on the underlying physics of the sources, making it a complementary tool for emission line–dominated sources that do not benefit from high-S/N continuum detections.

In Stroe et al. (2020), gleam was used to measure spectral lines in Gemini/GMOS IFU observations of five emission line–dominated cluster galaxies at z ∼ 0.2. For analysis with gleam, the IFU cube for each of the sources was split into individual spaxels. The nature of the project required slight reinterpretations of the naming convention. The Sample component was set to the name of the parent galaxy cluster, Pointing was used to specify the galaxy, while SourceNumber was used to label each spaxel in the IFU. Coordinates for each spaxel were added to the metadata file to track the connection between the spaxels and their sky positions with respect to each galaxy. This is a good example of using the metadata file for storing more than just the redshift information. With this interpretation of the naming convention, the YAML configuration for the project could be specified in just a few lines:

5. Development Model

gleam is developed as an open-source project hosted at https://github.com/multiwavelength/gleam and published under the permissive BSD-3-Clause License. The authors welcome community contributions in the form of bug reports and feature suggestions, as well as code contributions under the same license via the GitHub pull-request system. gleam is written in the Python programming language, which is a popular choice both within and outside the astronomical community, with the hope that interested contributors would find it easy to get started. The project aims to offer an inclusive and welcoming place for collaboration and has adopted the Astropy Community Code of Conduct. 7

6. Future Developments

In the near future, we will explore a number of natural extensions to gleam.

In its first iteration, gleam was envisioned to work out-of-the-box for most extragalactic science cases, hence the choice of Gaussian models for the fitting. In the future, we plan to expand the choices for component types with other models, such as Voigt (e.g., absorption lines toward quasars) or asymmetric (e.g., Lyα emission) profiles and more complex continuum models.

We aim to also provide better support for multiple component fits, such as when both broad and narrow lines are present in an AGN spectrum. In its iterative refinement of the model, gleam removes Gaussian components in a sequential fashion. As such, the most complex model that converges is passed through the S/N criterion to identify detected spectral lines. As evidenced in Section 4, gleam robustly fits well-separated spectral lines at nonredundant wavelength separations. For scenarios in which many (>3) blended or nearby lines at lower S/N are jointly fit, sometimes lines are cross-identified. Incorrect matching/labeling of spectral lines can be avoided by making use of constraints on the center of each Gaussian component. In the future, a number of new additions to gleam will ensure successful and correct fits to a wider variety of science cases. At the moment, line fitting in gleam does not take into account line ratio predictions from radiative modeling and, as such, allows for any ratio between spectral lines. The option for a tighter coupling between line ratios could be desirable for specific science cases or, for example, low-S/N regimes.

Another direction for development would be to investigate other back ends for performing the line fitting, e.g., astropy.modeling from Astropy, which was not available at the time the main code development was occurring for gleam. This approach would enable a closer integration with the Astropy suite of packages.

Further, as mentioned in Section 4, we tested gleam on a variety of optical and infrared observations. In the near future, we will test its robustness when applied to data at other wavelengths, such as radio (e.g., focusing on H i observations) and submillmeter observations (e.g., molecular and atomic lines, especially at high redshift).

The authors are grateful to the referee for constructive suggestions that improved the paper. A.S. gratefully acknowledges the support of a Clay Fellowship. gleam heavily relies on a number of scientific Python dependencies, including Astropy, LMFIT, Matplotlib, and NumPy. Its development makes use of other packages, tools, and services, including git, Poetry, Mypy, Black, Colorama, pydantic, Click, PyYAML, GitHub, and VSCode. In testing the software, we made use of observations obtained with the International Gemini Observatory, the ESO Telescopes at the La Silla Paranal Observatory, the William Herschel Telescope, the W. M. Keck Observatory, and the MMT Observatory.

Software: gleam (Stroe & Savu 2020), Matplotlib (Hunter 2007), Astropy (Astropy Collaboration et al. 2013), LMFIT (Newville et al. 2019), Numpy (Harris et al. 2020).

Footnotes

Please wait… references are loading.
10.3847/1538-3881/abe12a