The EDGE-CALIFA Survey: An Extragalactic Database for Galaxy Evolution Studies

The EDGE-CALIFA survey provides spatially resolved optical integral-field unit and CO spectroscopy for 125 galaxies selected from the Calar Alto Legacy Integral Field Area Survey (CALIFA) Data Release 3 sample. The Extragalactic Database for Galaxy Evolution (EDGE) presents the spatially resolved products of the survey as pixel tables that reduce the oversampling in the original images and facilitate comparison of pixels from different images. By joining these pixel tables to lower-dimensional tables that provide radial profiles, integrated spectra, or global properties, it is possible to investigate the dependence of local conditions on large-scale properties. The database is freely accessible and has been utilized in several publications. We illustrate the use of this database and highlight the effects of CO upper limits on the inferred slopes of the local scaling relations between the stellar mass, star formation rate (SFR), and H2 surface densities. We find that the correlation between H2 and SFR surface density is the tightest among the three relations.


INTRODUCTION
Over the past two decades, observational surveys have established a number of key scaling relations among Wong et al.
H I and CO emission, tracing the atomic and molecular gas respectively.At the same time, studies which combine SFR and gas measurements have been critical to understanding the cycling of gas in galaxies, and form the basis of another important scaling relation, the "star formation law," linking the SFR with gas content (Kennicutt & Evans 2012).Studies of the integrated gas content of galaxies, e.g.xCOLD GASS (Saintonge et al. 2017), have established that the molecular gas mass M H2 is the main determinant of a galaxy's offset in SFR relative to the main sequence (Saintonge & Catinella 2022).Thus, for main-sequence galaxies, all three of the quantities M H2 , M * , and SFR are tightly correlated.Spatially resolved studies of main-sequence galaxies, although limited to smaller samples, have established that these correlations are also observed among the mass surface densities of H 2 , stars, and SFR within galaxies (Wong & Blitz 2002;Bigiel et al. 2008;Wong et al. 2013;Cano-Díaz et al. 2016;Lin et al. 2019).
The Extragalactic Database for Galaxy Evolution (EDGE) survey (Bolatto et al. 2017) was the first galaxy survey to couple CO interferometry, from the Combined Array for Research in Millimeter-wave Astronomy (CARMA), with a large optical program of integral-field spectroscopy (IFS), the Calar Alto Legacy Integral Field Area Survey (CALIFA) (Sánchez et al. 2016a).CAL-IFA provides a full suite of spatially-resolved spectroscopic indicators for abundances, ionization, extinctioncorrected star formation rate, stellar population ages, star formation histories, and kinematics (Sánchez et al. 2016b) for a well-defined sample that is designed to be representative of the z=0 universe (Sánchez et al. 2012).EDGE provides the key molecular gas information for a sub-sample of infrared-selected CALIFA galaxies.Here we briefly summarize some of the main scientific highlights of EDGE.Globally, and consistent with previous work (Saintonge et al. 2011), the EDGE-CALIFA galaxies show a positive correlation between the molecular gas depletion time (τ dep,mol ) and M * (Bolatto et al. 2017), implying that more massive galaxies are less efficient at turning their observed molecular gas into stars.On resolved (∼1-2 kpc) scales, the depletion time is also longer in early-type and massive galaxies (Bolatto et al. 2017;Colombo et al. 2018;Villanueva et al. 2021), but shows no clear relationship with stellar mass surface density (Σ * ) or the Toomre Q instability parameter for stars and gas, and indeed both increases and decreases in τ dep,mol with radius are observed (Utomo et al. 2017).These results underscore the influence of both local and global galaxy properties for the star formation efficiency.The EDGE-CALIFA data also enable detailed studies of CO kinematics, which have been used to validate common approaches to deriving the circular velocity from stellar kinematics (Leung et al. 2018), and demonstrate the influence of thick ionized gas disks on Hα rotation curves (Levy et al. 2018(Levy et al. , 2019)).More recently, the EDGE-CALIFA collaboration has obtained additional CO observations of CALIFA galaxies with the Atacama Pathfinder Experiment (APEX) and the Atacama Compact Array (ACA), in Colombo et al. (2020), Sánchez et al. (2021), Garay-Solis et al. (2023), andVillanueva et al. (2023), but these data sets are not discussed in the present paper.
The need to efficiently manage both spatially resolved and global quantities spurred the design and development of a custom database for use within the EDGE collaboration.The key requirement was to provide, for each galaxy in the sample, a set of global values, 1-D profiles and 2-D images, in separate tables that were easily joined for analysis.Over the course of the EDGE project, a number of intermediate databases have been generated, the most important being the SQL database used in Bolatto et al. (2017), Dey et al. (2019), and Cao et al. (2023).While SQL is a powerful and wellestablished database query language, the modest size of the EDGE repository led us to ultimately adopt a simpler approach based on Astropy tables (Astropy Collaboration et al. 2013Collaboration et al. , 2018Collaboration et al. , 2022)).A pure Python implementation simplifies both the generation of and interaction with the database.The Python-based EDGE-CALIFA database (edge pydb) presented in this paper is the successor to the SQL database, retaining essentially the same information aside from the use of a square rather than hexagonal sampling grid and important differences in handling of the CALIFA data described in §4.Preliminary versions of the Python-based database have been used to produce results presented in Sánchez et al. (2021), Barrera-Ballesteros et al. (2021), Villanueva et al. (2021), and Ellison et al. (2021).An important aspect of our approach is that the format of the tables is largely secondary; we have found the Enhanced CSV (ECSV) format suitable for the global and 1-D profiles and the Hierarchical Data Format version 5 (HDF5) format suitable for 2-D images, but any format that supports preservation of metadata (including FITS) would also suffice.While HDF5 offers improvements relative to FITS in terms of input/output speed and compression, this is of secondary importance for our purposes compared to offering a clear distinction between the original image data and the derived data tables.
In this paper we provide a detailed description of the construction of the database ( §2), discuss some scientific applications ( §3), and provide a comparison with the previous-generation EDGE database ( §4).We summarize our results in §5.

DATABASE CONSTRUCTION
2.1.Input data products Bolatto et al. (2017) presented the original EDGE-CALIFA sample (hereafter simply the "EDGE sample"), consisting of 126 galaxies from Data Release 3 (DR3) of the CALIFA IFU survey (Sánchez et al. 2016a) that were observed in CO and 13 CO by CARMA.To briefly summarize, the CALIFA main sample of 667 galaxies was randomly selected for observation from a 937galaxy "mother sample" derived from diameter (r band isophotal major axis between 45 ′′ and 80 ′′ ) and redshift (0.005 < z < 0.03) cuts applied to the seventh data release (DR7; Abazajian et al. 2009) of the Sloan Digital Sky Survey (SDSS).The EDGE sub-sample was additionally selected to have high infrared flux (as measured by the WISE 22µm survey, Wright et al. 2010) and a possible CO detection in CARMA's E configuration (used to initially survey 177 galaxies) was a prerequisite for continuing with CARMA observations in the D configuration.As a result, the EDGE galaxies tend to have higher gas content and star formation activity compared to the overall CALIFA main sample.More details about the EDGE sample and the CARMA observations are provided in Bolatto et al. (2017).
For each galaxy we obtain three data cubes, an optical cube covering 4200-7100 Å (unvignetted) with a channel width of 2 Å and a FWHM resolution of 6 Å (CALIFA 'V500' setup), and CO and 13 CO cubes covering a typical velocity range of 860 km s −1 (centered on the galaxy's systemic velocity) with a channel spacing of 20 km s −1 derived from downsampling the original 3.4 km s −1 (CO) and 14.3 ( 13 CO) km s −1 spectral channels.The field of view for CARMA EDGE extends to offsets of ∼50 ′′ (50% sensitivity contour) while the hexagonal field of view for CALIFA extends to ∼35 ′′ .CALIFA data for one galaxy, NGC 2486, were excluded from DR3 due to poor quality, and the corresponding CARMA data show only a marginal detection of CO.We therefore drop this galaxy from the final database, leaving a sample of 125 galaxies.The 13 CO cubes have much lower signal-to-noise (owing to the faintness of the line) and are analyzed in a separate paper (Cao et al. 2023).While we focus on the optical and CO data in this paper, similar data products are available for 13 CO.
We note that the native spectral resolution of the CO data is ∼5 km s −1 , and that the EDGE collaboration has also generated cubes with 10 km s −1 channel spacing.These have been used for particular applications where velocity resolution is critical (e.g., studies of CO kinematics), but for most applications a 20 km s −1 channel spacing is preferred because of improved sensitivity per channel.The edge pydb database is therefore based on cubes that employ a 20 km s −1 channel spacing.Similarly, higher spectral resolution CALIFA products are also available using the 'V1200' spectral setup, but these are over a more limited velocity range (excluding the important Hα line), so these are not included in the general-purpose database.
While the CO cubes have a typical synthesized beam FWHM of 4. ′′ 5 (Bolatto et al. 2017), given the ellipticity of the beams we convolve to a common 7 ′′ curcular beam to produce the database.At this resolution, the typical 3σ sensitivity limit of the CO data is 3 K km/s, corresponding to 13 M ⊙ pc −2 of molecular gas.Matching to 7 ′′ resolution requires degrading the CALIFA images from their native resolution of approximately 2. ′′ 5 (FWHM), so we also construct a CALIFA-only database at the native resolution, as discussed further below.

Overall database architecture
In constructing a database for legacy reuse, it became clear that the data can span a dimensionality from 0 (single value for a whole galaxy) to 3 (values that depend on sky position and radial velocity).To minimize redundancy, it is best to store the data in its lowest dimensional form, and replicate it along additional axes only as needed.Given our modest sample size, we found that tables of dimensionality 0 and 1 could effectively be stored as plain text, in the self-documenting Enhanced Character Separated Values (ECSV) format.For the EDGE sample, a 0-dimensional table contains only ∼100 rows, and a 1-dimensional table (e.g.radial profiles, rotation curves, or integrated spectra) will increase the number of rows by a factor of 10-40, which is still manageable in textual format.Higher dimensional tables are stored in a binary compressed format (HDF5).
Even for data of the same dimensionality, it can be advantageous to create separate data tables, corresponding to different data sets or analyses which may be updated over time.For example, a table of global photometric parameters from SDSS can be maintained separately from similar parameters derived from CALIFA or the NASA Extragalactic Database (NED).Alternatively, one may wish to create separate versions of a table corresponding to different resolutions of the same data, or different ways of masking a spectral data cube.Tables can then be joined as needed to avoid handling unnecessarily large tables.To facilitating the joining process, we use standard names for the index columns that are used to match different tables, and we try to avoid duplication of column names for tables that are likely to be joined.Finally, in addition to the tables themselves, we have created a Python EdgeTable class which is a subclass of the astropy.table.Table class.This allows an experienced Python user to immediately get to work joining and plotting the tables.We have added some basic plotting capabilities that are demonstrated in a set of accompanying Jupyter notebooks.The class is made available through the Python package edge pydb1 which can be installed via the standard Python Package Index (PyPI) package manager (using the pip command).The package includes the full Python source code needed to regenerate the database from FITS images.
We discuss the components of the database in further detail below.We first describe the global parameter (0-dimensional) tables, followed by the 2-and 3dimensional image tables, and finally the 1-dimensional profiles (which are generally based on analysis of the higher dimensional data).

Global parameter tables
The global parameter tables are organized into four groups, as described below.
Characteristics of radio observations.-These include the characteristics of the CO data such as beam size, integration time, channel noise, systemic velocity adopted for the observation, and integrated line fluxes for H I, CO and 13 CO emission.H I fluxes in units of Jy km s −1 are derived from spectra obtained in GBT program 15B-287 (PI: D. Utomo) and from archival spectra published by van Driel et al. (2001), Springob et al. (2005), and Masters et al. (2014).Details about the GBT observations will be presented in Wen et al. (2024, in preparation).CO and 13 CO fluxes are provided in the same units and are measured from cubes at both the native resolution (edge coflux natv.csv) and a common resolution of 7 ′′ (edge coflux smo7.csv).Multiple CO tables with similar file names (e.g., edge coobs D.csv, edge coobs E.csv, and edge coobs DE.csv) are present because CARMA observations were conducted in the E configuration for 177 galaxies and in the D configuration for 126 galaxies, yielding a set of lowresolution cubes derived from E array data only, in addition to the principal set containing both D and E array data.Furthermore, the D+E cubes were generated at two velocity resolutions (10 km s −1 and 20 km s −1 ) and at two angular resolutions (native elliptical beam and a common 7 ′′ circular beam).While characteristics of these different cubes are provided in separate ECSV files or table columns, the user should note that only the combined (D+E) products with a 7 ′′ beam and 20 km s −1 channels are provided as 2-D and 3-D image tables.
External databases.-These tables collect galaxy properties summarized in the LEDA2 (Makarov et al. 2014) and NED3 databases, including coordinates, morphology, orientation parameters, diameter, redshift, and distance.Tables of WISE photometry from Bitsakis et al. (2019), corrected following the discussion in Levy et al. (2019, their §5.2), and Galactic extinction from Schlegel et al. (1998) are also provided.
The LEDA table is of particular importance because it provides the primary source for the coordinates of the galaxy centers, as well as the major axis position angles and inclinations, the latter derived from the apparent axis ratio using Equations (1) and (2) from Bottinelli et al. (1983).These orientation parameters are used to determine the galactocentric polar coordinates that are provided in the 2-D table (e.g., Table 2).A Python function (edge pydb.conversion.gcpolr) is provided  Figure 1, derived from the CALIFA global table, shows the location of our sample galaxies in the conventional "main sequence" diagram of SFR vs. stellar mass (M * ), and the strong observed correlation between a galaxy's offset from the main sequence and the observed molecular gas to stellar mass ratio.This figure demonstrates that molecular gas content has a strong effect on the location of a galaxy relative to the main sequence.
Derived parameters.-These include galaxy properties derived from analyses conducted by the EDGE team.Examples include orientation parameters and systemic velocities from rotation curve fitting (Leung et al. 2018;Levy et al. 2018) and half-light radii derived from integrating flux in elliptical rings (Bolatto et al. 2017).

CO image tables
The image tables (saved in HDF5 format) are obtained by sampling FITS images.For matching of the CARMA and CALIFA data, the coordinate grid is established by the original CO cubes, which were imaged with 1 ′′ pixels in an orthographic projection in J2000 coordinates.The spatial extent of the cubes has been truncated to 128 × 128 pixels; this is sufficient to cover both the CARMA and CALIFA fields of view.In addition to the CO cubes themselves, we sample moment images that are derived from the cubes following a masking process.Three different masking approaches are employed, with the results from each approach saved in a separate table ('path') in the HDF5 file.The paths are labeled as follows: • comom str: This is a "straight" moment calculation (no masking employed).Only the 0th moment (integrated intensity) is included, as the higher moments are largely unusable due to noise in the spectrum.Even the 0th moment is typically dominated by noise and is useful primarily as a rough guide to what signal would be clearly detected in the absence of any masking.
• comom dil: Data are first masked using a "dilated" approach (expanding from a 3.5σ threshold over 2 consecutive channels out to a 2σ threshold, with a minimum area of 2 synthesized beams for each mask region) before moments are calculated.
For 13 CO the dilated mask from the CO cube is applied.This table also includes peak signalto-noise ratio images as snrpk 12 and snrpk 13.
Although these are constructed without masking (and should thus logically be included in the comom str table), they are included in this table as most users will adopt the dilated masks for initial data inspection.
• comom smo: This is a "pre-smoothing" approach where the cube is first convolved to a resolution of 14 ′′ before generating a dilated mask as described above.This generally yields a more permissive mask than comom dil, which is then applied to the unsmoothed data before the moments are calculated.For 13 CO the corresponding mask from the CO cube is applied.
For both "dilated" and "smoothed" approaches, an additional mask expansion of 1 pixel in all 3 dimensions (2 pixels for the smoothed approach) is introduced to pick up additional weak emission.Since the masks are 3-D they are included with the cube data rather than the image tables.The overall approach to moment map generation remains as described by Bolatto et al. (2017), but using a Python implementation (maskmoment4 ) to remove a previous dependency on IDL.
Figure 2 shows how the masking methods compare for NGC 4047.The tables above also include estimates of molecular gas surface density (sigmol) derived from the CO moment-0 values by applying a constant COto-H 2 conversion factor (see §2.6.3).Note that tabulated intensities and surface densities have not been deprojected to face-on values, although a cosi column (representing the cosine of the inclination) is provided which can be multiplied by any column to yield a deprojected value based on the LEDA derived inclination ( §2.3).A summary of the columns currently provided in the comom dil table is provided in Table 2.For the CALIFA data, we sample a subset of the Pipe3D analysis products rather than the IFU cubes directly.Separate tables ("paths") for the Pipe3D products labeled as ELINES, flux elines, SFH, SSP, and indices (see Sánchez et al. 2016b) are provided.Each of these products is ingested as a FITS file (or as an extension in a multi-extension FITS file) which is essentially a concatenation of individual images into a pseudo-cube.Tables 3 and 4 summarize the columns that are provided in the SSP and flux elines tables.Pipe3D was run on two sets of cubes, the native-resolution DR3 cubes and a set of cubes obtained by convolving the DR3 cubes to a Gaussian PSF (7 ′′ FWHM) to match the CARMA data.To perform the convolution, we model the CAL-IFA PSF as a Moffat profile with the characteristics provided in the FITS header; in cases where this was not available, we assume a Moffat profile with FWHM 2. ′′ 5 and β M =2.15, which represent the mean and weighted median of the CALIFA DR3 sample as measured by Sánchez et al. (2016a).We store the native and convolved data in separate HDF5 files as detailed below.The images that are matched to the CARMA resolution have been interpolated onto the CARMA astrometric grid, using nearest-neighbor sampling in the reproject Python package, before pixel extraction.

Full pixel databases
For users that wish to recover all pixels from the FITS images, at the cost of significant oversampling, we provide large HDF5 files that contain the complete set of 1 arcsec 2 pixel values in tabular format.The edge carma allpix.pipe3d.hdf5file contains all pixels in the native-resolution Pipe3D images, without regridding to match the CARMA astrometry.The edge carma allpix.2dsmo7.hdf5file, which was used to generate Figure 2, contains all pixels in the matched (7 ′′ FWHM) resolution CO and Pipe3D images, with CARMA and CALIFA data placed in separate "paths" (tables) in the HDF5 file.

Square grid downsampling
Given that a Gaussian beam with FWHM 7 ′′ spans an area of 55.5 arcsec 2 , we are heavily oversampling the beam with our 1 ′′ pixels.We therefore generate smaller tables which sample one of every three pixels in R.A. and Dec. (3 ′′ spacing), reducing the number of pixels per galaxy from 128 2 to 43 2 (a factor of 8.9).We still cover each beam area with slightly more than 6 sampling pixels.Combining the 125 galaxies, the image tables contain 43 2 × 125 = 231 125 rows of sampled image data.The edge carma.2dsmo7.hdf5file contains these extracted pixels from the matched-resolution CO and Pipe3D images.The edge carma.cocubesmo7.hdf5file contains the corresponding CO spectra and CO mask information at the extracted pixels.

Hexagonal grid sampling
Wong et al. factor to deproject to face-on using ledaAxIncl 0 snrpk 13 13 CO peak signal to noise ratio 648 mom0 13 K km s −1 13 CO integrated intensity using dilated mask 1657 e mom0 13 K km s −1 13 CO error in mom0 assuming dilated mask 1657 mom1 13 km s −1 13 CO intensity weighted mean velocity using dilated mask 1673 e mom1 13 km s −1 13 CO error in mom1 assuming dilated mask 1670 mom2 13 km s −1 13 CO intensity weighted vel disp using dilated mask 1731 e mom2 13 km s −1 13 CO error in mom2 assuming dilated mask 1731 a Number of undefined pixels for NGC 4047, out of a maximum of 43 2 = 1849.This will differ among galaxies and is only listed for illustrative purposes.
In addition to the square grid, we have also implemented an option to sample the CARMA and CALIFA data on a hexagonal grid (file edge carma hex.2d smo7.hdf5).For a uniform hexagonal grid, the minimum spacing between grid points can be increased by a factor of 2/ √ 3 compared to a square grid while still achieving the same level of oversampling.Multiplying this factor by 3 ′′ yields a 3.5 ′′ spacing between adjacent hexagonal cells.The reference pixel of the CO image is placed at the (0,0) position of the grid.The grid is constructed using two vectors as shown in Figure 3. Hexagonal cells are created first along the diagonal line and then created in the vertical directions.
Interpolation of the original square pixel values onto the grid is done by calculating the distance from the center of the hexagon to the centers of the four nearest pixels, then weighting the pixel values by inverse distance.This approach differs slightly from the hexagonally sampled SQL database, which also uses a 3. ′′ 5 nearest-neighbor spacing but does not apply interpolation, instead adopting the value of the 1 ′′ pixel closest to each hexagonal grid point.A comparison of the three different sampling approaches (all pixels, square grid sampling, and hexagonal grid sampling) is provided in Figure 4.The comparison shows that the latter two sampling methods provide largely equivalent information to full-pixel sampling in a much more compact data format.

Derived quantities and uncertainties
In addition to the CO intensity data and the Pipe3D output, we include additional derived columns in the database as described below.The star formation surface density Σ SFR is derived from Hα surface brightness using the conversion factor from Equation 5 where we adopt K Hα = 2.53 and K Hβ = 3.61 based on the extinction curve of Cardelli et al. (1989).The uncertainty in this quantity is determined by propagating the errors of the Hα and Hβ intensities assuming these errors are uncorrelated.The third estimate (sigsfr adopt) applies spatial smoothing (by a 2D Gaussian kernel with a standard deviation of 3 ′′ ) to the Hα and Hβ images before their ratio is taken to obtain the Balmer decrement.Smoothing reduces noise in the resulting estimate of A(Hα), increasing the fraction of inferred A(Hα) values between 0 and 2 mag from 54% to 60%; for spaxels Wong et al. classified as star-forming, this fraction increases from 78% to 85%.This comes at the cost of some inconsistency between the coarser resolution A(Hα) images and the 7 ′′ resolution Hα images that are being corrected.We use the unsmoothed error maps for Hα and Hβ when propagating the uncertainty for sigsfr adopt.
None of the error estimates for Σ SFR take into account uncertainties in the extinction curve, IMF, or other factors needed to convert from recombination line flux to star formation rate.When the inferred A(Hα) < 0, we do not make an extinction correction, instead falling back on sigsfr0 and its uncertainty.Values are blanked (set to NaN) when the inferred A(Hα) > 6.

Stellar surface density
The SSP table provided by Pipe3D provides a column mass ssp that provides the stellar mass derived by Pipe3D for each 1 arcsec 2 CALIFA spaxel.We add an additional sigstar column that converts this value to M ⊙ pc −2 using the galaxy distance adopted L Figure 3.The construction of the hexagonal grid.L is the side length of the hexagon and can be set by the user.Blue and green vectors form the basis used to construct the grid.Hexagons are first are created along the diagonal, following the blue vector P .Then, for each hexagon along P , additional hexagons are created following the green vertical vector Q. by Pipe3D.Corresponding dust-corrected estimates mass AVcor ssp and sigstar Avcor are also provided; these make use of A V derived from the stellar population analysis, which can differ significantly from the Balmer decrement derived extinction.Although Pipe3D does not provide formal uncertainties for mass ssp or mass AVcor ssp, the fractional error in the continuum flux (fe medflux) can be used as a rough estimate.
The resolved star formation history provided by the SFH table (in bins of age and metallicity) is also used to provide two estimates of Σ SFR that are included in the SSP table.These estimates are computed by taking the mass fraction at ages <32 Myr (summed over metallicity), scaling by either sigstar or sigstar Avcor, and dividing the result by 32 Myr.The choice of summing time bins up to 32 Myr to measure the "young" stellar population provides a compromise between time resolution and smoothing over the large uncertainties involved in stellar population modeling (see González Delgado et al. 2016 for further discussion).

Molecular gas surface density
The molecular gas surface density Σ mol , including H 2 and associated helium, is derived by multiplying the CO surface brightness with a Galactic value for the CO-to-H 2 conversion factor of (e.g., Bolatto et al. 2013): This column is provided as sigmol in the comom xxx (e.g., comom dil) table.As noted in §2.4,an additional column cosi contains the multiplicative factor needed to deproject the surface density to face-on assuming a plane-parallel disk.If used for comparative analysis, the cos i factor should be used to deproject other surface densities as well.
The use of a constant CO-to-H 2 conversion factor is obviously a simplification, and a number of studies have inferred a decrease of the conversion factor at stellar surface densities of ≳100 M ⊙ pc −2 (e.g., Chiang et al. 2023) and an increase in the conversion factor at low metallicities (e.g., Accurso et al. 2017).We have elected not to implement a variable conversion factor in the current database, for two reasons.First, the fraction of Σ * > 100 M ⊙ pc −2 points is relatively small (∼20%) and even above this limit, the inferred dependence on Σ * is relatively weak for the CO(1-0) line (∝ Σ −0.3 * ).Second, the metallicity range for star-forming spaxels in the EDGE-CALIFA sample is relatively limited (spanning about 0.2 dex or a factor of 1.6) and comparable in size to the systematics in oxygen metallicity calibrations (Marino et al. 2013).We therefore defer the implementation of a variable conversion factor to a future work that can examine a wider range of conditions.

Oxygen abundance
We determine the gas-phase metallicity by taking a ratio of two emission line ratios, [O III]λ5007/Hβ and [N II]λ6583/Hα (i.e., the O3N2 method; Alloin et al. 1979;Pettini & Pagel 2004).We use the following prescription from Marino et al. (2013): (4) which has been derived for the logarithm ranging from −1 to 1.7, encompassing the range of our data.Since the metallicity calibration is reliant upon the emission lines arising from H II regions, we also require a "starforming" BPT classification (SF BPT flag in §2.6.5 below), otherwise the value is blanked.Error propagation is performed using a standard approach taking into account only the uncertainties in the emission line fluxes and assuming they are independent.The resulting values are saved as the ZOH column in the flux elines table (Table 4).

BPT Classification of pixels
The same line ratios used to estimate metallicity can also provide useful insight into the nature of ionized regions in galaxies.This is the basis of the most widely used "BPT" diagram (Baldwin et al. 1981) 2010) line as Seyfert (BPT code 2).The BPT code for each sample pixel is included as a column in the flux elines table, and an additional SF BPT column is set to True when the BPT code is −1 and the Hα equivalent width exceeds 6 Å; we consider such regions to be very likely to be star-forming (e.g., Barrera-Ballesteros et al. 2018).
To assist with interpreting the distribution of points in the diagram, we implemented a probability calculation algorithm for the BPT types.The flux uncertainties for [N II], [O III], Hα, and Hβ are used to calculate the probability that a datapoint actually lies within the region of the BPT diagram that it is assigned to.This calculation is done by constructing a bivariate Gaussian error distribution assuming independent errors, and numerically estimating the fractional area of the Gaussian that lies in the same BPT region as the center of the Gaussian.In practice, this is done by sampling points in the distribution using a 5 × 5 grid spanning ±1σ around the nominal value.Panels (b) and (c) of Figure 5 show how a simple signal-to-noise ratio (S/N) cut applied to the four emission lines compares with imposing the condition that the BPT probability exceed 0.9.A probability criterion allows more uncertain points to be considered as long as they lie safely within their designated BPT region.On the other hand, a probability criterion does reject points that lie close to the demarcation lines or where the BPT regions themselves span a narrow extent, since the resulting classification is deemed more uncertain.

Azimuthally aggregated data
The 1-D radial profile data are obtained from analysis of the original FITS images and provided as plain text tables.These include radial CO profiles derived from the "smoothed" mask moment-0 maps, as well as circular velocity curves derived from the analysis of Leung et al. (2018) for CO and Levy et al. (2018) for CO and Hα.The CO radial profiles and rotation curves are plotted for the subset of 54 EDGE galaxies studied by Leung et al. (2018) in Figures 6 and 7 respectively.
Radial CO brightness profiles (deprojected to face-on) are calculated for galaxies with inclinations < 85 • by averaging in concentric rings after setting masked pixels to zero; only rings with 10% or greater valid pixels are tabulated.The less restrictive "smoothed" moment masks are used in order to include more pixels in the averaging.The ring widths used are 2 ′′ for the native resolution cubes and 3 ′′ for the common resolution (7 ′′ ) cubes.Error bars on the points represent the r.m.s.deprojected value within the ring for unmasked pixels.A 3σ detection limit profile, derived from collapsing and azimuthally averaging the r.m.s.noise cube over a 200 km/s velocity width (or using the actual mask width if larger) is shown as a black dashed line in Fig. 6; this is higher near the galaxy center where fewer independent beams contribute to the azimuthal average.Note that the radial profiles may underestimate the true brightness profiles because faint CO emission may still be present at the masked pixels.Simple alternatives, such as not employing any masking, yield poorer results due to negative residuals in the CO cubes.A more sensitive approach is to use the Hα velocity field to align and stack CO spectra, as presented by Villanueva et al. (2021).To derive a radial intensity profile, the stacked spectra are integrated over a velocity window that becomes narrower with increasing galactocentric radius; this is equivalent to using the Hα velocity to define the centroid of a signal mask and the galactocentric radius to specify the mask's velocity width.
Rotation curves are also deprojected to represent circular velocities v c , although we plot v c sin i in Fig. 7 to allow more straightforward comparison of analyses where different inclination angles have been assumed.In addition to providing the curves previously published by Leung et al. (2018) and Levy et al. (2018), we have derived new CO rotation curves for 79 galaxies using the 3D-Barolo software (Di Teodoro & Fraternali 2015) applied to 10 km s −1 channel cubes with the primary beam pattern normalized out (so the noise does not rise toward the edge of the field).The fits were performed with the center position and disk inclination fixed to their LEDA values and in two passes where the first pass determines an average position angle and systemic velocity across all available rings and the second pass determines the circular speed and gas velocity dispersion (assumed isotropic) of each ring.The latter quantity, σ v (R), is not typically available from analysis of velocity images alone.The database provides fit results for both passes as separate ECSV files, and provides global orientation parameters for each galaxy as an additional table edge bbpars smo7.csv.While 3D-Barolo implicitly takes beam smearing into account by convolving its model with the synthesized beam before comparison with the data, it can have difficulty distinguishing between a steeply rising rotation curve near the nucleus and an increase in velocity dispersion there, so the sharp rise in circular speed inferred near the centers of some galaxies may be spurious.Overall, Fig. 7 suggests that the 3D-Barolo results generally agree with the circular velocity estimates derived by Leung et al. (2018) and Levy et al. (2018).

Star formation scaling relations
A straightforward application of the database is to make scatter plots comparing two flux-derived quantities.Recently there has been much attention given to the relative tightness of observed relations between  .Inclination-corrected radial CO profiles for the sample of 54 galaxies studied by Leung et al. (2018).The dashed curve in each panel represents a 3σ detection limit adopting a default velocity width of 200 km s −1 but using the actual mask width where it exceeds 200 km s −1 in width.Gray points denote radii where the azimuthal average is measured (i.e.there are >10% unmasked CO pixels at these radii) but falls below the detection limit and is thus unreliable.  .CO rotation curves derived from the 3D-Barolo package (blue circles) compared to those previously published by Leung et al. (2018) and Levy et al. (2018), for the sample of 54 galaxies studied by Leung et al. (2018).Circular velocities have been multiplied by a sin i factor to compensate for different adopted inclinations; the plotted curves should thus approximate the observed velocity widths.Differences are most apparent in the central rings, where varying treatment of beam smearing and velocity dispersion can have significant effects.
Figure 9 provides a more quantitative characterization of the three relations, making use of the LEO-Py code (Feldmann 2019) to deduce the best-fit power law relations taking into account censoring due to nondetections.Only the points from Figure 8 that are classified as star-forming (SF BPT=1) are used in the analysis.LEO-Py derives the fit parameters using likelihood maximization while assuming that the errors in both axes are uncorrelated and each follow a normal distribution.Non-detections are provided as 3σ upper limits.The best-fit slope, intercept, and vertical scatter are included in the legend of each sub-panel.The slopes of the three relations range from 0.8 to 1, thus being close to or slightly sub-linear.For comparison, the results of an ordinary least squares (OLS) fit to the detections only is shown as a dashed magenta line.The LEO-Py and OLS fits are most discrepant in panel (b), where the large number of CO non-detections leads to a bias towards high values of Σ mol at low Σ * and a resultant flattening of the OLS slope.This is less of an issue in panel (a) because the distribution of Σ SFR values at a given Σ mol are generally well-sampled and not strongly biased by non-detections.
Figure 10 summarizes the tightness of the three relations by comparing the Spearman partial correlation coefficients between each pair of variables.For each pair of variables X and Y , the partial correlation is the correlation between the residuals θ X and θ Y after removing the best-fit power laws relating X with Z and Y with Z respectively.It therefore signifies how well X and Y are expected to be correlated when Z is held constant.The partial correlation coefficients confirm that the rKS relation is the tightest correlation between any pair of variables.Although less tight than the rKS relation, the rMGMS shows a notably stronger partial correlation compared to the rSFMS, in agreement with the conclusion by Lin et al. (2019) from the ALMaQUEST survey that the rSFMS is largely a consequence of the other two relations.On the other hand, we find that the partial correlation coefficient between Σ SFR and Σ * is still positive, whereas Baker et al. (2022) find a value consistent with zero for the ALMaQUEST sample, which includes a substantial fraction (∼50%) of galaxies below the main sequence.Whether sample selection effects, in particular the bias towards higher SFRs in EDGE, is responsible for this difference will be an important avenue for future investigation, since it has implications for whether Σ * serves as an important secondary variable in the star formation law (e.g., Shi et al. 2018).

Spatially resolved star formation histories
Pipe3D models the stellar light as a linear combination of simple stellar populations taken from the GSD156 library (Cid Fernandes et al. 2013), which comprises 156 templates for 39 stellar ages (ranging from 1 Myr to 14.1 Gyr) and four metallicities (Z/Z ⊙ = 0.2, 0.4, 1, and 1.5).For each spaxel, the light fraction contributed by each template is recorded in the SFH table.In order to reconstruct the star formation history, we scale the light fraction by each template's mass-to-light ratio, based on a Salpeter (1955) IMF, then multiply the derived mass fraction (summed over metallicity) by the total stellar mass in the spaxel.We display cumulative star formation histories for two EDGE galaxies, NGC 3687 (which has the prominent Hα ring shown in Figure 4), and NGC 5614 (an interacting galaxy in the Arp 178 triplet) in Figure 11, along with the corresponding radial profiles of specific SFR (sSFR=Σ SFR /Σ * ) based on extinction-corrected Hα emission.Despite the assumptions involved in recovering the SFH, the radial As discussed in §1, a previous implementation of the EDGE database using the SQLite language was used for results presented in Bolatto et al. (2017) and Dey et al. (2019).In Figure 12 we compare the results obtained by Dey et al. (2019), studying the variables that most influence the spatially resolved SFR, with a re-analysis conducted using edge pydb.Displayed is the standardized slope, which represents the coefficient (slope) of the multi-linear fit for a given variable, scaled by its standard deviation to compensate for the fact that variables spanning larger dynamic range can still exert a strong influence even with a relatively shallow power-law dependence.As expected, Σ * and Σ mol remain the most important factors determining Σ SFR , consistent with the results of Dey et al. (2019).The greater importance of Σ * compared to Σ mol may come as a surprise given the relative tightness of the resolved K-S relation discussed in §3.1, but note that the limited dynamic range of the CO observations results in the Σ SFR -Σ * relation spanning a greater logarithmic range in terms of significant detections (cf. Figure 9).Analyses such as those in Figure 12 that do not take the distribution of non-detections into account will therefore favor quantities with larger dynamic range as explanatory variables.
On the other hand, many of the quantities derived from the stellar analysis (metallicity Z * , age τ * , extinction A V ) show noticeably different standardized slopes compared to the previous work.An important distinction is that the SQL database was derived from a Pipe3D analysis of the native resolution data cubes from CALIFA.Smoothing to match the CARMA resolution occurred after the extraction of the stellar population properties.On the other hand, for edge pydb we have degraded the spatial resolution of the CALIFA data to 7 ′′ before running Pipe3D.Especially age and extinction are likely to vary spatially on scales comparable to the native CALIFA resolution of 2. ′′ 5 (∼1 kpc), so smoothing an age or extinction map may yield different results  compared with sampling a map of the same quantity derived from a smoothed IFU cube.For both databases, the gas metallicity is derived from smoothed emissionline fluxes, so better consistency would be expected; however, Dey et al. (2019) use the N2 estimator for 12 + log(O/H) (Marino et al. 2013) whereas the present database employs the O3N2 estimator.The open triangle in Figure 12 shows the edge pydb result if one applies the N2 estimator, which is more consistent with the earlier results.Furthermore, the additional smoothing when calculating the Balmer decrement described in §2.6.1 was not used in the SQL database-this has the potential to introduce additional shifts in all of the relations involving Σ SFR .

SUMMARY AND FUTURE WORK
In this paper we have presented our basic approach to compiling a database of galaxy properties derived from spatially resolved CO and IFU observations.A strong emphasis on usability is essential to our approach, as the database is produced in order to conduct the sci-ence rather than as a benefit to the community after the science has been conducted.We take advantage of self-documenting formats now provided by Astropy tables, and provide sample notebooks that demonstrate how scientifically interesting plots can be generated from the data.
While the EDGE-CALIFA project began with a CAL-IFA subsample observed with CARMA, it has expanded as additional time has been secured to observe CAL-IFA galaxies in CO with other millimeter facilities, including the Atacama Pathfinder Experiment (APEX; see Colombo et al. 2020), the Atacama Compact Array (ACA; see Villanueva et al. 2023), the Large Millimeter Telescope (LMT/GTM), and the Green Bank Telescope (GBT).The vast majority of CALIFA galaxies still lack spatially resolved CO observations, so the database will continue to grow over time.In addition, spatially resolved H I observations have recently been obtained for CALIFA galaxies using the Jansky Very Large Array and the Giant Metrewave Radio Telescope (PIs V. Kalinova & D. Colombo).The use of the edge carma prefix in many of the HDF5 file names implicitly recognizes that the current database is tailored to the initial CARMA sample.Readers wishing to reproduce the exact results of this paper should access the appropriate version of the database as stored in the Github version history.The EDGE database architecture can easily be adapted to other IFU surveys, particularly when Pipe3D has been used for spectral fitting, and will continue to be updated as Pipe3D itself undergoes further development (e.g., Sánchez et al. 2023).
The current version of edge pydb at the time of this publication is indexed on Zenodo as zenodo.org/doi/10.5281/zenodo.10431797.The full pixel tables are also available to download from Zenodo at zenodo.org/doi/10.5281/zenodo.10256731.

Figure 1 .
Figure 1.(a) Location of EDGE-CALIFA sample in a plot of SFR vs. M * .The dashed magenta line shows the approximate locus of the star-forming main sequence from Cano-Díaz et al. (2016).CO non-detections are indicated by red squares.Small gray dots are used to represent the full CAL-IFA DR3 sample.(b) Vertical offset from the star-forming main sequence as a function of molecular to stellar mass ratio.The two quantities are correlated, underscoring the importance of gas content for continued star formation.CO non-detections (2σ upper limits) are indicated by horizontal arrows.

Figure 2 .
Figure 2. Integrated CO intensity (moment-0) maps of NGC 4047, obtained with (a) no masking; (b) a dilated mask expanding from a 3.5σ threshold over 2 consecutive channels out to a 2σ threshold, and (c) a smoothed mask where the cube is convolved to 14 ′′ FWHM before a dilated mask is generated.Each panel is approximately 147 ′′ square.The colormap range is shown at the bottom of each panel.
of Calzetti et al. (2010), but with the numerical coefficient scaled by 1.51 to convert from aKroupa et al. (1993) IMF to theSalpeter (1955) IMF employed by Pipe3D:SFR[M ⊙ yr −1 ] = 8.23 × 10 −42 L(Hα) [erg s −1 ] (1)Three separate estimates are calculated.The first estimate (sigsfr0) assumes no extinction and simply rescales the Hα surface brightness, converting it to a luminosity surface density assuming each patch of the galaxy radiates isotropically.The fractional error in this quantity is just the fractional error in the Hα surface brightness.The second estimate (sigsfr corr) corrects the SFR for extinction, A(Hα), as obtained from the Balmer decrement (e.g.,Catalán-Torrecilla et al. 2015):

Figure 4 .
Figure 4. Comparison of the three sampling methods in the edge pydb, using the Hα image of NGC 3687 as an example.Counts of unblanked pixels are given in parentheses.

Figure 5 .
Figure 5.Comparison between BPT plots with different selection cuts applied.(a) All EDGE samples plotted where each of the four emission line fluxes exceeds its 1σ uncertainty.(b) Points from (a) where each line's flux exceeds 3σ.(c) Points from (a) where the BPT probability estimate exceeds 0.9.Gray contours indicate the density of points.Demarcation lines defined by Kauffmann et al. (2003), Kewley et al. (2001), and Cid Fernandes et al. (2010) are shown as dashed, dotted, and solid curves respectively.gions must lie below a curve defined by the equation log [O III] Hβ = 1.19 + 0.61 log([N II]/Hα) − 0.47 .(5) Based on spectra from the Sloan Digital Sky Survey (SDSS), Kauffmann et al. (2003) introduced a more restrictive condition for star-forming regions, which they argued typically lie below the curve defined by log [O III] Hβ = 1.3 + 0.61 log([N II]/Hα) − 0.05 .(6) Finally, in the upper right part of the BPT diagram, signifying harder radiation fields, Cid Fernandes et al. (2010) find that Seyfert nuclei tend to lie above the line log [O III] Hβ = 0.48 + 1.01 log [N II] Hα ,(7)

Figure 6
Figure6.Inclination-corrected radial CO profiles for the sample of 54 galaxies studied byLeung et al. (2018).The dashed curve in each panel represents a 3σ detection limit adopting a default velocity width of 200 km s −1 but using the actual mask width where it exceeds 200 km s −1 in width.Gray points denote radii where the azimuthal average is measured (i.e.there are >10% unmasked CO pixels at these radii) but falls below the detection limit and is thus unreliable.

Figure 7
Figure7.CO rotation curves derived from the 3D-Barolo package (blue circles) compared to those previously published byLeung et al. (2018) andLevy et al. (2018), for the sample of 54 galaxies studied byLeung et al. (2018).Circular velocities have been multiplied by a sin i factor to compensate for different adopted inclinations; the plotted curves should thus approximate the observed velocity widths.Differences are most apparent in the central rings, where varying treatment of beam smearing and velocity dispersion can have significant effects.

Figure 8 .
Figure8.Mutual correlations between molecular gas surface density (Σ mol ), star formation rate surface density (based on extinction-corrected Hα emission, ΣSFR), and stellar surface density (Σ * ) for the full set of square grid downsampled pixels.The three panels correspond to the Kennicutt-Schmidt, molecular gas main sequence, and star-forming main sequence diagrams respectively.Points in each panel are color coded and binned by the value of the third parameter.A horizontal line in the middle panel denotes the approximate sensitivity limit for the CARMA CO data.

Figure 12 .
Figure12.Comparison of SQL and edge pydb databases, following the analysis ofDey et al. (2019).The "standardized slope" on the horizontal axis is the coefficient of each quantity considered in the multi-linear fit for log ΣSFR, scaled by the standard deviation of the same quantity in order to reflect its contribution to the fit.In both analyses Σ * and Σ mol are the dominant contributors to the SFR.The blue open triangle for O/H shows the edge pydb result if the N2 estimator used in the SQL database were to be used.

Table 1 .
Description of columns in the CALIFA global table.
cazgas Redshift for gas lines from z gas in get proc elines table cazstars Redshift for stars from z stars in get proc elines table caAge dex(Gyr) Mean stellar age from log age mean LW in get proc elines caeAge dex(Gyr) Error in mean stellar age from s log age mean LW in get proc elines caFHa dex(10 −16 erg cm −2 s −1 ) Log of Hα flux from log F Ha in get proc elines caFHacorr dex(10 −16 erg cm −2 s −1 ) Log of extinction corrected Hα flux from log F Ha cor in get proc elines caLHacorr dex(erg s −1 ) Log of extinction corrected Hα luminosity from log L Ha cor in get proc elines caMstars dex(M⊙) Log of stellar mass from log Mass in get proc elines caeMstars dex(M⊙) Error in log of stellar mass from error Mass in get proc elines caSFR dex(M⊙ yr −1 ) SFR from lSFR in get proc elines caeSFR dex(M⊙ yr −1 ) Error in SFR from e lSFR in get proc elines caOH dex Oxygen abundance as 12+log(O/H) from OH O3N2 in get proc elines caeOH dex Error in oxygen abundance from e OH O3N2 in get proc elines caAvgas mag Nebular extinction as Av from Av gas LW Re in get proc elines caeAvgas mag Error in nebular extinction from e Av gas LW Re in get proc elines caAvstars mag Stellar extinction as Av from Av ssp stats mean in get proc elines caeAvstars mag Error in stellar extinction Av ssp stats stddev in get proc elines caDistP3d Mpc Luminosity distance in Mpc used by Pipe3D, from DL in get proc elines caDistMpc Mpc Luminosity distance in Mpc from cazgas assuming H0=70, Ωm=0.
Fernandes et al. (2013)te these coordinates based on different assumed orientation parameters.CALIFA parameters.-Theseincludegalaxypropertiesderivedfromanalysesconducted by the CALIFA consortium.Examples include global measures of extinction, star formation rate, and stellar mass, effective radii from SDSS photometry, age and metallicity obtained from the Pipe3D analysis pipeline(Sánchez et al. 2016b), distances obtained from redshift, and CALIFA data quality flags from the Third Data Release(Sánchez et al. 2016a).The contents of this table are summarized in Table1.As noted inSánchez et al. (2016b), Pipe3D uses the gsd156 stellar library from CidFernandes et al. (2013), adopting a Salpeter (1955) stellar initial mass function (IMF), which we adopt for consistency throughout the database.

Table 2 .
Description of columns in the comom dil table.

Table 3 .
Description of columns in the SSP table.

Table 4 .
Description of columns in the flux elines table.
Cano-Díaz et al. 2016)z et al. 2Kennicutt 1998) to pixels classified as star-forming [below theKauffmann et al. 2003BPT curve and with EW(Hα) > 6 Å].Upper limits are shown as pink triangles for the vertical axis and blue triangles for the horizontal axis.Overlaid are best-fit power law relations based on ordinary least-squares (magenta dotted line) and LEO-Py, which takes into account non-detections (green solid line with vertical scatter indicated by shading).Figure 10.Spearman partial correlation coefficients calculated based on the LEO-Py fits.The coefficients indicate that the relation between ΣSFR and Σ mol is the tightest among the three correlation pairs.thestellarsurfacedensityΣ*, the star formation surface density Σ SFR , and the molecular gas density Σ mol (e.g.,Lin et al. 2019;Sánchez et al. 2021).Figure8shows the mutual correlations between these three variables in EDGE.In each subpanel, the scatter points are colored and vertically binned by the value of the third variable.While each pair of variables is well correlated, the correlation between Σ SFR and Σ mol , i.e. the resolvedKennicutt-Schmidt relation (rKS;Kennicutt 1998)appears to be tightest and most independent of the third variable, as evidenced by the vertical color gradient in panels (b) and (c) of Fig.8.The resolved molecular gas main sequence (rMGMS;Lin et al. 2019) and resolved star-forming main sequence (rSFMS;Cano-Díaz et al. 2016)show a much clearer dependence on the third variable.This result is in agreement with studies based on the ALMaQUEST survey that have found the Σ SFR -Σ mol relation to be the strongest among the three relations