Automated Mining of the ALMA Archive in the COSMOS Field (A3COSMOS). I. Robust ALMA Continuum Photometry Catalogs and Stellar Mass and Star Formation Properties for ∼700 Galaxies at z = 0.5–6

Daizhong Liu; P. Lang; B. Magnelli; E. Schinnerer; S. Leslie; Y. Fudamoto; M. Bondi; B. Groves; E. Jiménez-Andrade; K. Harrington; A. Karim; P. A. Oesch; M. Sargent; E. Vardoulaki; T. Bǎdescu; L. Moser; F. Bertoldi; A. Battisti; E. da Cunha; J. Zavala; M. Vaccari; I. Davidzon; D. Riechers; M. Aravena

doi:10.3847/1538-4365/ab42da

1. Introduction

The interstellar medium (ISM) is the raw material in galaxies out of which stars form. It plays a fundamental role when reconstructing the universe through cosmological simulations. In galaxies harboring intensive star formation, cold neutral gas is the main component of the ISM dominating its mass, and a significant fraction of this cold gas is in the molecular phase (e.g., Bigiel et al. 2008; Leroy et al. 2008; Walter et al. 2008). Over the past four decades, molecular gas has been observed mainly via the carbon monoxide (CO) rotational transition lines in the rest-frame millimeter wavelengths (which are the most feasible observable tracers of molecular gas; e.g., see review by Solomon & Vanden Bout 2005; Carilli & Walter 2013). However, CO observations at high redshift (z > 1) mostly target the brightest submillimeter galaxies (e.g., review by Blain et al. 2002) and quasi-stellar objects. These objects are the most extreme cases and not representative of the more numerous, less starbursty galaxies, i.e., the star-forming galaxies that follow a tight main sequence (MS) in the stellar mass–star formation rate (SFR) plane (e.g., Brinchmann et al. 2004; Daddi et al. 2007; Elbaz et al. 2007; Noeske et al. 2007). Observing CO in a large number (e.g., a few hundred) of MS galaxies at $z\gt 1$ (hence probing the ISM evolution) is in practice very time-consuming even with the most advanced facility, the Atacama Large Millimeter/submillimeter Array (ALMA), as (1) all galaxies are required to have a spectroscopic redshift in advance, (2) sufficient sensitivity is needed to detect the line within a small spectral bandwidth (typically ∼300–500 km s⁻¹), and (3) the galaxy sample should cover enough parameter space in the MS plane.

In recent years, a much more efficient approach—carrying out broadband dust continuum observations to infer the ISM—has been established. With the use of the entire bandwidth of the receiver, usually much short integration times are needed for high-redshift galaxies than observing (sub)millimeter emission lines (see also Carpenter et al. 2019). Then, the cold gas mass can be inferred either using the gas-to-dust mass ratio, ${\delta }_{\mathrm{GDR}}$ , which has been reasonably characterized as a function of gas phase metallicity (e.g., Santini et al. 2010; Leroy et al. 2011; Magdis et al. 2011, 2012; Magnelli et al. 2012; Bolatto et al. 2013; Rémy-Ruyer et al. 2014; Tan et al. 2014; Coogan et al. 2019), or with the ratio between gas mass and dust continuum luminosity at Rayleigh–Jeans tail wavelengths (e.g., rest-frame 250–850 μm), which has been calibrated with rich observations (e.g., Scoville et al. 2014; Groves et al. 2015; Hughes et al. 2017; Bertemes et al. 2018; Saintonge et al. 2018). The use of dust continuum observations to systematically survey the ISM content in hundreds of high-redshift galaxies is already proved to be fruitful (e.g., Schinnerer et al. 2016; Scoville et al. 2016, 2017).

Meanwhile, the continuously growing ALMA public archive offers a great opportunity of studying very large samples of high-redshift galaxies. The ALMA public archive consists of thousands of observations of high-redshift galaxies within deep fields led by individual Principle Investigator (PI) programs. Although ALMA has a small field of view, e.g., ∼0 farcm 5 in diameter (primary beam FWHM) in Band 6, accumulating archival data compensates for this shortcoming and leads to a several hundred arcmin² area. Comparing to contiguous deep field surveys with ALMA (e.g., Hatsukade et al. 2011, 2016, 2018; Carniani et al. 2015; Aravena et al. 2016; Walter et al. 2016; Dunlop et al. 2017; Umehata et al. 2017, 2018; Franco et al. 2018), the discreteness of field of views makes the sample selection bias and cosmic comoving volume very unpredictable, but it also leads to a sample with large varieties in galaxy properties, which thus provides crucial constraints on galaxy ISM and star formation scaling relations and analytic evolution prescriptions (e.g., Scoville et al. 2017; Tacconi et al. 2018). Moreover, the ALMA archive also serves as a powerful test bed for automated pipelines as in this work and for future large facilities.

Several recent studies have already been exploring the full ALMA archive (e.g., Fujimoto et al. 2017; Scoville et al. 2017; Zavala et al. 2018); however, the photometric methods (including aperture photometry, peak pixel analysis, uv-plane fitting, etc.) and sample selection can significantly differ between different authors. Furthermore, none of these studies have statistically evaluated the effects of applying different photometric methods to ALMA images, especially for large numbers of ALMA images with widely varying sensitivity and synthesized beam properties. Consequently, noticeable discrepancies on the cosmic evolution of the ISM are present among the aforementioned studies. In order to understand how potential biases of the photometric methods and gas mass calibration affect the outcome of ISM evolution studies, more dedicated efforts are required to exploit the public ALMA archive.

In this work, we present automated pipelines for "mining" the public ALMA archive in the COSMOS field (Scoville et al. 2007), hereafter referred to as the "A³COSMOS" (Automated mining of the ALMA Archive in COSMOS) project.¹² This work provides the foundation for a systematic exploitation of the (sub)millimeter continuum as a proxy for cold dust and gas for a diverse and large sample of high-redshift galaxies. The resulting catalog of galaxies with (sub)millimeter continuum detections can be used to, e.g., study the cosmic evolution of the gas fraction and gas depletion time (Liu et al. 2019, hereafter Paper II).

We present our workflow from the raw public ALMA data to the two robust photometric catalogs in Figure 1, which corresponds to Sections 2–3 of the paper. We first describe our ALMA data reduction and continuum imaging procedures in Section 2.1, and then we present the blind source extraction in Section 2.2, prior catalog compilation in Section 2.3, and prior source fitting in Section 2.4. Section 3 includes our extensive Monte Carlo (MC) simulations and analyses to verify our photometry. Section 4 describes how we combine the two photometry catalogs, remove spurious sources, and build a final well-characterized galaxy catalog (a workflow for these substeps is presented at the beginning of Section 4). Finally, the resulting catalogs are described in Section 5, and we summarize the paper in Section 6.

Throughout the paper, we adopt a flat ΛCDM cosmology with H₀ = 70 km s⁻¹ Mpc⁻¹, Ω_M = 0.3, Ω₀ = 0.7, and a Chabrier (2003) initial mass function (IMF).

2. Data and Photometry

The public ALMA archive is growing rapidly through PI-led observations. These observations mainly focus on targeted scientific objectives (sources), which are usually at the phase center of each ALMA pointing. However, with ALMA's unprecedented sensitivity, and benefiting from the negative-K-correction at millimeter wavelengths (e.g., review by Blain et al. 2002; Casey et al. 2014), further (sub)millimeter galaxies can serendipitously appear in any ALMA pointing. Such sources have a sizable chance for detection when their position falls within about twice the primary beam area¹³ of the corresponding ALMA pointing (i.e., with a primary beam attenuation (PBA) ≥0.2).

Here, we conduct a systematic effort to exploit these observational data. We limit our selection to within the COSMOS field (R.A. = 10:00:28.6, decl. = +02:12:21.0, J2000; Scoville et al. 2007) because it is one of the deep fields with the richest, deepest multiwavelength data sets, and there are numerous PI-led ALMA observations within its large area of 2 deg² (compared to the Great Observatories Origins Deep Survey (GOODS) North and South fields, with only 160 arcmin² (0.044 deg²) each). We include all the available ALMA data in COSMOS regardless of the ALMA bands used (but excluded very long baseline data with a synthesized beam <0 farcs 1; see Section 2.1; and the only one mosaic project on the AzTEC-3 protocluster).

COSMOS has extensive imaging data sets covering all accessible wavelength ranges: X-ray (Elvis et al. 2009; Civano et al. 2012, 2016; Marchesi et al. 2016), UV (Zamojski et al. 2007), optical (Capak et al. 2007; Leauthaud et al. 2007; Taniguchi et al. 2007, 2015), near-IR (McCracken et al. 2010, 2012), mid-IR (Sanders et al. 2007, Le Floc'h et al. 2009), far-IR (FIR; Lutz et al. 2011; Oliver et al. 2012), submillimeter (Geach et al. 2017), millimeter (Bertoldi et al. 2007; Aretxaga et al. 2011), and radio (Schinnerer et al. 2010; Smolčić et al. 2017). The depths of the X-ray, UV, optical, and near-IR data sets are listed in Laigle et al. (2016), and the depths of mid- to FIR, (sub)millimeter, and radio data sets are summarized in Jin et al. (2018).

Photometric redshifts have been obtained for ∼1.1 × 10⁶ galaxies through optical to near-IR spectral energy distribution (SED) fitting by Muzzin et al. (2013), Ilbert et al. (2013), Laigle et al. (2016), Davidzon et al. (2017), and Delvecchio et al. (2017) and through optical to millimeter/radio SED fitting by Jin et al. (2018).

Spectroscopic redshifts also exist for ∼7.1 × 10⁴ galaxies, from the latest compilation by M. Salvato et al. (version 2017 September 1; available internally in the COSMOS collaboration), which includes almost all spectroscopic observations in the COSMOS field: (Lilly et al. 2007, 2009, zCOSMOS Survey; with VLT/VIMOS); (Fu et al. 2010, with Spitzer/IRS); (Casey et al. 2012, 2017, with Keck II/DEIMOS); (Comparat et al. 2015, with VLT/FORS2); Le Fèvre et al. (2015) and Tasca et al. (2017) (VUDS Survey; with VLT/VMOS); (Hasinger et al. 2018, with Keck II/DEIMOS); (Kriek et al. 2015, MOSDEF Survey; with Keck I/MOSFIRE); (Marsan et al. 2017, with Keck II/NIRSPEC); (Masters et al. 2017, with Keck II/DEIMOS); (Nanayakkara et al. 2016, with Keck I/MOSFIRE); (Silverman et al. 2015b, FMOS-COSMOS Survey; with Subaru/FMOS); (van der Wel et al. 2016, LEGA-C Survey; with VLT/VMOS); (Yun et al. 2015, with LMT/RSR) (listed only references whose spectroscopic redshifts are used in this work).

We show the pointings of all public ALMA data for the COSMOS field as of 2018 January 2 in Figure 2, overlaid on the Herschel Space Observatory (Pilbratt et al. 2010) Photodetector Array Camera and Spectrometer (PACS; Poglitsch et al. 2010) 100 μm image. All the pointings processed for catalogs presented here are shown in green, and data that will be processed in our next release are shown in magenta, which includes ALMA data becoming public before 2018 August 1. Circle size represents the primary beam.¹⁴ The sum of the primary beam area of these observations reaches 164 arcmin² as of 2018 January 2 and will reach 280 arcmin² in our next release. Some pointings overlap because they are observed at different frequencies or with different spatial resolution. The overlapped area of all pointings is about 12%. Thus, even considering the non-overlapped primary beam area, the current data already reach a spatial coverage similar to the area of the GOODS fields and are much larger than any existing contiguous ALMA deep field survey (e.g., Dunlop et al. 2017, 4.5 arcmin² with 1σ ∼ 35 μJy beam^–1; Franco et al. 2018, 69 arcmin² with 1σ ∼ 0.18 mJy beam^–1).

In Figure 3, we compare the depth and areal coverage of the ALMA archival data in COSMOS at ALMA Band 6 and 7 to the selected existing contiguous ALMA continuum deep fields: Aravena et al. (2016; see also Walter et al. 2016), Dunlop et al. (2017), and Franco et al. (2018; PI: D. Elbaz). Other ALMA deep fields (e.g., Hatsukade et al. 2016; Umehata et al. 2017) have similar properties and are therefore not shown. We compute the depth of each ALMA image by converting its rms noise to an equivalent flux at observed-frame 1.1 mm assuming a modified blackbody with β = 1.8. The green (orange) curve represents the cumulative area of version 20180102 (version 20180801) ALMA images reaching a given sensitivity. Given that the ALMA archival data in the COSMOS field alone cover a larger cumulative area at all sensitivities, a systematic mining of the ALMA archive is strongly motivated. Given the inhomogeneous science goals of the individual PI-led projects, the resulting catalogs will not have a well-characterized selection function and are not complete per se (see Section 4.7 for a discussion of the properties and completeness of the final galaxy catalog).

In the following sections, we describe the reduction and processing of ALMA raw data into image products (in Section 2.1) and the photometric methods used (in Sections 2.2–2.4). We employ two complementary photometric methods, blind source extraction and prior source fitting, to obtain source flux densities and sizes from the ALMA continuum images. A comparison of the two methods and further technical assessments are presented in Sections 2.5–2.7.

2.1. ALMA Continuum Images

We start by querying the ALMA archive with the Python package astroquery (Ginsburg et al. 2019), retrieving all projects publicly available within a search radius of 2° centered on the COSMOS field. These data sets are calibrated with the Common Astronomy Software Applications (CASA; McMullin et al. 2007) using the scriptForPI.py scripts provided by the Joint ALMA Observatory together with the archived raw data.

Calibrated visibilities are imaged and "cleaned"—i.e., deconvolved with the "dirty" beam—with the CASA imaging pipeline version 4.7.2. With this systematic approach, we aim at obtaining data products as homogeneous as possible and also with maximized sensitivities. The pipeline is operated in "continuum" + "automatic" mode, leaving all but the weight parameters (set to "Briggs" with robust=2) to their default values. In this mode, the spectral windows (SpWs) of each target are aggregated into a single continuum image calculated at the central frequencies of these SpWs using the multifrequency synthesis (MFS) algorithm with nterms = 2. The parameters controlling the deconvolution process (i.e., masked pixels, maximum number of iterations, and stopping threshold) are automatically and homogeneously set by the pipeline based on the noise properties and dynamic range of the "dirty" images (i.e., before deconvolution). The output images sample the synthesized beam with 5 pixels and are masked where the PBA is <0.2. In case of obvious image artifacts in the cleaned images (as found by visual inspection, <10%), we rerun the CASA imaging pipeline flagging corrupted baselines and/or adopting robust = 0.5.

The imaging pipeline uses masks to identify regions of bright emission prior to cleaning, and the stopping criterion is set to a signal-to-noise ratio (S/N) = 4. Given this approach, combined with the sparseness of high S/N > 15 submillimeter sources in our catalog, we do not expect any overcleaning resulting in artificially low rms noise. Given the large number of antennas in the 12 m array, the instantaneous dirty beam for a short integration (∼30 s) as used for many programs has very low sidelobes; the presence of imaging artifacts is also minimized. The robustness of our cleaning process is also supported by our comparison of image-plane to uv-plane photometry (Section 2.6).

The CASA imaging pipeline unfortunately could not be run for a few of our projects (mostly from Cycle 0) owing to backward compatibility issues. These projects are thus imaged with the CASA task clean with input parameters manually set using a similar imaging and "cleaning" strategy to the CASA imaging pipeline, e.g., "Briggs" weighting with robust=2, sampling of the synthesized beam by ∼5 pixels, and masking based on the noise of the dirty image.

As a test, we measure the 1σ sensitivities (rms noise) of our images ( ${\sigma }_{{{\rm{A}}}^{3}}$ ) and compare with those measured in the continuum images available in the ALMA archive, which are produced during phase 2 of the ALMA Quality Assessment (QA2; ${\sigma }_{\mathrm{QA}2}$ ). Approximately 60% of our images have QA2-based continuum images, while the remaining ∼40% were not imaged during the QA2 mostly because they are part of the scheduling blocks (SBs) with multiple targets, and the quality assessment was performed by imaging only a few of them. Consequently, for most projects, at least one QA2-based continuum image is available to perform our test. The $({\sigma }_{{{\rm{A}}}^{3}}-{\sigma }_{\mathrm{QA}2})/{\sigma }_{{{\rm{A}}}^{3}}$ follows a Gaussian distribution centered at ∼−0.1 with a dispersion of ∼0.17 (Figure 4). Our images have ∼10% better sensitivities than those from the ALMA archive because they are produced with robust=2—i.e., favoring sensitivity over spatial resolution—while most QA2 analyses are performed with robust=0.5. We find no outliers with large positive values (e.g., $({\sigma }_{{{\rm{A}}}^{3}}-{\sigma }_{\mathrm{QA}2})/{\sigma }_{{{\rm{A}}}^{3}}\gt 0.6$ ), as images with obvious artifacts were spotted by our visual inspection and already re-imaged. Finally, we find few outliers with $({\sigma }_{{{\rm{A}}}^{3}\mathrm{COSMOS}}-{\sigma }_{\mathrm{QA}2})/{\sigma }_{\mathrm{QA}2}\lt -0.6$ . We systematically checked these images and found that all of them correspond to projects in which the QA2-based analysis was performed with low robust values (i.e., <0) and/or using only a fraction of the SpWs available. All these comparisons demonstrate the reliability of the ALMA imaging pipeline and thus of our image products.

**Figure 4.** Comparison of the 1σ sensitivities of our images (i.e., ${\sigma }_{{{\rm{A}}}^{3}}$ ) to those measured in continuum maps from the ALMA archive (i.e., ${\sigma }_{\mathrm{QA}2}$ ). The dark histogram shows the $({\sigma }_{{{\rm{A}}}^{3}}-{\sigma }_{\mathrm{QA}2})/{\sigma }_{{{\rm{A}}}^{3}}$ distribution, while the vertical solid and dashed lines represent its mean (∼−0.1) and dispersion (∼0.17), respectively. A Gaussian distribution with similar characteristics is shown as a dotted line.
Download figure:
Standard image High-resolution image

The data products released here include all "clean" continuum images corrected and uncorrected for PBA.¹⁵ Note that although the aggregation of all SpWs available for a given target optimizes the sensitivities of our continuum images, it does not consider any possible line contamination. In Section 4.5 we will further describe an effective approach to addressing potential line contamination.

Furthermore, in Figure 5 we show the distribution of the angular resolution of the ALMA data, as represented by the major-axis FWHM of ALMA data's synthesized beam ${\theta }_{\mathrm{Maj},\mathrm{beam}}$ . Of current images, ≲1% have a very high angular resolution, i.e., ${\theta }_{\mathrm{Maj},\mathrm{beam}}\lt 0\buildrel{\prime\prime}\over{.} 1$ . These images represent a more challenging case for our source extraction because our blind and prior source extraction methods are all optimized for only marginally resolved sources, while sources in the very high resolution images usually are significantly resolved (e.g., with a ratio of source to beam area ≳10). Also note that the large number of independent beams within these images ( $\propto \mathrm{FoV}/{\theta }_{\mathrm{Maj},\mathrm{beam}}^{2}$ ) statistically translates into a significant contamination of "spurious" sources to our photometry catalog (even using a conservative ${\rm{S}}/{\rm{N}}\sim 5$ cut; see Section 2.8). Therefore, these ≲1% very high resolution ( ${\theta }_{\mathrm{Maj},\mathrm{beam}}\lt 0\buildrel{\prime\prime}\over{.} 1$ ) images are excluded from our analysis.

**Figure 5.** Beam size distributions of two versions of A³COSMOS data sets. The beam size is defined as the FWHM along the major axis of ALMA data's synthesized beam ( ${\theta }_{\mathrm{Maj}.,\mathrm{beam}}$ ) after the interferometric "cleaning" process. The gray shaded area indicates ${\theta }_{\mathrm{Maj}.,\mathrm{beam}}\lt 0\buildrel{\prime\prime}\over{.} 1$ , for which data were discarded owing to too high spatial resolution as discussed at the end of Section 2.1.
Download figure:
Standard image High-resolution image

Currently, data for the same source taken at the same frequency arising from different projects are not combined.

The breakdown of the number of objects detected, expected number of false objects, area, median depth, and resolution as a function of observing band are provided in Table 1.

Table 1. Information per ALMA Band

Info Type	Band 3	Band 4	Band 5	Band 6	Band 7	Band 8	Band 9
Number of images	34	6	2	633	857	1	1
Sum beam area (arcmin²)^a	26.639	2.294	0.329	79.511	54.729	0.044	0.016
Mean beam size (arcsec)	2.164	1.098	1.548	1.202	0.772	0.526	0.305
Mean rms noise (mJy beam^–1)	0.039	0.025	0.090	0.077	0.160	0.034	1.757
PYBDSF ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 5.40$	24	5	3	371	524	1	2
GALFIT ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 4.35$ ^b	20 (7)	10 (7)	2 (2)	452 (342)	553 (461)	1 (1)	1 (1)

Notes.

^aThe areas are the sum of primary beam circular area only. ^bThe number in parentheses corresponds to the sources that passed our quality assessments from Sections 4.1 to 4.4. Based on the spurious fraction analysis in Section 2.8, we expect about 8% spurious sources in total for the PYBDSF selection and ∼12% for the GALFIT selection.

Download table as: ASCII Typeset image

2.2. Blind Source Extraction

We perform the blind source extraction on our "cleaned" ALMA continuum images. We use the primary-beam-attenuation-uncorrected images because they have the advantage of a constant noise across the field of view, and thus source extraction can be run with uniform parameters across them. PBA corrections are applied after the photometry steps.

We use the Python Blob Detector and Source Finder (aka PyBDSM or PyBDSF; hereafter PyBDSF; Mohan & Rafferty 2015)¹⁶ to find sources blindly and extract their flux and size information. First, the code identifies "islands" of emission, i.e., with the peak pixel emission above 4 times the rms noise (thresh_pix = 4) and surrounded by contiguous pixels with values all greater than 3 times the rms noise (thresh_isl = 3). These thresholds are obtained from a series of tests by introducing mock sources into the ALMA images and recovering them with PyBDSF. The best performance was evaluated based on completeness and contamination (see Sections 2.8 and 3.2). Next, PyBDSF fits multiple 2D Gaussians to each "island" depending on the number of peaks identified within it. Third, all Gaussians of the same "island" are grouped into one source, with the summed flux being the integrated source flux and the flux-weighted averaged position being the source position. The total intrinsic source size is obtained via a moment analysis¹⁷ on each individual Gaussian component's intrinsic size.¹⁸ Finally, the errors of each fitted parameter (peak flux, total flux, and each size parameter) are computed using the formulae of Gaussian fitting errors calibrated by Condon (1997).

About 6% of our "islands" are fitted with multiple Gaussians, while the vast majority (94%) have a single Gaussian component. Most of these multi-Gaussian sources are isolated sources but exhibit nonsmooth morphologies, due to resolved spatial components and/or noise in the image. In a few cases, these multi-Gaussian sources might indeed be interacting galaxies. Utilizing information from the prior source catalog, we are able to reliably flag these sources in a later step of our analysis. Therefore, they are kept as a single source in the blind source catalog.

Our final blind source catalog is obtained by correcting for flux bias and re-estimating flux errors (see Section 3), and then applying a PBA correction to the photometry of each source (i.e., to the peak flux, total flux, and associated uncertainties). Given that each source represents a high-redshift galaxy with a typical size of 0 farcs 5–2'', much smaller than the primary beam, using a single PBA correction factor at the source's central position is reasonable.

2.3. Prior Source Master Catalog

In addition to the blind source extraction, we utilize known source positions as a prior for the source fitting. This technique allows for deeper detection limits and lowers the spurious source fraction. Before starting the prior fitting, we compiled a "COSMOS master catalog" from a number of multiwavelength catalogs for sources in the COSMOS field as listed in Table 2. The aim is to be as complete as possible in prior sources while ensuring that source duplication is solved among the various catalogs. Thus, we loop over the prior catalogs in the order listed in Table 2. Their respective areal coverage is indicated in Figure 6. To ensure that a given galaxy (which might be detected in multiple prior catalogs) has only one unique entry in the master catalog, we find out each uniquely matched group among the prior catalogs (with matching radius 1'') and add into the master catalog only the source coming from the highest-quality (empirically sorted by angular resolution and relative depth) catalog, i.e., listed closest to the top in Table 2.

**Figure 6.** Coverage of our prior catalogs for the COSMOS field as listed in Table 2. The background image is the *Herschel* PACS 100 μm image, same as in Figure 2. The colored lines encompass the area searched for sources in the respective prior catalog (see inset for catalog information).
Download figure:
Standard image High-resolution image

Table 2. Prior Catalogs Used for Constructing the COSMOS Master Catalog^a

Catalog Name (and Reference)	Area (deg²)	N_Catalog^b	N_Master^c	N_Unique^d	Detection Map	Depth^e	Res.^f
COSMOS2015 catalog (Laigle et al. 2016)	1.8	1182108	1182108	738420	VISTA z Y J H K_s	24.0 (3σ, 3'', K_s)	∼1''
K_s-band catalog (Muzzin et al. 2013)^g	1.6^g	263229	18536	10799	VISTA K_s	24.35 (3σ, 21)	∼1''
i-band catalog (Capak et al. 2007)	1.8	386125	31159	30146	CFHT i* + Subaru i⁺	26.2 (5σ, 3'')	∼05
SPLASH IRAC supplementary catalog^h	1.6	5390	4685	3690	Spitzer/IRAC 3.6 + 4.5 μm	25.5 (3σ)	∼16
VLA catalog (Smolčić et al. 2017)	2	10922	1042	644	VLA 3 GHz	2.3 μJy (1σ)	∼075
IRAC catalog (Sanders et al. 2007)	2.3	347332	55346	54642	Spitzer/IRAC 3.6–8.0 μm	24.01 (5σ, 3.6 μm)	∼16

Notes.

^aThe master catalog used in this paper's work has a version code of 20170426. ^bTotal number of sources in each prior catalog. ^cThe number of sources in each prior catalog that are not in higher-order catalogs (the order is as listed from top to bottom), which is the number of sources in our "COSMOS master catalog" that originated from the current prior catalog. ^dThe number of unique sources in each prior catalog, which means that these sources have no counterpart in any other prior catalog. ^eThe depth is in AB magnitude and is in an aperture if indicated in parentheses. ^fSpatial resolution, or point-spread function size of the detection map. ^gThe Muzzin et al. (2013) catalog contains sources in the masked area of the COSMOS2015 catalog that are close to bright, saturated stars. ^hBased on the source extraction in the Spitzer Large Area Survey with Hyper-Suprime-Cam (SPLASH; PI: P. Capak) survey data after fitting and removing all COSMOS2015 catalog sources (I. Davidzon 2019, private communication).

Download table as: ASCII Typeset image

Our 1'' matching radius corresponds to a worst false-match probability of 13.3% for other catalogs cross-matched to the Laigle et al. (2016) catalog based on Equation (1) of Pope et al. (2006).¹⁹ We emphasize that the false-match rate does not affect our photometric work because if a galaxy from another catalog in Table 2 is falsely matched to the Laigle et al. (2016) catalog, we just use the prior position in the Laigle et al. (2016) catalog for our prior photometry. The source position is not forced to be at the exact prior position, as our photometry code will find the best fitting for position and flux (see next section). It may affect our galaxy property analysis via SED fitting in a later step because we use the redshift information from the literature as the prior. However, the influence is minimized by (a) collecting all possible prior redshift information in the literature, (b) verifying via photo-z SED fitting (see Section 4.4), and, in later steps, (c) only considering a source a robust galaxy if it passes all our quality assessments (Section 4). A falsely matched source with a wrong prior redshift is unlikely to pass them as detailed in Section 4. Yet we cannot totally avoid false matches (which should be only a few out of a thousand in our final products), especially when the astrometry in optical/near-IR image data also affects our work (see next section and Appendix A).

The combination of these prior catalogs results in the "COSMOS master catalog" with unique source IDs. In our current master catalog (version 20170426), because the COSMOS2015 catalog is our primary catalog, all 1,182,108 COSMOS2015 sources are in our "COSMOS master catalog" with the same IDs. A total of 443,688 (37.5%) of them have counterparts in other catalogs. The remaining five catalogs contribute 110,768 new sources that are not in the COSMOS2015 catalog. The Muzzin et al. (2013) catalog contributes 18,536 sources, a fraction of which are from the COSMOS2015 masked regions close to bright stars. The K_s data used for the Muzzin et al. (2013) catalog are shallower than those of the COSMOS2015 catalog (UltraVISTA DR2), so the reason for some new sources should be the different source extraction methods used: the COSMOS2015 catalog uses a z, Y, J, H, K_s combined χ² detection image, while the Muzzin et al. (2013) catalog directly uses the K_s image and therefore favors redder sources. The i-band-selected catalog contributes 31,159 sources, probably benefiting from its higher angular resolution detection image (see discussion in Section 4 of Capak et al. 2007). The IRAC catalog contributes another 4685 sources. The radio catalog contributes 1042 sources. Finally, the Sanders et al. (2007) catalog contributes 55,346 sources, but most of them are in an area outside the COSMOS2015 coverage (e.g., Figure 6), while only 8893 sources are new in the area covered by both catalogs.

The total number of unique priors that fall in PBA > 0.2 areas of our data set version 20180102 (20180801) is 41,161 (73,387).

2.4. Prior Source Fitting

Utilizing source positions from the COSMOS master catalog, we obtain the (sub)millimeter photometry via prior source fitting of the ALMA continuum images. We implement two steps below to optimize the robustness of the fitting. Potential small astrometric inconsistencies between the prior source positions and the ALMA data are taken into account as follows before the full prior source fitting procedure is applied: we calculate the offsets between pre-run ALMA positions and the prior source positions directly from Laigle et al. (2016) and other prior catalogs, and then we derive a mean offset for each prior catalog and update all prior positions in our master catalog. Details of the astrometry analysis are given in Appendix A.

As a first step, we identify potential candidate sources based on the S/N of their peak (sub)millimeter flux density ( ${S}_{\mathrm{peak}}$ ) or integrated flux density ( ${S}_{\mathrm{total}}$ ). Following Scoville et al. (2014, 2016, 2017), we measure both flux densities in a series of apertures with radii from 0 farcs 25 to 2'' in steps of 0 farcs 25. We follow exactly the Scoville et al. (2016, 2017) method so as to allow for a direct comparison. Using the pixel rms noise calculated from Gaussian fitting to the pixel value distribution of each image, we obtain the S/N ratio for the peak flux density via ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\equiv {S}_{\mathrm{peak}}/(\mathrm{rms}\,\mathrm{noise})$ and the one for integrated flux density ${\rm{S}}/{{\rm{N}}}_{\mathrm{total}}$ by dividing ${S}_{\mathrm{total}}$ by the integrated noise in each aperture (i.e., $\mathrm{rms}\,\mathrm{noise}$ times the square root of pixel number in each aperture). We refer to this aperture photometry as the getpix method hereafter (and compare its results with those from our other photometry methods in Section 2.7).

This first getpix step also provides guidance for the prior source fitting using galfit (Peng et al. 2002, 2010) in the next step. We select ${\rm{S}}/{{\rm{N}}}_{\mathrm{total}}\gt 2$ or ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 3.6$ sources (same as Scoville et al. 2017) as valid detections. This pre-selection of prior sources is important for applying galfit, as it significantly reduces the required computational time for galfit by avoiding the fitting of sources that mostly correspond to noise in the image. We have confirmed that this approach is sensible with our MC simulations (see Section 3). Also, our final catalogs are not sensitive to small changes of these thresholds, because in the end we apply a relatively high ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ cut according to our MC simulation statistics. Note that in most ALMA images our priors do not have blending issues.

To optimize the galfit fitting for source fluxes as well as sizes, an iterative approach is adopted: After the first-pass fitting with point-source models to all galfit priors fixed at their original positions, we select sources with a fitted magnitude error of <0.25²⁰ or ${S}_{\mathrm{total}}\gt 3\sigma$ (σ being the pixel $\mathrm{rms}\,\mathrm{noise}$ ) and allow their positions to vary by at most 0 farcs 7²¹ in the second-pass fitting. Then, in order to identify possible extended sources, we allow sources with fitted magnitude error <0.20 or ${S}_{\mathrm{total}}$ above 3 times the rms noise to be fitted with circular Gaussian models (and in a next step Sérsic profiles) in the third-pass fitting. We note that our thresholds are very loose, and 98% of the sources in our final prior photometry catalog (with a relatively high selection threshold, ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gtrsim 5$ , according to our MC simulation statistics; see Section 4.1) are fitted with extended shapes.

For each fit, we ensure that the image background is zero (as already verified by the close-to-zero means of the distributions of the pixel values from the ALMA images). If a given galfit iteration yields bad fits and/or nonconvergence, the fitting is repeated with a higher limit for galfit iteration.²²

As the galfit errors only consider the covariance matrix of the fitting, they do not reflect observational noise or correlated noise. Therefore, we estimate the error in ${S}_{\mathrm{total}}$ ( ${\sigma }_{{S}_{\mathrm{total}}}$ ) for Gaussian-fitted sources following Condon (1997). This error estimation determines ${\sigma }_{{S}_{\mathrm{total}}}$ purely from the rms noise, beam major- and minor-axis FWHM sizes (θ_bmaj. and θ_bmin., respectively), source major- and minor-axis FWHM sizes (θ_maj. and θ_min., respectively; fitted values and convolved with the beam), and source ${S}_{\mathrm{peak}}$ and ${S}_{\mathrm{total}}$ . We further verify that this is in general consistent with our own MC simulations (see Section 3).

2.5. Comparing Blind Extraction and Prior Fitting Results

As a quality check to both blind source extraction and prior source fitting, and to identify potential problem cases, we compare the total fluxes from PyBDSF to those from galfit for common sources (within 1 farcs 0 and using the same ALMA images) in Figure 7. Ninety-six percent of sources have fluxes agreeing within 3σ. Outliers with flux differences of >5σ are labeled in the figure. Their PyBDSF and galfit fitting models and residuals are further shown and discussed in Appendix B. The three outliers with a galfit flux much larger than the PyBDSF flux are caused by poor fits of PyBDSF to their irregular morphologies. The one outlier with a much larger PyBDSF flux than the galfit flux is due to a blending of prior sources, and given the complex morphology, both galfit and PyBDSF could not provide an ideal fit.

**Figure 7.** Top: comparison of total fluxes derived from the PyBDSF blind source extraction and the prior-based galfit source fitting. Data points show sources matched within 1'' and measured on the same image. Color indicates their ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ , i.e., the ratio of source peak flux to rms noise of the image. The solid line shows the one-to-one correspondence, and the two dashed lines indicate the 5σ range, where σ is the scatter measured from the bottom panel. Outliers above 5σ are labeled and discussed in detail in Appendix B. Bottom: histogram of the flux difference on a logarithmic scale, ${\mathrm{log}}_{10}({S}_{\mathrm{PyBDSF}}/{S}_{\mathrm{GALFIT}})$ . The mean is −0.013 with a standard deviation of 0.10 dex. Dashed vertical lines indicate the same 5σ range as in the top panel.
Download figure:
Standard image High-resolution image

**Figure 7.** Top: comparison of total fluxes derived from the PyBDSF blind source extraction and the prior-based galfit source fitting. Data points show sources matched within 1'' and measured on the same image. Color indicates their ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ , i.e., the ratio of source peak flux to rms noise of the image. The solid line shows the one-to-one correspondence, and the two dashed lines indicate the 5σ range, where σ is the scatter measured from the bottom panel. Outliers above 5σ are labeled and discussed in detail in Appendix B. Bottom: histogram of the flux difference on a logarithmic scale, ${\mathrm{log}}_{10}({S}_{\mathrm{PyBDSF}}/{S}_{\mathrm{GALFIT}})$ . The mean is −0.013 with a standard deviation of 0.10 dex. Dashed vertical lines indicate the same 5σ range as in the top panel.
Download figure:
Standard image High-resolution image

With both PyBDSF and prior-based galfit photometry, we not only obtain accurate independent fluxes that agree very well but also identify those few (0.5%²³ ) sources that suffer from source multiplicity/blending issues. These sources need careful visual inspections and multiwavelength diagnostics (e.g., SEDs) in order to fully deblend their ALMA flux, and thus they will be analyzed in a future work.

In our released two photometry catalogs, we flag sources for which the total fluxes from the two methods disagree by more than a factor of ∼3.12 (5σ, where σ is the scatter between PyBDSF and galfit total fluxes; see Figure 7) with a column Flag_inconsistent_flux and exclude them in subsequent steps. In the next sections, we use the prior photometry flux for the SED fitting. But measurements from both photometry methods will be made public together with the final galaxy SED and property catalog (see Section 5).

2.6. Comparison to uv-plane Source Fitting Results

Instead of measuring the source flux density in the image plane, it can also be directly measured in the uv-plane by fitting source models to the visibilities. We use the GILDAS²⁴ uv_fit task to fit Gaussian and/or point-source models and then compare the total flux with those measured from the image-plane galfit and PyBDSF fitting. We verified that GILDAS uv_fit gives similar results to the CASA uvmodelfit task for high- ${\rm{S}}/{\rm{N}}$ sources (e.g., total flux ${\rm{S}}/{\rm{N}}\gt 10$ ).

We run GILDAS uv_fit in an iterative approach: first we fit point-source models, and next for high- ${\rm{S}}/{\rm{N}}$ sources we fit extended Gaussian source models. We fit only for one source at the phase center and allow its position to vary freely by uv_fit. In total we ran the uv-fitting for 301 pointings from four representative ALMA projects: 2015.1.00137.S, 2013.1.00151.S, 2015.1.00379.S, and 2016.1.01208.S (these projects target the dust continuum for hundreds of galaxies from redshift 1 to 3; the PI of the first project is N. Scoville, and the PI for the other three is E. Schinnerer). The uv_fit flux densities and the prior-based and blind (sub)millimeter photometries agree very well. The difference between blind photometry and uv_fit flux densities (on a logarithmic scale) has a median of 0.015 dex and scatter of 0.08 dex. The difference between prior photometry and uv_fit flux densities has a median of −0.005 dex and scatter of 0.13 dex, showing a few more outliers (caused by blended priors, same as in Figure 7).

2.7. Comparison to Aperture Photometry Results

We further compare the fluxes from our PyBDSF and prior-based galfit fitting with those derived from aperture photometry (Scoville et al. 2016, 2017) (i.e., the getpix method described in Section 2.4). For sources with galfit ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\geqslant 3$ , the getpix method provides flux densities consistent with the ones from galfit (the mean of getpix-to-galfit flux ratio on a logarithmic scale is 0.004 dex, and the scatter is 0.18 dex). Sources with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 10$ are on average biased toward higher getpix flux densities, but no more than 10% (the mean value increases to 0.03 dex and scatter 0.06 dex, likely due to bright outlier sources that have non-Gaussian shapes).

The comparison between getpix and PyBDSF flux densities yields similar results: for PyBDSF ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\geqslant 3$ sources the mean of log₁₀S_GETPIX/S_PyBDSF is 0.003 dex with a scatter of 0.15 dex; when considering only ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\geqslant 10$ sources, the mean is still <0.01 dex. For about 10 sources, we directly compared our flux densities to measurements from Scoville et al. (2016, 2017, 2019, private communication), finding similar results to those mentioned above.

2.8. Inverted-image Fitting and the Fraction of Spurious Detection

We run our photometry tools (based on PyBDSF and galfit) on the inverted images (i.e., the sign of each pixel value is inverted) to estimate the fraction (and probability) of spurious detections by comparing the number of sources detected in inverted images to that in original images. We define the spurious fraction as the number of sources detected in inverted images compared to the corresponding number in the original images as a function of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (defined as ${S}_{\mathrm{peak}}/\mathrm{rms}\,\mathrm{noise}$ in Section 2.4), since this quantity does not depend on any fitted source size.

Since prior fitting needs a prior catalog to proceed with, and because our prior catalog has a very high number density (∼700 per arcmin²) that acts like a random sampling in the image, we directly use our COSMOS master catalog as the prior catalog for the inverted-image galfit photometry. The procedure is the same as described in Section 2.4; we first run the getpix step and then iteratively run galfit source fitting. In addition, we checked that the spurious detection curve remains the same when shifting the positions of the entire prior catalog by ±2'' in R.A. and/or decl. to avoid overlap with real galaxies.

Figure 8 shows the derived spurious fraction curves as a function of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ for both PyBDSF (top) and galfit (bottom) photometry. The differential curve (solid line) indicates the spurious fraction at each ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . The cumulative curve (dotted line) provides the spurious fraction summed over all bins with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ greater than or equal to the current bin. As expected, spurious fractions are lower for the prior-based photometry compared to the blind source extraction owing to the availability of information on the presence of a galaxy. Thus, the prior-based photometry achieves deeper detection limits.

To investigate whether the PBA is affecting the false-positive detection, we have done two tests: one is dividing the spurious fraction curve in bins of PBA (pb_attenu) as shown in Figure 9, and the other is plotting the radial distribution of all spurious detections from the inverted images in Figure 10. In the former test, we choose only three bins because of the low number of sources away from the phase center (low pb_attenu). We bin in equal ln(pb_attenu) intervals that correspond to the same sky area, because pb_attenu ∝ exp(dist.²/pb²), where dist. is the distance of the source to the phase center and pb is the FWHM of the primary beam. The spurious fraction decreases when pb_attenu becomes closer to 1.0, which is as expected. But we also caution that there is a strong bias in the statistics because the number of sources dramatically differs (see Figure 9 caption).

**Figure 9.** Spurious fraction for three bins of `pb`_`attenu`: (0.2–0.34], (0.34–0.58] and (0.58–1.0], which are equally distributed in logarithm. The solid and dashed lines represent differential and cumulative curves, respectively (see Figure 8 caption). We caution that the trend seen here suffers from a strong bias in statistics, because the lowest `pb`_`attenu` (farthest away from phase center) bin has only about 62 ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 5$ detections in original images and 27 in inverted images, while the numbers are 10 times larger in the innermost bin with `pb`_`attenu` ∼ 0.58–1.0 (although they are equal in area).
Download figure:
Standard image High-resolution image

**Figure 9.** Spurious fraction for three bins of `pb`_`attenu`: (0.2–0.34], (0.34–0.58] and (0.58–1.0], which are equally distributed in logarithm. The solid and dashed lines represent differential and cumulative curves, respectively (see Figure 8 caption). We caution that the trend seen here suffers from a strong bias in statistics, because the lowest `pb`_`attenu` (farthest away from phase center) bin has only about 62 ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 5$ detections in original images and 27 in inverted images, while the numbers are 10 times larger in the innermost bin with `pb`_`attenu` ∼ 0.58–1.0 (although they are equal in area).
Download figure:
Standard image High-resolution image

In Figure 10, we show the radial distribution of all sources detected in the inverted images with galfit ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 2.5$ or found by PyBDSF. Since the spurious fraction curve is slightly higher at larger radii, we might expect the spurious source density to be higher; however, the distribution remains fairly constant out to a pb_attenu of ∼0.3. We attribute the slight drop below ∼0.3 to the fact that instrumental systematics are likely becoming more prominent, namely, (a) the approximation of the primary beam by a Gaussian might no longer be correct,²⁵ and (b) the frequency dependence of the primary beam across the frequency range sampled by the continuum (i.e., 16 GHz between the upper and lower boundary of the spectral sidebands) will be more evident at large distances from the phase center. A more detailed investigation is beyond the scope of this paper.

In this work, we provide a photometry catalog out to a PBA of 0.2 (i.e., covering the full area of the images that are made available) and provide the pb_attenu for each source in our catalog. Note that 91% of our final selected sources lie within a PBA of 0.5 and only 2% beyond 0.3. Special care should be applied, e.g., considering a higher ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ threshold as shown in Figure 9 when studying sources below a pb_attenu of ∼0.5.

3. Monte Carlo Simulations

We run extensive MC simulations to verify our two main photometry methods: PyBDSF and galfit. The principle idea is to simulate model galaxies and recover them with the same analysis used to create our catalogs. The aims are (1) to test whether the recovered flux densities have a systematic offset to the simulated flux densities, which is hereafter referred to as "flux bias" and to understand its source and quantify it if it exists; (2) to quantify the overall uncertainty on the extracted flux densities and verify whether the aforementioned Condon (1997) error estimates can statistically describe the uncertainty; (3) to quantify the fraction of sources being recovered from all sources simulated, which is hereafter referred to as "completeness"; and (4) to verify whether the prior information used in the simulations will alter the output statistics or not.

In our simulations, we create artificial sources (of Gaussian shape), insert them into residual images (after blind extraction photometry), and recover them with our photometry pipelines. These steps are repeated several tens to hundreds of times for a large number of images with different properties (details are given in Appendix C). Our artificial sources are created within a grid of input values of both flux density and size. We create two sets of simulations with quite different input distributions defining this grid: (1) We start with a full parameter-space simulation (hereafter "FULL" simulation) in which the full parameter space of flux density and size is uniformly sampled: ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ ranges from 2.5 to 100 in logarithmic intervals, and the ratio of source major-axis size to beam major-axis size ranges from 0.1 to 6. Each grid point with a given flux density and size contains the same number of simulated sources. (2) We create another physically motivated MC simulation, hereafter "PHYS" simulation, where we simulate sources mimicking observed galaxy stellar mass functions (SMFs; e.g., Davidzon et al. 2017), star-forming MS relation (e.g., Sargent et al. 2014), and starburst/MS classification (i.e., following the two-star formation model (2SFM) of Sargent et al. 2012, 2014 and Béthermin et al. 2012a), as well as galaxies' size evolution (e.g., van der Wel et al. 2014; Fujimoto et al. 2017). The motivation for performing our "PHYS" simulation is that galaxies have nonuniform luminosity functions (or number counts) and size distributions. Fainter galaxies are much more numerous than brighter ones, and lower-redshift galaxies are in general larger than higher-redshift ones. Our comparison of the "FULL" and "PHYS" simulations tests whether the input distribution of the simulations influences the derived recovery statistics.

Due to the large number (1500+) of individual ALMA imaging data, we select a subset (150+) of representative images for each Scheduling Block of each Science Goal in each ALMA project. In this way we make sure that all different observing scenarios (frequencies, spatial resolutions, integration times, etc.) are covered.

For each selected image, we perform the "FULL" and "PHYS" type simulations 4225 and 273 times, respectively, depending on the grid of simulation (see Appendix C), resulting in 4225 and ∼3000–25,000 simulated objects, respectively. The number of sources in the "PHYS" simulation varies with the image field of view and the observing wavelength and dominates with fainter sources owing to the assumed galaxy SMFs and MS correlation, as well as the SEDs. Details of the two simulations are presented in Appendix C.

We then recover the simulated objects with our PyBDSF and galfit photometry pipelines, respectively, using the identical settings as for the real ALMA data. Therefore, we have four sets of simulated and recovered data to analyze and compare: FULL-PyBDSF, FULL-galfit, PHYS-PyBDSF, and PHYS-galfit.

In the next sections, we discuss the flux bias and flux errors for each simulation set and characterize them by two normalized parameters: the fitted source peak flux density normalized by the rms noise,

$\begin{eqnarray}&&{\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\equiv {S}_{\mathrm{peak}}/\mathrm{rms}\,\mathrm{noise},\end{eqnarray} \tag{ 1 }$

and the fitted source area (convolved with the beam) normalized by the beam area,

$\begin{eqnarray}&&{{\rm{\Theta }}}_{\mathrm{beam}}\equiv \sqrt{{\mathrm{Area}}_{\mathrm{source},\mathrm{convol}.}/{\mathrm{Area}}_{\mathrm{beam}}}.\end{eqnarray} \tag{ 2 }$

Note that the different types of simulations yield clear differences in the parameters of interest, especially the flux bias correction, as we will show in the following when comparing the results from all four simulated data sets.

3.1. Analyses of the "FULL" and "PHYS" Simulations

Although the simulated total source flux density, ${S}_{\mathrm{sim}.}$ , overall agrees well with the recovered total source flux density, ${S}_{\mathrm{rec}.}$ , a substantial bias between ${S}_{\mathrm{sim}.}$ and ${S}_{\mathrm{rec}.}$ becomes obvious when looking at the dependency on the flux ${\rm{S}}/{\rm{N}}$ . When normalizing the difference between ${S}_{\mathrm{sim}.}$ and ${S}_{\mathrm{rec}.}$ by the measured flux error, the histogram distribution of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/{\sigma }_{{S}_{\mathrm{rec}.}}$ exhibits a nonzero mean and nonunity scatter (such histograms are illustrated later in Appendices C.1.2, C.2.3 and C.3). This indicates that the measured fluxes need to be corrected for flux biases, and the errors in the measured fluxes need to be re-estimated.

To analyze the flux bias and errors from our simulations, we bin all simulated and recovered sources in the 2D parameter space of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ and consider flux bias and error to be functions of these two parameters (Condon 1997; Bondi et al. 2003, 2008; Schinnerer et al. 2010; Jiménez-Andrade et al. 2019). Because ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ are both normalized quantities, sources from different ALMA projects can be combined.

For each ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ bin, we compute the mean and median of the relative flux density difference ( $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/{S}_{\mathrm{sim}.}$ ). The flux bias is then defined as

$\begin{eqnarray}&&{\eta }_{{\rm{bias}}}\equiv \langle ({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/{S}_{\mathrm{sim}.}\rangle ,\end{eqnarray} \tag{ 3 }$

which represents how the recovered flux density is biased relative to the simulated flux density. The corrected flux density can then be calculated as

$\begin{eqnarray}&&{S}_{\mathrm{rec}.}^{\mathrm{corr}.}={S}_{\mathrm{rec}.}/(1.0-{\eta }_{{\rm{bias}}}).\end{eqnarray} \tag{ 4 }$

We note that computing the flux bias using the noise-normalized flux density difference ( $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/\mathrm{rms}\,\mathrm{noise}$ ) leads to no obvious difference.

Then, we also compute the scatter of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/\mathrm{rms}\,\mathrm{noise}$ (we computed the standard deviation and the lower and higher 68th percentiles; see Section 3.1.3) and denote it as

$\begin{eqnarray}&&{\eta }_{{\rm{error}}}\equiv {\sigma }_{\left[({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/\mathrm{rms}\mathrm{noise}\right]}.\end{eqnarray} \tag{ 5 }$

We do not use the relative difference ( $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/{S}_{\mathrm{sim}.}$ ) because its scatter has an asymmetric distribution. The corrected flux density error can then be computed as

$\begin{eqnarray}&&{\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}={\eta }_{{\rm{error}}}\times \mathrm{rms}\,\mathrm{noise}.\end{eqnarray} \tag{ 6 }$

Combining all bins, we can measure ${\eta }_{{\rm{bias}}}$ and η_error as functions of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ , which are illustrated in Figure 11 for the "FULL" simulation with PyBDSF recovery as the example (the other three simulation–recovery pairs are analyzed similarly, and the ${{\rm{\Theta }}}_{\mathrm{beam}}$ -collapsed figures can be seen in Figures 12 and 13). This figure demonstrates that the flux bias and error do strongly correlate with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ .

**Figure 11.** PyBDSF flux bias and flux error as functions of measured ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (Equation (1)) and ${{\rm{\Theta }}}_{\mathrm{beam}}$ (Equation (2)) in the left and right panels, respectively, from the "FULL" simulation. Subpanels (from top to bottom) are bins with increasing ${{\rm{\Theta }}}_{\mathrm{beam}}$ (as labeled). In the left panels, red and blue circles correspond to the mean and median of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/{S}_{\mathrm{sim}.}$ , respectively. Solid red lines are function fitting (with the form $a\,{\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}^{\,m}+b\,{\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}^{\,n}$ ) to the flux bias data points (but no feasible function form could be fitted for flux error), and solid green lines are interpolations or extrapolations (visible in the right panels). The vertical dashed line corresponds to our sample selection threshold as will be detailed in Section 4.1. In the right panels, red and blue circles represent the scatter (standard deviation) and (the minimum of upper and lower) 68th percentile of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})/\mathrm{rms}\,\mathrm{noise}$ , respectively. See text in Section 3.1.
Download figure:
Standard image High-resolution image

**Figure 12.** Flux bias (as defined in Equation (3)) vs. the measured ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (as defined in Equation (1)) statistics for PyBDSF (top panels) and galfit (bottom panels) photometry, each based on our two types of MC simulations (left panels are "FULL" simulation, and right panels are "PHYS" simulation; see the label in each panel). Color represents the geometric mean source-to-beam size ratio ( ${{\rm{\Theta }}}_{\mathrm{beam}}$ , as defined in Equation (2) in Section 2.2) and is the same in all four panels. Vertical lines are our ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ thresholds for the sample selection in Section 4.1.
Download figure:
Standard image High-resolution image

**Figure 13.** Similar to Figure 12, but showing the flux error factor ${\eta }_{\mathrm{error}}$ (as defined in Equation (5)) vs. the measured ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (as defined in Equation (1) in Section 2.2) for our simulated sources. Statistics for the two photometry methods (PyBDSF: top; galfit: bottom) based on our two types of simulation ("FULL": left; "PHYS": right) are shown. Color represents the source-to-beam area ratio ( ${{\rm{\Theta }}}_{\mathrm{beam}}$ , as defined in Equation (2)) and is the same in all panels. The horizontal colored lines show the expected flux errors using the Condon (1997) prescription for ${{\rm{\Theta }}}_{\mathrm{beam}}=1$ , 2, and 5 (same color-coding as the data points). See text for further details.
Download figure:
Standard image High-resolution image

**Figure 13.** Similar to Figure 12, but showing the flux error factor ${\eta }_{\mathrm{error}}$ (as defined in Equation (5)) vs. the measured ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (as defined in Equation (1) in Section 2.2) for our simulated sources. Statistics for the two photometry methods (PyBDSF: top; galfit: bottom) based on our two types of simulation ("FULL": left; "PHYS": right) are shown. Color represents the source-to-beam area ratio ( ${{\rm{\Theta }}}_{\mathrm{beam}}$ , as defined in Equation (2)) and is the same in all panels. The horizontal colored lines show the expected flux errors using the Condon (1997) prescription for ${{\rm{\Theta }}}_{\mathrm{beam}}=1$ , 2, and 5 (same color-coding as the data points). See text for further details.
Download figure:
Standard image High-resolution image

3.1.1. Flux Bias in the PyBDSF Photometry

The PyBDSF photometry measurement of a source always includes the intrinsic source flux plus a contribution from noise; thus, it always fits positive source fluxes, and the measured fluxes are statistically boosted by a certain amount that we define as the flux bias.

Based on our simulations, we characterize the flux bias correction factor (η_bias, Equation (3)) by the two measurable parameters ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ and apply the flux bias correction to the measured/recovered flux with Equation (4). We find these two parameters to much more strongly affect the flux bias than other parameters, e.g., absolute source size or beam size. After the flux bias correction, the extracted total fluxes for simulated sources in maps of different spatial resolutions exhibit no obvious further bias from their simulated total fluxes.

Here, we also found that the flux bias parameterization strongly depends on the input mock source populations of the MC simulation as demonstrated below.

In Figure 12, we compare the flux bias of the PyBDSF photometry characterized from our "FULL" and "PHYS" simulations. ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ is on the x-axis, and ${{\rm{\Theta }}}_{\mathrm{beam}}$ is indicated by the color. The flux bias is a strong function of both ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ . It rapidly becomes significant with decreasing ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . For example, $| {S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}|$ can be >10% of ${S}_{\mathrm{sim}.}$ when ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\lesssim 10$ . Second, sources with larger sizes suffer a stronger flux bias: a source with a measured size 4 times the beam size can be boosted by ∼80% of ${S}_{\mathrm{sim}.}$ at an ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\,=5.77$ (where the spurious fraction at this ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ is ∼40%; see Figure 8), while an unresolved source is only boosted by ∼20% at the same ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ .

The flux bias functions derived from the two simulations are fully consistent at the bright end, e.g., ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gtrsim 10\mbox{--}20$ . Discrepancies between the simulations become only obvious at the faint end of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ for sources with small ${{\rm{\Theta }}}_{\mathrm{beam}}$ . The flux bias in the "FULL" simulation is much smaller than compared to the "PHYS" simulation. This is due to the difference of the input populations of the two simulations. The effect of "resolution bias" is likely the main reason—such a bias causes sources with low ${\rm{S}}/{\rm{N}}$ and large simulated sizes to have much smaller recovered sizes or even be unresolved (or undetected) and also causes their fluxes to be underestimated instead of boosted by noise. This is common in radio photometry, where the spatial resolution is comparable to and even smaller than the sizes of galaxies at high redshift, e.g., as discussed in Bondi et al. (2003, 2008). The resolution bias is much more evident at the faint end of the "FULL" simulation than the "PHYS" simulation because of the higher number of large sources simulated in the former case. More discussion is presented in Appendix C.1.3.

In reality, the physical sizes of galaxies increase with cosmic time and scale with stellar masses (van der Wel et al. 2014), and their angular sizes (stellar component) increase quickly from z ∼ 1 to the present. This means that lower-redshift galaxies with high stellar masses tend to be largest. These galaxies can be bright at radio wavelengths but are in general much fainter and even undetectable at (sub)millimeter wavelengths (due to the K-correction and the general drop in star formation activity). Therefore, in our ALMA (sub)millimeter data, the real galaxy angular size distribution should be dominated by small sources, i.e., it is better described by the "PHYS" simulation rather than the "FULL" simulation. And thus, we use "PHYS" simulation-based flux bias functions for the final correction of the photometry.

3.1.2. Flux Bias in the galfit Photometry

In Figure 12, we show the flux bias parameterizations derived for the galfit photometry based on both simulations. Similar to the PyBDSF photometry, the galfit photometry also shows both flux boosting due to noise and flux underestimation due to the "resolution bias."

The galfit photometry has a smaller flux bias, which is likely due to the use of known prior position information for the photometry and the optimized iterative photometry approach (Section 2.4). It even achieves a better accuracy for sources with largest measured sizes ( ${{\rm{\Theta }}}_{\mathrm{beam}}\sim 5$ ) than those with slightly smaller measured sizes ( ${{\rm{\Theta }}}_{\mathrm{beam}}\sim 3$ ), if their ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ are above 20 or so.

3.1.3. Flux Error Estimation for PyBDSF Photometry

With Equations (5) and (6), we estimate the flux error factor (η_error) from our simulation bins and parameterize it by ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ (after the correction for flux bias). We compute η_error in a given bin by computing both the standard deviation and the upper and lower 68th percentiles. Because the data do not usually follow a normal distribution in $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})$ , both of these error estimates do not always agree with each other. This can be seen in the right panels of Figure 11, especially for low- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ data points, where the standard deviation is usually larger than the one derived from the percentiles. And we find that the minor value of the upper and lower 68th percentiles can better represent the underlying scatter (which are shown in later figures).

Condon (1997) proposed a mathematical recipe for estimating the errors of a six-parameter Gaussian fit with correlated noise. As shown by their Equations (32), (41), and (42), the total flux error can be characterized by the following parameters: the convolved source size parameters (major- and minor-axis FWHM sizes, denoted as θ_maj. and θ_min., respectively, corresponding to θ_M and θ_m, respectively, in Condon 1997), the beam size parameters (major- and minor-axis FWHM sizes, denoted as θ_bmaj. and θ_bmin., respectively, corresponding to θ_N and θ_n, respectively, in Condon 1997), and the measured total flux ( ${S}_{\mathrm{total}}$ ). Such a recipe has later been adopted in Bondi et al. (2003, 2008), Schinnerer et al. (2010), and Smolčić et al. (2017) for the VLA source fitting photometry.

In this work, because our ALMA data have different beam sizes, we express these size quantities in the normalized form: the geometric mean of the source size normalized by the beam size, ${{\rm{\Theta }}}_{\mathrm{beam}}$ , as defined in Equation (2), which equals ${{\rm{\Theta }}}_{\mathrm{beam}}^{2}\equiv ({\theta }_{\mathrm{maj}.}{\theta }_{\min .})/({\theta }_{\mathrm{bmaj}.}{\theta }_{\mathrm{bmin}.});$ the size of the source major axis normalized by the beam, Θ_maj. ≡ θ_maj./θ_bmaj.; and the size of the source minor axis normalized by the beam, Θ_min. ≡ θ_min./θ_bmin..

Because the total flux is the product of peak flux and source area, we can write

$\begin{eqnarray}&&{S}_{\mathrm{total}}={S}_{\mathrm{peak}}\times \displaystyle \frac{({\theta }_{\mathrm{maj}.}{\theta }_{\min .})}{({\theta }_{\mathrm{bmaj}.}{\theta }_{\mathrm{bmin}.})}\equiv {S}_{\mathrm{peak}}\times {{\rm{\Theta }}}_{\mathrm{beam}}^{2}.\end{eqnarray} \tag{ 7 }$

Therefore, the Condon (1997, C97) recipe can be rewritten as

$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{{\sigma }_{{S}_{\mathrm{total}},{\rm{C}}97}^{2}}{{[\mathrm{rms}\mathrm{noise}]}^{2}} & = & \left({{\rm{\Theta }}}_{\mathrm{beam}}^{2}\times \displaystyle \frac{8}{{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\mathrm{maj}.}^{2}}\right]}^{1.5}{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\min .}^{2}}\right]}^{1.5}}\right)\\ & & +\,\displaystyle \frac{8}{{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\mathrm{maj}.}^{2}}\right]}^{2.5}{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\min .}^{2}}\right]}^{0.5}}\\ & & +\,\displaystyle \frac{8}{{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\mathrm{maj}.}^{2}}\right]}^{0.5}{\left[1+\tfrac{1}{{{\rm{\Theta }}}_{\min .}^{2}}\right]}^{2.5}}.\end{array}\end{eqnarray} \tag{ 8 }$

Condon (1997) validated the coefficients/indices in their equations using ∼3000 simulations. Because our ALMA photometry is more diverse than their simulations in both data complexity (the variety of beam size, rms noise) and photometry method (e.g., involving iterations), we need to verify that the Condon (1997) recipe is still appropriate for our analysis.

In Figure 13, we present how our estimated η_error changes with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ , and compared with the Condon (1997) errors (horizontal lines). The four panels show the same diagram for our two photometry methods and the two simulations.

According to Equation (8), the flux error normalized by the $\mathrm{rms}\,\mathrm{noise}$ should be independent of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ but strongly dependent on ${{\rm{\Theta }}}_{\mathrm{beam}}$ . Figure 13 indeed shows a strong dependency on ${{\rm{\Theta }}}_{\mathrm{beam}}$ but also indicates a weak dependency on ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . For sources with small sizes (relative to the beam), the flux error becomes larger for larger ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (by about 15% within the range indicated in the figure). However, for sources with large sizes (relative to the beam), it becomes smaller for larger ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (by about 40% within the range of the figure).

The expected Condon (1997) errors for ${{\rm{\Theta }}}_{\mathrm{beam}}=1$ , 2, and 5 cases are shown as horizontal lines in Figure 13, computed using Equation (8) and assuming a minor/major-axis ratio of 1. Note that a smaller axis ratio will lead to a smaller Condon (1997) error value (by about 15% for ${{\rm{\Theta }}}_{\mathrm{beam}}=5$ when reducing the axis ratio from 1 to 0.1). Our simulation-derived errors (colored data points) are consistent with Condon (1997) errors (colored lines) at the low- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ end and at smallest and largest sizes (represented by the colors). However, the "FULL" simulation panel indicates that Condon (1997) errors are overestimated by about 40% for large, high- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ sources, while the "PHYS" simulation panel indicates that Condon (1997) errors are underestimated by about 15% for small, high- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ sources. Both simulations show that the Condon (1997) errors are slightly overestimated at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 5\mbox{--}10$ for small and intermediate-sized sources.

In our final catalog, we provide both our simulation-derived total flux errors and those given by our photometry pipelines, which are based on Condon (1997).²⁶

3.1.4. Flux Error Estimation for galfit Photometry

The flux errors are analyzed in a similar way for galfit photometry. The same diagnostic plots are shown in the bottom panels of Figure 13. The trends for the galfit photometry are very similar to those for PyBDSF. The galfit photometry has even smaller flux errors for large size sources than PyBDSF photometry. Both methods involve multiple iteration or multisource fitting (rather than one-time simple 2D Gaussian fits), and thus the reason for these trends is not very clear. Yet the different inputs for the two types of simulations do not have a sizable impact here.

3.1.5. Final Corrections

We finally correct flux biases and re-estimate flux errors for both the simulation catalogs and the real data's blind extraction and prior fitting catalogs, based on our aforementioned recipes (as functions of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}};$ Equations (4) and (6), respectively). We choose the "PHYS" simulation for the final correction, considering the discussion in the previous sections, i.e., "PHYS" simulation is more representative of our real data. Note that using "FULL" simulation would underestimate the flux bias correction and hence lead to larger fluxes, especially for large sources.

The comparison of corrected and uncorrected fluxes and errors for real catalogs is shown in Figure 14. Based on this, we find that our corrected fluxes and errors follow well-behaved statistics (see details in Appendix C.3), which means that flux biases (e.g., flux boosting) are fully removed and flux errors can fully reflect the scatters of photometry measurements introduced by the noise in the data.

Further, in Figures 15 and 16 we present the distributions of primary-beam-corrected total flux and fitted intrinsic size versus source peak-to-rms noise ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (see Equation (1)), beam-normalized source size ${{\rm{\Theta }}}_{\mathrm{beam}}$ (see Equation (2)), and the rms noise and beam major-axis FWHM of the ALMA data. These figures show that our detections span a large range in flux and size. Note that the continuum wavelengths of the ALMA detections also vary: about 44% of the data are at ∼870 μm, about 49% at ∼1.0–1.5 mm (mostly 1.25 mm), ∼1% at ∼1.9–2.3 mm, and ∼6% at ∼2.5–3.4 mm. Thus, the sensitivity shown cannot straightforwardly be compared to single-band ALMA continuum surveys. From these figures, good consistency between the two photometry catalogs is also evident. The prior catalog extends to a slightly fainter regime, and only a minor fraction of sources are fitted with smaller sizes. As the aim here is to obtain good continuum photometry catalogs, the study of the uncertainty on source sizes is the topic of future work.

**Figure 15.** Total flux (primary beam corrected) vs. source peak flux to rms noise ratio ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (see Equation (1)) (left panel) and the fitted intrinsic source major-axis FWHM vs. the beam-normalized source size ${{\rm{\Theta }}}_{\mathrm{beam}}$ (see Equation (2)) (right panel) for the ALMA detections with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 4.35$ and 5.40 in our prior photometry and blind photometry catalogs, respectively (the ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ thresholds are determined in Section 4.1). Contours are the density of the data points. The size of a data point scales inversely with the density for illustration purposes.
Download figure:
Standard image High-resolution image

**Figure 16.** Total flux (primary beam corrected) vs. rms noise (left panel) and the fitted intrinsic source major-axis FWHM vs. beam major-axis FWHM (right panel). Contours and data points are in the same style as in Figure 15.
Download figure:
Standard image High-resolution image

3.2. Completeness

In this section, we analyze the completeness of our photometry by examining the fraction of simulated sources that are successfully recovered to the total simulated number. The photometry is incomplete for several reasons: (1) some faint sources are undetected owing to noise fluctuation, (2) PyBDSF groups blend multiple sources into one source, (3) galfit might give wrong best-fit results in case of severely clustered priors, and (4) PyBDSF has certain flagging criteria to filter out nonphysical sources.²⁷ To assess the contribution of these effects, we calculate the completeness curves as a function of ${\rm{S}}/{\rm{N}}$ and source sizes (normalized by the beams).

We use both PHYS and FULL simulations to verify the completeness. Note that the two simulations have very different source flux and spatial distributions. Sources are isolated and have flat flux distribution in the FULL simulation, whereas in the PHYS simulations sources have instead realistic spatial distribution, as well as a flux distribution that fully agrees with the observed millimeter number counts (see Appendix C.2).

We cross-match the PyBDSF source recovery catalog to the simulated catalog for each image by coordinate using a search radius of 1 farcs 5,²⁸ and we match by ID for the galfit recovery catalog. We measure the completeness as the ratio of the number of sources in the cross-matched catalog to those in the simulated catalog for each bin of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ . We confirmed that the wide range in rms noise and beam size does not affect the completeness estimates by splitting our simulations in random half. Using a smaller search radius has a very minor effect, as only 4% (10%) of sources have recovered position shifted by more than 1 farcs 0 (0 farcs 6) from the simulated position.

Moreover, the completeness is associated with certain detection criteria. Within PyBDSF, the detection is defined as an extracted source that passes thresh_pix, thresh_isl, and other flagging criteria. Therefore, the remaining discussion within this section is focused on the PyBDSF setups (Section 2.2). In galfit, a detection is slightly more complex to define, as galfit always fits a positive flux density for each prior. Thus, we apply an ${\rm{S}}/{\rm{N}}$ cut to the galfit catalog before computing the completeness (without such an ${\rm{S}}/{\rm{N}}$ cut, the recovery rate would be 100%, as every prior is fitted with a flux density).

In the left panels of Figure 17, we show the completeness curves for the PyBDSF photometry as a function of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . As sources tend to be small relative to the beam size (with a median (mean) observed size of ${{\rm{\Theta }}}_{\mathrm{beam},\mathrm{sim}.\mathrm{convol}.}\sim 1.2$ (1.6)) in the "PHYS" simulation (top left panel), we do not distinguish between source sizes. The "FULL" simulations (bottom left panel) have sufficient statistics to study the effect of source sizes; thus, we show completeness curves for different simulated source sizes in the bottom left panel. Here, we consider simulated size instead of recovered size, as the latter is unavailable for undetected sources. Large sources are slightly more complete than small sources at very low ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 2\mbox{--}4$ . This trend reverses at a higher ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (up to ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 20$ ), above which the completeness for sources reaches ∼100%. In principle, at a given ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ , sources with larger recovered size should have higher completeness. We speculate that the previously discussed resolution bias, spatial noise fluctuation, and "island" feature of PyBDSF all play a role in the low- to intermediate- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ regime—a larger simulated source is easier to detect owing to a higher number of pixels above the threshold, but at the same time it has a chance of being recovered with a smaller size or even as an (or multiple) unresolved source(s) by PyBDSF (especially for the largest simulated sizes). Thus, these effects lead to a lower completeness for the largest simulated sources even at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 10\mbox{--}20$ . While fine-tuning the PyBDSF parameters can achieve better detection for large sources, this would require more dedicated effort beyond our systematic approach, which is tailored to the bulk of source properties expected. Moreover, our prior photometry is fitting well large sources (<3''); thus, such cases will be identified when we cross-match the prior photometry and blind photometry catalogs (see Section 4.1), and currently no such source is found in our data set, as we excluded beam < 0 farcs 1 ALMA data.

The shaded areas in Figure 17 indicate an uncertainty of a factor of two in the estimated incompleteness in "PHYS"–PyBDSF and are the same in all other panels. Comparison between the completeness for the smallest sources in the "FULL" simulation and the one from the "PHYS" simulation gives a ∼3% lower completeness at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 20\mbox{--}40$ . This difference is caused by source blending and exactly corresponds to the 3.5% multi-Gaussian sources detected in our data set. As described in Section 2.2, when several sources are blended, PyBDSF fits multiple Gaussians and groups them as one island, which is then output as a single source.

In the right panels of Figure 17, we show how different ALMA beam sizes (absolute values in units of arcseconds) would impact the completeness. We find that as long as the ALMA beam is between 0 farcs 2 and 1'', the completeness is not obviously affected. For ALMA beams larger than 1'', completeness drops by ∼5%–10% even for a high ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 20\mbox{--}50$ source in our PHYS simulation, which is likely because sources are clustered and the large ALMA beam starts to cause a blending effect, and also because PyBDSF has the "island"-grouping feature (Section 2.2).

In addition, in Figure 34 in Appendix C, we show the completeness as 2D functions of both ${S}_{\mathrm{peak}}/\mathrm{rms}\,\mathrm{noise}$ and ${S}_{\mathrm{total}}/\mathrm{rms}\,\mathrm{noise}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ . We find a good agreement between our completeness analysis and similar work by Jiménez-Andrade et al. (2019) for PyBDSF photometry in their COSMOS VLA data, as well as by Franco et al. (2018) for Blobcat (Hales et al. 2012) photometry on their ALMA deep field data. Further, we discuss the comparisons of our completeness to other (sub)millimeter/radio photometry works (Karim et al. 2013; Ono et al. 2014; Aravena et al. 2016; Hatsukade et al. 2016; Franco et al. 2018), which confirm that more realistic simulations are required to better recover the statistics.

Given our finding that completeness shows an obvious dependency on source sizes, if selecting a sample with a total flux ${\rm{S}}/{\rm{N}}$ threshold, the sample will have different completeness for different sizes. But when selecting with a constant ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ threshold, the sample will have a homogeneous completeness. Thus, we use ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ to select our final sample (see the next section). Furthermore, we confirm that the spurious fractions derived from the simulations are consistent with those based on inverted-image fitting in Section 2.8. The robust estimates of the fractions of completeness and spurious sources provide us with a good handle of the performance of our photometry methods. For a given ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ selection threshold, we know how many real sources are missed and how many could be spurious. While there is no way to improve on the nondetections, there are a number of automated examinations that can significantly reduce the number of spurious sources in our final galaxy catalog (see next sections).

4. Galaxy Catalog and Properties

In this section, we discuss the selection of reliable ALMA detections from the two photometry catalogs and the construction of our "galaxy catalog." Given the extensive information on galaxies in the COSMOS field that is available in the literature, we have devised rigorous inspections to ensure that our galaxy sample and its SEDs are reliable. These inspections include the identification of spurious sources and galaxies with inconsistent photometric and/or spectroscopic redshifts in the literature. We further discuss how galaxy properties are obtained via multiple SED fitting techniques, including consistency and reliability checks. The workflow of this analysis step (including Sections 4.1–4.6) is illustrated in Figure 18.

**Figure 18.** Workflow for the selection of a reliable galaxy sample and the determination of its properties (see Section 4). We first apply an ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ cut to our two photometry catalogs and then apply a counterpart association code (based on machine learning) to construct our galaxy multiwavelength catalog. Next, we run SED fitting to identify outliers that are due to either spurious ALMA sources or "suspicious" (inconsistent) redshifts in the literature. Finally, after discarding spurious sources and refinement of inconsistent prior redshifts, SED fitting is repeated to obtain physical properties of our galaxies. The corresponding subsections in the text are provided in parentheses.
Download figure:
Standard image High-resolution image

4.1. Combining the Two Photometric Catalogs

We apply an ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ cut at 5.40 to our blind source extraction catalog (Section 2.2) and an ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ cut at 4.35 to our prior source fitting catalog (Section 2.4). These thresholds are selected such that the differential spurious fractions are both 50% at the applied ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ cut level, and the cumulative spurious fractions are <8% and <12% for the blind- and prior-selected samples, respectively (see Section 2.8 and Figure 8). The corresponding differential completenesses at those thresholds are 57% and 98%, and the cumulative ones are as high as >92% and >99%, respectively (see Section 3.2 and Figures 17, 34). In Figure 19, we show the ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ histograms of the blind and prior catalogs and the applied thresholds.

**Figure 19.** ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ histograms of our blind extraction and prior fitting catalogs. The blue and red vertical dashed lines indicate the ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ thresholds we applied to select our sample from the prior and blind catalogs, respectively.
Download figure:
Standard image High-resolution image

To merge the two photometric catalogs, we spatially cross-match their sources with a radius of 1 farcs 0 (false-match probability 0.5% applying Equation (1) of Pope et al. 2006; see also further discussion of the counterparts association in the next section), and we find 820 sources in common. Another 326 sources are only present in one catalog (207 sources in the prior catalog and 119 sources in the blind catalog). The ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ histograms of those sources (Figure 20) show that the sources only present in the prior catalog (prior-only sources) mostly lie at the lowest- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ end, where the spurious fraction is 50%. The few prior-only sources at high ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ are blends with nearby prior sources, such that only one source is cross-matched to the corresponding PyBDSF counterpart. The sources only present in the blind catalog could be spurious (if at low ALMA ${\rm{S}}/{\rm{N}}$ ) or, if at high ALMA ${\rm{S}}/{\rm{N}}$ , real dusty, high-redshift galaxies whose optical/near-IR/radio emission is too faint to be detected in the prior catalogs. However, as there is currently no optical/near-IR information available for these blind-only sources, we exclude them from the analysis in the rest of this paper.

**Figure 20.** Vertically stacked ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ histograms of our selected sample from the blind extraction and prior fitting catalogs. Sources in both catalogs are indicated by green bars and shown with the prior catalog ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ , while those in the blind extraction (prior fitting) catalog with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ above the labeled threshold are shown with red (blue) bars. The height of each stacked bar indicates the relative number, and the total height of the histogram represents our selected sample size. (Comparing to Figure 19, the difference in the third-highest bin is due to different ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ between prior photometry and blind photometry as detailed in Section 2.5.)
Download figure:
Standard image High-resolution image

After accounting for 25% of galaxies having more than one ALMA observation, due to either different wavelengths or spatial resolutions, we have 823 unique galaxies (with data set version 20180201). The ALMA flux densities and their errors are then corrected for the PBA. As 26% of these galaxies do not have sufficient optical/near-IR data, i.e., not in the Laigle et al. (2016) catalog, it is not possible to obtain reliable stellar masses for them. While some of these sources emit weakly in the deeper IRAC 3.6 and 4.5 μm data from the Spitzer Large Area Survey with Hyper-Suprime-Cam (SPLASH) survey (PI: P. Capak; I. Davidzon 2019, private communication) and are also present in the IRAC catalogs from the Spitzer Matching Survey of the UltraVISTA Ultra-deep Stripes (SMUVS; Ashby et al. 2018), their stellar masses and photometric redshifts have large uncertainties owing to the lack of shorter-wavelength information. We therefore omit these sources from our galaxy catalog (see the "no optical/near-IR prior redshift galaxies" entry in Figure 18; they are kept in the ALMA photometry catalogs, e.g., those with IRAC/radio priors). We plan to update our galaxy catalog when deeper optical to K-band data become available, e.g., from the UltraVISTA Data Release 4.

In the next sections, we further exclude some outliers from the ALMA photometry catalogs to construct our final galaxy catalog. We list the numbers and fractions of sources excluded at each step in Table 3.

Table 3. Number of Sources in A³COSMOS Catalogs and Excluded at Each Step in Section 4 (Version 20180201; See Also Workflow in Figure 18)

Catalog/Step	Number	Fraction
Prior photometry catalog	1027	⋯
Blind photometry catalog	939	⋯
Combined ALMA detections	1146	⋯

Galaxies having more than one ALMA data point (Section 4.1)^a	204	25%
Galaxies having no optical/near-IR counterpart/prior redshift (Sections 4.1, 4.2, and 4.3)^b^,^c	215	26%
Inconsistent-flux outliers (Section 2.5; `Flag`_`inconsistent`_`flux`)^c	4	0.5%
Unreliable counterpart outliers (Section 4.2; `Flag`_`outlier`_`CPA`)^c	36	4%
SED-excess outliers (Section 4.4; `Flag`_`outlier`_`SED`)^c	21	3%

Final galaxy catalog (Section 5)^d	547	⋯

Notes.

^aIn this step, we sorted 1027 ALMA prior detections into 823 unique galaxies, while we discarded 119 blind-only sources (see discussion in Section 4.8). The fractions in the third column are of the 823 unique galaxies. ^bThis includes the 119 blind-only sources, 43 galaxies that have no redshift from the literature as prior information, and 53 galaxies that only have an FIR/millimeter photo-z from Jin et al. (2018). They are excluded from the further quality assessments owing to too poor constraints on galaxy properties. ^c10 sources are duplicated among these flags. ^dOur approach aims at keeping only galaxies with the most reliable properties (redshift, stellar mass, and dust-obscured SFR); therefore, the number of galaxies is significantly reduced compared to the number of ALMA detections. The exclusion of galaxies does not mean that they are all not real, but just their properties could not be reliably estimated with current data. Future follow-ups will be needed to explore their properties.

Download table as: ASCII Typeset image

4.2. Examining Counterpart Association

Our ALMA data set has excellent spatial resolution (∼1'') compared to data from single-dish (sub)millimeter telescopes (>10''), and for most sources a unique counterpart at optical/near-IR/radio wavelengths can be easily identified by examining the spatial separation. However, a small number of ambiguous cases remain for both prior fitting and blind extraction photometry. Note that we have already corrected for the known astrometry offsets between prior and ALMA positions before our final run of prior fitting (for more details on astrometry, see Appendix A).

During our prior fitting photometry, we allow the source position to vary if the source has high ${\rm{S}}/{\rm{N}}$ (see Section 2.4). This implies that any ALMA source not in our prior master catalog close to a prior position will be wrongly attributed to that prior. In these cases, they are more likely to have a certain spatial offset. But this scenario needs to be distinguished from the case where the prior source is an extended galaxy and its dust emission peak is offset from its optical position (e.g., Hodge et al. 2016; Chen et al. 2017).

Besides, spurious sources caused by noise boosting (∼10% spurious sources are expected from our statistical analysis with our selection thresholds in Section 4.1) can also exhibit larger offsets, as the signal boosted by noise is randomly spatially distributed. Thus, by examining the counterpart association, we can identify most of these outliers (∼4% in this step, or in total ∼8.4% including the steps in the next sections) and reduce the number of spurious sources in our final catalog.²⁹

In order to correctly identify such ambiguous cases in an automated fashion, we quantify the counterpart association process by several measurable parameters as follows:

1.
The projected separation between the positions of the ALMA and counterpart source, normalized by the projected ALMA source radius (denoted as Sep.).
2.
The ALMA total flux ${\rm{S}}/{\rm{N}}$ (denoted as ${\rm{S}}/{{\rm{N}}}_{\mathrm{ALMA}}$ ).
3.
The ${\rm{S}}/{\rm{N}}$ of the aperture-integrated flux in optical/near-IR/radio images, measured with an aperture centered at the ALMA source position ( ${\rm{S}}/{{\rm{N}}}_{{\rm{S}}.}$ ) and at the reference counterpart positions ( ${\rm{S}}/{{\rm{N}}}_{\mathrm{Ref}.}$ ), as well as their respective ratio (denoted as ${\rm{S}}./\mathrm{Ref}.$ ). The aperture size is determined via measurements with a series of concentric apertures where the aperture with the maximum ${\rm{S}}/{\rm{N}}$ is taken.
4.
An extension parameter $\mathrm{Ext}.$ that traces the amount of extended optical/near-IR/radio emission within the location between the ALMA and counterpart positions. This is quantified by deriving the optical/near-IR/radio surface brightness level within a series of fixed-size apertures (equal to the fitted ALMA source size) centered along the connecting line between the ALMA position and the reference counterpart position. The linear slope of the relation between surface brightness and increasing (linear) distance from the ALMA position is adopted as $\mathrm{Ext}.$ : if the source is an extended galaxy and the optical emission is attenuated by dust at the ALMA position, then $\mathrm{Ext}.$ is around or slightly larger than 1. However, if the ALMA source is a dusty galaxy with nondetectable optical emission and is wrongly associated with a counterpart in the optical catalog at some distance away, $\mathrm{Ext}.$ will be very large or even not measurable in the counterpart optical image (as we require ${\rm{S}}/{{\rm{N}}}_{{\rm{S}}.}\gt 3$ in the apertures to measure the $\mathrm{Ext}.$ parameter).

These parameters have been defined to best describe the counterpart association process and are best suited to distinguish between those considered true by visual classification and those cases where the visual classification suggests that the ALMA source is unrelated to the counterpart source. These parameters are then measured for each ALMA detection (Section 4.1) and its master catalog counterpart (Section 2.3) in four counterpart images: Hubble Space Telescope (HST) ACS i-band image from Capak et al. (2007), UltraVISTA K_s-band image from McCracken et al. (2010, 2012), Spitzer IRAC 3.6 μm image from the SPLASH survey (PI: P. Capak), and VLA 3 GHz image from Smolčić et al. (2017). Other images have worse spatial resolution and/or sensitivity and therefore are less helpful in distinguishing the quality of counterpart associations.

Empirically, we find that counterparts with larger $\mathrm{Sep}.$ and lower ${\rm{S}}/{{\rm{N}}}_{\mathrm{ALMA}}$ are less reliable (i.e., less confident to say that the ALMA emission belongs to the counterpart galaxy, based on our visual identification). However, those could be more reliable if we see extended emission between the ALMA and counterpart position (i.e., Ext. ∼ 1), which could be the aforementioned case where the galaxy's dust emission is offset from its optical emission and has a smooth transition in between. We show an example of our counterpart association diagnostic in Appendix D.

With these parameters, we proceed with machine-learning techniques to establish the linkage between these parameters and the confidence of a counterpart association. To build up a training data set, three team members visually classified all the 1000+ ALMA detections individually. We visually inspected ALMA contours overlaid on ACS i band, UltraVISTA K_s band, IRAC 3.6 μm, and 3 GHz images and assigned each source a classification of 1 (robust) or 0 (spurious or incorrect association). We adopt the median classification from the three sets as truth. In order to automate this classification for future data releases, we use the results from visual inspection to train an algorithm that takes as input the parameters described above ( $\mathrm{Sep}.$ , ${\rm{S}}/{{\rm{N}}}_{\mathrm{ALMA}}$ , ${\rm{S}}/{{\rm{N}}}_{\mathrm{Ref}.}$ , ${\rm{S}}./\mathrm{Ref}.$ , and $\mathrm{Ext}.$ ) calculated for the ACS, Ks, IRAC 3.6 μm, and 3 GHz cutouts. In addition, we include a flag for $\mathrm{crowdedness}$ (defined as the density of master catalog sources weighted by a 2D Gaussian with an FWHM of PSF size; see Liu et al. 2018 Equation (1)) and $\mathrm{clean}$ parameter (defined as the number of master catalog sources within 3'' radius; Elbaz et al. 2011), as they are helpful in identifying extremely blended cases.

For this supervised machine-learning task, we use the Python scikit-learn package (Pedregosa et al. 2012). For sources with missing parameters, we replace the missing values with the mean of that parameter from the entire sample. Then, we randomly select 60% of the sample with visual classifications for training, leaving the final 40% for model validation. After testing a number of different classifiers available in scikit-learn, we decide to use the Gaussian process (GP) classifier, which implements Gaussian processes for probabilistic classification. Running our trained model on the validation sample gave an accuracy of ∼96.5%. For the total sample of 1027 analyzed sources, we find that 94% (965) of sources are classified as robust by both the visual and GP classifications. Three percent of the sources (32) are classified as not robust/spurious by both visual and GP classifications (bringing the overall accuracy to 97%). Only 1% of the sources (7) are classified as robust visually but missed by the GP classification. Two percent of the sources were classified as not robust visually but assigned a robust classification by the GP classifier. Reassuringly, the cases where the visual and GP classifications disagree are all borderline cases where the three visual inspectors are also not in full agreement. The model was saved and can be reused to predict the robustness of counterpart associations for future A³COSMOS runs without the need for visual classification, provided that our current training sample is representative of future data sets.

After this automated counterpart association step, 36 sources are flagged as spurious sources (they could potentially be noise boosted or a co-aligned real dusty galaxy). We flag them by the Flag_outliers_CPA column in our final galaxy catalog and discard them for our further analysis in this paper.

4.3. Combining Multiwavelength Photometry and Prior Redshifts in the Literature

To combine the multiwavelength photometric and spectroscopic information for our prior catalog, we adopt the optical/near-IR photometry from the Laigle et al. (2016) catalog and use the 3'' diameter aperture fluxes to be consistent with Laigle et al. (2016).³⁰

Further, we adopt the FIR/(sub)millimeter/radio photometry from Jin et al. (2018). The authors use detailed "super-deblended" procedures following Liu et al. (2018) to overcome the severe source confusion in their FIR/(sub)millimeter data, which is due to the large beam sizes of the Herschel and ground-based single-dish FIR/(sub)millimeter telescopes. Their photometry is prior based, with the prior catalog constructed by combining the Laigle et al. (2016), Muzzin et al. (2013), and Smolčić et al. (2017) catalogs, all of which are also in our master catalog. The "super-deblended" photometry uses the prior information of galaxies' photometric redshifts and SEDs to "freeze" low-redshift sources and includes the step of blindly extracting sources in the residual images and refitting together with initial priors. Therefore, sources not in the prior catalog or even co-aligned sources at a significantly higher redshift than the prior source have already been reasonably well accounted for (e.g., if prior redshift < 1, its SED will predict a too low FIR flux and it gets "frozen" during fitting; see details in Liu et al. 2018; Jin et al. 2018). More complex situations arise if an unknown FIR source is blending with a prior source whose SED is not constrained well. However, the ALMA data have typically the spatial resolution and sensitivity to distinguish them. In this work, we do find about 100 ALMA sources not in the prior catalog used by Jin et al. (2018), of which only about 10 are blended with a Jin et al. (2018) prior source (within 1''), and their ALMA ∼1 mm fluxes (<1 mJy) indicate that they are undetectable by Herschel and SCUBA2. Therefore, using the Jin et al. (2018) catalog for FIR photometry seems appropriate, especially for those with common priors.

For the SED fitting in this work, we first consider a prior spectroscopic or optical/near-IR photometric redshift if available in the literature. Using photometric redshift is motivated by the sufficiently good agreement between photometric and spectroscopic redshifts as demonstrated by Laigle et al. (2016).

In this work, we examine all the spectroscopic and photometric redshifts in the literature listed in Section 2. We show the comparison of these redshifts (hereafter prior redshift, or "prior-z") in Figure 21, where each data point represents a galaxy in our galaxy catalog and has prior-z from both the Laigle et al. (2016) catalog and other catalogs³¹ : the M. Salvato et al. spectroscopic redshift catalog, the Davidzon et al. (2017) photometric catalog for the same UltraVISTA galaxies as Laigle et al. (2016) but with optimized SED fitting for z > 2.5 sources, the Delvecchio et al. (2017) photometric catalog for radio-detected galaxies, and the Salvato et al. (2011) photometric catalog for X-ray-detected active galactic nuclei (AGNs).

**Figure 21.** Comparison between literature photometric and spectroscopic redshifts available for our galaxy sample. Redshifts from various studies in the literature: Davidzon et al. (2017), Delvecchio et al. (2017), Salvato et al. (2011), and the M. Salvato et al. compilation catalog of spectroscopic redshifts are plotted against the photometric redshifts from Laigle et al. (2016). Each data point represents a master catalog source that has a counterpart in the second respective catalog (see Section 4.3, footnote 31 for the cross-matching). Filled orange circles indicate sources with robust spectroscopic redshifts (≥2 spectral features); low-quality spectroscopic redshifts with only one spectral feature are shown as open orange circles. From the set of sources with photometric redshifts in Davidzon et al. (2017), we highlight those that have a consistent second probability peak in redshift in the Laigle et al. (2016) catalog by a white cross inside the black circle (mostly around redshift 4 in Laigle et al. 2016 catalog). The solid line presents the one-to-one relation, and the dashed gray lines indicate ±0.15 × (1 + z) catastrophic errors (e.g., Laigle et al. 2016, Section 4.3).
Download figure:
Standard image High-resolution image

The majority of our sample galaxies show good consistency among all available prior redshifts. However, we do find several types of outliers: (1) About 14 X-ray-detected AGNs have higher redshifts in the Salvato et al. (2011) than in the Laigle et al. (2016) catalog (see open squares in Figure 21), but about half (6) of them have spectroscopic redshifts in good agreement with the Salvato et al. (2011) values (see overlap between open squares and yellow circles in Figure 21). (2) About 25 z > 3 galaxies have lower redshifts in Davidzon et al. (2017) than in Laigle et al. (2016), as indicated by the black filled circles in Figure 21, but about half (14) of them have consistent second redshift peaks in Laigle et al. (2016) (see the black filled circles with white cross in Figure 21). (3) About 10 low-quality spectroscopic redshifts (i.e., with two or fewer detected spectral features to determine the respective redshift) disagree with Laigle et al. (2016), yet both could have large uncertainties (see yellow circles outside the area enclosed by dashed lines in Figure 21).

In our next step, we will run SED fitting to obtain galaxies' stellar mass and SFR properties, but with redshift fixed to a prior-z.³² For galaxies with consistent prior-z or a single prior-z from the above catalogs, we directly use it for the SED fitting. But for galaxies with inconsistent prior-z (Δz > 0.15 × (1 + z)) from the above catalogs, we run SED fitting for each inconsistent prior-z and take the one with minimum-χ² at the ALMA bands as our best fit. The details are presented in the next section.

4.4. SED Fitting

We use MAGPHYS (da Cunha et al. 2008, 2015)³³ for the SED fitting, as it has rich stellar SED libraries and has been widely tested on local and high-redshift galaxies (e.g., Smith et al. 2012; Berta et al. 2013; Rowlands et al. 2014a, 2014b; Hayward & Smith 2015; Smith & Hayward 2015; Smolčić et al. 2015; Delvecchio et al. 2017; Miettinen et al. 2017a, 2017b; Hunt et al. 2019). It assumes an energy balance between the energy attenuated by dust in the UV/optical and that radiated by dust at IR/millimeter wavelengths. As it is debated whether this energy balance is still robust for very dusty galaxies (e.g., Casey et al. 2017; Simpson et al. 2017), we provide some supporting evidence for the assumption of energy balance in our whole galaxy sample (see below in this section).

Due to the large number of templates being fitted, MAGPHYS per default fits the SED at a fixed prior-z (which can be either photo-z or spec-z from the literature). A wrong prior-z can easily lead to a poor fit with a large residual at the wavelengths of the ALMA bands, which is measured by the reduced chi-square:

$\begin{eqnarray}&&{\chi }_{\mathrm{ALMA}}^{2}\equiv \displaystyle \sum _{\mathrm{ALMA}}\displaystyle \frac{{\left(| {S}_{\mathrm{SED}}-{S}_{\mathrm{OBS}}| \right)}^{2}}{{\sigma }_{{S}_{\mathrm{OBS}}}^{2}}/{N}_{\mathrm{ALMA}},\end{eqnarray} \tag{ 9 }$

where ${\sigma }_{{S}_{\mathrm{OBS}}}$ is the flux error and N_ALMA is the number of ALMA data points. Therefore, we consider all possible prior-z's for a given galaxy and fit each of them before choosing the fit with the lowest ${\chi }_{\mathrm{ALMA}}^{2}$ as the final best fit.³⁴

The final values of ${\chi }_{\mathrm{ALMA}}^{2}$ are generally well behaved. In Figure 22, we compare the difference between S_SED and S_OBS at all available ALMA bands for each galaxy. The median of S_SED − S_OBS for all total flux ${\rm{S}}/{{\rm{N}}}_{\mathrm{total}}\gt 3$ ALMA photometry is consistent with being zero, suggesting that MAGPHYS fitting has no obvious systematic over- or underestimation of the flux. There are about 25% of data points with ${\rm{S}}/{{\rm{N}}}_{\mathrm{total}}\lt 3$ (but ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ meets our sample selection criterion), which are shown as 3σ upper limits, and 60% of them are consistent with the SED flux (being above the one-to-one line). The histogram of log₁₀(S_OBS/S_SED) in the bottom panel is fitted with a 1D Gaussian with μ = −0.01 and σ = 0.05. Its upper 5σ envelope corresponds to S_OBS/S_SED = 1.77, above which we do find 3% outliers. Most of these "SED-excess" outliers have low total flux ${\rm{S}}/{\rm{N}}$ (i.e., ${\rm{S}}/{{\rm{N}}}_{\mathrm{total}}\lt 4\mbox{--}5$ as indicated by the color-coding in Figure 22).

**Figure 22.** Top panel: comparison of MAGPHYS SED-predicted (S_SED) and observed fluxes (S_OBS; already corrected for flux bias and error based on our simulation in Section 3.1.5) at all available ALMA photometric bands for each galaxy in our sample (Section 4.1; removed spurious sources/outliers in Section 4.2). Color indicates the ${\rm{S}}/{\rm{N}}$ of the measured total ALMA flux. Arrows are the 3σ upper limits for sources with total flux ${\rm{S}}/{\rm{N}}\lt 3$ . The dashed line shows the one-to-one relation, and the dotted line indicates the 5σ threshold derived from the histogram in the bottom panel. Bottom panel: histogram of log₁₀(S_OBS/S_SED) for sources with a total flux of ${\rm{S}}/{\rm{N}}\geqslant 3$ (i.e., excluding upper limits). The dashed curve shows the best-fit 1D Gaussian. The vertical dotted lines indicate the 5σ range. We identify sources outside the 5σ range (i.e., S_OBS/S_SED > 1.77) as "SED-excess" outliers (see text for details).
Download figure:
Standard image High-resolution image

We speculate that the outliers are most likely spurious sources boosted by noise that by chance align with their optical/near-IR counterparts and are thus not removed by our earlier counterpart association step. Since their ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ pass our previous sample selection criterion, they tend to be large in angular size. And this number is actually supported by the statistics: we expect ≲12% (≲140) spurious sources owing to our ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ selection in Section 4.1, which is then reduced by ∼4% by our counterpart association examination in Section 4.2. Meanwhile, we have ∼130,000 master catalog sources within the current data set totaling 946 arcmin² regardless of primary beam areas (∼23,000 within primary beam areas, which sum up to 164 arcmin²); hence, we expect a false-match probability of 3% with a matching radius of 0 farcs 5 (Equation (1) of Pope et al. 2006), i.e., only ∼4 spurious sources to coincide with some prior sources by chance alignment.

However, we note that there is also a chance that there is an unidentified ALMA source at the same line of sight as the foreground prior source, thereby boosting the ALMA flux to much higher than what the SED could fit. These SED-excess outliers are rare but do exist, e.g., the z ∼ 5.7 background ALMA source "CRLE" found by Pavesi et al. (2018), which is not in any optical/near-IR/radio catalog but is at the same line of sight with a foreground z ∼ 0.3 galaxy in the Laigle et al. (2016) catalog.

Similar to the counterpart association flagging, we flag 21 sources as SED-excess outliers. They are indicated by the Flag_outlier_SED column in our final galaxy catalog and will no longer be considered in our further scientific analysis.

Furthermore, in order to verify whether doing a completely blind photometric redshift scan could lead to better fits (smaller χ²) or not, we adopt the recently developed photo-z version of the MAGPHYS code (MAGPHYS+photo-z; A. Battisti et al. 2019, in preparation). It considers redshift as a free parameter between z = 0 and 8 and generates identical libraries to the original version of MAGPHYS for each redshift. The output of this step is a probability distribution function (pdf) of the photometric redshift. We perform this photo-z fitting for all our sources and compare the best-fit redshifts (derived as the median of the pdf) to available spectroscopic redshifts, finding no obvious systematic offset (a 1D Gaussian fitting to the distribution of (z_photo. − z_spec.)/(1 + z_spec.) gives μ = −0.015 and σ = 0.045). The comparison with all prior-z also shows no obvious systematic offset (a 1D Gaussian fitting to the distribution of (z_photo. − z_prior.)/(1 + z_prior.) gives μ = 0.000 and σ = 0.076).

Comparing the physical properties obtained from the two SED fittings for common sources, we find a median difference (scatter) of 0.0 dex (0.05 dex) and 0.0 dex (0.04 dex) for $\mathrm{log}{M}_{* }$ and $\mathrm{log}\,\mathrm{SFR}$ , respectively. However, we do note that the uncertainties in $\mathrm{log}{M}_{* }$ and $\mathrm{log}\,\mathrm{SFR}$ are systematically larger in photo-z SED fitting when the uncertainties in redshift are included. (The histogram of the difference in uncertainty has a median of 0.0 dex but has a second peak at 0.2 dex and extends to 0.4 dex.) Therefore, for $\mathrm{log}{M}_{* }$ and $\mathrm{log}\,\mathrm{SFR}$ in our final galaxy catalog, we take the uncertainties from the photo-z SED fitting, which includes the redshift uncertainty, while keeping the best-fit values still from the best prior-z fit.

In this photo-z experiment, we also tested the photo-z of the SED-excess outliers, finding that for seven of them the photo-z are in the range of z = 2–4, whereas the prior-z's are below z = 1, while the remaining 14 have photo-z and prior-z consistent with z = 0–2. Note that the MAGPHYS photo-z fitting places more weight on the stellar SED when the optical/near-IR bands have more data points than the FIR/millimeter bands. Thus, these SED-excess outliers will still show an excess in their observed ALMA fluxes relative to the SED-predicted flux. Given their unreliable photo-z's, such sources will benefit from a better FIR/millimeter coverage as will be available from future submillimeter/millimeter surveys like JCMT/SCUBA2 S2COSMOS (at 850 μm; PI: I. Smail), STUDIES (at 450 μm; PI: W. Wang), the IRAM 30 m/NIKA2 Cosmology Legacy Survey (N2CLS; at 1 and 2 mm; PI: G. Lagache), and the LMT/TolTEC Ultra-Deep Galaxy Survey (at 1 and 2 mm).

4.5. Correcting Significant Contribution from Emission Lines

In sensitive (sub)millimeter observations like the majority of the ALMA observations in the COSMOS field, strong (sub)millimeter spectral lines like [C ii], [N ii], and high-J CO emission from high-redshift galaxies can strongly bias the dust continuum measurement if they are bright enough and fall in the bandwidth of the spectral setup.³⁵ In special cases, these lines will dominate the emission from the whole bandwidth, e.g., mostly ∼8 GHz of the current ALMA receiver. This is more significant in the lower-frequency 3mm observations and will become more critical in the future with even deeper observations from ALMA and for the large surveys mentioned in the previous section. It is therefore necessary to consider strong submillimeter/millimeter line emission together with the dust continuum in photometry pipelines. When the observation is not intended for line detection, the chance of a strong emission line being in the bandwidth is very low (e.g., ∼1.6%, from the blind [C ii] line search work by Cooke et al. 2018, who found 10 line emitters out of 695 ALMA continuum sources), but when the number of sources becomes large as in this and future works (with automated pipelines), the line emitters must be systematically corrected for.

As our continuum images are obtained by directly collapsing all channels of all SpWs, ignoring whether the PI intended a line detection or not, a strong (sub)millimeter emission line could potentially "contaminate" the measured continuum flux. Therefore, we developed a pipeline to automatically identify such cases and to apply a first rough correction for these lines. Direct blanking of channels affected by line emission before construction of the continuum image would require either a good a priori knowledge of the redshift or dedicated line searches (that are not part of this project), as well as special treatment of each source present in a single pointing. Both aspects result not only in a significant increase in data volume and analysis time required but also in an inhomogeneous data set. Given the small fraction of potentially affected sources of 7% (see below), our adopted approach is sufficient for our purpose.

Our pipeline uses the redshift and SFR (and IR color, e.g., rest-frame ${S}_{70\mu {\rm{m}}}/{S}_{160\mu {\rm{m}}}$ from SEDs, when necessary) to predict for each source the low- to high-J CO (upper level quantum number 2 ≤ J_upper ≤ 10), [C i] ${}^{3}{P}_{2}\to {}^{3}{P}_{1}$ and ${}^{3}{P}_{1}\to {}^{3}{P}_{0}$ (at rest-frame 370 and 609 μm, respectively), [N ii] ${}^{3}{P}_{2}\to {}^{3}{P}_{1}$ and ${}^{3}{P}_{1}\to {}^{3}{P}_{0}$ (at rest-frame 122 and 205 μm, respectively), and [C ii] ${}^{2}{P}_{3/2}\to {}^{2}{P}_{1/2}$ (at rest-frame 158 μm). We do not account for other lines in this work because those are predicted to fall outside the frequency range or are generally much weaker. The line prediction follows empirical luminosity–luminosity correlations: [C ii]–L_IR correlation from De Looze et al. (2011), with a [C ii] deficit roughly proportional to ${L}_{\mathrm{IR}}^{-0.335}$ when L_IR > 10¹⁰ L_⊙, which fits the data best; [N ii]–L_IR correlation from Zhao et al. (2013, 2016); CO (1−0)–L_IR correlation from Sargent et al. (2014); high-J (J_upper ≥ 4) CO–L_IR correlation from Liu et al. (2015); and [C i]–L_IR correlation based on the data sets in Liu et al. (2015) and Valentino et al. (2018). For CO 2 ≤ J_upper ≤ 3 lines, we interpolate the line luminosity using the CO (1−0)–L_IR and CO (4−3)–L_IR correlations.

Meanwhile, we obtain the exact frequency setups for each ALMA observation from the ALMA archive and identify the predicted strong (sub)millimeter lines within the frequency setups. We estimate the line contribution to the measured continuum by dividing the predicted line flux by the total bandwidth and compare that to the measured continuum. Our prediction suggests that ∼50 (∼7%) of sources have (sub)millimeter lines contributing more than 20% to the measured continuum. We looked into their data cubes and found that most of them do have line emission as predicted, as all except four have accurate redshift from the M. Salvato spectroscopic redshift compilation. A strong emission line is predicted but not found to be present for only three sources with spectroscopic redshift (A3COSMOS master catalog IDs 1236908, 350733, and 418763) and two with photometric redshift (IDs 339509 and 1236904). Interestingly, two sources (IDs 990180 and 861198) without spectroscopic redshifts from the M. Salvato compilation do show a line detection, and their spectroscopic redshifts are also reported in the literature (Lee et al. 2017; Cassata et al. 2019). More details of the A3COSMOS line search work will be presented in future papers. Here we have measured those (sub)millimeter lines³⁶ to verify our prediction, and the comparison is presented in Figure 23, where filled symbols are these A³COSMOS sources. Their measured line luminosity (x-axis) and predicted line luminosity (y-axis) show good agreement (the dashed lines indicate a factor of 2 range). The pipeline also predicts <20% line contributions for more sources, but as these lines could not be measured at sufficient ${\rm{S}}/{\rm{N}}$ in the data cube, they are omitted from the figure.

In Figure 23, we added 234 line detections with ${\rm{S}}/{\rm{N}}\gt 3$ for CO, [C i], [C ii], or [N ii] from the literature as follows: Albrecht et al. (2007), Baan et al. (2008), Bauermeister et al. (2013), Bertemes et al. (2018), Capak et al. (2015), Carilli & Walter (2013), Daddi et al. (2015), Lee et al. (2017), Magdis et al. (2017), Magnelli et al. (2012), Pavesi et al. (2018), Saintonge et al. (2017), Silverman et al. (2015a), Spilker et al. (2018), Tacconi et al. (2013), Tan et al. (2014), and Yao et al. (2003). SFRs from these works and also from Sanders et al. (2003) and Brinchmann et al. (2004) are used for our line prediction. The distribution of ${\mathrm{log}}_{10}({L}_{\mathrm{line},\mathrm{observed}}^{{\prime} }/{L}_{\mathrm{line},\mathrm{predicted}}^{{\prime} })$ has a mean of 0.07 and scatter of 0.27. Some disagreement can be found at the lowest end, where line luminosity ${L}_{\mathrm{line},\mathrm{observed}}^{{\prime} }\sim {10}^{8}\,{\rm{K}}\,\mathrm{km}\,{{\rm{s}}}^{-1}\,{\mathrm{pc}}^{2}$ . As our current data do not cover this faint regime, improvement is postponed to a future work.

After the correction for strong (sub)millimeter line "contamination," we reiterate over the SED fitting step. Note that in Figure 22 the data points represent already the final continuum fluxes corrected for line contamination.

4.6. Obtaining Galaxy Properties from SED Fitting

From MAGPHYS SED fitting, we obtain the following galaxy properties: stellar mass ( ${M}_{* }$ ), mass-weighted stellar age, V-band attenuation A_V, star formation history (SFH) integrated SFR_SFH, and total IR luminosity L_IR (integrated over 8–1000 μm). For each property, MAGPHYS gives a minimum-χ² (i.e., best-fit) value, as well as the median and the lower and upper 68th percentiles of the pdf.

Our final SFRs are computed from the IR luminosity with the Kennicutt (1998) calibration and assuming a Chabrier (2003) IMF:

$\begin{eqnarray}&&\displaystyle \frac{{\mathrm{SFR}}_{\mathrm{IR}}}{({M}_{\odot }\,{\mathrm{yr}}^{-1})}=\displaystyle \frac{{L}_{\mathrm{IR},8\mbox{--}1000\mu {\rm{m}}}}{({L}_{\odot })}\times {10}^{-10}.\end{eqnarray} \tag{ 10 }$

By comparing SFR_SFH and SFR_IR, we find that the distribution of ${\mathrm{log}}_{10}({\mathrm{SFR}}_{\mathrm{IR}}/{\mathrm{SFR}}_{\mathrm{SFH}})$ has more pronounced wings than a 1D Gaussian, with a mean of 0.14 and a standard deviation of 0.15. As mentioned in Kennicutt (1998), the calibration of SFR_IR is based on the starburst synthesis models of Leitherer & Heckman (1995) assuming a constant SFH with a young age of 10–100 Myr (in which time the bolometric luminosity-to-SFR ratio is relatively constant), and assuming that dust re-radiates all the bolometric luminosity. The difference between SFR_IR and SFR_SFH could thus come from the actual fitted SFHs, the fraction of bolometric luminosity re-radiated by dust, the variation of bolometric luminosity-to-SFR ratio with stellar population ages, or other additional effects. In the following analysis, we will use SFR_IR (and hereafter SFR) because the simple Kennicutt (1998) calibration is widely used in studies focused on the dusty galaxy population at high redshift and given the fact that our sample is biased toward massive, dusty galaxies at high redshift.

Through a detailed simulation and recovery study, Hayward & Smith (2015) tested the accuracy of MAGPHYS in recovering galaxies' physical properties. They found that for isolated disk galaxies MAGPHYS recovers well the physical properties mentioned above. However, for galaxy mergers, there might be some bias in the determined dust masses (Hayward & Smith 2015 found that MAGPHYS underestimates by 0.1–0.2 dex (and up to 0.6 dex) the dust mass during the post-starburst phase of a galaxy merger). Therefore, we do not provide dust masses in our final catalog and defer this to our Paper II.

Hayward & Smith (2015) also found that for AGN host galaxies, when the AGN does not significantly contribute to the UV–millimeter luminosity (e.g., <25%), the absence of a mid-IR AGN component in MAGPHYS is not significantly affecting the best-fit results. However, stronger mid-IR AGNs can lead to an overestimation of stellar mass and SFR. In our final sample (after removing outliers in Sections 4.2 and 4.4), 34 galaxies are AGN hosts in the Salvato et al. (2011) XMM-Newton catalog and 48 are in the Salvato et al. (2011) Chandra catalog. Meanwhile, 112 are classified as AGNs via SED fitting with an AGN component using SED3FIT (Berta et al. 2013) by Delvecchio et al. (2017). These catalogs have overlaps; thus, the final number of AGNs is 158 (∼23%).

We try to assess the mid-IR AGN problem by running MAGPHYS twice, one time including and the other time excluding the mid-IR 24 μm flux information. Then, we adopt the fit with the smaller χ² as our final best fit. The fitting excluding the 24 μm data usually leads to a better χ². The overall difference from the derived IR luminosity is very small: the distribution of the difference in log₁₀ L_IR between the two SED fitting results has a median of 0.0 dex and σ of 0.17 dex. This distribution is slightly broadened to a σ of 0.25 dex for the AGN subsample, but the median is still close to zero. About 20 sources are 3σ outliers, but for most cases the difference is caused by low- ${\rm{S}}/{\rm{N}}$ data at the FIR/millimeter wavelengths. Only four of them are AGNs according to the Delvecchio et al. (2017) classification.

In Figure 24 we further compare our final ${M}_{* }$ to the Delvecchio et al. (2017) SED3FIT-fitted ${M}_{* }$ for 396 sources in common (with consistent redshifts and coordinates). AGNs are highlighted in red. This demonstrates a good agreement (within 3σ). We find five outliers (labeled with 1-4 if our ${M}_{* }$ is larger and with a if our ${M}_{* }$ is smaller) exceeding the 3σ envelope of the distribution. Their corresponding A³COSMOS master catalog IDs and Delvecchio et al. (2017) IDs are listed in the figure. Through detailed inspection, we find that the difference is mainly caused by including the ALMA data in the SED fitting, which leads to a higher dust attenuation and thus higher stellar mass.

**Figure 24.** Comparison of our final ${M}_{* }$ from the multirun MAGPHYS SED fitting and those of Delvecchio et al. (2017), who used SED3FIT to account for the mid-IR AGN component. AGNs classified by Delvecchio et al. (2017) are highlighted in red. Four sources have highly overestimated ${M}_{* }$ by our method (labeled `1`, `2`, `3`, and `4`), while one source exhibits a significantly underestimated ${M}_{* }$ (labeled `a`). Their corresponding A³COSMOS master catalog IDs (and Delvecchio et al. 2017 `ID`_`VLA3` in parentheses) are listed. They are discussed in Section 4.6. The dashed line is the one-to-one relation, and the dotted lines indicate the ±3σ range (with σ being the standard deviation).
Download figure:
Standard image High-resolution image

In addition, the source shown with the highest stellar mass of ∼10¹² ${M}_{\odot }$ (ID 223951) in Figure 24 is the strong AGN XID2028 at z = 1.593 studied by Brusa et al. (2015, 2018), Cresci et al. (2015), and Perna et al. (2015). Brusa et al. (2018) estimated a stellar mass of ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })={11.65}_{-0.35}^{+0.35}$ via optical to millimeter SED fitting including an AGN component. For comparison, we obtain ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })=12.28\pm 0.07$ , almost consistent with their upper boundary. Interestingly, the reduced-χ² at the stellar wavelengths of our MAGPHYS SED fitting is as poor as for the outliers 2 and 3 with a rchi2_star ∼ 6.8 (top ∼10% of the worst fits). Delvecchio et al. (2017), accounting for mid-IR AGN contamination, obtain ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })=12.12$ (with an uncertainty of the order of 0.1 dex; see their Section 6.1). This indicates that our estimate is still acceptable for such an extreme case (although they should be treated with caution in individual studies).

To summarize our detailed comparison of the robustness of the derived parameters for AGNs, we find the following:

(1)
Our current multirun, iterative MAGPHYS SED fitting, although without an AGN component, achieves in general good agreement with SED fitting that includes an AGN component. The agreement is valid even for the AGN population identified in Delvecchio et al. (2017) and is within the uncertainties even for the most extreme AGNs, e.g., reported in Brusa et al. (2018).
(2)
For very few (5 out of 396) sources our stellar masses lie outside the 3σ range when comparing to the Delvecchio et al. (2017) stellar masses. Three of them exhibit strong mid-IR AGN emission contaminating near-IR IRAC and even optical bands. Thus, their stellar SEDs are poorly fitted, with rchi2_star ≳ 10. These extreme outliers are further discussed in Appendix E. Their stellar mass estimates in this work should be treated with caution when used in individual studies.
(3)
The inclusion of ALMA (and FIR/(sub)millimeter) data points is crucial for codes like MAGPHYS, which assumes energy balance. If the energy balance is valid for these dusty, ALMA-detected sources studied here, our stellar masses and IR luminosities are more reliable than optical-only estimates.

4.7. Final Galaxy Catalog and Properties

Our final "robust galaxy catalog" contains 547 galaxies with reliable stellar mass and SFR properties, from a parent sample of 823 galaxies with at least one ALMA detection in 1534 ALMA archive images (from 142 ALMA projects) available for the COSMOS field (version v20180201). In this catalog, 56% of the galaxies have a primary beam correction factor of <1.01 (corresponding to a 2'' offset from the phase center at 230 GHz), i.e., they are the primary targets of the PI-led observations. We caution that due to the selection functions of the PI-led ALMA observations in the archive, our sample is not complete in any quantity, e.g., cosmic comoving volume, stellar mass, and SFR. This bias exists even for sources away from the phase center because galaxies suffer from clustering effects, and also the input coordinates for single-dish-selected submillimeter galaxies are uncertain (possibly resulting in a few-arcsecond offset from the phase center). Bearing these limitations in mind, we show the redshift, stellar mass, and SFR properties of our galaxy catalog in this section and compare with known galaxy correlations and population properties in the literature.

In Figure 25 we show the distributions of their SFRs and specific SFRs (hereafter sSFR, which is defined as $\mathrm{sSFR}\,\equiv \mathrm{SFR}/{M}_{* }$ , in units of Gyr⁻¹) versus redshift. Our sample spans a large range of SFR from ∼1 to ∼2000, but the main portion of the sample is SFR limited at z > 1 with SFR ∼ 100–1000.

**Figure 25.** Distribution of SFR (left) and sSFR (right) vs.redshift. Empirical evolution curves of the star-forming MS at different stellar masses ( ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })=9.5$ , 10.5, and 11.5, respectively) are shown as blue, green, and red dashed lines, which are computed as a function of redshift and ${M}_{* }$ following Speagle et al. (2014) (the #49 model in their Table 7). The color bar indicates ${\mathrm{log}}_{10}{M}_{* }$ in both panels.
Download figure:
Standard image High-resolution image

The low number of z < 1 galaxies is mainly due to the selection function, the quick drop of flux density at the Rayleigh–Jeans tail in a galaxy's redshifted SED, and the rapid decline of the cosmic SFR density at z < 2 (e.g., Madau & Dickinson 2014; Liu et al. 2018). Furthermore, given the smaller volume sampled at low redshift, lower source density and cosmic variance could play a role as well. Therefore, at z < 1, our sample is very different from FIR-selected samples (e.g., see a Herschel sample in Liu et al. 2018, Figure 23; see also Béthermin et al. 2015, to name a few). Several z < 1 galaxies in our sample are strongly biased toward less massive systems (e.g., ${\mathrm{log}}_{10}{M}_{* }/{M}_{\odot }\lt 10.5$ ) but also relatively low SFR, and they all have low ALMA ${\rm{S}}/\mathrm{Ns}$ (total flux ${\rm{S}}/{\rm{N}}\sim 4\mbox{--}5$ ). Although they passed our rigorous spurious source examinations, they could statistically still be spurious. Should they be real, they are of interest in their own right. However, given these uncertainties, we recommend treating these galaxies with caution, especially for individual studies.

In Figure 26, we show for our A³COSMOS galaxies the distribution of their stellar masses and their SFR offsets to the SFR_MS expected for star-forming MS galaxies ( ${\rm{\Delta }}\mathrm{MS}\,\equiv {\mathrm{log}}_{10}(\mathrm{SFR}/{\mathrm{SFR}}_{\mathrm{MS}})$ ), where the MS SFR_MS is defined as a function of redshift and ${M}_{* }$ and is empirically measured by a number of works from z ∼ 0 (e.g., Brinchmann et al. 2004; Chang et al. 2015) to ∼4 (e.g., Sargent et al. 2014; Speagle et al. 2014; Béthermin et al. 2015; Schreiber et al. 2015; Pearson et al. 2018). Here we adopt the Speagle et al. (2014) MS (the #49 model in their Table 7).

**Figure 26.** Stellar mass ${M}_{* }$ (left) and MS offset ${\rm{\Delta }}\mathrm{MS}\equiv {\mathrm{log}}_{10}(\mathrm{SFR}/{\mathrm{SFR}}_{\mathrm{MS}})$ (right) vs. redshift, where the MS SFR SFR_MS is computed as a function of redshift and ${M}_{* }$ for each source following Speagle et al. (2014) (the #49 model in their Table 7). The color bars indicate ${\rm{\Delta }}\mathrm{MS}$ and ${\mathrm{log}}_{10}{M}_{* }$ in the left and right panels, respectively. The dashed blue line represents ΔMS = 0 in the right panel.
Download figure:
Standard image High-resolution image

The majority of our sample lies on the MS (i.e., their sSFRs are within a factor of 4, or equivalently ±0.6 dex, of the sSFR_MS; Rodighiero et al. 2011). However, less massive galaxies tend to be above the MS. This strong anticorrelation between ${M}_{* }$ and ${\rm{\Delta }}\mathrm{MS}$ is primarily an effect of detection limit/sample selection and is more evident in Figure 27, where the ${M}_{* }$ versus SFR are plotted for A³COSMOS galaxies in nine redshift bins ranging from z = 0.35 to 5.5, overlaid with four empirical MS parameterizations from Speagle et al. (2014), Sargent et al. (2014), Schreiber et al. (2015), and Béthermin et al. (2015). The differences between these MSs are relatively small (but see S. Leslie et al. 2019 for a detailed comparison).

**Figure 27.** SFR vs. ${M}_{* }$ , i.e., the star-forming MS diagram, in nine redshift bins from z ∼ 0.35 to 5.5. Galaxies on the MS and starbursts whose SFR is 0.6 dex above the empirical MS of Speagle et al. (2014) are shown as blue and red crosses, respectively. Several empirical MSs from Speagle et al. (2014), Sargent et al. (2014), Schreiber et al. (2015), and Béthermin et al. (2015) are shown for reference (see bottom right panel for information on the color-coding).
Download figure:
Standard image High-resolution image

**Figure 27.** SFR vs. ${M}_{* }$ , i.e., the star-forming MS diagram, in nine redshift bins from z ∼ 0.35 to 5.5. Galaxies on the MS and starbursts whose SFR is 0.6 dex above the empirical MS of Speagle et al. (2014) are shown as blue and red crosses, respectively. Several empirical MSs from Speagle et al. (2014), Sargent et al. (2014), Schreiber et al. (2015), and Béthermin et al. (2015) are shown for reference (see bottom right panel for information on the color-coding).
Download figure:
Standard image High-resolution image

The fraction of sources classified as starbursts indicates that our ALMA catalog is biased toward starbursts. It is roughly constant at ∼20% in our catalog at each redshift in Figure 27. But this is a factor of 2–5 higher than that from a Herschel-selected sample, e.g., Liu et al. (2018), and much higher than that from a mass-complete sample. For example, Rodighiero et al. (2011) find a starburst fraction of 2%–3% for a sample complete down to ${M}_{* }={10}^{10}\,{M}_{\odot }$ at 1.5 < z < 2.5, and Schreiber et al. (2015) report 2%–4% for a sample complete down to ${M}_{* }=2\times {10}^{10}\,{M}_{\odot }$ and constant up to z = 4. Figure 27 further shows that it is mainly the less massive range within which our catalog is dominated by starbursts (e.g., ${M}_{* }\lt {10}^{10.5}\,{M}_{\odot }$ ).

Finally, in Figure 28 we compare the ${M}_{* }$ histogram of our sample to the SMFs of star-forming galaxies (Ilbert et al. 2013; Muzzin et al. 2013; Grazian et al. 2015; Song et al. 2016; Davidzon et al. 2017), corresponding to the area of the full 2 deg² COSMOS field. The completeness of our sample to the full star-forming galaxy population, which can be considered as the fraction of the ${M}_{* }$ histogram to the SMFs, strongly depends on redshift and stellar mass. Although the area covered by all the ALMA archival pointings is only about 164 arcmin² (or only 4.2% of the full 2 deg² area of COSMOS), our sample at z ∼ 1–3 probes a significant fraction (∼10%–100% depending on the used SMF and redshift bin) of all very massive ( ${\mathrm{log}}_{10}{M}_{* }\gt 11.5$ ) star-forming galaxies present in the full 2 deg² area.

**Figure 28.** Stellar mass histograms of our A³COSMOS sources compared to expected stellar mass distributions in nine redshift bins from z ∼ 0.35 to 5.5. The A³COSMOS sources are selected within a 164 arcmin² area covered by 1534 discrete ALMA pointings. A³COSMOS star-forming MS galaxies and starbursts as distinguished in Figure 27 are shown by blue and red histograms, respectively. The colored lines (with line shading) are the stellar mass distributions of all star-forming galaxies within the COSMOS 2 deg² area for each redshift bin, computed with empirical SMFs from Ilbert et al. (2013; magenta), Muzzin et al. (2013; yellow), Davidzon et al. (2017; dark green), Grazian et al. (2015; cyan), and Song et al. (2016; light green). These SMFs are derived for star-forming galaxies; they have been expressed for a Chabrier IMF when necessary and convolved with typical stellar mass uncertainties ( ${\sigma }_{{M}_{* }}$ ) as indicated by the labels in each panel, following Appendix A of Ilbert et al. (2013; except for Song et al. 2016, which is the directly observed SMF). The SMFs of Ilbert et al. (2013) and Muzzin et al. (2013) are measured up to z ∼ 4, while the SMFs of Grazian et al. (2015) and Song et al. (2016) are measured at z ∼ 4–6. The Davidzon et al. (2017) SMF probes z ∼ 0.2–5.5 but does not fully cover the highest-redshift bin, so we applied a linear extrapolation.
Download figure:
Standard image High-resolution image

Due to the large variety of ALMA programs contributing to out data set, we find no obvious differences between sources at the phase center and in the outer area, even out to a PBA of 0.2. The sample selection bias is dominated by the range of sensitivities of the ALMA data rather than the PIs' targeted sources.

4.8. Properties of the Source Not Included in the Final Robust Galaxy Catalog

As listed in Table 3, a significant number (∼26%) of ALMA detections are not included in our final galaxy catalog (see Table 3, footnote b). Half of them come from the prior photometry catalog, with most having only IRAC 3.6 and 4.5 μm and/or VLA 3 GHz priors without optical/near-IR (up to K_s-band) counterparts (hence they do not have a photo-z as their prior-z). These "K_s-dropouts" are potential very dusty z ∼ 3–4 galaxies or less dusty sources at even higher redshifts, i.e., similar to the sample of Wang et al. (2016) and the HST-dark sample of Franco et al. (2018). This is in particular true for the sources with significant detections well above our threshold. The remaining half comes from the blind photometry catalog and has typically low significance, implying that they could be spurious, as the differential spurious fraction strongly depends on the actual ${\rm{S}}/{\rm{N}}$ (see Figure 8).

As we do not have high spatial resolution optical/near-IR imaging or accurate photometric redshifts for these sources, it is not possible to do similar counterpart association or SED fitting quality assessments to better identify spurious ones. If we assume that the fraction of spurious source is the same (about 10% based on our quality assessment) for the sources we have done the quality assessment (74% of the total ALMA detections) and for those unable to perform a quality assessment (26% of the total ALMA detections), then we also expect a small number of spurious sources from the latter sources. Adding the two together gives a total spurious fraction of ∼10.1%, which is in good agreement with the statistics (∼8%–12%).

Further discussion of these interesting sources is not the focus of this work. Future deeper optical/near-IR (up to K_s-band) observations, e.g., the new data release of the UltraVISTA survey, will enable an analysis similar to the one done here, so that they could be included in the robust galaxy catalog in the future.

4.9. The Effect of Galaxy–Galaxy Gravitational Lensing

The galaxy–galaxy gravitational lensing has been found to be significant in several ALMA follow-up studies of brightest submillimeter galaxies over large areas, e.g., Negrello et al. (2010), Bussmann et al. (2013, 2015), and Spilker et al. (2016). The strong-lensing cases (magnification μ > 2) therein exhibit the following common features: (1) very bright observed submillimeter flux, e.g., ${S}_{870\mu {\rm{m}}}\gtrsim 15\,\mathrm{mJy}$ for all the μ > 2 galaxies in Bussmann et al. (2013, 2015) and Spilker et al. (2016) (although we note that lensing is not just limited to the very brightest submillimeter objects but happens at all flux levels; see also below); (2) bright optical emission within 1''–2'', which belongs to a low-redshift (usually z < 1) massive galaxy; (3) usually two or more submillimeter components at each side of the optical emission or roughly distributed as an Einstein ring with ∼2'' size.

We estimate the number of strongly lensed (μ > 2) cases among our submillimeter galaxies to be very low as follows.

First, given the flux distribution of our photometry catalogs, we only find 0.2% of sources with equivalent ${S}_{870\mu {\rm{m}}}\gtrsim 15\,\mathrm{mJy}$ , i.e., four sources in current data set (v20180102). Three of them have only very weak or no optical emission in their 1''–2'' vicinity, while the fourth one, ID 180903, has a low-redshift (z = 0.347) optically bright galaxy within 0 farcs 5 and has already been studied in detail by Pavesi et al. (2018). ID 180903 does not exhibit multiple images as expected for strong lenses, fully consistent with its magnification factor of only 1.09 (Pavesi et al. 2018).

Second, considering the second feature of a close distance to a low-z galaxy, our prior source fitting and SED-excess assessment can test for this: if the ALMA flux coming from our prior catalog is originating from a lensed higher-redshift galaxy, the SED fitting with a much lower redshift as the prior-z will not be able to fit the ALMA data and therefore be classified as SED-excess outliers (Section 4.4). Among the 21 SED-excess outliers listed in Table 3, we searched for multiple submillimeter images or distorted features but found no obvious lensed candidates, except for one case, ID 650923, where there are three optical components (z = 0.568 in Laigle et al. 2016) surrounding the eastern and southern sides of the ALMA emission at a distance of ∼1'' (although the ALMA data have a beam of 1 farcs 5 × 1 farcs 0).

Third, there are no multiple submillimeter sources within 1''–2'' or sources being part of an Einstein ring. This is based on visual identification. In addition, this is confirmed through the comparison between prior photometry and blind extraction photometry, which can in principle identify sources with irregular multicomponent morphology.

Lastly, our low number of strongly lensed sources is consistent with the analytic galaxy modeling of Béthermin et al. (2017; see also Béthermin et al. 2012b). In their modeling, 1.5 million galaxies are simulated from redshift 0 to 10 within a light cone of 2 deg², the same area as the COSMOS field. Their modeled galaxies follow the clustering effect matched to dark matter halos, and strong- and weak-lensing effects are modeled following Hezaveh & Holder (2011) and Hilbert et al. (2007), respectively. According to our galaxy sample properties, we down-selected 3176 of their galaxies with the criteria 1 < z < 6, ${M}_{* }\gt 2\,\times {10}^{10}\,{M}_{\odot }$ , and $\mathrm{SFR}\gt 200\,{M}_{\odot }\,{\mathrm{yr}}^{-1}$ over the full 2 deg². Among this subsample, only 16 have μ > 2. Scaling to our galaxy catalog source number of 823, only three strongly (μ > 2) lensed sources are expected. Note that as discussed in Hezaveh & Holder (2011), there remains significant uncertainty in the estimation of the probability of lensed sources, e.g., the assumed mass model for the lensing halos, the ellipticity of lenses, etc.

Therefore, we conclude that strong lensing is not affecting the properties of most of our galaxies.

5. Data Products

As the result of this work, we produce three public catalogs: two photometric ones (blind extraction and prior fitting) and one galaxy catalog (with SED-derived properties). We describe the columns in the first two catalogs in Table 4 and those in the third catalog in Table 5.

Table 4. Columns in the Two Photometry Catalogs

Column Name	Units	Description
`ID`^a	⋯	A3COSMOS master catalog ID (version 20170426), which equals Laigle et al. (2016) COSMOS2015 catalog ID when ID ≤ 1182108
`ID`_`PriorCat`^a	⋯	The original ID in the `Ref`_`ID`_`PriorCat`-th prior catalog
`Ref`_`ID`_`PriorCat`^a	⋯	The reference number of the prior catalog in which the source is first included (see Table 2)
`RA`	deg	The fitted R.A. coordinate of the ALMA emission with Gaussian source models, in equatorial coordinate in the epoch of J2000
`Dec`	deg	Same as above but the decl. coordinate, in equatorial coordinate in the epoch of J2000
`Total`_`flux`_`pbcor`	mJy	The fitted total flux with Gaussian source models, corrected for flux bias and PBA
`E`_`Total`_`flux`_`pbcor`	mJy	Error in `Total`_`flux`_`pbcor`, provided by photometry pipelines based on Condon (1997) simulation statistics and equations
`E`_`Total`_`flux`_`sim`_`pbcor`	mJy	Error in `Total`_`flux`_`pbcor`, but estimated from our own simulation statistics
`Pbcor`	⋯	PBA factor
`Primary`_`beam`	arcsec	ALMA 12 m antenna's primary beam FWHM size at the observing frequency
`Peak`_`flux`	mJy beam^–1	Fitted ALMA continuum emission's peak flux, uncorrected for PBA
`rms`_`noise`	mJy beam^–1	Pixel rms noise in the continuum image
`Obs`_`frequency`	GHz	Observing frequency, i.e., the center frequency of all collapsed SpWs
`Obs`_`wavelength`	μm	Observing wavelength, =(2.99792458 × 10⁵)/`Obs`_`frequency`
`Maj`_`beam`	arcsec	Synthesized beam's major-axis FWHM size
`Min`_`beam`	arcsec	Synthesized beam's minor axis FWHM size
`PA`_`beam`	deg	Synthesized beam's position angle; zero means to the north
`Image`_`file`	⋯	Image file name
`Flag`_`multi`^b	⋯	`Flag` = `S` (or `M`) means that the source is fitted with single (or multiple) Gaussian component(s)
`Galfit`_`reduced`_`chi`_`square`^a	⋯	The reduced χ² of galfit prior source fitting, measured from the residual image for each source with an aperture of 10 in diameter
`Flag`_`size`_`upper`_`boundary`^a	⋯	`Flag` = 1 means that the fitted source major-axis FWHM size reaches the upper boundary of 30 and should be used with caution
`Flag`_`inconsistent`_`flux`	⋯	`Flag` = 1 means that the source has >5σ inconsistent total fluxes from our prior and blind photometry

Notes.

^aOnly in prior source fitting catalog. ^bOnly in blind source extraction catalog. (This table is available in its entirety in FITS format.)

Download table as: ASCII Typeset image

Table 5. Columns in the Final Galaxy Property Catalog

Column Name	Units	Description
`ID`	⋯	A3COSMOS master catalog ID (version 20170426), which equals Laigle et al. (2016) COSMOS2015 catalog ID when ID ≤ 1182108
`RA`	deg	Fitted ALMA continuum emission's R.A. with Gaussian source models, in the equatorial coordinate in the epoch of J2000
`Dec`	deg	Same as above but for decl., in the equatorial coordinate in the epoch of J2000
`z`	⋯	SED best-fit redshift from the list of prior redshifts in `z`_`prior`
`z`_`prior`	⋯	Prior redshifts (prior-z) in the literature; multiple values are separated by white spaces
`Ref`_`z`_`prior`	⋯	References of `z`_`prior`^a
`Flag`_`outlier`_`CPA`	⋯	`Flag` = 1 means that the source is flagged as an outlier in our counterpart association analysis (Section 4.2)
`Flag`_`outlier`_`SED`	⋯	`Flag` = 1 means that the source is flagged as an outlier in our SED fitting analysis (Section 4.4)
`M`_`star`	M_⊙	Stellar mass from our SED fitting at redshift `z`; assumed Chabrier (2003) IMF
`L`_`dust`	L_⊙	Infrared 8–1000 μm luminosity from dust from the same SED fitting as above
`SFR`	M_⊙ yr⁻¹	SFR integrated from SFH from the same SED fitting as above; same IMF as above
`sSFR`	Gyr⁻¹	sSFR from SFH, =`SFR`/`M`_`star` × 10⁹

Note.

^aA3COSMOS_specz means that the source has the spectroscopic redshift (spec-z) confirmed in our A3COSMOS data cube analysis with at least one ${\rm{S}}/{\rm{N}}\gt 6$ ${\rm{S}}/{\rm{N}}\gt 6$ spectral line (Section 4.5; D. Liu et al. 2019, in preparation). Salvato2017_specz means that the source has spec-z in the COSMOS spec-z catalog compiled by M. Salvato et al. (available in the COSMOS collaboration; version 07SEP2017 with 103,964 rows). Salvato2011_Chandra_photoz means that the source has photometric redshift (photo-z) (optimized for AGNs) in the Salvato et al. (2011) Chandra source catalog and the photo-z is inconsistent with any previous redshift (by >0.15 × (1 + z) difference, same condition afterward). Salvato2011_XMM_photoz means that the source has photo-z (optimized for AGNs) in the Salvato et al. (2011) XMM-Newton source catalog and the photo-z is inconsistent with any previous redshift. Laigle2016_photoz means that the source has photo-z in the COSMOS2015 catalog provided by Laigle et al. (2016) and the photo-z is inconsistent with any previous redshift. Davidzon2017_photoz means that the source has photo-z (optimized for z > 2.5 sources) in the Davidzon et al. (2017) catalog and the photo-z is inconsistent with any previous redshift. Delvecchio2017_photoz means that the source has photo-z (considered mid-IR AGN component) in the Delvecchio et al. (2017) catalog and the photo-z is inconsistent with any previous redshift. Jin2018_photoz means that the source has photo-z (with FIR/millimeter photometry) in the Jin et al. (2018) catalog and the photo-z is inconsistent with any previous redshift. (This table is available in its entirety in FITS format.)

Download table as: ASCII Typeset image

The two photometric catalogs have most columns in common, except that the prior photometry catalog has information on the prior source (ID, ID_PriorCat, and Ref_ID_PriorCat columns), and some Flag_* columns differ.

The ID column lists the IDs in our A3COSMOS master catalog, which is a combination of six-plus prior catalogs after solving source cross-matching (Section 2.3). The ID equals the COSMOS2015 (Laigle et al. 2016) catalog ID when ID ≤1,182,108. The ID_PriorCat column lists the original IDs in those prior catalogs, so that users of our prior photometric catalog can trace back into the prior catalogs. The Ref_ID_PriorCat column lists the reference rank number of the prior catalog in Table 2, e.g., the COSMOS2015 (Laigle et al. 2016) catalog has Ref_ID_PriorCat = 1, the Smolčić et al. (2017) catalog has Ref_ID_PriorCat = 5, etc. Note that this reference number indicates in which catalog the source is first included, i.e., has no counterpart in all previous catalogs with smaller Ref_ID_PriorCat. Thus, our catalog does not contain the information of whether a source with Ref_ID_PriorCat = 1 has a counterpart in Ref_ID_PriorCat > 1 catalogs (but this information is in our master catalog upon request). Also note that our master catalog will be updated in the future with more deeper prior catalogs; thus, we caution that the source ID will be different when a future updated master catalog is used.

The Flag_∗ columns carry important information for quality assessment and should be taken into account when using the catalog for specific science applications. For the blind photometry catalog, Flag_multi indicates whether the source is fitted with multiple Gaussian components or a single-Gaussian model by PyBDSF. Flag_inconsistent_flux indicates whether the source has inconsistent fluxes between the prior fitting and blind extraction photometry catalogs (see Section 2.5 and Figure 7). When Flag_multi = 1 and Flag_inconsistent_flux =1, it is likely that the source is a merger system or has a close companion in the ALMA image, as shown in Appendix B. For the prior photometry catalog, Galfit_reduced_chi_square indicates the quality of the final galfit source fitting. Flag_size_upper_boundary indicates whether the fitted source size reaches the upper boundary of 3 farcs 0 we set in the photometry, in which case the source is either blended or dominated by noise and should be used with caution.

The final galaxy property catalog also has two important flags: Flag_outlier_CPA, which indicates the outliers from our counterpart association examination (Section 4.2), and Flag_outlier_SED, which indicates the outliers with SED excess from our SED fitting (Section 4.4). We recommend to only use galaxies with both Flag = 0 for scientific analysis.

The catalogs are available from the COSMOS archive at IPAC/IRSA and in electronic form from the journal. Further, the ALMA continuum images are also provided via the COSMOS archive.

6. Summary

The growing information in the ALMA archive is ideal for systematic exploitations of specific scientific questions, such as the number and properties of high-redshift galaxies detected in their (sub)millimeter continuum emission in selected cosmological deep fields. Given the large number of observations already available in the archive—e.g., for the COSMOS field 1534 pointings covering an area of 164 arcmin² have been publicly available since 2018 January 2—we have developed a highly automatic approach toward mining these data (A³COSMOS—Automated ALMA Archive mining in the COSMOS field). Here we summarize our workflows (Figures 1 and 18) implemented to obtain quality controlled (sub)millimeter source catalogs based on two different identification approaches, as well as a catalog of galaxies with (sub)millimeter detections and reliable properties.

We present two (sub)millimeter continuum source catalogs from public ALMA archival data. For the source identification, the calibrated archival data were homogenously imaged to provide a single continuum image with best sensitivity (i.e., using all available bandwidth and natural weighting). The first catalog is based on a blind extraction using PyBDSF on the continuum images, and the second catalog used prior positions from a master catalog that combines sources detected in the optical, IR, and radio. Extensive simulations using two mock samples with highly different distributions in (sub)millimeter source properties provide robust information on the completeness limits, spurious source fraction, and flux-boosting factors, as well as uncertainties on the measured quantities in both catalogs. In particular, we used these simulations to refine the widely used Condon (1997) prescription for error estimation of radio continuum sources. After further quality control steps, the final catalogs (version v20180201) contain 939 sources above a peak flux ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}=5.40$ for the blindly detected sources (with a cumulative spurious fraction of ∼8%) and 1027 sources above a peak flux ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}=4.35$ for the prior-selected sources (with a cumulative spurious fraction of ∼12%).

We combine the two (sub)millimeter continuum source catalogs to remove inconsistent-flux outliers and use the prior catalog to produce a single sample of high-redshift galaxies with robust (sub)millimeter detections by ALMA (25% having more than one ALMA photometric measurement usually at different wavelengths) and mostly homogeneously determined galaxy properties (stellar mass, SFR). The construction included the development of a sophisticated method to automatically qualify counterpart associations with the (sub)millimeter continuum sources taking into account astrometric uncertainties (both absolute and relative), as well as complex, differing source structure across wavelength. Further steps were applied to remove spurious (sub)millimeter continuum detections and/or sources with highly uncertain redshift information based on SED fitting results. The final galaxy catalog (version 20180201) contains 547 galaxies in the range of z = 0.25–5.67, ${M}_{* }=3\times {10}^{7}\mbox{--}1.9\times {10}^{12}\,{M}_{\odot }$ , and $\mathrm{SFR}=0.02\mbox{--}4000\,{M}_{\odot }\,{\mathrm{yr}}^{-1}$ . (Despite the vast number of star-forming galaxies presented in this work, we caution that this catalog is not complete in cosmic comoving volume, stellar mass, or SFR.)

The latest versions of our catalogs are available from the COSMOS archive at IPAC/IRSA³⁷ and in electronic form.

D.L., P.L., and E.S. acknowledge support and funding from the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation program (grant agreement No. 694343). S.L. acknowledges funding from Deutsche Forschungsgemeinschaft (DFG) grant SCH 536/9-1. B.G. acknowledges the support of the Australian Research Council as the recipient of a Future Fellowship (FT140101202). Part of this research was carried out within the Collaborative Research Centre 956, subproject A1, funded by the Deutsche Forschungsgemeinschaft (DFG)—project ID 184018867. We thank Annalisa Pillepich and the Max Planck Computing & Data Facility for very helpful computing cluster resources.

This paper makes use of the following ALMA data: ADS/JAO.ALMA#2011.0.00064.S, ADS/JAO.ALMA#2011.0.00097.S, ADS/JAO.ALMA#2011.0.00539.S, ADS/JAO.ALMA#2011.0.00742.S, ADS/JAO.ALMA#2012.1.00076.S, ADS/JAO.ALMA#2012.1.00323.S, ADS/JAO.ALMA#2012.1.00523.S, ADS/JAO.ALMA#2012.1.00536.S, ADS/JAO.ALMA#2012.1.00919.S, ADS/JAO.ALMA#2012.1.00952.S, ADS/JAO.ALMA#2012.1.00978.S, ADS/JAO.ALMA#2013.1.00034.S, ADS/JAO.ALMA#2013.1.00092.S, ADS/JAO.ALMA#2013.1.00118.S, ADS/JAO.ALMA#2013.1.00151.S, ADS/JAO.ALMA#2013.1.00171.S, ADS/JAO.ALMA#2013.1.00208.S, ADS/JAO.ALMA#2013.1.00276.S, ADS/JAO.ALMA#2013.1.00668.S, ADS/JAO.ALMA#2013.1.00815.S, ADS/JAO.ALMA#2013.1.00884.S, ADS/JAO.ALMA#2013.1.00914.S, ADS/JAO.ALMA#2013.1.01258.S, ADS/JAO.ALMA#2013.1.01292.S, ADS/JAO.ALMA#2015.1.00026.S, ADS/JAO.ALMA#2015.1.00055.S, ADS/JAO.ALMA#2015.1.00122.S, ADS/JAO.ALMA#2015.1.00137.S, ADS/JAO.ALMA#2015.1.00260.S, ADS/JAO.ALMA#2015.1.00299.S, ADS/JAO.ALMA#2015.1.00379.S, ADS/JAO.ALMA#2015.1.00388.S, ADS/JAO.ALMA#2015.1.00540.S, ADS/JAO.ALMA#2015.1.00568.S, ADS/JAO.ALMA#2015.1.00664.S, ADS/JAO.ALMA#2015.1.00704.S, ADS/JAO.ALMA#2015.1.00853.S, ADS/JAO.ALMA#2015.1.00861.S, ADS/JAO.ALMA#2015.1.00862.S, ADS/JAO.ALMA#2015.1.00928.S, ADS/JAO.ALMA#2015.1.01074.S, ADS/JAO.ALMA#2015.1.01105.S, ADS/JAO.ALMA#2015.1.01111.S, ADS/JAO.ALMA#2015.1.01171.S, ADS/JAO.ALMA#2015.1.01212.S, ADS/JAO.ALMA#2015.1.01495.S, ADS/JAO.ALMA#2015.1.01590.S, ADS/JAO.ALMA#2015.A.00026.S, ADS/JAO.ALMA#2016.1.00478.S, ADS/JAO.ALMA#2016.1.00624.S, ADS/JAO.ALMA#2016.1.00735.S. ALMA is a partnership of ESO (representing its member states), NSF (USA), and NINS (Japan), together with NRC (Canada), MOST and ASIAA (Taiwan), and KASI (Republic of Korea), in cooperation with the Republic of Chile. The Joint ALMA Observatory is operated by ESO, AUI/NRAO, and NAOJ.

Facility: ALMA. -

Appendix A: Astrometry Accuracy between Prior Catalogs

Variations in the absolute astrometric calibration between different catalogs can cause small but noticeable offsets between source positions at different wavelengths. As we use prior positions from sources selected from catalogs covering the optical to radio regime, it is important to verify that potential offsets are small. Here we report astrometric offsets between the prior catalogs used from the literature (Section 2.3) and our ALMA prior fitting photometry catalog. These astrometric offsets between the prior positions and the fitted ALMA positions are small (<0 farcs 1).

In Figure 29, we plot the offsets in R.A. and decl. for sources common in two catalogs using the UltraVISTA/COSMOS2015 catalog (Laigle et al. 2016), the VLA-COSMOS 3 GHz catalog (Smolčić et al. 2017), the HST/ACS i band (Capak et al. 2007), and the fitted positions of the (sub)millimeter sources in our prior-based ALMA catalog (see Section 2.4). First, we confirm that the ALMA astrometry is indeed excellent (as expected for a (sub)millimeter interferometer at the angular resolutions and frequencies of our observations) by comparison to the positions of 699 VLA-COSMOS 3 GHz sources (top right panel). Comparison between the UltraVISTA and our ALMA positions for 827 sources (top left panel) yields a relatively large offset of +0 farcs 10 in R.A., but the offset in decl. is very small (+0 farcs 01). We confirm this astrometric offset of the UltraVISTA catalog by examining positions for 9373 sources in common with the VLA-COSMOS 3 GHz catalog (bottom left panel). Given the order-of-magnitude-larger number of sources, the offset of +0 farcs 088 in R.A. is statistically meaningful and consistent with the offset seen between UltraVISTA and ALMA source positions. Finally, comparison between positions of 7369 sources in common in the HST/ACS i-band and VLA catalogs (bottom right panel) yields a significantly lower offset in R.A. but a more substantial offset in decl.

Since we allow the source position to vary by a relatively large amount (∼0 farcs 7; see Section 2.4) during the prior-based detection of the (sub)millimeter continuum source, it is not necessary to repeat the initial detection step. However, we have applied a correction to take the small offsets into account during our counterpart association process (Section 4.2). We note that Smolčić et al. (2017) report astrometric offsets of similar size between VLA-COSMOS 3 GHz and UltraVISTA source positions using a more complex analysis identifying variations in the astrometry across the full COSMOS field (see their Appendix A.1 and Figure A.1). As the numbers of sources analyzed per R.A. and decl. bin are only a few hundred, we prefer to apply only a single value when correcting for the astrometric offset of the UltraVISTA sources.

We note that a new COSMOS photometry catalog is under construction using the UltraVISTA DR4 data, which are astrometrically corrected using GAIA data, providing a much better astrometry of a few milliarcseconds (see https://calet.org/). Our next A³COSMOS updates will use it when available.

Appendix B: Sources with Inconsistent (Sub)millimeter Continuum Photometry in Our Two Catalogs

Here we present the ALMA images of the (sub)millimeter continuum sources that have inconsistent total fluxes in the prior-based and blindly extracted catalogs and are labeled in Figure 7. Three outliers with labels 1 to 3 have >5σ higher galfit fluxes than PyBDSF fluxes, and one outlier with label a has the opposite situation. Their ALMA images, prior fitting, and blind extraction model images and residual images are shown in Figure 30 (each outlier has six subpanels; see caption for details).

In general, the galfit source models provide better fits to the original ALMA images, with less residual emission in the residual images, except for outlier 3, which seems to be composed of two ALMA sources, while our COSMOS master catalog contains only one prior source.

For the fourth outlier with label a in Figure 7 and shown in the bottom right of Figure 30, the galfit model is more complex than the simple Gaussian-shaped PyBDSF model because multiple prior sources are fitted. Therefore, as long as we have a good knowledge of prior sources in the ALMA field of view, i.e., from our compiled COSMOS master catalog, the galfit fitting typically provides very good photometry results (with a small enough reduced-χ² in the residual image).

Appendix C: Detailed Description of Our Monte Carlo Simulations

We briefly introduced our two sets of MC simulations in Section 3—the full parameter-space ("FULL") simulation and the physically motivated ("PHYS") simulation. Below we provide in-depth details of how we model the artificial sources (Appendices C.1.1 and C.2.1), inject them into ALMA residual images (after blind extraction photometry) (Appendices C.1.1 and C.2.2), recover the sources with our two types of photometry pipelines (Appendices C.1.2 and C.2.3), and analyze the statistics (Appendix C.3). We also discuss the limitations of each simulation in Appendices C.1.3 and C.2.4. Note that both simulations have limitations that could bias our final flux and error estimations. Only by doing both simulations and comparing them with each other, as done here, can these limitations be understood and can the least biased way to implement corrections to obtain final photometry results be identified.

C.1. Full Parameter Space ("FULL") MC Simulation

C.1.1. Source Simulation and Injection

In the "FULL" simulation, we simulate one source at a time for each of the ∼150 representative ALMA continuum images (Section 3), with source peak flux density ranging from 3.0 to 100 times the rms noise (the ratio is denoted as ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}};$ see Equation (1)) and size (Gaussian major-axis FWHM, convolved with the beam) ranging from 0.1 to 6.0 times the synthesized beam size (clean beam, Gaussian major-axis FWHM) (the ratio is denoted as ${{\rm{\Theta }}}_{\mathrm{beam}};$ see Equation (2)).

There are 13 grid points in the first parameter ( ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ ) and also 13 in the second parameter ( ${{\rm{\Theta }}}_{\mathrm{beam}}$ ). For each grid point, we generate 25 mock sources by randomizing the injecting position.

The simulated source is assumed to be of Gaussian shape (the minor-axis FWHM is generated with an axis ratio randomly picked between 0.2 and 1.0). Then, the source is convolved with the clean beam and injected into the residual image derived from PyBDSF, where sources were already blindly extracted and removed. We randomly cut a box area around the source with a size of 8 times the intrinsic source size to ensure enough empty sky area for source extraction.

In total, we have ∼4225 "FULL" simulations per ALMA image, and repeating this for ∼150 representative ALMA images (one for each independent ALMA scheduling block), we have ∼3750 sources per grid point in the 2D parameter space.

C.1.2. Source Recovery

We run our PyBDSF and galfit photometry tools to recover those simulated sources one by one. For PyBDSF, we keep the exact same conditions as for the real catalog, i.e., setting the background to zero and the rms noise to the values we measured from the previous photometry run (from fitting the pixel histograms; Section 2.2), and using the same thresholds as for the original ALMA images (Section 2.2). For galfit, the only difference is the input prior catalog. We assume no source blending issue and only fit the simulated source.

In Figure 31 we show the comparisons of the simulated and recovered fluxes for PyBDSF (top panels) and galfit (bottom panels). The left panels show the simulated versus recovered fluxes, colored by ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . Here we only shows sources that have ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\gt 3$ , because lower- ${\rm{S}}/{\rm{N}}$ detections are mostly spurious, according to the spurious fraction analysis in Section 2.8, and eventually we select our ALMA detections with a much higher ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 5$ (Section 4.1).

The middle panels of Figure 31 show the difference between the simulated and recovered fluxes $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})$ normalized by ${S}_{\mathrm{sim}.}$ as a function of ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (ratio of source peak flux to rms noise; Equation (1)), colored by ${{\rm{\Theta }}}_{\mathrm{beam}}$ (ratio of source area to beam area; Equation (2)). In general, at a low ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ , ${S}_{\mathrm{sim}.}$ is always smaller than ${S}_{\mathrm{rec}.}$ , indicating that fluxes are boosted by noise. Such a flux boosting is much smaller for a higher ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . Therefore, based on these, we quantify the flux bias by the two parameters ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ in the main text (Section 3.1). Meanwhile, the scatter of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})$ reflects the uncertainty of the photometry, i.e., flux errors, which can also be quantified by the two parameters (Section 3.1.3). Note that the flux errors that came along with our two photometry pipelines are based on the equations in Condon (1997), where the author used about 3000 MC simulations to calibrate these equations. Our MC simulations offer the possibility for alternative assessments that show a broad consistency but also evidence for a second-order trend with ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ (Sections 3.1.3 and 3.1.4).

The right panels of Figure 31 show the histogram of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.})$ normalized by the flux errors ${\sigma }_{{S}_{\mathrm{rec}.}}$ . Sources are grouped into subsamples according to their ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . A 1D Gaussian fit ( ${e}^{-{(x-\mu )}^{2}/2{\sigma }^{2}}$ ) to the histogram indicates whether the flux errors can statistically represent the uncertainty of the photometry. If the fitted Gaussian is too wide (i.e., σ > 1), then the flux errors are underestimated, and vice versa. We overlay the σ = 1, μ = 0 1D Gaussian curve for comparison. Note that both flux bias and error affect these histograms. We demonstrate in Appendix C.3 that after correcting flux biases and re-estimating flux errors, these histograms become much closer to σ = 1, μ = 0 1D Gaussian shapes, i.e., we can say that they follow a well-behaved Gaussian statistics.

C.1.3. Limitations

We discuss several limitations related to the use of the "FULL" simulation to indicate flux bias, error, and completeness in this section. First is the assumed source property distribution. Using uniform ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ and ${{\rm{\Theta }}}_{\mathrm{beam}}$ distributions is indeed a strong assumption, although it is perhaps the most commonly adopted way in IR/millimeter/radio photometry studies. We have to consider the following situation, which we refer to as the "resolution bias." A large, low- ${\rm{S}}/{\rm{N}}$ source can usually break up into several smaller clumps owing to noise fluctuations. Our photometry code will then usually only detect a smaller, low- ${\rm{S}}/{\rm{N}}$ clump; therefore, the source's total flux is only partially recovered. This acts opposite to the effect of flux boosting, where our photometry code detects a low- ${\rm{S}}/{\rm{N}}$ source, which is actually the peak of a noise fluctuation instead of a real galaxy. In reality, what we know about the detected sources are only the recovered fluxes and sizes; therefore, we cannot distinguish the two effects. As we parameterize the flux biases and errors by the recovered fluxes and sizes (as will be described in detail in Section 3.1), simulating more large size sources will lead to less flux boosting (and hence smaller flux biases) and larger flux errors that can be significant particularly at low ${\rm{S}}/{\rm{N}}$ .

A second limitation is that sources are simulated and then recovered individually in our "FULL" simulation procedure. Thus, there is no source blending or clustering effect. In reality, sources can be blended even at the arcsecond resolution of the ALMA data, although this situation occurs at low probability, e.g., it depends on the galaxy merger fraction. This limitation could affect our estimation of the completeness of the PyBDSF photometry because PyBDSF is a blind extraction tool and sometimes will extract two blended sources as a single source. The galfit photometry should not be affected, if the prior source catalog has a high resolution (subarcsecond) and is complete (not missing sources in ALMA bands).

Another limitation is the input map we used for injecting source models. We use the PyBDSF residual maps for all of our analysis presented here because the residual maps ideally should contain the exact noise as in the observations. However, imperfect source subtraction (by PyBDSF) could potentially increase the noise at certain positions. However, this is likely a very minor problem, as we randomize the injection positions. Injecting source models in the uv-plane (pure interferometry noise) instead of the image plane (PyBDSF residual image) could, in principle, help to assess the additional uncertainty introduced by the imaging/cleaning process. But the difference should be small because we have verified with our real data that the rms noise and source fluxes measured from the uv-plane and image plane are fully consistent.

C.2. Physically Motivated ("PHYS") MC Simulation

C.2.1. Source Simulation

In order to disentangle the major limitations from the "FULL" simulation, we have done another physically motivated simulation ("PHYS" simulation) where we try to reproduce the real physical properties of galaxies across cosmic time.

In detail, we follow the 2SFM recipe (Béthermin et al. 2012a; Sargent et al. 2012, 2014), which assumes that all star-forming galaxies are in two populations, with the population of starbursts being enhanced in their sSFRs by a range of factors under a normal distribution with a mean of 5 (here we adopt 5 because we find that this better fits the millimeter number counts; Sargent et al. 2014 suggest a value of 4; see their Figure 10). We generate these star-forming galaxies within the cosmic volume of the 2 deg² COSMOS field with the following procedures:

(1)
Defining 25 redshift bins from z = 9.75 to 0.
(2)
Computing the number of star-forming galaxies using the SMF (e.g., Davidzon et al. 2017) at each redshift and starting from ${M}_{* }={10}^{8.0}\,{M}_{\odot }.$
(3)
Assuming that a small fraction of these star-forming galaxies are in starburst (SB) mode while the rest are in MS mode. The fraction is set to be consistent with the merger fraction extrapolated from Conselice (2014).
(4)
Computing SFRs for MS and SB galaxies following the MS correlation (including the scatter) and SB boost function in Sargent et al. (2014).
(5)
Then, we generate an IR to radio SED for each model galaxy according to the redshift, stellar mass, and SFR, following the SED modeling in Liu et al. (2018), which is based on Magdis et al. (2012), assuming Draine & Li (2007) dust models and simplifying the dust SEDs by associating them only with redshift and the interstellar radiation field ( $\left\langle U\right\rangle ;$ see Magdis et al. 2012, Béthermin et al. 2015 and Liu et al. 2018, for more details).
(6)
Estimating MS galaxies' sizes depending on their redshifts and stellar masses following (extrapolating from) van der Wel et al. (2014), as well as considering that dust sizes are a factor of about 2 smaller (Fujimoto et al. 2017).
(7)
Random minor/major-axis ratio from 0.2 to 1.

In total about 0.7 million model galaxies are generated in 2 deg². As a validation, their number counts are also estimated at each IR/millimeter/radio wavelength; these simulated counts are found to agree well with real (sub)millimeter measurements at 500, 850, 1.1, and 1.3 μm (within the error bars), e.g., Béthermin et al. (2012b), Karim et al. (2013), Carniani et al. (2015), Hatsukade et al. (2016), and Geach et al. (2017).

These model galaxies are then randomly injected into the 2 deg² COSMOS field and recovered, which is described in the following.

C.2.2. Source Placement

We assign a random position within the 2 deg² COSMOS field for each mock galaxy. Then, we create artificial ALMA maps by inserting our mock galaxies into the ALMA residual images of the 150 representative programs we have selected in Appendix C.1. For each of the residual maps, we create 273 of such artificial maps. Within each iteration, we select a mock galaxy at specific redshift and stellar mass out of the full mock galaxy catalog and place it in the center of the residual map. We apply a small random offset (1''–6'') to avoid imperfect source extraction at the center of the residual image. A subset of other remaining galaxies from the full mock catalog may fall within the same residual map according to their position within the full simulated 2 deg² map and are inserted as well. In this way, we account for possible clustering of (sub)millimeter sources in real observations. We loop for each simulated galaxy over a redshift grid ranging from 1.0 to 6.0 in steps of 0.25, with log stellar mass from $\mathrm{log}({M}_{* }({M}_{\odot }))=9.0$ to 12.0 in steps of 0.25, making a total of 273 iterations. Then, we extract all the sources simultaneously in the next section (unlike for the "FULL" simulation, where we extract each simulated source individually; see Appendix C.1).

C.2.3. Source Recovery

The simulated images are then treated by our PyBDSF and galfit pipelines in the same manner as the original ALMA images. The simulated and original ALMA images have the same size; therefore, the impact of simultaneous multisource fitting is also considered. Moreover, for the prior fitting with galfit, we use the catalog of simulated galaxies as the prior source list.

Similar to Figure 31, we present the comparison of the simulated and recovered fluxes from the "PHYS" simulation in Figure 32. The "PHYS" simulation contains many more faint sources (due to the realistic SMF and MS correlation). The histograms in the right panels of Figure 32 are narrower than those of the "FULL" simulation as shown in Figure 31, especially for the low- ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ sources (shown as the red histograms in both figures). This means that the flux bias and error are different if we adopt different simulation methods (Figures 12 and 13), even when each simulation is repeated sufficiently to yield robust statistics. Here we emphasize that the prior information assumed in the simulations is important for analyzing the flux bias and error statistics, and making the simulation as close to a real galaxy population distribution as possible will lead to more realistic results (Section 3.1).

C.2.4. Limitations

The physically motivated simulation has the advantage of resembling closer the real situation for galaxy photometry in ALMA images. However, it also has limitations, both in the assumed galaxy evolution models and when comparing to the full parameter-space simulation.

First, our model galaxies are built on the star-forming galaxy's SMF at each redshift. These mass functions are not well constrained at redshift of z ∼ 4 and unconstrained at higher redshifts (e.g., Grazian et al. 2015; Song et al. 2016; Davidzon et al. 2017). Then, we assume a starburst fraction associated with the merger fraction, which is highly unconstrained at redshift z ∼ 2 and beyond. Moreover, because we aim to reproduce the majority of star-forming galaxies, we choose simplified galaxy SED models (Magdis et al. 2012; Liu et al. 2018), which can represent the bulk of star-forming and starburst galaxies (see details in Appendix C.2.1 and references there), but these SED models do not include extreme cases, e.g., galaxies with very high sSFR, very low or very high dust temperature, etc. The "PHYS" simulation also does not include a full implementation of the clustering effect as in Béthermin et al. (2017). However, we emphasize that the aim of the "PHYS" simulation is to provide very different inputs from the "FULL" simulation, to see whether they can lead to different statistical results, and they do. Further, the "PHYS" simulation we adopt here is sufficiently complex for testing our ALMA (sub)millimeter photometry under all possible, physical situations that are not covered by the "FULL" simulation.

C.3. Final Statistical Behavior of Corrected Fluxes and Errors

We provide details on the statistical behavior of flux errors here as an extension to the discussion in Appendix 3.1.5. We examined the histograms of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/{\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ as shown in Figure 33 for final corrected PyBDSF and galfit fluxes and errors. Such a histogram indicates how well our final flux errors ${\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ can reflect the true scatter between ${S}_{\mathrm{sim}.}$ and ${S}_{\mathrm{rec}.}^{\mathrm{corr}.}$ . Ideally, if the flux error well represents the uncertainty in the photometry, the histogram should have the shape of a 1D Gaussian with mean = 0 and σ = 1 (which is overlaid in the figure). Comparing these histograms to those before correction (Figures 31 and 32), we do find significant improvement in the shape of the histogram. In Figure 33, we show the histogram after each step of correction: (1) flux bias correction (top row) and (2) both flux bias and error correction (bottom row). The final histograms nicely agree with the mean = 0, σ = 1 1D Gaussian. There are some outliers, however, but they only contribute a few percent in number. The outlier fraction is lower in the galfit photometry compared to the PyBDSF photometry, probably because prior fitting uses known positional information and reduces the chance of recovering noise peaks as sources. Both of these effects are also related to the features in each photometry method. For example, at the high-value end of the histogram, outliers have highly underestimated fluxes likely due to the aforementioned resolution bias (large sources are broken down, and only partial fluxes are recovered). And at the low-value end, outliers have extra-boosted fluxes mostly because the recovered source sizes are significantly larger than their simulated sizes. For these outliers, we can tentatively identify them by checking their PyBDSF multicomponent flag (Flag_multi) and also comparing their PyBDSF and galfit fluxes and sizes. For example, there are 5% of sources with Flag_multi=='M' in our blind photometry catalog. Therefore, if excluding them, for the bulk of sources done without photometry, the errors are quite well behaved in statistics.

**Figure 33.** Left panels are based on the "PHYS" simulation with PyBDSF photometry, and right panels are based on the "PHYS" simulation with galfit photometry. In each left/right panel, the top panel is the histogram of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/{\sigma }_{{S}_{\mathrm{rec}.},\mathrm{Condon}1997}$ , where ${S}_{\mathrm{rec}.}^{\mathrm{corr}.}$ is the measured total flux after flux bias correction (Section 3.1.1) and ${\sigma }_{{S}_{\mathrm{rec}.},\mathrm{Condon}1997}$ is the error in total flux shipped with PyBDSF based on Condon (1997). The bottom panel is the histogram of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/{\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ , where ${\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ is the simulation-based error in total flux (Section 3.1.3). A 1D Gaussian with mean = 0 and σ = 1 is overlaid in each panel.
Download figure:
Standard image High-resolution image

**Figure 33.** Left panels are based on the "PHYS" simulation with PyBDSF photometry, and right panels are based on the "PHYS" simulation with galfit photometry. In each left/right panel, the top panel is the histogram of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/{\sigma }_{{S}_{\mathrm{rec}.},\mathrm{Condon}1997}$ , where ${S}_{\mathrm{rec}.}^{\mathrm{corr}.}$ is the measured total flux after flux bias correction (Section 3.1.1) and ${\sigma }_{{S}_{\mathrm{rec}.},\mathrm{Condon}1997}$ is the error in total flux shipped with PyBDSF based on Condon (1997). The bottom panel is the histogram of $({S}_{\mathrm{sim}.}-{S}_{\mathrm{rec}.}^{\mathrm{corr}.})/{\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ , where ${\sigma }_{{S}_{\mathrm{rec}.}^{\mathrm{corr}.}}$ is the simulation-based error in total flux (Section 3.1.3). A 1D Gaussian with mean = 0 and σ = 1 is overlaid in each panel.
Download figure:
Standard image High-resolution image

C.4. Discussion on the Completeness of (Sub)millimeter/Radio Photometry

We present the 2D diagnostic diagram of completeness in Figure 34. Such a diagram is also used in similar works (e.g., Franco et al. 2018; Jiménez-Andrade et al. 2019). The left panel shows the dependency on ${S}_{\mathrm{peak}}$ , and the right panel shows that on ${S}_{\mathrm{total}}$ . Because ${S}_{\mathrm{total}}\propto {S}_{\mathrm{peak}}\times {{\rm{\Theta }}}_{\mathrm{beam},\mathrm{convol}.}^{2}$ , the completeness has a more complicated dependency on ${S}_{\mathrm{total}}/\mathrm{rms}\,\mathrm{noise}$ than ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ . Thus, using ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ to select the sample results in a more uniform completeness for various source sizes.

**Figure 34.** Completeness of the PyBDSF source extraction as a 2D function of the simulated ${S}_{\mathrm{peak}}/\mathrm{rms}\,\mathrm{noise}$ (i.e., ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}};$ left panel) or ${S}_{\mathrm{total}}/\mathrm{rms}\,\mathrm{noise}$ (right panel) and ${{\rm{\Theta }}}_{\mathrm{beam}}$ , based on the "FULL" simulation (Appendix C.1). Color indicates the completeness fraction.
Download figure:
Standard image High-resolution image

We compare our findings with other ALMA photometry work with completeness assessments in the literature: Hatsukade et al. (2011, 2016), Karim et al. (2013), Ono et al. (2014), Simpson et al. (2015), Aravena et al. (2016), Umehata et al. (2017), Dunlop et al. (2017), and Franco et al. (2018). Our differential completeness at low ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ is either lower than or consistent with the values in the literature. For example, Karim et al. (2013) estimated ∼70% completeness at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 3$ for their Gaussian fitting photometry down to 2.5σ, while we derive a value of ∼20%. Aravena et al. (2016) estimated ∼50% completeness at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 3$ for their SExtractor (Bertin & Arnouts 1996) source extraction down to 3.5σ. Hatsukade et al. (2016) found ∼25% completeness at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 3$ for their AEGEAN (Hancock et al. 2012) Gaussian fitting photometry down to 4σ. And Franco et al. (2018) report <20% completeness at the same ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}$ for their Blobcat Gaussian fitting photometry, which is consistent with ours. Note that all these studies mentioned are based on ALMA maps coming from a single program with similar beam size and rms noise. However, in our work these parameters vary significantly across the archival programs. Based on archival data, Ono et al. (2014) found a completeness of ∼85% at ${\rm{S}}/{{\rm{N}}}_{\mathrm{peak}}\sim 3$ for their SExtractor extraction down to as low as 1.8σ.

Our comparison above shows that both detection criterion and photometry method are important when deriving completeness fractions. A lower detection criterion leads to a higher completeness. However, we have also to consider the spurious detection fraction (Figure 8), which rapidly increases by lowering the detection criterion. Our choice of PyBDSF parameters as described in Section 2.2 is thus a compromise between the completeness and spurious fractions.

Appendix D: An Example of Our Automated Counterpart Association Examination

In Figure 35 we show an example for our automated examination of the counterpart association between each ALMA source and its counterpart source in prior catalogs. For each counterpart image (HST ACS i band, UltraVISTA K_s band, SPLASH IRAC ch1, and VLA 3 GHz), we measure several parameters as described in Section 4.2 and then link them to the likelihood of the counterpart association based on our visual inspection. For example, in Figure 35, the bold green circle indicates the fitted ALMA source position and size (convolved with the ALMA beam), and the red crosses are sources in the prior catalog (Section 2.3). The bold red cross indicates the counterpart of the ALMA source (or just the prior source used in prior fitting photometry).

**Figure 35.** Example of our automated counterpart association (Section 4.2) for the ALMA source with COSMOS2015 ID 888774 (Laigle et al. 2016). It is selected as an example because of its relatively large offset (Sep. = 0.46; see definition in Section 4.2) between the ALMA and optical positions (corrected for astrometry). The background image is the *HST* ACS i band in the left panel and the UltraVISTA K_s band in the right panel. Other symbols are the same in two panels: the thick green ellipse shows the position and size of the ALMA source, and the thin green ellipses show the aperture used for measuring the $\mathrm{Ext}.$ parameter as described in Section 4.2. The largest red cross represents the counterpart position, and smaller red crosses are other master catalog sources within this image. This prior source is from the Laigle et al. (2016) catalog; therefore, its prior position is based on an optical to near-IR combined S/N map and centered at the K_s emission. There is a faint blue source with ID 1732293 in the Capak et al. (2007) i-band catalog (but not in the Laigle et al. 2016 catalog) at the northeast of the prior position with a small offset of 04. Current information (including astrometry) is not sufficient to distinguish whether this faint blue source is another galaxy or physically associated with the K_s source. The latter situation (where UV stellar light offsets from dust emission) has been observed in many high-redshift dusty galaxies (e.g., Hodge et al. 2016, 2019; Lang et al. 2019; Rujopakarn et al. 2019). Due to the fact that the i-band source is quite faint, the K_s galaxy's SED is mostly unaffected by the i-band and shorter-wavelength photometry, and the ALMA photometry can be reasonably fitted for the K_s galaxy at its redshift.
Download figure:
Standard image High-resolution image

Appendix E: Problematic SED Fitting Cases

In Figure 36 we show the four problematic SED fitting cases that are labeled in Figure 24 and discussed in Section 4.6. Their stellar masses derived from our optimized iterative MAGPHYS fitting (Section 4.4) without an AGN component are higher by a factor of 10–100 compared to the masses reported by Delvecchio et al. (2017), who used SED3FIT to account for an AGN SED component (and both redshifts are consistent). However, we have the following reasons to believe that these are just rare cases that do not indicate an obvious bias in our SED fitting affected by mid-IR AGN contamination. First, these sources are very rare, with only 4 out of a total of 396 sources cross-matched with the Delvecchio et al. (2017) catalog. All other AGNs have no such large difference between this work and Delvecchio et al. (2017). Second, the advantage of this work is that we have the ALMA and FIR/(sub)millimeter constraint, and MAGPHYS has the energy balance assumption that links optical dust attenuation to dust luminosity in IR; therefore, the degeneracy between age and attenuation can be reduced (if the energy balance is valid, which could be true for our ALMA-selected dusty sample). Third, the optical photometry itself has uncertainties from either noise or galaxy–galaxy blending. For example, for outlier 3, its HST i-band and Subaru Suprime-Cam z'-band data could not be fit well, but its IR SED looks reasonable and counterpart association shows no problem. It has no obvious blended optical source within 3'', but we could not rule out the chance of line-of-sight blending of two sources, i.e., like the case of the galaxy CRLE reported in Pavesi et al. (2018).

**Figure 36.** Example of SEDs for sources `1`, `2`, `3`, and `4` listed in Figure 24 that exhibit strong mid-IR AGN emission and show disagreement in the stellar masses between this work and Delvecchio et al. (2017). See discussion of each source in Section 4.6. For each source, the best-fit full SED (black line) and unattenuated stellar SED (cyan line), as well as the photometric data points, are shown in the top, large subpanel. The ALMA data point from this work is highlighted in red. Optical to near-IR K-band data are from the "COSMOS2015" catalog (Laigle et al. 2016) with 3'' aperture photometry; near-IR IRAC to millimeter and radio data are from the "super-deblended" FIR/millimeter catalog (Jin et al. 2018). The SED fitting residuals at each band (log₁₀S_OBS/S_SED) are shown in each middle subpanel. The probability distribution histograms of each fitted physical parameter ( ${\mathrm{log}}_{10}{M}_{* }$ , log₁₀ SFR, IR luminosity log₁₀L_dust, dust mass log₁₀M_dust, and dust temperature T_dust) are shown in the subpanels at the bottom.
Download figure:
Standard image High-resolution image

To verify the degeneracy between age and attenuation, we ran some additional SED fitting with our multicomponent χ² fitting code under development,³⁸ similar to Liu et al. (2018). We fit two components for test purposes: one uses Bruzual & Charlot (2003) stellar SED models at solar metallicity, with constant SFH, with various ages from 0.1 to 1 Gyr, and with a varied dust attenuation (E(B − V) = 0 to 1.2) following the Calzetti et al. (2000) attenuation law; and the other uses the Mullaney et al. (2011) AGN SED models. We fit only the optical to mid-IR part (λ < 20 μm) of the SED. We loop each combination of the two component models and find a best χ² fit, and then we merge all combinations together to analyze the χ² distributions for ${M}_{* }$ and age (to obtain a median value and error for each parameter, similar to Liu et al. 2018). We find that such a fitting without constraints from FIR dust emission leads to very large uncertainties in age and stellar mass. For example, for outlier 2 (ID 951838), the best-fit ${\mathrm{log}}_{10}({M}_{* }/{M}_{\odot })$ dramatically varies from 9.5 ± 0.5 to 11.4 ± 0.6 with an age that varies from 200 Myr to 1 Gyr (with best-fit E(B − V) = 0.5 and 0.7, respectively). For comparison, the Delvecchio et al. (2017) ${\mathrm{log}}_{10}{M}_{* }$ of 10.24 and our MAGPHYS value of 12.15 agreed within the errors. For our outlier 3 (ID 813955), a fixed age of 200 Myr fitting gives E(B − V) =0.5 and ${\mathrm{log}}_{10}{M}_{* }=11.0\pm 0.1$ , while a free-age fitting gives E(B − V) = 1.2 and ${\mathrm{log}}_{10}{M}_{* }=12.6\pm 0.3$ with an age of 450 Myr. In comparison, Delvecchio et al. (2017) report a ${\mathrm{log}}_{10}{M}_{* }$ of 10.34, and our MAGPHYS value is 11.63, also in agreement within the range of uncertainties. Outlier 4 (ID 842140) presents a similar situation. For outlier 1 (ID 422662), our experimental stellar+AGN SED fitting ( ${\mathrm{log}}_{10}{M}_{* }\sim 11.2\mbox{--}12.6\pm 0.6$ ) cannot recover the low ${\mathrm{log}}_{10}{M}_{* }=9.45$ reported in the Delvecchio et al. (2017) catalog (their ID_VLA3 4136). Therefore, as mentioned in Section 4.6, we think that its stellar mass is more reliable from the MAGPHYS fitting because of the inclusion of ALMA data here. Finally, we conclude that these sources are just low-probability outliers suffering from the large uncertainty in stellar mass estimation and possibly also optical line-of-sight blending. The latter needs further follow-up observations, which is beyond the scope of this paper.