A publishing partnership

The following article is Open access

Galaxy Zoo: Clump Scout: Surveying the Local Universe for Giant Star-forming Clumps

, , , , , , , and

Published 2022 May 19 © 2022. The Author(s). Published by the American Astronomical Society.
, , Citation Dominic Adams et al 2022 ApJ 931 16 DOI 10.3847/1538-4357/ac6512

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/931/1/16

Abstract

Massive, star-forming clumps are a common feature of high-redshift star-forming galaxies. How they formed, and why they are so rare at low redshift, remains unclear. In this paper we identify the largest sample yet of clumpy galaxies (7050) at low redshift using data from the citizen science project Galaxy Zoo: Clump Scout, in which volunteers classified 58,550 Sloan Digital Sky Survey (SDSS) galaxies spanning redshift 0.02 < z < 0.15. We apply a robust completeness correction by comparing with simulated clumps identified by the same method. Requiring that the ratio of clump to galaxy flux in the SDSS u band be greater than 8% (similar to clump definitions used by other works), we estimate the fraction of local star-forming galaxies hosting at least one clump (fclumpy) to be ${3.22}_{-0.34}^{+0.38} \% $. We also compute the same fraction with a less stringent relative flux cut of 3% (${12.68}_{-0.88}^{+1.38} \% $), as the higher number count and lower statistical noise of this fraction permit finer comparison with future low-redshift clumpy galaxy studies. Our results reveal a sharp decline in fclumpy over 0 < z < 0.5. The minor merger rate remains roughly constant over the same span, so we suggest that minor mergers are unlikely to be the primary driver of clump formation. Instead, the rate of galaxy turbulence is a better tracer for fclumpy over 0 < z < 1.5 for galaxies of all masses, which supports the idea that clump formation is primarily driven by violent disk instability for all galaxy populations during this period.

Export citation and abstract BibTeX RIS

1. Introduction

The morphologies of low-redshift galaxies can generally be classified by their location on the Hubble sequence (Hubble 1926). However, in the past few decades, observations of star-forming galaxies at the peak of cosmic star formation (z ∼ 2) have revealed that these galaxies typically exhibit irregular, clumpy morphologies different from these classifications (Cowie et al. 1995; Elmegreen et al. 2004a, 2004b). Clumpy galaxies receive their name from the "giant star-forming clumps" that occupy them. Elmegreen (2007) estimated these clumps to be of characteristic mass between 107 and 109 M in galaxies in the Hubble Ultra Deep Field, much more massive than typical star-forming regions locally (though more recent papers have suggested lower characteristic masses; e.g., Fisher et al. 2017; Dessauges-Zavadsky & Adamo 2018). It has now been established that clumpy morphology in star-forming galaxies peaks near z ∼ 2, with ≳50% exhibiting clumpy behavior, before declining significantly with cosmic time (Murata et al. 2014; Guo et al. 2015; Shibuya et al. 2016; Guo et al. 2018).

It is still not clear why clumps are so dominant at z ∼ 2, nor why they are so much rarer in low-redshift galaxies. Two major modes of clump formation have been discussed at length in the literature. First, clumps may form "in situ" owing to "violent disk instability" (VDI) within their host galaxy, i.e., in disks where the Toomre stability parameter is ∼1 (Bournaud et al. 2014; Mandelker et al. 2014; Fisher et al. 2017). Such disks are called "marginally stable." This state can occur in galaxies that are continuously fed by smooth cold accretion of intergalactic gas (Genzel et al. 2008; Dekel et al. 2009), though observational confirmation of this process has been limited (Scarlata et al. 2009; Bouché et al. 2013). Second, clumps may form owing to interactions, i.e., major or minor mergers. The merging galaxies can themselves become clumps, known as "ex situ" clumps, or they can generate local instabilities within the merging galaxies that collapse to form in situ clumps. It is expected that ex situ clumps are higher in mass and volume, older, and lower in star formation activity than their in situ counterparts; simulations and observations both suggest that a minority of high-redshift clumps are formed this way (Mandelker et al. 2014; Zanella et al. 2019). It is possible that different formation mechanisms are at work for different populations of clumpy galaxies. Guo et al. (2015) suggest that the clump formation mode depends strongly on galaxy mass, with high-mass galaxies (M ≳ 1010.5 M) dominated by the VDI-driven "in situ" mode, while low-mass galaxies (M ≲ 1010 M) are dominated by minor-merger-driven "ex situ" formation.

Clumps have been difficult to study largely because they are difficult to resolve with existing instruments. While clumps were originally thought to be kiloparsec-scale objects, high-resolution observations of clumpy galaxies in the local universe (e.g., Overzier et al. 2009; Fisher et al. 2014, 2017; Messa et al. 2019) and in strongly lensed fields (e.g., Wuyts et al. 2014; Dessauges-Zavadsky et al. 2017; Dessauges-Zavadsky & Adamo 2018; Cava et al. 2018) have revealed that many clumps have a much smaller characteristic size, ranging from tens to hundreds of parsecs. It has therefore been theorized that high-redshift "kiloparsec-scale clumps" may be rare and that the size and mass of observed clumps are an order of magnitude lower than were originally estimated. Several papers point to "blending" of multiple small clumps at low resolution to explain high-redshift observations (Dessauges-Zavadsky et al. 2017; Fisher et al. 2017; Dessauges-Zavadsky & Adamo 2018). The characteristic size, mass, and properties of giant star-forming clumps are still a topic of debate.

To date, studies of local clumpy galaxies have focused on high-resolution imaging of small galaxy samples (n < 50). However, no studies have yet assembled a large catalog of local clumpy galaxies. Local clumpy galaxies, while rare, are observable in much greater detail and can act as analogs to their more distant counterparts. To this end, we launched the citizen science project Galaxy Zoo: Clump Scout (herein called Clump Scout). This project, which was active between 2019 and 2021, recruited volunteers to visually identify clumps in galaxies in the local universe. Each subject was examined by many volunteers, whose annotations were aggregated into consensus locations. The catalog of low-redshift clumps and clumpy galaxies from this project is the largest of its kind and can help to constrain models of clump formation and galaxy evolution by comparing it with high-redshift populations.

In this paper we will present the clump catalog assembled by Clump Scout and use it to estimate the fraction of clumpy galaxies (fclumpy) in the local universe. Guo et al. (2015), Shibuya et al. (2016), and others have used the evolution of fclumpy with redshift to evaluate the likelihood of different clump formation mechanisms (i.e., by internal VDI or by galaxy interactions). We will extend their analysis using our own fclumpy result.

This paper is structured as follows. Section 2 describes the galaxy sample, the Clump Scout citizen science project, and our methods for aggregating the annotations provided by Clump Scout volunteers. Section 3 describes our methods for recovering clump properties and correcting for incompleteness and describes the best practices for using the Clump Scout clump catalog. Section 4 details our method of computing the clumpy fraction and correcting it for incompleteness and compares this fraction to other results. In Section 5 we discuss the physical significance of our clumpy fraction result. Finally, Section 6 presents a summary and conclusions.

2. Sample Selection and Preparation

2.1. The Galaxy Zoo: Clump Scout Project

Galaxy Zoo: Clump Scout was a citizen science project that was active from 2019 September 19 to 2021 February 11 on the Zooniverse platform. 7 It presented volunteers with image cutouts of galaxies from the Sloan Digital Sky Survey (SDSS; York et al. 2000); for each galaxy image (referred to as a subject) the volunteer was asked to identify its central bulge, followed by all of its off-center clumps. The central bulge location was requested in order to discourage volunteers from marking the galaxy center as a clump, as well as to remove any identified "clumps" coinciding with galaxy center. To improve classification quality, new volunteers were presented with a classification tutorial; we also provided a field guide, as well as help pages for each task, which showcased many examples of quality clump annotations. In addition, the majority of volunteers 8 were shown a small sample of expert-classified "training" images when they began classifying (≲10 randomly interspersed throughout their first 20 classifications). After classifying a training image, the volunteer was given feedback on how the classification compared to that of the experts.

For the clump-marking task, volunteers were provided with a "normal clump marker" and an "unusual clump marker" (see Figure 1). The "unusual" marker was intended to identify foreground star contaminants, i.e., Milky Way stars that overlap with the angular area of the target galaxy. Like clumps, foreground stars appear as point sources in SDSS images, and their colors in the i, r, and g bands can be very similar to clumps; this makes them very difficult to distinguish from clumps by any simple rule. We therefore instructed volunteers to mark clumps as "unusual" if they were particularly bright, differently colored, or especially offset from their host galaxy. Examples of "unusual clumps" (which were generally foreground stars) were provided in the tutorial and field guide. A total of 11,089 distinct volunteers provided classifications while logged into the Zooniverse website, while volunteers who were not logged in undertook 10,200 distinct classification sessions. In total, we obtained 1,707,485 classifications on our galaxy sample.

Figure 1. Refer to the following caption and surrounding text.

Figure 1. The user interface for Galaxy Zoo: Clump Scout, showing a partially completed classification. The volunteer has marked the central bulge (red crosshairs) in the previous step and is now identifying clumps.

Standard image High-resolution image

2.2. Galaxy Selection Criteria

In this study, our goal was to identify as thorough a sample of clumpy galaxies in the local universe as possible, where we take "local" to refer to a region nearby enough that little to no cosmological evolution takes place (i.e., z ≲ 0.15). To select for potentially clumpy galaxies, we relied on data from the Galaxy Zoo 2 (GZ2) citizen science project, which provides morphological classifications for over 300,000 local galaxies from the SDSS legacy survey. GZ2 itself consists of roughly 25% of galaxies identified in the SDSS main galaxy sample and comprises "the nearest, brightest, and largest systems for which fine morphological features can be resolved and classified" (Willett et al. 2013). From 243,500 GZ2 galaxies with spectroscopic redshifts, we selected for galaxies with detectable features by requiring a weighted vote fraction 9 of >50% for the presence of "features or disk" (herein called ffeatured). Major mergers were removed by requiring that the debiased vote fraction 10 for the presence of "merger" (herein called fmerger) was <50%. (For an overview of GZ2 vote fractions, see Willett et al. 2013.) Galaxies with z < 0.02 were removed to ensure that the vast majority of clumps could not be resolved and would appear as point sources; this was required so that realistic-looking simulated clumps could be added to these galaxies for comparison. Finally, we limited our sample to galaxies whose masses had been estimated by the SDSS DR7 MPA-JHU value-added catalog 11 (Kauffmann et al. 2003; Brinchmann et al. 2004). We did not make any cuts on galaxy size, as GZ2 already selects its galaxies to be resolvable (petroR90_r > 3 arcsec, where petroR90_r is the radius containing 90% of galaxy flux in the Petrosian aperture in the r band). The resulting sample contained 53,613 galaxies.

In addition to this main sample, we included a sample of 4937 additional galaxies from GZ2 with 0.02 < z < 0.075 and for which the GZ2-weighted vote fraction for "features or disk" was ≤50%. The redshift limit of z < 0.075 was applied to include a larger proportion of low-mass galaxies in our sample, since SDSS is dominated by high-mass galaxies at higher redshifts. In total, 168,603 galaxies in the GZ2 spectroscopic sample met the "features or disk" vote fraction cut (ffeatured ≤ 0.5), and 74,519 of these additionally met the z < 0.075 redshift cut. We selected only a subsample of these to study, as they were not classified as having features or a disk and therefore were unlikely candidates for hosting clumps. The 4937 galaxies chosen will be referred to as the "extra" sample, and we included the sample to permit us to extrapolate its clumpy statistics over the full population. With the "extra" sample included, a total of 58,550 galaxies were studied. Unless otherwise specified, photometry describing galaxies in our sample was obtained from the SDSS DR15 PhotoPrimary table (Aguado et al. 2019). Table 1 summarizes our galaxy selection criteria and number counts.

Table 1. The Sequence of Cuts Performed on the Galaxy Sample

SelectionGalaxy Count
Parent Sample
GZ2 with spec. redshift243,500
With MPA-JHU mass estimates239,950
With z > 0.02225,085
Regular Sample
With fmerger < 0.5211,236
With ffeatured > 0.553,613
Extra Sample
With ffeatured ≤ 0.5168,603
With z < 0.07574,519
Randomly selected sample4,937

Note. fmerger refers to the debiased GZ2 vote fraction for "merger," while ffeatured refers to the weighted GZ2 vote fraction for "features or disk." The final sample consisted of all 53,613 galaxies with ffeatured > 0.5 and fmerger < 0.5 (the "regular" sample), in addition to a sample of 4937 of the 168,603 galaxies with ffeatured ≤ 0.5 (the "extra" sample).

Download table as:  ASCIITypeset image

Given this sample, we created image cutouts for all galaxies from the SDSS DR15 Legacy survey that were presented to volunteers. We mapped the SDSS i-, r-, and g-band imaging to the red, green, and blue values of the image cutouts using asinh color scaling (Lupton et al. 2004) and scaled galaxies to be approximately the same size in each cutout. Cutout creation is described more fully in Appendix A.

2.3. Generating Images with Simulated Clumps

To estimate the completeness of our sample, we created additional cutouts of galaxies from the target sample with artificial clumps added. These will be called "simulated" subjects, whereas cutouts with no added clumps will be called "real." The simulated sample consists of 84,565 simulated clumps in 26,736 galaxies, approximately half the galaxy count in the real sample. During Clump Scout, each volunteer was shown a random sequence of subjects drawn uniformly from the pool of all subjects, i.e., both real and simulated. A subject was retired (no longer shown to volunteers) after receiving 20 independent classifications.

Galaxies were selected for the simulated sample such that their mass distribution matched the mass distribution of clumpy galaxies in the Guo et al. (2018) sample (1132 galaxies spanning 0.5 < z < 3). This lowered the characteristic mass of galaxies in the simulated sample: the galaxy sample from Guo et al. (2018) has a median mass of 109.7 M, compared to 1010.6 M for galaxies in the real sample. This was appropriate, as massive galaxies tend to be redder and smoother; visual inspection confirmed that a large fraction of star-forming clumps inserted into high-mass galaxies were easily distinguished as simulations. We allowed galaxies to appear in the simulated sample multiple times, applying a different image transformation (i.e., combination of rotations and reflections) to each cutout made of the galaxy to ensure that the final subject was unique.

To determine clump luminosity and color, we simulated clump spectra using spectral templates generated by the software Flexible Stellar Population Synthesis (Conroy et al. 2009; Conroy & Gunn 2010). We treated each clump as a single stellar population (with delta-function star formation history), assigning it an age and a V-band total extinction AV . Clump ages spanned 10−2.5 Gyr (i.e., ∼3 Myr) to 101 Gyr. Dust extinction was given by a Calzetti et al. (2000) attenuation curve, and the clump's AV value varied narrowly about the SDSS-estimated dust content for the host galaxy (σ = 0.04 in $\mathrm{ln}({A}_{V})$). Metallicity was fixed to the solar value. The resulting spectrum was redshifted to match the host galaxy and integrated to determine the clump's broadband flux in each SDSS band.

Clump mass was selected uniformly from a distribution that depended on the clump age, shown in Figure 2. Qualitatively, we excluded low-mass, high-age clumps to avoid extremely faint ("known invisible") objects. Similarly, we excluded the brightest clumps (high mass, low age), as these were "known visible" objects. The resulting sample probes the region where we are least certain of volunteers' recovery capability. We discarded and regenerated a clump if its apparent magnitude was >22.8 in each of the g, r, i, and z bands, as well as >24.8 in the u band, 12 since these clumps were considered "known invisible." The resulting distribution in the age and mass of simulated clumps is shown in Figure 2.

Figure 2. Refer to the following caption and surrounding text.

Figure 2. A 2D histogram of simulated clump counts in log10 clump relative mass and age space. Clumps' properties were drawn uniformly from within the area plotted in red and then discarded if they did not meet the magnitude limit described in Section 2.3.

Standard image High-resolution image

Each galaxy in the simulated sample was assigned a number of clumps selected from a Poisson distribution with mean 3, where 0 results were rejected and resampled. (Since 0 results were rejected, the actual mean number of clumps per galaxy was 3.16.) This distribution was chosen to maximize the clump density per galaxy without appearing unrealistic to volunteers and is not meant to reflect the true distribution of clumps per galaxy. To assign locations to clumps, image segmentation was performed on the r-band imaging field containing the host galaxy using Source Extractor (Bertin & Arnouts 1996). For each field, a centered cutout was made, smoothed with a boxcar filter, and then segmented by Source Extractor to determine which pixels belonged to the host galaxy. A pixel was chosen uniformly at random from the host galaxy's segment to be the clump's location within the image. In order to probe the low surface brightness regions of a galaxy (without letting our sample be dominated by clumps in these regions), we randomly assigned 25% of clumps to a "wide" segment that encompassed a larger area than the standard galaxy segment. 13 Figure 3 provides an example of the simulation placement process. Each clump was then simulated as a point source with a γ = 2.5 Moffat profile; of the profiles we tried, it was found by visual inspection by the authors that a Moffat profile looked the most "real" to observers and was most difficult to distinguish from real clumps. For most clumps, the FWHM of this profile was equivalent to the FWHM of the r-band point-spread function (PSF) in the image. We also allowed a small number of clumps to be "extended" by assigning each clump an effective physical radius, selected from a uniform distribution over the range [10 pc, 500 pc] (based on size limits observed by Messa et al. 2019). If this physical radius exceeded that of the PSF, this radius was used as the effective radius of the clump's profile. Approximately 19% of simulated clumps (16,342 of 84,565) exceeded the seeing PSF size, by a median of 27%. The distribution of simulated clump sizes allows us to probe the effect of clump size on recovery statistics and is not intended to match the true distribution.

Figure 3. Refer to the following caption and surrounding text.

Figure 3. An illustration of the process of placing simulated clumps in an image. (a) The initial galaxy image, g band only. (b) The galaxy segmentation map. The dark-gray inner region is the "narrow" region where clumps are placed with 75% probability, while the dark- and light-gray regions together make up the "wide" region where clumps are placed with 25% probability. Selected locations for simulated clumps are plotted as crosses. (c) The final galaxy image with simulated clumps added (some are too faint to be detected). Dashed red lines are drawn around the simulations to highlight them. The final galaxy image shown to volunteers was rotated and/or reflected with respect to this one to distinguish its appearance from the original.

Standard image High-resolution image

2.4. Aggregation Method

After subjects had been examined by Clump Scout volunteers, the locations of clumps needed to be determined from the collected annotations on each subject. To this end, we developed an aggregation algorithm with which all volunteer annotations on each subject were transformed into consensus clump locations. Broadly, our aggregation process consisted of two steps: first, clump candidates were identified via a clustering algorithm; second, clumps marked "unusual" by a sufficient fraction of volunteers (>0.35) were discarded, since these are likely to be foreground star contaminants.

The clustering algorithm we employed is adapted from Branson et al. (2017) and is specialized for citizen science applications. A complete description of the algorithm can be found in H. Dickinson et al. (2022, in preparation). Briefly, we assign each volunteer a "false-positive probability" (i.e., the chance that an annotation by this volunteer is not associated with any clump candidates), a "false-negative probability (i.e., the chance that the volunteer failed to mark a given clump candidate), and a "scatter" value that estimates the distance between an annotation and its intended target. These values inform a clustering algorithm that identifies clump candidates; in turn, the identified clump candidates update the volunteer statistics, and so on. To determine when each image has been fully classified, a "risk" value is computed for each one. Once the risk falls below a threshold value, clustering is no longer performed and its clump candidates are finalized. Each clump candidate is assigned a "false-positive probability" based on the number and properties of volunteers who marked it. To improve the purity of our sample, we discard clumps with false-positive probability >0.6. By comparison with a sample of classifications by the authors, we found that the majority of volunteer-identified clumps with false-positive probabilities larger than 0.6 did not correspond to any expert-identified clumps, while the majority of those below the threshold did. We additionally remove any clumps that coincide with the volunteer-identified central bulge of a galaxy. To locate a galaxy's central bulge, we take the median x and y coordinates of central bulge annotations from all volunteers; we then prune outlying annotations by removing any volunteer's central bulge annotation that falls more than 20 pixels from this location (in the 400 × 400 cutout image), and we recalculate the central bulge location without these. We then remove any clumps within 1 PSF-FWHM of the central bulge.

In total (excluding unusual clumps), we identify 10,738 clumps over 7050 galaxies in our sample. An additional 3858 unusual clumps were found; although these are most likely foreground contaminants, we include them in our public catalog for transparency.

3. Clump Properties and Completeness

In this section, we will discuss our method for estimating the flux and background of each clump identified by Clump Scout.

We estimated the flux of and galactic background for each clump in each of the SDSS ugriz bands using a method similar to Guo et al. (2015). First, flux was measured in an aperture of diameter 2.25 PSF-FWHM centered on the clump location. Next, the background-per-pixel value (where "background" refers to diffuse galaxy light) was estimated by taking the median pixel value in an annulus spanning diameters 3–5 PSF-FWHM and used to estimate the background flux in the central aperture. Figure 4 provides a visual example of the aperture and annulus sizes used. This background is subtracted to obtain a clump flux estimate within the aperture. A random sample of model PSFs from 1000 SDSS fields revealed that 84% ± 2% of flux from a point source falls within an aperture of diameter 2.25 PSF-FWHM. We therefore multiplied the background-subtracted aperture flux by 1.191 to counteract this systematic.

Figure 4. Refer to the following caption and surrounding text.

Figure 4. An illustration of the parameters used to estimate the flux from each clump. The left panel is a galaxy image shown to volunteers from the "real" sample (no simulated clumps), zoomed in to show detail. The right panel is an image that marks the locations of four recovered clumps. Solid outlines are drawn around the central apertures, and dashed lines are drawn around the background regions.

Standard image High-resolution image

There are a few known sources of systematic error in our flux estimation process for simulated clumps. A small percentage of flux (≲5%) is lost owing to a combination of pixelation effects, contamination of the background region by the clump, and offsets between the recovered and true locations of clumps. In addition to these, the diffuse background flux is slightly underestimated by the background annulus (median: ∼85% of the true value). This results in overestimation of clump fluxes, particularly for dimmer clumps. While these systematics cannot fully be removed with the existing method, we have minimized them so that they are smaller than the scatter. It should be noted that these systematic effects may not exactly match those for the real sample of clumps compared with our simulations; as such, we simply minimize the systematics and do not apply a correction to counteract them. The bottom panel of Figure 5 visually quantifies the effectiveness of this flux recovery method by displaying the recovered versus input magnitudes of the simulated clump sample and demonstrates that the systematic effects are less than the scatter.

Figure 5. Refer to the following caption and surrounding text.

Figure 5. Here we illustrate the effectiveness of the aggregation and flux estimation process. For all simulated clumps that volunteers recovered, we compare the "input" properties of clumps (their original simulated values) with their "recovered" properties, following the aggregation process laid out in Section 2.4 and the flux estimation method in Section 3. Top: histogram of the offset between the recovered and true locations of simulated clumps recovered by volunteers. The median offset is 0.15 PSF-FWHM, as marked by the dashed line. Bottom: the flux recovery fraction (recovered flux/input flux) versus input magnitude for simulated clumps. Red points mark the binned medians in magnitude difference, while vertical bars span the 16th to 84th percentile flux recovery fractions. The median flux recovery fraction was 1.03 for clumps with simulated magnitude <22.5. There are a few sources of systematic error in the flux recovery process that could not be removed completely (described in Section 3).

Standard image High-resolution image

To estimate the error on clump fluxes, we first estimated the per-pixel uncertainties in each SDSS field using the gain and dark variance values provided for each CCD, as well as the image calibration and sky image maps provided with each field. We then fit an uncertainty model to all the pixels in each field of the form ${\sigma }_{f}^{2}={mf}+b$, where ${\sigma }_{f}^{2}$ is the variance on a pixel's flux, f is the pixel's flux, and m and b are model parameters. We sum the variances within a clump's aperture to estimate the aperture flux uncertainty, and we take the median pixel variance in the background annulus and multiply this by the aperture area to estimate the background uncertainty within the aperture. Finally, to obtain the variance on the background-subtracted aperture flux, we sum the aperture and annulus variance estimates; we multiply this value by the same aperture correction (1.191) to estimate the uncertainty on the final, background-subtracted clump flux.

We also obtained a completeness estimate for each clump. For a given clump, its completeness estimate is the estimated fraction of clumps similar to it that Clump Scout recovered. To determine this, we relied on the recovery fractions of simulated clumps, where a simulated clump was considered "recovered" if Clump Scout volunteers located a clump within 0.75 PSF-FWHM of its location. We then examined the sample of simulated clumps with respect to three properties: a clump's brightness (g-band magnitude), its color (g-minus-r magnitude), and its contrast against the diffuse background (clump-minus-background g-band magnitude). The simulated clumps were binned with respect to these properties, and the overall recovery fraction was calculated for each bin. We then calculated the same three properties for each real clump and compared with the recovery fractions of the simulated sample to obtain each clump's completeness estimate. The three properties selected—brightness, color, and contrast—effectively predict the completeness of each clump, as we found that including additional properties, including clump galactocentric radius, galaxy redshift, galaxy size, or image resolution, had only a very small effect on our completeness estimates by comparison. The details of this process are described in Appendix B. Figure 6 shows the estimated recovery fraction statistics for real clumps in our sample.

Figure 6. Refer to the following caption and surrounding text.

Figure 6. The estimated recovery fraction for clumps as a function of the three clump properties ${m}_{g}^{\mathrm{clump}}$, ${m}_{g}^{\mathrm{clump}}-{m}_{r}^{\mathrm{clump}}$, and ${m}_{g}^{\mathrm{clump}}-{m}_{g}^{\mathrm{background}}$. Plots at the top of each column show the median estimated recovery fraction for clumps as a function of each property (red line), with 68% scatter (red shaded region). Completeness estimates are not shown for bins with <50 clumps; the dashed black lines trace the number density of clumps as a function of each property. Plots below the diagonal show the 2D dependence of estimated recovery fraction vs. pairs of clump properties, where bins with <5 clumps are not shown.

Standard image High-resolution image

We use these completeness estimates in this work to correct the fraction of clumpy galaxies (fclumpy) for incompleteness (Section 4.3). Future studies relying on this clump catalog should generally incorporate these completeness estimates to accurately model the local population of clumps and should not rely solely on the observed number counts of clumps.

3.1. Catalog Release and Use

Along with the electronic release of this paper, we release the catalog of all of the clumps identified by the Clump Scout aggregator and their estimated properties. The columns are fully described by Table 3 at the end of this paper. Clumps with a high fraction (≥35%) of "unusual" annotations are included but are marked by the flag unusual_flag=1.

To use this catalog for scientific purposes, we make a few "best practices" suggestions on how to filter this catalog:

  • 1.  
    Selecting a clean sample: We strongly recommend that clumps with unusual_flag=1 should be rejected, as these are probable nonclump contaminants (i.e., foreground stars, background galaxies, or other point-like sources).
  • 2.  
    Mass completeness: This galaxy catalog is not mass complete, and the lower-limit mass on galaxies evolves significantly with redshift. We estimate that the catalog is mass complete down to 109 M for z < 0.035. If a mass-complete catalog is needed, care must be taken to limit the sample's redshift.

4. The Local Clumpy Fraction of Galaxies

A particularly important observable in the study of clumpy galaxies is the fraction of star-forming galaxies with at least one clump, known as the "clumpy fraction" or fclumpy. The clumpy fraction is most simply defined by

Equation (1)

where SFGs refer to star-forming galaxies (specific star formation rate (sSFR) >0.1 Gyr−1). This is an easily compared observable between different galaxy populations that can significantly constrain models of galaxy evolution. In this section, we establish a clump definition that makes estimating fclumpy straightforward and precise, and then we present our fclumpy estimate.

4.1. Selecting a Clump Definition for fclumpy

A major difficulty in clumpy galaxy literature is that the definition of a "clump" is highly inconsistent. Past works have defined clumps as those objects identified by visual investigation (e.g., Elmegreen et al. 2007; Puech 2010; Overzier et al. 2009) or by applying detection algorithms that are robust to changes in resolution and depth, including the clumpfind algorithm from Williams et al. (1994) and others (e.g., Livermore et al. 2012; Guo et al. 2012; Tadaki et al. 2014; Zanella et al. 2019). However, comparisons between these different methods are not straightforward.

Guo et al. (2015) proposed an empirically motivated definition that the ratio of clump to galaxy UV luminosity (fLUV) must exceed 8%. The 8% cutoff was chosen to select for star-forming regions at high redshift while excluding common star-forming regions locally; specifically, it includes many star-forming regions in HST-imaged galaxies spanning 0.5 < z < 3 but excludes >99% of star-forming regions identified in the galaxy M101 (blurred to match the resolution of the high-redshift sample). Local clumps exceeding the fLUV > 8% threshold are therefore expected to be rare, exceptional objects. Several other recent works use this or a similar criterion (e.g., Shibuya et al. 2016; Mandelker et al. 2017; Fisher et al. 2017). However, it is not universally applicable: Dessauges-Zavadsky & Adamo (2018) rejected it to allow for study of the mass function of high-redshift clumps down to much lower masses, while Huertas-Company et al. (2020) used a clump mass cut of Mclump > 107 M instead to facilitate comparison with simulations.

In this paper, we calculate fclumpy by specifying a relative flux cutoff in the SDSS u band, i.e., the ratio of clump to galaxy flux in the u band (fLu ) is greater than some specified fraction. In particular, we use the relative flux cuts fLu > 8% and fLu > 3%, and call the clumpy fractions under these criteria fclumpy,8% and fclumpy,3%, respectively. The 8% fraction was selected to be comparable to existing works with a similar criterion, while the 3% fraction allows for larger number statistics and is easier and more accurate to estimate for the low-redshift universe, where clumps are less common.

Regarding the relationship between fLu and fLUV, it is worth noting that several past studies have used a relative flux threshold in the UV (∼2500 Å) to define clumps (e.g., Guo et al. 2015; Shibuya et al. 2016). Because UV data are not available in SDSS, we instead use the lowest available wavelength band, the u band at ∼3500 Å. Therefore, in order to compare our fclumpy value with past studies, we must demonstrate that clumps selected under our fLu cut are comparable to those selected under the more commonly used fLUV cut. To do so, we examined a highly complete sample of clumps from the Guo et al. (2018) sample, requiring fLUV > 5% to select for bright clumps and a highly complete sample. This sample is herein called the Guo+2018 sample. Guo+2018 was chosen because it includes a large sample of bright clumps (523 clumps meet the fLUV > 5% criterion) and because it spans a similar physical resolution to Clump Scout: a typical SDSS g-band PSF-FWHM at z ∼ 0.05 is ∼1.2 kpc, compared with ∼1.1–1.3 kpc for HST sources spanning 1 < z < 2.5 for similar wavelengths.

For each clump in the Guo+2018 sample, we examined its flux fraction in CANDELS filter bands that were analogous to near-UV and the SDSS u band in the rest frame. 14 We find that there is a strong correlation between fLu and fLUV using these filters, with the median clump having fLu = 0.86fLUV. A total of 1170 clumps from the Guo+2018 sample meet the fLUV > 8% criterion, compared with 961, which meet fLu > 8%, a reduction of ∼18%.

We performed a similar experiment on clumps in Clump Scout. Though we could not calculate fLUV directly since SDSS does not provide near-UV data (∼2500 Å), we obtained near-UV fluxes for local galaxies from the GALEX survey (Martin et al. 2005), using the cross-matched GALEX-SDSS catalog created by Bianchi & Shiao (2020). We then assumed that local clumps matched the SED distribution of clumps from the Guo+2018 sample and multiplied each clump's u-band flux by the UV-to-u ratio of a randomly selected Guo+2018 clump. The resulting values for fLu and fLUV are plotted alongside the Guo+2018 values in Figure 7. For both groups, fLu is a reasonably strong predictor of fLUV. Performing the same experiment using the g band rather than the u band revealed that fLg values are not as well correlated to fLUV and are typically much smaller (the median clump had fLg = 0.73fLUV).

Figure 7. Refer to the following caption and surrounding text.

Figure 7. Comparison of fLUV (the clump-to-galaxy flux ratio in the near-UV) and fLu (the same in the SDSS u band) in the Guo+2018 sample and the Clump Scout sample. The y-axis traces fLu , while the x-axis traces fLUV (the same in the near-UV). Red points are used for Clump Scout clumps, while the gray contour lines are drawn to visualize their scatter. For Guo+2018 clumps, fLUV and fLu have a strong linear correlation (Spearman rank correlation rs ≈ 0.84). We take the near-UV fluxes for a sample of Clump Scout galaxies from the GALEX survey. While we cannot measure the near-UV fluxes of Clump Scout clumps directly, we estimate them by assuming that local clumps have similar SEDs to clumps in the Guo+2018 sample and multiplying each clump's u-band flux by the UV-to-u ratio for a randomly selected Guo+2018 clump. The resulting distributions of fLUV and fLu fall nearly along a 1-to-1 line for both Clump Scout and Guo+2018. fLu is strongly correlated with fLUV, though there is a large degree of scatter in the relation.

Standard image High-resolution image

Based on these experiments, we conclude that fLu is the best available analog of fLUV for SDSS data. We choose not to apply a conversion factor to convert fLu values to fLUV, since no statistically large sample of UV-measured local clumps exists that could verify such a conversion factor. Further, our results in this section suggest that fLu - and fLUV-defined values of fclumpy are closely related and can be meaningfully compared, with the caveat that u-band-estimated values of fclumpy are likely to be slightly smaller than UV-estimated values.

4.2. Calculating fclumpy

The calculation of fclumpy from the Clump Scout catalog has several steps. We calculate fclumpy in the full sample, as well as in three distinct bins of galaxy mass, and the selection cuts and galaxy counts for each mass bin are detailed in Table 2. Here, we detail the steps for calculating fclumpy in the broadest mass bin (M > 109 M), though the same process applies to every bin.

Table 2. The Cuts Applied on the Galaxy Sample Used for Calculating fclumpy

  zmax RegularExtra (Examined)Total
Parent sample 53,613168,603 (4,937)222,216
With sSFR > 0.1 Gyr−1  12,67120,969 (1,015)33,640
With b/a ratio > 0.3 12,14219,858 (954)32,000
All masses0.03526403203 (230)5843
(109 < M/M)    
Low-mass galaxies0.03520422898 (118)4940
(109 < M/M < 109.8)    
Medium-mass galaxies0.0517411444 (149)3185
(109.8 < M/M < 1010.6)    
High-mass galaxies0.09601720 (203)1321
(1010.6 < M/M < 1011.4)    

Note. "Regular" galaxies were fully examined by Clump Scout volunteers, while "extra" galaxies (with ffeatured ≤ 0.5) were only partially examined (the number of examined galaxies is given in parentheses); the "total" column sums the regular and extra columns. Row 1 presents the parent sample of galaxies from Table 1. (The number of galaxies here is slightly smaller than that in Table 1, as galaxies with ffeatured > 0.5 and fmerger ≥ 0.5 were not examined and are left out of the total.) Rows 2 and 3 enumerate the sample removing quiescent galaxies and edge-on galaxies, respectively. The final four rows detail the four mass-binned samples used for calculating fclumpy.

Download table as:  ASCIITypeset image

First, we isolate a star-forming, mass-complete sample of galaxies. SDSS is complete for galaxies down to 109 M at redshifts z < 0.035. Therefore, beginning with the Clump Scout parent sample defined in Section 2.2, we limit the sample to galaxies with sSFR > 10−1, M > 109 M, and z < 0.035. In addition, because clumps may be more difficult to detect in edge-on galaxies than face-on galaxies, we remove all galaxies for which the ratio of the galaxy's major to minor axis is less than 0.3, where this axis ratio is estimated by the SDSS exponential fit in the r band (expAB_r in the PhotoPrimary table). Our sample contains Ntot = 5843 galaxies passing these cuts.

We then combine the contributions from the "regular" Clump Scout sample of 53,613 galaxies and the "extra" sample of 4937 galaxies used to extrapolate over all galaxies in SDSS that Clump Scout did not directly examine. In total, Nreg = 2640 of 5843 galaxies passing all cuts for fclumpy were examined in the regular sample.

Of the remaining 3203 galaxies, a sample of 230 were examined by volunteers as part of the "extra" sample. We use Nextra,samp to refer to the size of the sample of these galaxies that volunteers examined directly, i.e., Nextra,samp = 230, and Nextra,tot to refer to the size of the total population from which these galaxies were drawn, i.e., Nextra,tot = 3203.

For each group, the observed clumpy fraction fclumpy,obs is calculated and the completeness correction from Section 4.3 is applied. The corrected fraction ${f}_{\mathrm{clumpy},\mathrm{extra}}^{\mathrm{corr}}$ over the "extra" sample is then extrapolated over all SDSS galaxies not examined by Clump Scout. This yields the total clumpy fraction:

Equation (2)

Here fclumpy,reg and fclumpy,extra are the fraction of galaxies out of Nreg and Nextra,samp, respectively, that were estimated to be clumpy.

The sampling error is estimated separately on the "regular" and "extra" samples using the standard error formula on a proportion, taking the sample size N to be the number of examined galaxies in the "regular" or "extra" group:

Equation (3)

We then scale these by their contribution to the total value of fclumpy and add them in quadrature to obtain

To estimate the total error on fclumpy, we use a Monte Carlo method to include contributions from the uncertainty on clump fluxes and clump incompleteness, as well as from sampling error. Over 100 trials, clump fluxes are allowed to randomly vary within a normal distribution defined by their estimated error values (see Section 3), while the clump completeness map is recalculated on each trial by the method described in Appendix B. On each Monte Carlo trial, we include sampling error by calculating an initial value of fclumpy and then reassigning it to a random value selected from N(fclumpy, σclumpy). Of these sources of error, sampling error is by far the most significant: in a trial run where error contributions from clump flux error and the completeness correction were ignored, the error bars were typically within 10% of their original values.

4.3. Correcting fclumpy for Incompleteness

Our estimate of fclumpy must also take into account the incompleteness of our clump catalog. We therefore use the following method for completeness-correcting the clumpy fraction of galaxies.

The end goal of the fclumpy completeness correction is to calculate PFN, the probability that a galaxy is a "false negative" for clumps—in other words, the probability that the galaxy contains one or more clumps, but that none of its clumps were detected. To calculate PFN, we work in steps from the completeness estimates on individual clumps in the sample. We define Prec,i to be the recovery fraction of clump i. (Refer to Appendix B for a full overview of how Prec,i is determined for each clump.)

We begin by calculating the recovery probability of a randomly selected clump within the sample, Prec:

Equation (4)

where for each clump i, 1/Prec,i approximates the true number of clumps with similar properties.

Next, we estimate the true distribution of clumps per clumpy galaxy, Pcount(nc ). To do so, we begin with a proposed distribution Pcount(nc ) and then simulate 10,000 clumpy galaxies with this distribution. We then remove a fraction Prec of clumps at random to simulate the observed distribution, discarding galaxies with no observed clumps. The proposed distribution is adjusted until the mean number of clumps per galaxy in the simulated distribution closely matches the observed mean. For mathematical expediency, we used an exponential distribution to model Pcount(nc ) with coefficient λ, i.e., Pcount(nc ) ∼ ${e}^{-\lambda {n}_{c}}$. (It should be noted, though, that the observed distributions are also well fit by Poisson distributions, and the exponential model does not necessarily have physical significance.) In Figure 8, we plot the observed distributions of fLu > 3% clumps per galaxy for each galaxy mass bin we examine, along with the best-fit exponential model.

Figure 8. Refer to the following caption and surrounding text.

Figure 8. The distribution of clumps per galaxy in each of our four mass bins (the bins are described more fully in Table 4). Dashed lines represent the best-fit exponential model to the data of the form ${N}_{G}={{Ae}}^{-\lambda {n}_{c}}$ for galaxy count NG and clumps per galaxy nc . Error bars on this fit were estimated with Markov Chain Monte Carlo sampling, and the shaded region around each line represents the 68% confidence interval for the fit. The best-fit λ value (with 68% confidence interval) is also provided for each fit.

Standard image High-resolution image

Given Prec and Pcount(nc ), PFN is given by

Equation (5)

The PFN sum is dominated by the first few terms. For fLu > 8% clumps in our broadest galaxy mass bin (M > 109 M with z < 0.035), we estimate Prec ≈ 49.3% and λ ≈ 0.85; given these values, the first three terms of the PFN sum account for approximately 78%, 92%, and 99% of missed clumpy galaxies, respectively.

Given the galaxy false-negative probability PFN, the completeness of fclumpy is given by (1 – PFN). It is then straightforward to correct the clumpy fraction:

Equation (6)

Parameter fclumpy corr should be taken as our estimate of the "true" value, as it accounts for galaxies whose clumps were not detected; however, we present both the observed and corrected values in our results.

4.4. Results

Here we present our results for fclumpy,8% and fclumpy,3% (the clumpy fraction using the thresholds fLu > 8% and fLu > 3%, respectively), both overall and within several different galaxy mass bins. All of these numbers are collected in Table 4.

Table 3. Clump Scout Catalog Description

ColumnNameNoteReference
1IDSDSS DR8 object IDSDSS
2Clump indexOne-indexed
3Galaxy sample"regular" or "extra"
Galaxy Properties   
4Redshift SDSS
5–6Galaxy R.A., decl.J2000, degSDSS
7Image r-band PSF-FWHMarcsecSDSS
8–9Galaxy u-band flux and error μJySDSS
10–11Galaxy g-band flux and error μJySDSS
12–13Galaxy r-band flux and error μJySDSS
14–15Galaxy i-band flux and error μJySDSS
16–17Galaxy z-band flux and error μJySDSS
18–19Central bulge R.A., decl.J2000, degSection 2.4
20Galaxy r-band reff arcsecSDSS
21Galaxy log(M*)log(M)MPA-JHU
22Galaxy log(SFR)log(M Gyr−1)MPA-JHU
23Galaxy log(sSFR)log(Gyr−1)MPA-JHU
Clump Properties   
24–25Clump R.A., decl.J2000, degSection 2.4
26Clump offsetarcsecSection 2.4
27–28Clump u-band flux and error μJySection 3
29–30Clump g-band flux and error μJySection 3
31–32Clump r-band flux and error μJySection 3
33–34Clump i-band flux and error μJySection 3
35–36Clump z-band flux and error μJySection 3
37–38Background u-band flux and error μJy arcsec–2 Section 3
39–40Background g-band flux and error μJy arcsec–2 Section 3
41–42Background r-band flux and error μJy arcsec–2 Section 3
43–44Background i-band flux and error μJy arcsec–2 Section 3
45–46Background z-band flux and error μJy arcsec–2 Section 3
47Est. clump/galaxy near-UV flux ratio Section 4.1
48Volunteer unusual fraction Section 2.4
49Unusual flag Section 3.1
50Completeness estimate Section 3

Note. References abbreviated as "SDSS" refer to the SDSS DR15 catalog (York et al. 2000), while those abbreviated as "MPA-JHU" refer to the value-added catalog released with SDSS, based on work by Kauffmann et al. (2003) and Brinchmann et al. (2004).

Only a portion of this table is shown here to demonstrate its form and content. A machine-readable version of the full table is available.

Download table as:  DataTypeset image

Within the regular Clump Scout sample (Nreg = 2640), we detected 104 galaxies with clumps passing fLu > 8%; corrected for incompleteness, we estimate the true number to be ∼156. Within the extra sample (Nextra = 230), we observed just two galaxies with clumps passing fLu > 8%, and we estimate the true number to be ∼3 correcting for incompleteness. In total, we estimate that 188 of Ntot = 5843 galaxies have clumps (corrected for incompleteness), leading to an estimate fclumpy,8% corr = ${3.22}_{-0.34}^{+0.38} \% $.

We apply the same procedure to galaxies using the fLu > 3% cut and observe 410 clumpy galaxies in the regular sample (∼641 corrected for incompleteness) and 5 clumpy galaxies in the extra sample (∼9 corrected for incompleteness). This yields a total estimate ${f}_{\mathrm{clumpy},3 \% }^{\mathrm{corr}}={12.68}_{-0.88}^{+1.38} \% $.

To characterize the local distribution of clumps, we also examine fclumpy in three different bins of galaxy mass: $9\lt {\mathrm{log}}_{10}(M/{M}_{\odot })\lt 9.8$, $9.8\lt {\mathrm{log}}_{10}(M/{M}_{\odot })\lt 10.6$, and $10.6\lt {\mathrm{log}}_{10}(M/{M}_{\odot })\lt 11.4;$ we refer to the galaxies in these bins as low-mass, medium-mass, and high-mass galaxies, respectively. These match the mass bins over which Guo et al. (2015) estimated the clumpy fraction. To obtain a complete galaxy sample, different redshift limits were applied to each bin: z < 0.035 for low-mass galaxies, z < 0.05 for medium-mass galaxies, and z < 0.09 for high-mass galaxies (see Figure 9).

Figure 9. Refer to the following caption and surrounding text.

Figure 9. Determining survey completeness limits for our mass bins. Here we plot log mass vs. redshift for all galaxies examined by Galaxy Zoo 2 with spectroscopic redshifts (i.e., the parent sample of Clump Scout). Dashed horizontal lines demarcate the three primary mass bins used in our analysis. The green shaded regions represent the relative surface density of galaxies, while the red line traces the same with some added smoothing; assuming no significant cosmological evolution over this redshift range, the surface density should remain constant if there is no loss due to survey incompleteness. We draw a vertical dashed black line at the redshift where the surface density of galaxies first falls below 60% of maximum, and we use this redshift to approximate the limit of a "mass-complete sample" in each mass bin. Redshift limits are therefore drawn at 0.035 for the lowest-mass bin, 0.05 for the intermediate bin, and 0.09 for the highest-mass bin.

Standard image High-resolution image

Following the same procedure as for the overall clumpy fraction, we estimate that ${f}_{\mathrm{clumpy},8 \% }^{\mathrm{corr}}$ is ${3.12}_{-0.26}^{+0.26} \% $ for low-mass galaxies, ${2.48}_{-0.83}^{+0.50} \% $ for medium-mass galaxies, and ${2.57}_{-0.47}^{+0.56} \% $ for high-mass galaxies. A more complete list of statistics can be found in Table 4; they are also plotted in Figure 10. Example images of galaxies in each of the mass bins are shown in Figure 11.

Figure 10. Refer to the following caption and surrounding text.

Figure 10. The value of fclumpy in three galaxy mass bins, under the criteria fLu > 8% (left) and fLu > 3% (right). The hatched bar represents the observed value, while the solid bar represents the completeness-corrected estimate.

Standard image High-resolution image
Figure 11. Refer to the following caption and surrounding text.

Figure 11. A sample of galaxies, each containing at least one clump with fLu > 8%. Ten galaxies are presented from each of the low-, medium-, and high-mass bins; the SDSS specobjid of each galaxy is provided at the top of each image, while the R.A. and decl. values are provided in parentheses at the bottom. Each identified clump is circled in yellow, while the central bulge locations identified by volunteers are identified by red crosses. Dotted circles indicate clumps with fLu < 3%, dashed circles indicate those with 3% < fLu < 8%, and solid circles those with fLu > 8%.

Standard image High-resolution image

4.5. Comparisons to Other Studies

To place our estimates of fclumpy,8% at z < 0.1 in context, we compare them with high-redshift (z > 0.5) results from other works, in particular Shibuya et al. (2016) for galaxies of all masses M > 109 M and Guo et al. (2015) for galaxies in bins of low, medium, and high mass (matching the mass bins used in this paper). We plot these values in Figure 12.

Figure 12. Refer to the following caption and surrounding text.

Figure 12. The value of fclumpy,8% vs. redshift, as estimated by our study at z ∼ 0 and others at z > 0.5. Top: fclumpy for galaxies of all masses (>109 M). Our result is in line with the model by Shibuya et al. (2016) that was originally fit to their high-redshift fclumpy results. Bottom: fclumpy divided into mass bins. The mass bins and methods used in this work match closely with those used by Guo et al. (2015) to estimate fclumpy at z > 0.5, but our clumpy fractions at z ∼ 0 are significantly lower than those at higher redshift. For comparison, we have plotted estimates of the minor merger fraction and observations of the "turbulent fraction" of galaxies between 0 < z < 1.5. The minor merger fraction is modeled by Lotz et al. (2011) and plotted here with observability timescales of 0.5, 1.25, and 2 Gyr (with gray error regions containing the best-fit range). The turbulent fraction comes from kinematic observations by Kassin et al. (2012) over three galaxy mass bins. We find that the turbulent fraction qualitatively matches the patterns observed in fclumpy, i.e., that it declines significantly over 0 < z < 1.5 and that high-mass galaxies begin this decline the soonest. By comparison, we find that the minor merger fraction remains approximately constant over the same time period and is a poor tracer of fclumpy for any mass bin when including our results at z ∼ 0.

Standard image High-resolution image

Shibuya et al. (2016) found that the clumpy fraction peaks between redshifts 1 and 2 at a value of >50%, before declining over z < 1. To model this trend, they use a fit function with the same form as is commonly used to model the trend in the cosmic SFR density with redshift (Madau et al. 1996; Lilly et al. 1996). To compare with their results, we take their best-fit model for z versus fclumpy and extend it beyond their data to z ∼ 0; this is plotted in Figure 12. Their model predicts a value of fclumpy,8% ∼ 4% at z ∼ 0, which is in line with our result of fclumpy,8% ≈ 3.22%.

Guo et al. (2015) observed a nearly constant fclumpy,8% ∼ 50% for low-mass galaxies over 0.5 < z < 3, while our results indicate a value of ∼3% at z ∼ 0 for galaxies of the same mass As explained in Section 4.1, these results can be meaningfully compared because the fLu > 8% cut used in this work is similar to the fLUV > 8% cut used by Guo et al. (2015). We note that the differences between our z < 0.1 results for fclumpy and the z > 0.5 results from Guo et al. (2015) are an order of magnitude or more, which is much larger than the difference we expect between fclumpy computed with a UV versus u-band clump definition (as discussed in Section 4.1). These two works also probe a similar physical resolution: the physical resolution of CANDELS images is ∼1 kpc at all redshifts, compared to ∼0.5–1.7 kpc over the range 0.02 < z < 0.09 for SDSS. We therefore expect that this drop in the clumpy fraction between 0 < z < 0.5 is real and cannot be the result of different identification methods.

There are few other studies that estimate fclumpy in the local universe. Murata et al. (2014) studied clumpy galaxies selected from HST/ACS F814W imaging from the COSMOS field spanning redshifts 0.2 < z < 1. The F814W filter approximately corresponds to the SDSS r and g bands over this redshift range, and the nearest galaxies in this sample (z ∼ 0.2) are likely similar to Clump Scout galaxies (z < 0.1). The fraction of optically bright galaxies with multiple star-forming clumps was found to decrease from 0.35 at z ∼ 1 to 0.05 at z ∼ 0.2. While this decrease is qualitatively in line with our results, the definition of fclumpy used by Murata et al. (2014) is significantly different from that used here: rather than apply a relative flux criterion, clumpy galaxies were selected to have multiple star-forming clumps of comparable brightness. The low completeness of our sample prevents us from applying the Murata et al. (2014) condition, as it requires the detection of at least three clumps per clumpy galaxy. Overzier et al. (2009) also studied clumpy galaxies at z < 0.3 but only studied a sample of 30 "Lyman break analog" galaxies with extremely high UV fluxes; the clumpy fraction obtained from this sample is not comparable to that of our broader sample.

5. Discussion

5.1. Interpretation of fclumpy

Given this paper's focus on the fraction of clumpy galaxies, it is worth discussing exactly how this quantity is defined and how it should be used. Parameter fclumpy is a particularly good probe of trends in clumpiness across cosmic time: By controlling for galaxy mass and SFR, we ensure that fclumpy is computed between groups of similar galaxies even at different redshifts. However, using a relative flux criterion (fLUV > 8%) to define clumps may select for very different sets of physical objects depending on galaxy mass. Naively, assuming a linear relation between mass and u-band luminosity in our galaxy sample, our lowest galaxy mass bin (109–109.8 M) includes clumps that are ∼2 dex less massive than our highest-mass bin (1010.6–1011.4 M). It is therefore not straightforward to compare fclumpy between bins of different galaxy mass. The validity of this comparison depends on the clump luminosity function: for example, if the clump luminosity function experiences an exponential cutoff (e.g., as proposed by Livermore et al. 2012), the relation between fclumpy and galaxy mass would depend on the location of this exponential cutoff with galaxy mass. Therefore, while fclumpy can be compared for galaxies of similar mass across different redshifts, it is not straightforward to compare fclumpy between galaxies of different mass. The remainder of this discussion focuses on trends in fclumpy with redshift for this reason.

5.2. Physical Implications of fclumpy

A major motivation for determining fclumpy over large redshift ranges is to distinguish between different proposed modes of clump formation. There are two primary modes by which clumps are thought to form. In the in situ mode, clumps form owing to gas collapse within the host galaxy due to turbulent disk dynamics (i.e., VDI). VDI is expected in galaxies that are actively accreting gas via "cold-mode" accretion, in which gas flows into the galaxy via smooth, cold streams. This accretion process adds kinetic energy to the disk and can drive the Toomre parameter below unity, making gas unstable to collapse (Dekel et al. 2009). Alternately, in the ex situ mode of formation, clumps originate as minor mergers: the clump forms as a satellite galaxy with its own dark matter component, only later merging with its host. It should be noted that clumps are short-lived structures on a cosmological scale: simulations find that massive clumps in disk galaxies have a maximum lifetime of ≲500 Myr, by which time they are slowed owing to dynamical friction and have merged with their host's central bulge (Bournaud et al. 2014; Mandelker et al. 2014). Therefore, the presence of clumps indicates that the clump formation process is ongoing or recent, and fclumpy can effectively act as a tracer of galaxy behavior. It remains unclear which is the dominant formation process, as different processes may dominate different galaxy populations.

To determine the primary formation process of clumps (i.e., via in situ or ex situ formation), we can examine trends in the rate of VDI and the minor merger rate over cosmic time and compare these with trends in fclumpy. In Figure 12, we have plotted our estimate of fclumpy,8%, along with comparable estimates at higher redshift (Guo et al. 2015; Shibuya et al. 2016) and estimates of the fraction of galaxies experiencing VDI and with observable signatures of minor mergers.

We use the minor merger rate estimate from Lotz et al. (2011), which was obtained by subtracting the number of galaxies with close pairs (major mergers) from the number with disturbed, uneven morphologies (major and minor mergers). The best-fit model to the minor merger fraction takes the form fmerg,minorTobs(1 + z)α , with best-fit exponent α = −0.2 ±0.2. The "observability timescale" refers to the time during which the host galaxy's morphology is measurably disturbed, which is dependent on the detection method (and distinct from the lifetime of an ex situ clump formed during a merger). To represent the uncertainties in these parameters, we plot the best-fit model for Tobs values of 0.5, 1.25, and 2 Gyr over the fit range (0 < z < 1.5). In all cases, the minor merger rate rises or remains constant over the full redshift range owing to the fit parameter α = −0.1 ± 0.1.

To determine the fraction of galaxies experiencing turbulence, we use measurements of galaxy kinematics from Kassin et al. (2012). These measurements reveal that galaxies of a wide range of masses (108–1010.7 M) tend to "settle" and become rotationally dominated over the period 0 < z < 1.2. Moreover, they find that the highest-mass galaxies have the lowest fraction of turbulence at any epoch. The "turbulent fraction," defined as the fraction of galaxies for which Vcirc/σgas < 3 (i.e., the fraction of galaxies experiencing VDI), is plotted in Figure 12 for three of the mass bins examined by Kassin et al. (2012), spanning 109–1010.7 M. All turbulent fractions decline over 0 < z < 1.2, with higher-mass galaxies declining more quickly.

We then turn to trends in fclumpy,8% and compare them to the trends in the two clump formation mechanisms (VDI and minor mergers) described above. Ignoring our low-redshift data for a moment, the data from Guo et al. (2015) suggested that two different clump formation mechanisms may be dominant in high-mass galaxies and low-mass galaxies. The clumpy fraction for high-mass galaxies declines significantly with time over the span 0.5 < z < 3 from ∼55% to ∼15%, while for low-mass galaxies it remains constant at fclumpy ∼ 60% over the same time span. To explain this difference, it was suggested that the primary formation mechanism for clumps in high-mass galaxies may be VDI (in situ) and trace the turbulent fraction over this time span, while those in low-mass galaxies form owing to minor mergers (ex situ) that are roughly stable over the same time span.

However, our low-redshift estimates of fclumpy challenge this two-mechanism formation model. For galaxies of all masses, we now observe a significant decline in fclumpy,8% to <5% at z ∼ 0. Even assuming that our z ∼ 0 estimates of fclumpy,8% are too small by a factor of several, the observed fraction would still be far lower at z ∼ 0 than at z > 0.5 in every mass bin. This result matches the conventional wisdom about clumps, i.e., that giant star-forming clumps are common at high redshift and rare locally. However, the roughly constant minor merger rate over the period 0 < z < 0.5 is inconsistent with the significant low-redshift decline that we observe.

Instead, we suggest that in situ, VDI-driven formation is the primary mode of clump formation in galaxies of all masses, at least over the redshift range 0 < z < 1.5. The trends in galaxy turbulence over this time span match closely with the trends in fclumpy: all galaxies show evidence of a decline in turbulence, with low-mass galaxies remaining turbulent the longest. The VDI-driven formation model provides a natural mechanism for the decline in fclumpy, which is the decline in the cosmological rate of gas accretion by galaxies: as the availability of intergalactic gas decreases, so too do the rates of star formation, turbulent dynamics, and clump formation (Dekel et al. 2009).

Adding to this picture, smaller case studies have already provided limited evidence to link VDI to clumpiness directly. Studies of the kinematics of high-redshift clumpy galaxies find that they have turbulent morphologies (Elmegreen et al. 2009; Genzel et al. 2011), with Genzel et al. (2011) in particular noting that clumps appear in regions of the galaxy where the Toomre instability parameter is subunity. However, these studies examined spirals with stellar masses ≳1010.6 M, corresponding to the highest-mass bin in our work; similar kinematic examination of galaxies with lower masses remains to be done. In total, the current body of evidence points to a picture of clump formation that is dominated by in situ formation due to turbulent disk dynamics, though more concrete evidence is needed to confirm this.

6. Summary and Conclusions

In this work we present the largest catalog yet of local star-forming clumps (z ≲ 0.1), consisting of 10,738 clumps in 7050 galaxies. The clumps were identified via the citizen science project Galaxy Zoo: Clump Scout, which asked volunteers to identify star-forming clumps in a sample of 58,550 galaxies selected from the parent sample Galaxy Zoo 2. Consensus locations for these clumps are determined via an aggregation technique adapted from Branson et al. (2017). We estimate the completeness of our clump sample by comparing with a sample of simulated clumps identified via the same process.

The clump catalog generated by this work is versatile and can be used for many purposes. While this paper focused on estimating the clumpy fraction of galaxies, the catalog can also be used to answer other questions about clumps. In follow-up work, we intend to investigate the mass and age functions of these clumps using photometric SED fitting. Obtaining these statistics will permit comparison between the properties of clumps at low and high redshift, and they can be also used to directly test theories of clump formation and evolution that make predictions on the mass or age distribution of clumps.

We define two different measures of the clumpy fraction of galaxies, fclumpy,8% and fclumpy,3%, which measure the fraction of galaxies with at least one clump emitting at least 8% and 3% of galaxy flux in the u band (fLu ≥ 8% and fLu ≥ 3%), respectively. Parameter fLu is found to be the closest analog to fLUV (the fraction of galaxy flux emitted in the near-UV) available in SDSS data. Parameter fclumpy,8% is presented because it has been used in the past for high-redshift studies of clumpy galaxies, while fclumpy,3% is presented for comparison with future local studies. Both fractions are corrected for incompleteness. We find ${f}_{\mathrm{clumpy},8 \% }={3.22}_{-0.34}^{+0.38} \% $ and ${f}_{\mathrm{clumpy},3 \% }\sim {12.68}_{-0.88}^{+1.38} \% $, with considerable variation over different mass bins.

Our low value of fclumpy,8% is qualitatively in line with other low-redshift surveys (e.g., Murata et al. 2014), though few are available. It is, however, much lower than the values of fclumpy,8% estimated at high redshift (Guo et al. 2015; Shibuya et al. 2016). We suggest that the extreme decrease in clumpy morphology is not in line with minor-merger-driven clump formation (as suggested by Guo et al. 2015 for low-mass galaxies) because the minor merger rate does not show similar change over this period (Lotz et al. 2011). Instead, we suggest that a better tracer of fclumpy,8% is the turbulent fraction of galaxies. Kassin et al. (2012) observed a decline in turbulence for galaxies of all masses (108–1010.7 M), but in particular they noted that larger galaxies settle quickly after z ∼ 1.2 while less massive galaxies remain turbulent for a longer time, mimicking the trends in fclumpy. In total, the current body of evidence supports a picture where clumps primarily form in situ owing to disk instability, though more observations are needed.

The data in this paper are the result of the efforts of the Galaxy Zoo volunteers, without whom none of this work would be possible. Their efforts are individually acknowledged at http://authors.galaxyzoo.org. We would like to thank the anonymous referee, whose suggestions were insightful and led to substantial improvements to this paper.

This research is partially supported by the National Science Foundation under grant AST 1716602.

This material is based on work supported by the National Aeronautics and Space Administration (NASA) under grant No. HST-AR-15792.002-A.

This research made use of Montage. It is funded by the National Science Foundation under grant No. ACI-1440620 and was previously funded by the National Aeronautics and Space Administration's Earth Science Technology Office, Computation Technologies Project, under Cooperative Agreement No. NCC5-626 between NASA and the California Institute of Technology.

This publication uses data generated via the Zooniverse.org platform, development of which is funded by generous support, including a Global Impact Award from Google, and by a grant from the Alfred P. Sloan Foundation.

Appendix A: Galaxy Cutout Creation

Here we detail the process by which we created galaxy image cutouts. For each galaxy, we began by downloading the i-, r-, and g-band fields listed in the SDSS PhotoPrimary table for the galaxy. From each field, we selected a cutout centered on the target galaxy and scaled to 6 times the r-band 90% Petrosian radius on each side. Finally, we projected images in our three image bands to a uniform coordinate system using the Montage library (Jacob et al. 2010) to avoid subpixel offsets in the RGB composite image. This coordinate system had a scale of 0.396'' pixel–1 to match SDSS's native resolution. To create a color-composite image, the i-, r-, and g-band cutouts were mapped to red, green, and blue, respectively, and color scaling was performed via the "Lupton scaling" technique described in Lupton et al. (2004). The particular scaling performed obeyed the equation

Equation (7)

where I is the input intensity in a given image band (in counts) and x the output intensity for display (ranging from 0 to 1). "Q," "minimum," and "stretch" were tunable parameters set to 7, 0, and 0.2, respectively, while "band_scaling" is a per-band parameter set to 1.818, 1.176, and 0.7 for the i, r, and g bands, respectively. The result resembles SDSS's standard color balance with a significant emphasis on the g band to emphasize star-forming regions. Finally, images were resized to a standard 400 × 400 pixel scale.

Appendix B: Calculating the Completeness Estimate on Clumps

In this appendix we describe the mathematical process used to calculate the completeness estimate of each clump identified by Clump Scout volunteers.

We estimate the completeness of each recovered clump as a function of three measured properties: the clump's brightness (traced by ${m}_{g}^{\mathrm{clump}}$), the clump's color (traced by ${m}_{g}^{\mathrm{clump}}-{m}_{r}^{\mathrm{clump}}$), and the clump's brightness relative to the diffuse background (traced by ${m}_{g}^{\mathrm{clump}}-{m}_{g}^{\mathrm{background}}$). The background magnitude ${m}_{g}^{\mathrm{background}}$ measures the estimated background in a circular aperture with diameter 1 PSF-FWHM. The g band was selected as the primary band because it traces the bluest wavelength of the filters used in our image cutouts (i, r, and g) and correlates most strongly with star formation. Other properties, such as galactocentric distance, galaxy mass, redshift, or image resolution, were found to have minimal impact, possibly because they are correlated with the three properties already used.

We then partitioned real clumps into bins based on their features x = {x1, x2, x3}, where

Partition edges were selected such that each feature i was divided into partitions ki (xi ) containing equal number counts of real clumps. Partitions over all three features were then given by k ( x ) = {k1(x1), k2(x2), k3(x3)}. (Note that since clump properties are not independent, the three-dimensional partitions k do not generally contain the same number counts of clumps.)

Next, we used simulated clumps to estimate the recovery fraction frecov( k ) in each partition. A naive estimate would be to equate frecov( k ) to the fraction of simulated clumps in bin k recovered by volunteers. However, this method is imprecise for bins with few recovered clumps and provides no simple method for estimating error. Instead, we used a Bayesian method: we assumed that the likelihood distribution of frecov( k ) obeys a beta distribution defined by the number of recovered and missed clumps within k . That is,

Equation (8)

where nrecov is the number of clumps in bin k that volunteers recovered and nmissed is the number that they missed. We then took frecov( k ) to be the median of the P(frecov( k )) distribution. (This method is described in Cameron 2011 and provides robust uncertainties for estimated fractions even when number count is low.) Figure 13 displays the number of (total and recovered) simulated clumps in each bin, as well as the uncorrected distribution of recovered real clumps over these bins.

Figure 13. Refer to the following caption and surrounding text.

Figure 13. Demonstration of the simulated clump binning procedure. In each panel, each shaded square represents a single bin in 3D space; the panels divide clumps by their brightness, while the x- and y-axes trace the other two properties (contrast and color) that are included in the binning procedure. Top: the recovery fraction of simulated clumps is printed in each bin, formatted as "recovered/total." Bottom: the number of real clumps with fLu > 3% is printed in each bin.

Standard image High-resolution image

Finally, we interpolated the discrete map frecov( k ) to a continuous map over all clump properties, frecov( x ). We began with a grid of values for frecov( x ) by assuming that frecov( k ) = frecov( x 50( k )), where x 50( k ) is the median value of x for simulated clumps in partition k . We then linearly interpolated over this grid to obtain the continuous map. It should be noted that, because the interpolation grid was defined over the median clump locations in each bin ( x 50), some clumps in the outermost bins fell outside of the interpolation grid; these clumps were assigned frecov( k ( x )), i.e., the value for their bin. A total of 0.4% of real clumps fell in bins with fewer than five simulated clumps, and these clumps were discarded for further analysis.

Once the map frecov ( x ) is established, we can estimate the specific completeness estimate for each clump in our sample: for clump i with estimated properties x (i), we define its estimated completeness Prec,i as Prec,i = frecov ( x (i)). These Prec,i values can then be used to correct fclumpy for incompleteness, as explained in Section 4.3.

This method of estimating completeness also allows us to easily estimate the uncertainty on the estimate using a Monte Carlo method. Over 100 trials, we allowed the fraction of recovered clumps in each bin (frecov( k )) to vary randomly over its estimated distribution P(frecov( k )), rather than taking the 50th percentile value. Repeated trials yielded an approximate distribution on frecov(x).

Table 4. Observed and Corrected Values for the Clumpy Fraction Using the fLUV ≥ 8% and fLUV ≥ 3% Criteria, Divided by Mass Bin

  Galaxies fclumpy,8% fclumpy,3%
sSFR >0.1 Gyr−1 zmax (Ntot)ObservedCorrectedObservedCorrected
All masses0.0355843 ${2.18}_{-0.39}^{+0.39} \% $ ${3.22}_{-0.34}^{+0.38} \% $ ${8.16}_{-0.54}^{+0.49} \% $ ${12.68}_{-0.88}^{+1.38} \% $
(109 < M/M)   $\left({127}_{-23}^{+23}\right)$ $\left({188}_{-20}^{+23}\right)$ $\left({477}_{-31}^{+29}\right)$ $\left({741}_{-51}^{+81}\right)$
Low mass0.0354940 ${2.06}_{-0.18}^{+0.16} \% $ ${3.12}_{-0.26}^{+0.26} \% $ ${7.85}_{-0.61}^{+0.85} \% $ ${12.96}_{-1.40}^{+1.84} \% $
(109 < M/M < 109.8)   $\left({102}_{-9}^{+8}\right)$ $\left({154}_{-13}^{+13}\right)$ $\left({388}_{-30}^{+42}\right)$ $\left({640}_{-69}^{+91}\right)$
Medium mass0.053185 ${1.42}_{-0.40}^{+0.60} \% $ ${2.48}_{-0.83}^{+0.50} \% $ ${6.13}_{-0.88}^{+0.77} \% $ ${10.90}_{-0.90}^{+1.20} \% $
(109.8 < M/M < 1010.6)   $\left({45}_{-12}^{+19}\right)$ $\left({79}_{-27}^{+16}\right)$ $\left({195}_{-28}^{+25}\right)$ $\left({347}_{-29}^{+38}\right)$
High mass0.091321 ${1.54}_{-0.38}^{+0.43} \% $ ${2.57}_{-0.47}^{+0.56} \% $ ${3.37}_{-0.51}^{+0.71} \% $ ${6.40}_{-0.76}^{+0.78} \% $
(1010.6 < M/M < 1011.4)   $\left({20}_{-5}^{+6}\right)$ $\left({34}_{-6}^{+7}\right)$ $\left({44}_{-6}^{+10}\right)$ $\left({85}_{-11}^{+10}\right)$

Note. Each cell reporting an estimate of fclumpy also reports the number of galaxies corresponding to this fraction, out of a possible total of Ntot. Note that the observed numbers refer to the sum of the clumpy galaxy count observed in the "regular" sample with the extrapolated count from the "extra" sample, so not all of these galaxies were directly observed.

Download table as:  ASCIITypeset image

Footnotes

  • 7  
  • 8  

    This feature could only be made available for volunteers who were logged into the platform. Roughly 83% of classifications were provided by logged-in volunteers.

  • 9  

    The fraction of volunteers who voted for a particular Galaxy Zoo 2 classification, weighted according to the estimated consistency of each volunteer.

  • 10  

    Similar to the weighted vote fraction, but additionally corrected to remove systematic biases in galaxy properties due to their redshift.

  • 11  
  • 12  

    For comparison, the SDSS point-source 95% completeness magnitudes in the u, g, r, i, and z bands are 22.0, 22.2, 22.2, 21.3, and 20.5, respectively.

  • 13  

    For both the narrow and wide segments, the Source Extractor settings DEBLEND_NTHRESH = 16 and DEBLEND_MINCONT = 0.01 were used; the only parameter changed was the detection threshold DETECT_THRESH, which was set to 3 for the wide segment and 5 for the narrow. In addition, to generate the wide segment, the galaxy image was convolved with a boxcar filter of size 7 pixels, while the narrow image was convolved with a boxcar filter of size 3 pixels.

  • 14  

    The filters selected for UV were F435W for 0.5 < z < 1, F606W for 1 < z < 2, and F775W for 2 < z < 3, matching the scheme in Guo et al. (2015). We selected SDSS u-band analog filters to be closest to 3550 Å in the rest frame: F606W for 0.5 < z < 0.9, F775W for 0.9 < z < 1.2, F814W for 1.2 < z < 1.6, F105W for 1.6 < z < 2.2, and F125W for 2.2 < z < 3.

Please wait… references are loading.
10.3847/1538-4357/ac6512