Galaxy Zoo: Clump Scout: Surveying the Local Universe for Giant Star-forming Clumps

Massive, star-forming clumps are a common feature of high-redshift star-forming galaxies. How they formed, and why they are so rare at low redshift, remains unclear. In this paper we identify the largest sample yet of clumpy galaxies (7050) at low redshift using data from the citizen science project Galaxy Zoo: Clump Scout, in which volunteers classified 58,550 Sloan Digital Sky Survey (SDSS) galaxies spanning redshift 0.02 < z < 0.15. We apply a robust completeness correction by comparing with simulated clumps identified by the same method. Requiring that the ratio of clump to galaxy flux in the SDSS u band be greater than 8% (similar to clump definitions used by other works), we estimate the fraction of local star-forming galaxies hosting at least one clump (f clumpy) to be 3.22−0.34+0.38% . We also compute the same fraction with a less stringent relative flux cut of 3% ( 12.68−0.88+1.38% ), as the higher number count and lower statistical noise of this fraction permit finer comparison with future low-redshift clumpy galaxy studies. Our results reveal a sharp decline in f clumpy over 0 < z < 0.5. The minor merger rate remains roughly constant over the same span, so we suggest that minor mergers are unlikely to be the primary driver of clump formation. Instead, the rate of galaxy turbulence is a better tracer for f clumpy over 0 < z < 1.5 for galaxies of all masses, which supports the idea that clump formation is primarily driven by violent disk instability for all galaxy populations during this period.


INTRODUCTION
The morphologies of low-redshift galaxies can generally be classified by their location on the Hubble sequence (Hubble 1926).However, in the last two decades, observations of star-forming galaxies at the peak of cosmic star formation (z ∼ 2) reveal that they typically exhibit irregular, clumpy morphologies different from these classifications (Cowie et al. 1995;Elmegreen et al. 2004a,b).Clumpy galaxies receive their name from the "giant star-forming clumps" that occupy them.Elmegreen (2007) estimated these clumps to be of characteristic mass between 10 7−9 M in galaxies in the Hubble Ultra Deep Field, much more massive than typical star-forming regions locally (though more recent papers have suggested lower characteristic masses, e.g.Fisher et al. (2017); Dessauges-Zavadsky & Adamo (2018)).It has now been established that clumpy morphology in star-forming galaxies peaks near z ∼ 2, with 50% exhibiting clumpy behavior, before declining significantly with cosmic time (Murata et al. 2014;Guo et al. 2015;Shibuya et al. 2016;Guo et al. 2018).
It is still not clear why clumps are so dominant at z ∼ 2, nor why they are so much rarer in low-redshift galaxies.Two major modes of clump formation have been discussed at length in the literature.First, clumps may form "in-situ" due to "violent disk instability" (VDI) within their host galaxy, i.e. in disks where the Toomre stability parameter is ∼ 1 (Bournaud et al. 2014;Mandelker et al. 2014;Fisher et al. 2017).Such disks are called "marginally stable".This state can occur in galaxies that are continuously fed by smooth cold accretion of intergalactic gas (Genzel et al. 2008;Dekel et al. 2009), though observational confirmation of this process has been limited (Scarlata et al. 2009;Bouché et al. 2013).Second, clumps may form due to interactions, i.e. major or minor mergers.The merging galaxies can themselves become clumps, known as "ex-situ" clumps, or they can generate local instabilities within the merging galaxies which collapse to form in-situ clumps.It is expected that ex-situ clumps are higher in mass and volume, older, and lower in starformation activity than their in-situ counterparts; simulations and observations both suggest that a minority of high-redshift clumps are formed this way (Mandelker et al. 2014;Zanella et al. 2019).It is possible that different formation mechanisms are at work for different populations of clumpy galaxies: Guo et al. (2015) suggests that the clump formation mode depends strongly on galaxy mass, with high-mass galaxies (M 10 10.5 M ) dominated by the VDI-driven "in-situ" mode while lowmass galaxies (M 10 10 M ) are dominated by minormerger-driven "ex-situ" formation.
Clumps have been difficult to study largely because they are difficult to resolve with existing instruments.While clumps were originally thought to be kiloparsecscale objects, high-resolution observations of clumpy galaxies in the local universe (e.g.Overzier et al. 2009;Fisher et al. 2014Fisher et al. , 2017;;Messa et al. 2019) and in strongly lensed fields (Wuyts et al. 2014;Dessauges-Zavadsky et al. 2017;Dessauges-Zavadsky & Adamo 2018;Cava et al. 2018) have revealed that many clumps have a much smaller characteristic size, ranging from tens to hundreds of parsecs.It has therefore been theorized that high-redshift "kiloparsec-scale clumps" may be rare, and that the size and mass of observed clumps are an order of magnitude lower than were originally estimated.Several papers point to "blending" of multiple small clumps at low resolution to explain highredshift observations (Dessauges-Zavadsky et al. 2017;Fisher et al. 2017;Dessauges-Zavadsky & Adamo 2018).The characteristic size, mass and properties of giant star-forming clumps is still a topic of debate.
To date, studies of local clumpy galaxies have focused on high-resolution imaging of small galaxy samples (n < 50).However, no studies have yet assembled a large catalog of local clumpy galaxies.Local clumpy galaxies, while rare, are observable in much greater detail and can act as analogs to their more distant counterparts.To this end, we launched the citizen science project Galaxy Zoo: Clump Scout (herein called Clump Scout).This project, which was active between 2019 and 2021, recruited volunteers to visually identify clumps in galaxies in the local universe.Each subject was examined by many volunteers, whose annotations were aggregated into consensus locations.The catalog of lowredshift clumps and clumpy galaxies from this project is the largest of its kind, and can help to constrain models of clump formation and galaxy evolution by comparing it with high-redshift populations.
In this paper we will present the clump catalog assembled by Clump Scout and use it to estimate the fraction of clumpy galaxies (f clumpy ) in the local universe.Guo et al. (2015), Shibuya et al. (2016) and others have used the evolution of f clumpy with redshift to evaluate the likelihood of different clump formation mechanisms (i.e. by internal VDI or by galaxy interactions).We will extend their analysis using our own f clumpy result.
This paper is structured as follows.Section 2 describes the galaxy sample, the Clump Scout citizen science project, and our methods for aggregating the annotations provided by Clump Scout volunteers.Section 3 describes our methods for recovering clump properties and correcting for incompleteness.Section 4 describes the criteria applied to define clumps, and also details the clump catalog accompanying this paper.Section 5 estimates the local fraction of clumpy galaxies and compares it to other works.In Section 6 we discuss this result, its physical significance, and potential issues.Section 7 presents a summary and conclusions.

SAMPLE SELECTION AND PREPARATION
2.1.The Galaxy Zoo: Clump Scout project Galaxy Zoo: Clump Scout was a citizen science project which was active from September 19, 2019 to February 11, 2021 on the Zooniverse platform1 .It presented volunteers with image cutouts of galaxies from the Sloan Digital Sky Survey (SDSS; York et al. 2000); for each galaxy image (referred to as a subject) the volunteer was asked to identify its central bulge, followed by all of its off-center clumps.The central bulge location was requested in order to discourage volunteers from marking the galaxy center as a clump, as well as to remove any identified "clumps" coinciding with galaxy center.To improve classification quality, new volunteers were presented with a classification tutorial; many examples of correct classifications were provided by a field guide and task-level help menus as well.In addition, the majority of volunteers2 were shown a small sample of expertclassified "training" images when they began classifying (< 10 randomly interspersed throughout their first 20 classifications).After classifying a training image, the volunteer was given feedback on how the classification compared to that of the experts.
For the clump-marking task, volunteers were provided with a "normal clump marker" and an "unusual clump marker" (see Figure 1).The "unusual" marker was intended to identify foreground star contaminants, i.e.Milky Way stars that overlap with the angular area of the target galaxy.Like clumps, foreground stars appear as point sources in SDSS images, and their colors in the i, r and g bands can be very similar to clumps; this makes them very difficult to distinguish from clumps by any simple rule.We therefore instructed volunteers to mark clumps as "unusual" if they were particularly bright, differently-colored, or especially offset from their host galaxy.Examples of "unusual clumps" (which were generally foreground stars) were provided in the tutorial and field guide.1.The sequence of cuts performed on the galaxy sample.fmerger refers to the debiased GZ2 vote fraction for "merger", while f f eatured refers to the weighted GZ2 vote fraction for "features or disk".The final sample consisted of all 53,613 galaxies with f f eatured > 0.5 (the "regular" sample), in addition to a sample of 4,937 of the 171,472 galaxies with fmerger ≥ 0.5 and f f eatured ≤ 0.5 (the "extra" sample).

Galaxy selection criteria
In this study, our goal was to identify as thorough a sample of clumpy galaxies in the local Universe as possible, where we take "local" to refer to a region nearby enough that little to no cosmological evolution takes place (i.e.z 0.15).To select for potentially clumpy galaxies, we relied on data from the Galaxy Zoo 2 (GZ2) citizen science project, which provides morphological classifications for over 300,000 local galaxies from the SDSS legacy survey.GZ2 itself consists of roughly 25% of galaxies identified in the SDSS main galaxy sample, and comprises "the nearest, brightest, and largest systems for which fine morphological features can be resolved and classified" Willett et al. (2013).From 243,500 GZ2 galaxies with spectroscopic redshifts, we selected for galaxies with detectable features by requiring a weighted vote fraction3 of at least 50% for the presence of "features or disk" (f f eatured ).Major mergers were removed by requiring that the debiased vote fraction4 for the presence of "merger" (f merger ) was below 50%.(For an overview of GZ2 vote fractions, see Willett et al. 2013).Galaxies with z < 0.02 were removed to ensure that the vast majority of clumps could not be resolved and would appear as point sources; this was required so that realistic-looking simulated clumps could be added to these galaxies for comparison.Finally, we limited our sample to galaxies whose masses had been estimated by the SDSS DR7 MPA-JHU value-added catalog5 (Kauffmann et al. 2003;Brinchmann et al. 2004).Our of GZ2 as our parent catalog means that we do not make any cuts on galaxy size, as GZ2 already selects its galaxies to be resolvable (petroR90 r > 3 arcsec, where petroR90 r is the radius containing 90% of galaxy flux in the Petrosian aperture).The resulting sample contained 53,613 galaxies.
In addition to this main sample, we included a sample of 4,937 additional galaxies from GZ2 with 0.02 < z < 0.075 and for which the GZ2 weighted vote fraction for "features or disk" was less than 50%.The redshift limit of z < 0.075 was applied to include more low-mass galaxies in our sample, since SDSS is dominated by high-mass galaxies at higher redshifts.In total, 157,623 galaxies in the GZ2 spectroscopic sample met the "features or disk" vote fraction cut (f f eatured > 0.5), and 74,057 of these additionally met the z < 0.075 redshift cut.We selected only a subsample of these to study as they were not classified as having features or a disk, and therefore unlikely candidates for hosting clumps.The 4,937 galaxies chosen will be referred to as the "extra" sample, and we included the sample to permit us to extrapolate its clumpy statistics over the full population.With the "extra" sample included, a total of 58,550 galaxies were studied.Unless otherwise specified, photometry describing galaxies in our sample were obtained from the SDSS DR15 PhotoPrimary table (Aguado et al. 2019).Table 1 summarizes our galaxy selection criteria and number counts.
Given this sample, we created image cutouts for all galaxies from the SDSS DR15 Legacy survey which were presented to volunteers.We mapped the SDSS i, r, and g band imaging to the red, green and blue values of the image cutouts using asinh color scaling (Lupton et al. 2004), and scaled galaxies to be approximately the same size in each cutout.Cutout creation is described more fully in Appendix A.

Generating images with simulated clumps
To estimate the completeness of our sample, we created additional cutouts of galaxies from the target sample with artificial clumps added.These will be called "simulated" subjects, whereas cutouts with no added clumps will be called "real".The simulated sample consists of 84,565 simulated clumps in 26,736 galaxies, approximately half the galaxy count in the real sample.During Clump Scout, each volunteer was shown a random sample of subjects drawn uniformly from the pool of all subjects (both real and simulated).A subject was retired (no longer shown to volunteers) after receiving 20 classifications.Clumps' properties were drawn uniformly from within the area plotted in red, then discarded if they did not meet the magnitude limit described in Section 2.3.
Galaxies were selected for the simulated sample such that their mass distribution matched the mass distribution of clumpy galaxies in the Guo et al. (2018) sample (1,132 galaxies spanning 0.5 < z < 3, hereafter called Guo+2018).This lowered the characteristic mass of galaxies in the simulated sample: The Guo+2018 sample has a median mass of 10 9.7 M , compared to 10 10.6 M for galaxies in the real sample.This was appropriate, as massive galaxies tend to be redder and smoother; visual inspection confirmed that a large fraction of starforming clumps inserted into high-mass galaxies were easily distinguished as simulations.We allowed galaxies to appear in the simulated sample multiple times, applying a different image transformation (i.e.combination of rotations and reflections) to each cutout made of the galaxy to ensure the final subject was unique.
To determine clump luminosity and color, we simulated clump spectra using spectral templates generated by the software Flexible Stellar Population Synthesis (Conroy et al. 2009;Conroy & Gunn 2010).We treated each clump as a single stellar population (with deltafunction SFH), assigning it an age and a V band total extinction A V .Clump ages spanned 10 −2.5 Gyr (i.e.∼3 Myr) to 10 1 Gyr.Dust extinction was given by a Calzetti et al. (2000) attenuation curve, and the clump's A V value varied narrowly about the SDSS-estimated dust content for the host galaxy (σ = 0.04 in ln(A V )).Metallicity was fixed to its Solar value.The resulting spectrum was redshifted to match the host galaxy and integrated to determine the clump's broadband flux in each SDSS band.
Clump mass was selected uniformly from a distribution which depended upon the clump age, shown in Figure 2. Qualitatively, we excluded low-mass, high-age clumps to avoid extremely faint ("known invisible") ob-jects.Similarly, we excluded the brightest clumps (highmass, low-age) as these were "known visible" objects.The resulting sample probes the region where we are least certain of volunteers' recovery capability.We discarded and regenerated a clump if its apparent magnitude was > 22.8 in each of the g, r, i and z bands as well as > 24.8 in the u band 6 , since these clumps were considered "known invisible".The resulting distribution in the age and mass of simulated clumps is shown in Figure 2.
Each galaxy in the simulated sample was assigned a number of clumps selected from a Poisson distribution with mean 3, where 0 results were rejected and resampled.(Since 0's were rejected, the actual mean number of clumps per galaxy was 3.16.)This distribution was chosen to maximize the clump density per galaxy without appearing unrealistic to volunteers, and is not meant to reflect the true distribution of clumps per galaxy.To assign locations to clumps, image segmentation was performed on the r band imaging field containing the host galaxy using Source Extractor (Bertin & Arnouts 1996).For each field, a centered cutout was made, smoothed with a boxcar filter, then segmented by Source Extractor to determine which pixels belonged to the host galaxy.A pixel was chosen uniformly at random from the host galaxy's segment to be the clump's location within the image.In order to probe the low surface brightness regions of a galaxy (without letting our sample be dominated by clumps in these regions), we randomly assigned 25% of clumps to a "wide" segment which encompassed a larger area than the standard galaxy segment. 7Figure 3 contains an example of the simulation placement process.Each clump was then simulated as a point source with a γ = 2.5 Moffat profile; of the profiles we tried, it was found by visual inspection by the authors that a Moffat profile looked the most "real" to observers, and was most difficult to distinguish from real clumps.For most clumps, the full width at half maximum (FWHM) of this profile was equivalent to the FWHM of the r band point spread function (PSF) in the image.We also allowed a small number of clumps to be "extended" by assigning each clump an effective physical radius, se- The galaxy segmentation map.The dark-gray inner region is the "narrow" region where clumps are placed with 75% probability, while the dark-and light-gray regions together make up the "wide" region where clumps are placed with 25% probability.Selected locations for simulated clumps are plotted as x's.(c) The final galaxy image with simulated clumps added (some are too faint to be detected).Dashed red lines are drawn around the simulations to highlight them.The final galaxy image shown to volunteers was rotated and/or reflected with respect to this one to distinguish its appearance from the original.
lected from a uniform distribution over the range [10 pc, 500 pc] (based on size limits observed by Messa et al. (2019)).If this physical radius exceeded that of the PSF, this radius was used as the effective radius of the clump's profile.Approximately 19% of simulated clumps (16,342 of 84,565) exceeded the seeing PSF size, by a median of 27%.The distribution of simulated clump sizes allows us to probe the effect of clump size on recovery statistics, and is not intended to match the true distribution.

Aggregation method
After subjects had been examined by Clump Scout volunteers, the locations of clumps needed to be determined from the collected annotations on each subject.To this end, we developed an aggregation algorithm with which all volunteer annotations on each subject were transformed into consensus clump locations.Broadly, our aggregation process consisted of two steps: First, clump candidates were identified via a clustering algorithm; second, clumps marked "unusual" by a sufficient fraction of volunteers (> 0.35) were discarded, since these are likely to be foreground star contaminants.
The clustering algorithm we employed is adapted from Branson et al. (2017) and is specialized for citizen science applications.A complete description of the algorithm can be found in Dickinson et al. (in prep).Briefly, we assign each volunteer a "false positive probability" (i.e. the chance that an annotation by this volunteer is not associated with any clump candidates), a "false negative probability (i.e. the chance that the volunteer failed to mark a given clump candidate), and a "scat-ter" value which estimates the distance between an annotation and its intended target.These values inform a clustering algorithm which identifies clump candidates; in turn, the identified clump candidates update the volunteer statistics, and so on.To determine when each image has been fully classified, a "risk" value is computed for each one.Once the risk falls below a threshold value, clustering is no longer performed and its clump candidates are finalized.Each clump candidate is assigned a "false positive probability" based on the number and properties of volunteers who marked it.To improve the purity of our sample, we discard clumps with false positive probability > 0.6.(By comparison with a sample of classifications by the authors, we found that the majority of volunteer-identified clumps with false positive probabilities larger than 0.6 did not correspond to any expert-identified clumps, while the majority of those below the threshold did.)We additionally remove any clumps that coincide with the volunteer-identified central bulge of a galaxy.To locate a galaxy's central bulge, we take the median x and y coordinates of central bulge annotations from all volunteers; we prune outlying annotations by removing any volunteer's central bulge annotation that falls more than 20 pixels from this location (in the 400x400 cutout image), and recalculate the central bulge location.We then remove any clumps within 1 PSF-FWHM of the central bulge.
In total (excluding unusual clumps), we identify 10,739 clumps over 7,052 galaxies in our sample.An additional 3,861 unusual clumps were identified; these are included in the final catalog, though it is likely that a majority are contaminating foreground stars.

CLUMP PROPERTIES AND COMPLETENESS
In this section, we will discuss our method for estimating the flux and background of each clump identified by Clump Scout.
We estimated the flux of and galactic background for each clump in each of the SDSS ugriz bands, using a method similar to Guo et al. (2015).First, flux was measured in an aperture of diameter 2.25 PSF-FWHM centered on the clump location.Next, the backgroundper-pixel value (where "background" refers to diffuse galaxy light) was estimated by taking the median pixel value in an annulus spanning diameters 3-5PSF-FWHM and used to estimate the background flux in the central aperture.Figure 4 provides a visual example of the aperture and annulus sizes used.This background is subtracted to obtain a clump flux estimate within the aperture.A random sample of model PSFs from 1,000 SDSS fields revealed that 84 ± 2% of flux from a point source falls within an aperture of diameter 2.25 PSF-FWHM.We therefore multiplied the backgroundsubtracted aperture flux by 1.191 to obtain the total flux of the clump.
There are a few known sources of systematic error in our flux estimation process for simulated clumps.A small percentage of flux ( 5%) is lost due to a combination of pixelation effects, contamination of the background region by the clump, and offsets between the recovered and true locations of clumps.In addition to these, the diffuse background flux is slightly underestimated by the background annulus (median: ∼ 85% of the true value).This results in overestimation of clump fluxes, particularly for dimmer clumps.While these systematics cannot fully be removed with the existing method, we have minimized them so that they are smaller than the scatter.It should be noted that these systematic effects may not exactly match those for the real sample of clumps compared with our simulations; as such, we simply minimize the systematics and do not apply a correction to counteract them.The bottom panel of Figure 5 visually quantifies the effectiveness of this flux recovery method by displaying the recovered vs. input magnitudes of the simulated clump sample, and demonstrates that the systematic effects are less than the scatter.
To estimate the error on clump fluxes, we first estimated the per-pixel uncertainties in each SDSS field using the gain and dark variance values provided for each CCD, as well as the image calibration and sky image maps provided with each field.We then fit an uncer-tainty model to all the pixels in each field of the form σ 2 f = mf + b, where σ 2 f is the variance on a pixel's flux, f is the pixel's flux, and m and b are model parameters.We sum the variances within a clump's aperture to estimate the aperture flux uncertainty, and we take the median pixel variance in the background annulus and multiply this by the aperture area to estimate the background uncertainty within the aperture.Finally, to obtain the variance on the background-subtracted aperture flux, we sum the aperture and annulus variance estimates; we multiply this value by the same aperture correction (1.191) to estimate the uncertainty on the final, background-subtracted clump flux.
We also obtained a completeness estimate for each clump.For a given clump, its completeness estimate is the estimated fraction of clumps similar to it that Clump Scout recovered.To determine this we relied on the recovery fractions of simulated clumps, where a simulated clump was considered "recovered" if Clump Scout volunteers located a clump within 0.75 PSF-FWHM of its location.We then examined the sample of simulated clumps with respect to three properties: A clump's brightness (g-band magnitude), its color (gminus-r magnitude), and its contrast against the diffuse background (clump-minus-background g-band magnitude).The simulated clumps were binned with respect to these properties, and the overall recovery fraction was calculated for each bin.We then calculated the same three properties for each real clump and compared with the recovery fractions of the simulated sample to obtain each clump's completeness estimate.The three properties selected -brightness, color and contrast -well capture the relevant properties of each clump; we found that including additional properties, including clump galactocentric radius, galaxy redshift, galaxy size, or image resolution, had only a very small effect on our completeness estimates by comparison.The details of this process are described in Appendix B. Figure 6 shows the estimated recovery fraction statistics for real clumps in our sample.
We use these completeness estimates in this work to correct the fraction of clumpy galaxies (f clumpy ) for incompleteness (Section 4.3).Future studies relying on this clump catalog should generally incorporate these completeness estimates to accurately model the local population of clumps, and should not rely solely on the observed number counts of clumps.

Catalog release and use
Along with the electronic release of this paper, we release the catalog of all of the clumps identified by the Clump Scout aggregator and their estimated properties.The columns are fully described by Table 4 at the end of this paper.Clumps with a high fraction (≥ 35%) of "unusual" annotations are included, but are marked by the flag unusual flag = 1.
To use this catalog for scientific purposes, we make a few "best practices" suggestions on how to filter this catalog: • Selecting a clean sample: We strongly recommend that clumps with unusual flag = 1 should be rejected, as these are probable nonclump contaminants (i.e.foreground stars, background galaxies, or other point-like sources).
• Mass completeness: This galaxy catalog is not mass complete, and the lower-limit mass on galaxies evolves significantly with redshift.We estimate that the catalog is mass-complete down to 10 9 M for z < 0.035.If a mass-complete catalog is needed, care must be taken to limit the sample's redshift.

THE LOCAL CLUMPY FRACTION OF GALAXIES
A particularly important observable in the study of clumpy galaxies is the fraction of star-forming galaxies with at least one clump, known as the "clumpy fraction" or f clumpy .The clumpy fraction is most simply defined where SFGs refer to star-forming galaxies (sSFR > 0.1 Gyr −1 ).This is an easily-compared observable between different galaxy populations which can significantly constrain models of galaxy evolution.In this section, we establish a clump definition that makes estimating f clumpy straightforward and precise, then present our f clumpy estimate.

Selecting a clump definition for f clumpy
A major difficulty in clumpy galaxy literature is that the definition of a "clump" is highly inconsistent.Past works have defined clumps as those objects identified by visual investigation (e.g.Elmegreen et al. 2007;Puech 2010;Overzier et al. 2009) or by applying detection algorithms that are robust to changes in resolution and depth, including the clumpfind algorithm from Williams et al. (1994) and others (e.g.Livermore et al. 2012;Guo et al. 2012;Tadaki et al. 2014;Zanella et al. 2019).However, comparisons between these different methods are not straightforward.Guo et al. (2015) proposed an empirically-motivated definition that the ratio of clump to galaxy UV luminosity (f LU V ) must exceed 8%.The 8% cutoff was chosen to select for star-forming regions at high redshift while excluding common star-forming regions locally; specifically, it includes many star-forming regions in HST-imaged galaxies spanning 0.5 < z < 3, but excludes > 99% of star-forming regions identified in the galaxy M101 (blurred to match the resolution of the high-redshift sample).Local clumps exceeding the f LU V > 8% threshold are therefore expected to be rare, exceptional objects.Several other recent works use this or a similar criterion (e.g.Shibuya et al. 2016;Mandelker et al. 2017;Fisher et al. 2017).However, it is not univer- In this paper, we calculate f clumpy by specifying a relative flux cutoff in the SDSS u band, ie: The ratio of clump to galaxy flux in the u band (f Lu ) is greater than some specified fraction.In particular we use the relative flux cuts f Lu > 8% and f Lu > 3%, and call the clumpy fractions under these criteria f clumpy,8% and f clumpy,3% respectively.The 8% fraction was selected to be comparable to existing works with a similar criterion, while the 3% fraction allows for larger number statistics and is easier and more accurate to estimate for the lowredshift universe where clumps are less common.
The relationship between f Lu and f LU V : It is worth noting that past studies have defined clumps by a relative flux threshold in the UV at ∼2,500 Å(e.g.Guo et al. 2015;Shibuya et al. 2016).For SDSS, UV data is not available, as the lowest wavelength band (u) probes ∼3,500 Å.However, there is a strong relationship between a clump's flux fraction in the SDSS u band (f Lu ) and in the near UV (f LU V ).To demonstrate this, we examined a highly complete sample of clumps from the Guo et al. ( 2018) sample (with f LU V > 5%), herein called the Guo+2018 sample.Guo+2018 was chosen because it includes a large (523 clumps) sample with f LU V ≥ 5%, and because it spans a similar physical resolution to Clump Scout: A typical SDSS g-band PSF-FWHM at z ∼ 0.05 is ∼1.2 kpc, compared with ∼1.1-1.3 kpc for HST sources spanning 1 < z < 2.5 for similar wavelengths.
For each clump in the Guo+2018 sample, we examined its flux fraction in CANDELS filter bands that were analogous to near UV and the SDSS u band in the rest frame. 8We find that there is a strong correlation between f Lu and f LU V using these filters, with the median clump having f Lu = 0.86f LU V .A total of 1,170 clumps from the Guo+2018 sample meet the f LU V > 8% criterion, compared with 961 which meet f Lu > 8%, a reduction of ∼ 18%.
We performed a similar experiment on clumps in Clump Scout.Though we could not calculate f LU V directly since SDSS does not provide near UV data (∼ 2500 Å), we obtained near UV fluxes for local galaxies from the GALEX survey (Martin et al. 2005), using the cross-matched GALEX-SDSS catalog created by Bianchi & Shiao (2020).We then assumed that local clumps matched the SED distribution of clumps from the Guo+2018 sample and multiplied each clump's u band flux by the UV-to-u ratio of a randomly selected Guo+2018 clump.The resulting values for f Lu and f LU V are plotted alongside the Guo+2018 values in Figure 7.For both groups, f Lu is a reasonably strong predictor of f LU V .Performing the same experiment using the g band rather than the u band revealed that f Lg values are not as well correlated to f LU V and are typically much smaller (the median clump had f Lg = 0.73f LU V ).
Based on these experiments, we conclude that f Lu is the best available analog of f LU V for SDSS data, though it results in a slight underestimate of f clumpy .

Calculating f clumpy
The calculation of f clumpy from the Clump Scout catalog has several steps.We calculate f clumpy in the full sample as well as in 3 distinct bins of galaxy mass, Table 2.The cuts applied on the galaxy sample used for calculating f clumpy ."Regular" galaxies were fully examined by Clump Scout volunteers, while "extra" galaxies (with fmerger ≥ 0.5 or f f eatured ≤ 0.5) were only partially examined (the number of examined galaxies is given in parentheses); the "total" column sums the regular and extra columns.Row 1 presents the parent sample of galaxies from Table 1, while rows 2 and 3 enumerate the sample removing quiescent galaxies and edge-on galaxies respectively.The final four rows detail the four mass-binned samples used for calculating f clumpy .
and the selection cuts and galaxy counts for each mass bin are detailed in Table 2. Here, we detail the steps for calculating f clumpy in the broadest mass bin (M > 10 9 M ), though the same process applies to every bin.First, we isolate a star-forming, mass-complete sample of galaxies.SDSS is complete for galaxies down to 10 9 M at redshifts z < 0.035.Therefore, beginning with the Clump Scout parent sample defined in section 2.2, we limit the sample to galaxies with specific star formation rate (sSFR) > 10 −1 , M > 10 9 M , and z < 0.035.In addition, because clumps may be more difficult to detect in edge-on galaxies than face-on galaxies, we remove all galaxies for which the ratio of the galaxy's major to minor axis is less than 0.3, where this axis ratio is estimated by the SDSS exponential fit in the r-band (expAB r in the PhotoPrimary table).SDSS contains N tot = 5,930 galaxies passing these cuts.
We then combine the contributions from the "regular" Clump Scout sample of 53,613 galaxies, and the "extra" sample of 4,937 galaxies used to extrapolate over all galaxies in SDSS that Clump Scout did not directly examine.In total, N reg = 2,640 of 5,930 galaxies passing all cuts for f clumpy were examined in the regular sample.
Of the remaining 3,290 galaxies, a sample of 230 were examined by volunteers as part of the "extra" sample.We use N extra,samp to refer to the size of the sample of these galaxies that volunteers examined directly, ie.N extra,samp = 230, and N extra,tot to refer to the size of the total population from which these galaxies were drawn, ie.N extra,tot = 3,290.
For each group, the observed clumpy fraction f clumpy,obs is calculated and the completeness correction from Section 4.3 is applied.The corrected fraction f corr clumpy,extra over the "extra" sample is then extrapolated over all SDSS galaxies not examined by Clump Scout.This yields the total clumpy fraction: Here, f clumpy,reg and f clumpy,extra are the fraction of galaxies out of N reg and N extra,samp respectively that were estimated to be clumpy.The sampling error is estimated separately on the "regular" and "extra" samples using the standard error formula on a proportion, taking the sample size N to be the number of examined galaxies in the "regular" or "extra" group: We then scale these by their contribution to the total value of f clumpy , and add them in quadrature to obtain To estimate the total error on f clumpy , we use a Monte Carlo method to include contributions from the uncer- tainty on clump fluxes and clump incompleteness as well as from sampling error.Over 100 trials, clump fluxes are allowed to randomly vary within a normal distribution defined by their estimated error values (see Section 3), while the clump completeness map is recalculated on each trial by the method described in Appendix B. On each Monte Carlo trial, we include sampling error by calculating an initial value of f clumpy , then reassigning it to a random value selected from N (f clumpy , σ clumpy ).Of these sources of error, sampling error is by far the most significant: In a trial run where error contributions from clump flux error and the completeness correction were ignored, the error bars were typically within 10% of their original values.The exception is f corr clumpy,3% whose error bars were ∼40% lower when completeness and clump flux error contributions were ignored.

Correcting f clumpy for incompleteness
Our estimate of f clumpy must also take into account the incompleteness of our clump catalog.We therefore use the following method for completeness-correcting the clumpy fraction of galaxies.
The end-goal of the f clumpy completeness correction is to calculate P F N , the probability that a galaxy is a false negative for clumps.In other words, P F N is the probability that a given galaxy contains one or more clumps, but that none of its clumps were detected.To calculate this probability, we work in steps from the completeness estimates on individual clumps in the sample.We define P rec,i to be the recovery fraction of clump i. (Refer to Appendix B for a full overview of how P rec,i is determined for each clump.) We begin by calculating the recovery probability of a randomly selected clump within the sample, P rec : Next, we estimate the true distribution of clumps per clumpy galaxy, P count (n c ).To do so, we begin with a proposed distribution P count (n c ), then simulate 10,000 clumpy galaxies with this distribution.We then remove a fraction P rec of clumps at random to simulate the observed distribution, discarding galaxies with no observed clumps.The proposed distribution is adjusted until the mean number of clumps per galaxy in the simulated distribution closely matches the observed mean.For mathematical expediency, we used an exponential distribution to model P count (n c ). (It should be noted, though, that the observed distributions are also well-fit by Poisson distributions, so the exponential model does not necessarily have physical significance.)In Figure 8, we plot the observed distributions of f Lu > 3% clumps per galaxy for each galaxy mass bin we examine along with the best-fit exponential model.
Given P rec and P count (n c ), P F N is given by The P F N sum is dominated by the first few terms.For f Lu > 8% clumps in our broadest galaxy mass bin (M > 10 9 M with z < 0.035), we estimate P rec ≈ 49.3% and λ ≈ 0.85; given these values, the first three terms of the P F N sum account for approx.78%, 92% and 99% of missed clumpy galaxies respectively.Given the galaxy false-negative probability P F N , the completeness of f clumpy is given by (1 − P F N ).It is then straightforward to correct the clumpy fraction: f corr clumpy should be taken as our estimate of the "true" value, as it accounts for galaxies whose clumps were not detected; however, we present both the observed and corrected values in our results.

Results
Here, we present our results for f clumpy,8% and f clumpy,3% (the clumpy fraction using the thresholds f Lu > 8% and f Lu > 3% respectively), both overall and within several different galaxy mass bins.All of these numbers are collected in Table 3.
Within the regular Clump Scout sample (N reg = 2,640), we detected 85 galaxies with clumps passing f Lu > 8%; corrected for incompleteness, we estimate the true number to be ∼136.Within the extra sample (N extra = 230), we observed just 1 galaxy with clumps passing f Lu > 8%, and estimate the true number to be ∼2 correcting for incompleteness.In total we estimate that 157 of N tot =5,930 galaxies have clumps (corrected for incompleteness), leading to an estimate f corr clumpy,8% = 2.68 +0.33 −0.30 %.We apply the same procedure to galaxies using the f Lu > 3% cut, and observe 344 clumpy galaxies in the regular sample (∼556 corrected for incompleteness) and 4 clumpy galaxies in the extra sample (∼9 corrected for incompleteness).This yields a total estimate f corr clumpy,3% = 11.33 +0.89 −1.16 %.
To characterize the local distribution of clumps, we also examine f clumpy in three different bins of galaxy mass: 9 < log 10 (M/M ) < 9.8, 9.8 < log 10 (M/M ) < 10.6, and 10.6 < log 10 (M/M ) < 11.4; we refer to the galaxies in these bins as low-mass, medium-mass, and high-mass galaxies respectively.These match the mass bins over which Guo et al. (2015) estimated the clumpy fraction.To obtain a complete galaxy sample, different redshift limits were applied to each bin: z < 0.035 for low-mass galaxies, z < 0.05 for medium-mass galaxies, and z < 0.09 for high-mass galaxies (see Figure 9).
Following the same procedure as for the overall clumpy fraction, we estimate that f corr clumpy,8% is 2.53 +0.21 −0.23 % for low-mass galaxies, 2.43 +0.57−0.50 % for medium-mass galaxies, and 1.95 +0.49−0.40 % for high-mass galaxies.A more complete list of statistics can be found in Table 3; they are also plotted in Figure 10.Example images of galaxies in each of the mass bins is shown in Figure 13.

Comparisons to other studies
To place our estimates of f clumpy,8% at z < 0.1 in context, we compare them with high-redshift (z > 0.5) results from other works, in particular Shibuya et al. (2016) for galaxies of all masses M > 10 9 M and Guo et al. (2015) for galaxies in bins of low, medium, and high mass (matching the mass bins used in this paper).We plot these values in Figure 11.Table 3. Observed and corrected values for the clumpy fraction using the fLUV ≥ 8% and fLUV ≥ 3% criteria, divided by mass bin.Each cell reporting an estimate of f clumpy also reports the number of galaxies corresponding to this fraction, out of a possible total of Ntot.Note that the observed numbers refer to the sum of the clumpy galaxy count observed in the "regular" sample with the extrapolated count from the "extra" sample, so not all of these galaxies were directly observed.
Figure 9. Determining survey completeness limits for our mass bins.Here we plot log mass vs. redshift for all galaxies examined by Galaxy Zoo 2 with spectroscopic redshifts (ie. the parent sample of Clump Scout).Dashed horizontal lines demarcate the three primary mass bins used in our analysis.The green shaded regions represent the relative surface density of galaxies, while the red line traces the same with some added smoothing; assuming no significant cosmological evolution over this redshift range, the surface density should remain constant if there is no loss due to survey incompleteness.We draw a vertical dashed black line at the redshift where the surface density of galaxies first falls below 60% of maximum, and we use this redshift to approximate the limit of a "mass complete sample" in each mass bin.Redshift limits are therefore drawn at ∼ 0.035 for the lowest-mass bin, ∼ 0.05 for the intermediate bin and ∼ 0.09 for the highest mass bin.Shibuya et al. (2016) found that the clumpy fraction peaks between redshifts 1-2 at a value of > 50% before declining over z < 1.To model this trend, they use a fit function with the same form as is commonly used to model the trend in the cosmic star formation rate density with redshift (Madau et al. 1996;Lilly et al. 1996).To compare with their results, we take their best fit model for z vs. f clumpy and extend it beyond their data to z ∼ 0; this is plotted in Figure 11.Their model predicts a value of f clumpy,8% ∼ 4% at z ∼ 0, which aligns closely with our result of f clumpy,8% = 3.6 ± 0.5%.Guo et al. (2015) observed a nearly constant f clumpy,8% ∼ 50% for low-mass galaxies over 0.5 < z < 3, while our results indicate a value of ∼ 3% at z ∼ 0 for galaxies of the same mass.These results can be meaningfully compared, because the Clump Scout results define f clumpy,8% with a f Lu > 8% cut that is similar to the f LU V > 8% cut used by Guo et al. (2015) (see Section 4.1).In addition, both probe a similar physical resolution: The physical resolution of CANDELS images is ∼ 1kpc at all redshifts, compared to ∼ 0.5 to ∼ 1.7 kpc over the range 0.02 < z < 0.09 for SDSS.We therefore expect that this drop in the clumpy fraction between Figure 11.The value of f clumpy,8% vs. redshift, as estimated by our study at z ∼ 0 and others at z > 0.5.TOP: f clumpy for galaxies of all masses (> 10 9 M ).Our result is reasonably close to the model by Shibuya et al. (2016) that was originally fit to their high-redshift f clumpy results.BOTTOM: f clumpy divided into mass bins.The mass bins and methods used in this work match closely with those used by Guo et al. (2015) to estimate f clumpy at z > 0.5, but our clumpy fractions at z ∼ 0 are significantly lower than those at higher redshift.For comparison, we have plotted estimates of the minor merger fraction and observations of the "turbulent fraction" of galaxies between 0 < z < 1.5.The minor merger fraction is modeled by Lotz et al. (2011) and plotted here with observability timescales of 0.5, 1.25, and 2 Gyr (with gray error regions containing the best fit range).The turbulent fraction comes from kinematic observations by Kassin et al. (2012) over three galaxy mass bins.We find that the turbulent fraction qualitatively matches the patterns observed in f clumpy , ie. that it declines significantly over 1.5 < z < 0 and that high-mass galaxies begin this decline the soonest.By comparison, we find that the minor merger fraction remains approximately constant over the same time period and is a poor tracer of f clumpy for any mass bin when including our results at z ∼ 0. 0 < z < 0.5 is real and not merely the result of different identification methods.
There are few other studies that estimate f clumpy in the local universe.Murata et al. (2014) studied clumpy galaxies selected from HST/ACS F814W imaging from the COSMOS field spanning redshifts 0.2 < z < 1.The F814W filter approximately corresponds to the SDSS r and g bands over this redshift range and the nearest galaxies in this sample (z ∼ 0.2) are likely similar to Clump Scout galaxies (z < 0.1).The fraction of optically bright galaxies with multiple star-forming clumps was found to decrease from 0.35 at z ∼ 1 to 0.05 at z ∼ 0.2.While this decrease is qualitatively in line with our results, the definition of f clumpy used by Murata et al. (2014) is significantly different than that used here: Rather than apply a relative flux criterion, clumpy galaxies were selected to have multiple star-forming clumps of comparable brightness.The low completeness of our sample prevents us from applying the Murata et al. (2014) condition, as it requires the detection of at least 3 clumps per clumpy galaxy.Overzier et al. (2009) also studied clumpy galaxies at z < 0.3, but only studied a sample of 30 "Lyman break analog" galaxies with extremely high UV fluxes; the clumpy fraction obtained from this sample is not comparable to that of our broader sample.

Interpretation of f clumpy
Given this paper's focus on the fraction of clumpy galaxies, it is worth discussing exactly how this quantity is defined and how it should be used.f clumpy is a particularly good probe of trends in clumpiness across cosmic time: By controlling for galaxy mass and star-formation rate, we ensure that f clumpy is computed between groups of similar galaxies even at different redshifts.However, using a relative flux criterion (f LU V > 8%) to define clumps may select for very different sets of physical objects depending on galaxy mass.Naively, assuming a linear relation between mass and u-band luminosity in our galaxy sample, our lowest galaxy mass bin (10 9 − 10 9.8 M ) includes clumps that are ∼ 2 dex less massive than our highest mass bin (10 10.6 − 10 11.4 M ).It is therefore not straightforward to compare f clumpy between bins of different galaxy mass.The validity of this comparison depends on the clump luminosity function: For example, if the clumpy luminosity function experiences an exponential cutoff (e.g. as proposed by Livermore et al. 2012), the relation be-tween f clumpy and galaxy mass would depend the location of this exponential cutoff with galaxy mass.Therefore, while f clumpy can be compared for galaxies of similar mass across different redshifts, it is not straightforward to compare f clumpy between galaxies of different mass.The remainder of this discussion focuses on trends in f clumpy with redshift for this reason.

Physical implications of f clumpy
A major motivation for determining f clumpy over large redshift ranges is to distinguish between different proposed modes of clump formation.There are two primary modes by which clumps are thought to form.In the insitu mode, clumps form due to gas collapse within the host galaxy due to turbulent disk dynamics (ie.VDI).VDI is expected in galaxies that are actively accreting gas via "cold-mode" accretion, in which gas flows into the galaxy via smooth, cold streams.This accretion process adds kinetic energy to the disk and can drive the Toomre parameter below unity, making gas unstable to collapse (Dekel et al. 2009).Alternately, in the ex-situ mode of formation, clumps originate as minor mergers: The clump forms as a satellite galaxy with its own dark matter component, only later merging with its host.It should be noted that clumps are short-lived structures on a cosmological scale: Simulations find that massive clumps in disk galaxies have a maximum lifetime of 500 Myr, by which time they are slowed due to dynamical friction and have merged with their host's central bulge (Bournaud et al. 2014;Mandelker et al. 2014).Therefore, the presence of clumps indicates that the clump formation process is ongoing or recent, and f clumpy can effectively act as a tracer of galaxy behavior.It remains unclear which is the dominant formation process as different processes may dominate different galaxy populations.
To determine the primary formation process of clumps (i.e. via in-situ or ex-situ formation), we can examine trends in the rate of VDI and the minor merger rate over cosmic time and compare these with trends in f clumpy .In Figure 11, we have plotted our estimate of f clumpy,8% along with comparable estimates at higher redshift (Guo et al. 2015;Shibuya et al. 2016) and estimates of the fraction of galaxies experiencing VDI and with observable signatures of minor mergers.
We use the minor merger rate estimate from Lotz et al. (2011), which was obtained by subtracting the number of galaxies with close pairs (major mergers) from the number with disturbed, uneven morphologies (major and minor mergers).The best-fit model to the minor merger fraction takes the form f merg,minor ∝ T obs (1 + z) α , with best-fit exponent α = −0.2± 0.2.The "observability timescale" refers to the time during which the host galaxy's morphology is measurably disturbed, which is dependent on the detection method (and distinct from the lifetime of an ex-situ clump formed during a merger).To represent the uncertainties in these parameters, we plot the best fit model for T obs values of 0.5, 1.25, and 2 Gyr over the fit range (0 < z < 1.5).In all cases, the minor merger rate rises or remains constant over the full redshift range due to the fit parameter α = −0.1 ± 0.1.
To determine the fraction of galaxies experiencing turbulence, we use measurements of galaxy kinematics from Kassin et al. (2012).These measurements reveal that galaxies of a wide range of masses (10 8−10.7 M ) tend to "settle" and become rotationally-dominated over the period 0 < z < 1.2.Moreover, they find that the highest mass galaxies have the lowest fraction of turbulence at any epoch.The "turbulent fraction", defined as the fraction of galaxies for which V circ /σ gas < 3 (ie.the fraction of galaxies experiencing VDI), is plotted in Figure 11 for 3 of the mass bins examined by Kassin et al. (2012), spanning 10 9−10.7 M .All turbulent fractions decline over 0 < z < 1.2, with higher-mass galaxies declining more quickly.
We then turn to trends in f clumpy,8% and compare them to the trends in the two clump formation mechanisms (VDI and minor mergers) described above.Ignoring our low-redshift data for a moment, the data from Guo et al. (2015) suggested that two different clump formation mechanisms may be dominant in high-mass galaxies and low-mass galaxies.The clumpy fraction for high-mass galaxies declines significantly with time over the span 0.5 < z < 3 from ∼ 55% to ∼ 15%, while for low-mass galaxies it remains constant at f clumpy ∼ 60% over the same time span.To explain this difference, it was suggested that the primary formation mechanism for clumps in high-mass galaxies may be VDI (in-situ) and trace the turbulent fraction over this time span, while those in low-mass galaxies form due to minor mergers (ex-situ) which are roughly stable over the same time span.
However, our low-redshift estimates of f clumpy challenge this two-mechanism formation model.For galaxies of all masses, we now observe a significant decline in f clumpy,8% to < 5% at z ∼ 0.Even assuming that our z ∼ 0 estimates of f clumpy,8% are too small by a factor of several, the observed fraction would still be far lower at z ∼ 0 than at z > 0.5 in every mass bin.This result matches the conventional wisdom about clumps, i.e. that giant star-forming clumps are common at high redshift and rare locally.However, the roughly constant minor merger rate over the period 0 < z < 0.5 is inconsistent with the significant low-redshift decline that we observe.
Instead, we suggest that in-situ, VDI-driven formation is the primary mode of clump formation in galaxies of all masses, at least over the redshift range 0 < z < 1.5.The trends in galaxy turbulence over this time span match closely with the trends in f clumpy : All galaxies show evidence of a decline in turbulence, with low-mass galaxies remaining turbulent the longest.The VDI-driven formation model provides a natural mechanism for the decline in f clumpy , which is the decline in the cosmological rate of gas accretion by galaxies: As the availability of intergalactic gas decreases, so too do the rates of star-formation, turbulent dynamics, and clump formation (Dekel et al. 2009).
Adding to this picture, smaller case studies have already provided limited evidence to link VDI to clumpiness directly.Studies of the kinematics of high-redshift clumpy galaxies find that they have turbulent morphologies (Elmegreen et al. 2009;Genzel et al. 2011), with Genzel et al. (2011) in particular noting that clumps appear in regions of the galaxy where the Toomre instability parameter is sub-unity.However, these studies examined spirals with stellar masses 10 10.6 M , corresponding to the highest mass bin in our work; similar kinematic examination of galaxies with lower masses remains to be done.In total, the current body of evidence points to a picture of clump formation that is dominated by in-situ formation due to turbulent disk dynamics, though more concrete evidence would be needed to confirm this.

SUMMARY AND CONCLUSIONS
In this work we present the largest-yet catalog of local star-forming clumps (z 0.1), consisting of 14,341 clumps in 9,692 galaxies.The clumps were identified via the citizen science project Galaxy Zoo: Clump Scout which asked volunteers to identify star-forming clumps in a sample of 58,614 galaxies selected from the parent sample Galaxy Zoo 2. Consensus locations for these clumps are determined via an aggregation technique adapted from Branson et al. (2017).We estimate the completeness of our clump sample by comparing with a sample of simulated clumps identified via the same process.
The clump catalog generated by this work is versatile and can be used for many purposes.While this paper focused on estimating the clumpy fraction of galaxies, the catalog can also be used to answer other questions about clumps.In follow-up work, we intend to investigate the mass and age functions of these clumps using photometric SED fitting.Obtaining these statistics will permit comparison between the properties of clumps at low-and high-redshift, and they can be also used to directly test theories of clump formation and evolution which make predictions on the mass or age distribution of clumps.
We define two different measures of the clumpy fraction of galaxies, f clumpy,8% and f clumpy,3% , which measure the fraction of galaxies with at least one clump emitting at least 8% and 3% of galaxy flux in the u band (f Lu ≥ 8% and f Lu ≥ 3%) respectively.f Lu is found to be the closest analog to f LU V (the fraction of galaxy flux emitted in the near UV) available in SDSS data.f clumpy,8% is presented because it has been used in the past for high-redshift studies of clumpy galaxies, while f clumpy,3% is presented for comparison with future local studies.Both fractions are corrected for incompleteness.We find f clumpy,8% ∼ 3.6% and f clumpy,3% ∼ 13.4%, with considerable variation over different mass bins.
Our low value of f clumpy,8% is qualitatively in line with other low-redshift surveys (e.g.Murata et al. 2014) though few are available.It is however much lower than the values of f clumpy,8% estimated at high redshift (Guo et al. 2015;Shibuya et al. 2016).We suggest that the extreme decrease in clumpy morphology is not in line with minor-merger-driven clump formation (as suggested by Guo et al. (2015) for low-mass galaxies) because the minor merger rate does not show similar change over this period (Lotz et al. 2011).Instead, we suggest that a better tracer of f clumpy,8% is the turbulent fraction of galaxies.Kassin et al. (2012) observed a decline in turbulence for galaxies of all masses (10 8−10.7 M ), but in particular noted that larger galaxies settle quickly after z ∼ 1.2 while less massive galaxies remain turbulent for a longer time, mimicking the trends in f clumpy .In total, the current body of evidence supports a picture where clumps primarily form in-situ due to disk instability, though more observations are needed.the number of (total and recovered) simulated clumps in each bin, as well as the uncorrected distribution of recovered real clumps over these bins.
Finally, we interpolated the discrete map f recov (k) to a continuous map over all clump properties, f recov (x).We began with a grid of values for f recov (x) by assuming that f recov (k) = f recov (x 50 (k)), where x 50 (k)) is the median value of x for simulated clumps in partition k.We then linearly interpolated over this grid to obtain the continuous map.Clumps falling outside of the interpolation grid were assigned f recov (k(x)), ie. the value for their bin.0.4% of real clumps fell in bins with fewer than 5 simulated clumps; these clumps were discarded for further analysis.
Once the map f recov (x) is established, we can estimate the specific completeness estimate for each clump in our sample: For clump i with estimated properties x(i), we define its estimated completeness P rec,i as P rec,i = f recov (x(i)).These P rec,i values can then be used to correct f clumpy for incompleteness, as explained in Section 4.3.
This method of estimating completeness also allows us to easily estimate the uncertainty on the estimate using a Monte Carlo method.Over 100 trials, we allowed the fraction of recovered clumps in each bin (f recov (k)) to vary randomly over its estimated distribution P (f recov (k)), rather than taking the 50th percentile value.Repeated trials yielded an approximate distribution on f recov(x) .Table 4. Clump Scout catalog description.References abbreviated as "SDSS" refer to the SDSS DR15 catalog (York et al. 2000); those abbreviated as "GALEX" refer to the GALEX All-sky Imaging Survey (Martin et al. 2005), with SDSS crossmatching performed by Bianchi & Shiao (2020); those abbreviated as "MPA-JHU" refer to the value-added catalog released with SDSS, based on work by Kauffmann et al. (2003) and Brinchmann et al. (2004).

Figure 1 .
Figure 1.The user interface for Galaxy Zoo: Clump Scout, showing a partially completed classification.The volunteer has marked the central bulge (red crosshairs) in the previous step and is now identifying clumps.

Figure 2 .
Figure2.A 2D histogram of simulated clumps counts in log10 clump relative mass and age space.Clumps' properties were drawn uniformly from within the area plotted in red, then discarded if they did not meet the magnitude limit described in Section 2.3.

Figure 3 .
Figure 3.An illustration of the process of placing simulated clumps in an image.(a) The initial galaxy image, g band only.(b)The galaxy segmentation map.The dark-gray inner region is the "narrow" region where clumps are placed with 75% probability, while the dark-and light-gray regions together make up the "wide" region where clumps are placed with 25% probability.Selected locations for simulated clumps are plotted as x's.(c) The final galaxy image with simulated clumps added (some are too faint to be detected).Dashed red lines are drawn around the simulations to highlight them.The final galaxy image shown to volunteers was rotated and/or reflected with respect to this one to distinguish its appearance from the original.

Figure 4 .
Figure 4.An illustration of the parameters used to estimate the flux from each clump.Left is a galaxy image shown to volunteers from the "real" sample (no simulated clumps), zoomed-in to show detail.The right image marks the locations of 4 recovered clumps.Solid outlines are drawn around the central apertures and dashed lines are drawn around the background regions.

Figure 5 .
Figure 5.Here we illustrate the effectiveness of the aggregation and flux estimation process.For all simulated clumps that volunteers recovered, we compare the "input" properties of clumps (their original simulated values) with their "recovered" properties, following the aggregation process laid out in Section 2.4 and the flux estimation method in Section 3. Top: Histogram of the offset between the recovered and true locations of simulated clumps recovered by volunteers.The median offset is 0.15 PSF-FWHM, as marked by the dashed line.Bottom: The flux recovery fraction (recovered flux / input flux) vs. input magnitude for simulated clumps.Red points mark the binned medians in magnitude difference, while vertical bars span the 16th to 84th percentile flux recovery fractions.The median flux recovery fraction was 1.03 for clumps with simulated magnitude < 22.5.There are a few sources of systematic error in the flux recovery process which could not be removed completely (described in Section 3).sally applicable: Dessauges-Zavadsky & Adamo (2018) rejected it to allow for study of the mass function of high-redshift clumps down to much lower masses, while Huertas-Company et al. (2020) used a clump mass cut of M clump > 10 7 M instead to facilitate comparison with simulations.In this paper, we calculate f clumpy by specifying a relative flux cutoff in the SDSS u band, ie: The ratio of clump to galaxy flux in the u band (f Lu ) is greater than some specified fraction.In particular we use the relative flux cuts f Lu > 8% and f Lu > 3%, and call the clumpy fractions under these criteria f clumpy,8% and f clumpy,3% respectively.The 8% fraction was selected to be comparable to existing works with a similar criterion, while the 3% fraction allows for larger number statistics and is easier and more accurate to estimate for the lowredshift universe where clumps are less common.The relationship between f Lu and f LU V : It is worth noting that past studies have defined clumps by a relative flux threshold in the UV at ∼2,500 Å(e.g.Guo

Figure 6 ..
Figure 6.The estimated recovery fraction for clumps as a function of the three clump properties m clump g

Figure 7 .
Figure 7.Comparison of fLUV (the clump-to-galaxy flux ratio in the near UV) and fLu (the same in the SDSS u band) in the Guo+2018 sample and the Clump Scout sample.The y-axis traces fLu , while the x-axis traces fLUV (the same in the near UV).Red points are used for Clump Scout clumps while the gray contour lines are drawn to visualize their scatter.For Guo+2018 clumps, fLUV and fLu have a strong linear correlation (Spearman rank correlation rs ≈ 0.84).We take the near UV fluxes for a sample of Clump Scout galaxies from the GALEX survey.While we cannot measure the near UV fluxes of Clump Scout clumps directly, we estimate it by assuming that local clumps have similar SEDs to clumps in the Guo+2018 sample and multiplying each clump's u-band flux by the UV-to-u ratio for a randomly selected Guo+2018 clump.The resulting distributions of fLUV and fLu fall nearly along a 1-1 for both Clump Scout and Guo+2018.fLu is strongly correlated with fLUV , though there is a large degree of scatter in the relation.

Figure 8 .
Figure 8.The distribution of clumps/galaxy in each of our 4 mass bins (the bins are described more fully in Table3).Dashed lines represent the best-fit exponential model to the data of the form NG = Ae −λnc for galaxy count NG and clumps per galaxy nc.Error bars on this fit were estimated with Markov chain Monte Carlo sampling, and the shaded region around each line represents the 68% confidence interval for the fit.The best-fit λ value (with 68% confidence interval) is also provided for each fit.

Figure 10 .
Figure 10.The value of f clumpy in three galaxy mass bins, under the criterion fLu > 8% (left) and fLu > 3% (right).The hatched bar represents the observed value, while the solid bar represents the completeness-corrected estimate.

Figure 12 .
Figure12.Demonstration of the simulated clump binning procedure.In each subplot, each shaded square represents a single bin in 3D space; the subplots divide clumps by their brightness, while the x-and y-axes trace the other two properties (contrast and color) that are included in the binning procedure.TOP: The recovery fraction of simulated clumps is printed in each bin, formatted as "recovered/total".BOTTOM: The number of real clumps with fLu > 3% is printed in each bin.

Figure 13 .
Figure13.A sample of galaxies, each containing at least one clump with fLu > 8%.Ten galaxies are presented from each of the low-, medium-, and high-mass bins; the SDSS specobjid of each galaxy is provided at the top of each image, while the RA and dec values are provided in parentheses at the bottom.Each identified clump is circled in yellow, while the central bulge locations identified by volunteers are identified by red crosses.Dotted circles indicate clumps with fLu < 3%, dashed circles indicate those with 3% < fLu < 8%, and solid circles those with fLu > 8%