GALAXY ZOO MORPHOLOGY AND PHOTOMETRIC REDSHIFTS IN THE SLOAN DIGITAL SKY SURVEY

Published 2011 May 18 Copyright is not claimed for this article.
, , Citation M. J. Way 2011 ApJL 734 L9 DOI 10.1088/2041-8205/734/1/L9

2041-8205/734/1/L9

ABSTRACT

It has recently been demonstrated that one can accurately derive galaxy morphology from particular primary and secondary isophotal shape estimates in the Sloan Digital Sky Survey (SDSS) imaging catalog. This was accomplished by applying Machine Learning techniques to the Galaxy Zoo morphology catalog. Using the broad bandpass photometry of the SDSS in combination with precise knowledge of galaxy morphology should help in estimating more accurate photometric redshifts for galaxies. Using the Galaxy Zoo separation for spirals and ellipticals in combination with SDSS photometry we attempt to calculate photometric redshifts. In the best case we find that the root-mean-square error for luminous red galaxies classified as ellipticals is as low as 0.0118. Given these promising results we believe better photometric redshift estimates for all galaxies in the SDSS (∼350 million) will be feasible if researchers can also leverage their derived morphologies via Machine Learning. These initial results look to be promising for those interested in estimating weak lensing, baryonic acoustic oscillation, and other fields dependent upon accurate photometric redshifts.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

It is commonly believed that adding information about the morphology of galaxies may help in the estimation of photometric redshifts (Photo-z's) when using training set methods. Most of this work in recent years has utilized The Sloan Digital Sky Survey (SDSS; York et al. 2000). For example, as discussed in Way et al. (2009, hereafter Paper II), many groups have attempted to use a number of derived primary and secondary isophotal shape estimates in the SDSS imaging catalog to help in estimating Photo-z's. Some examples include using the radius containing 50% and/or 90% of the Petrosian (1976) flux in the SDSS r band (denoted as petroR50_r petroR90_r in the SDSS catalog), concentration index (CI = petroR90_r/petroR50_r), surface brightness, axial ratios, and radial profile (e.g., Collister & Lahav 2004; Ball et al. 2004; Vanzella et al. 2004; Wadadekar 2005; Kurtz et al. 2007; Wray & Gunn 2008).

More recently, Singal et al. (2011) have attempted to use Galaxy Shape parameters derived from Hubble Space Telescope/Advanced Camera for Surveys imaging data using a principal component approach and then feeding this information into their neural network code to predict Photo-z's, but for samples much deeper than the SDSS. Unfortunately, they find marginal improvement when using their morphology estimators.

Another promising approach focuses on the reddening and inclination of galaxies. Yip et al. (2011) have attempted to quantify these effects on a galaxy's spectral energy distribution (SED). The idea is to use this information to correct the overestimation of Photo-z's of disk galaxies.

On the other hand, attempts to morphologically classify large numbers of galaxies in the universe have gained in accuracy over the past 15 years as better/larger training samples from eye classification have increased. For example, Lahav et al. (1995) were one of the first to use an artificial neural network trained on 830 galaxies classified by the eyes of six different professional astronomers. In more recent years, Ball et al. (2004) have attempted to classify galaxies by morphological type using a neural network approach based on a sample of 1399 galaxies (from the catalog of Nakamura et al. 2003). Cheng et al. (2011) have used a sample of 984 non-star-forming SDSS early-type galaxies to distinguish between E, S0, and Sa galaxies. In the past year two new attempts at morphological classification using Machine Learning techniques on a Galaxy Zoo (Lintott et al. 2008, 2011) training sample have been published (Banerji et al. 2010; Huertas-Company et al. 2011). The Banerji et al. (2010) results were impressive in that they claim to obtain classification to better than 90% for three different morphological classes (spiral, elliptical, and point sources/artifacts).

These works are in contrast to previous work like that of Bernardi et al. (2003) who used a classification scheme based on SDSS spectra. However, this classification certainly missed some early-type galaxies from their desired sample due to the presence of star formation.

In this Letter, we will continue our use of Gaussian process regression to calculate Photo-z's using a variety of inputs. This method has been discussed extensively in two previous papers (Way & Srivastava 2006; Way et al. 2009).

We utilize the SDSS Main Galaxy Sample (MGS; Strauss et al. 2002) and the Luminous Red Galaxy Sample (LRG; Eisenstein et al. 2001) from the SDSS Data Release Seven (DR7; Abazajian et al. 2009). We also utilize the Galaxy Zoo 1 survey results (GZ1; Lintott et al. 2011). The Galaxy Zoo project4 (Lintott et al. 2008) contains a total of 900,000 SDSS galaxies with morphological classifications (Lintott et al. 2011).

While this study does not focus exclusively on the LRG sample, it should be noted that if it is possible to improve the Photo-z estimates for these objects as shown herein it could also improve the estimation of cosmological parameters (e.g., Blake & Bridle 2005; Padmanabhan et al. 2007; Percival et al. 2010; Reid et al. 2010; Zunckel et al. 2011) using the SDSS as well as upcoming surveys such as BOSS5 (Cuesta-Vazquez et al. 2011; Eisenstein et al. 2011), BigBOSS (Schlegel et al. 2009), and possibly Euclid (Sorba & Sawicki 2011), not to mention LSST6(Ivezic et al. 2008). It could also contribute to more reliable Photo-z errors, as required for weak-lensing surveys (Bernstein & Huterer 2010; Kitching et al. 2011) and baryonic acoustic oscillation measurements, which are also dependent upon accurate Photo-z estimation of LRGs (Roig et al. 2008).

2. DATA

All of the data used herein have been obtained via the SDSS casjobs server.7 In order to obtain results consistent with Paper II for both the MGS and LRG samples we use the same photometric quality flags (!BRIGHT and !BLENDED and !SATURATED) and redshift quality (zConf>0.95 and zWarning = 0) but using the SDSS DR7 instead of earlier SDSS releases. These data are cross-matched in casjobs with Columns 14–16 in Table 2 of Lintott et al. (2011) extracting the galaxies flagged as "spiral," "elliptical," or "uncertain." The galaxies "flagged as "elliptical" or "spiral" require 80% of the vote in that category after the debiasing procedure has been applied; all other galaxies are flagged "uncertain" (Lintott et al. 2011). Debiasing is the processes of correcting for small biases in spin direction and color. See Section 3.1 in Lintott et al. (2011) for more details on debiasing.

Note that the GZ1 sample is based upon the MGS, but the MGS contains LRGs as well. This is why we can analyze both of these samples. However, the actual LRG survey goes fainter than the MGS and so we do not find LRG galaxies fainter than the MGS limit of rpetrosian ≲17.77. See Strauss et al. (2002) and Eisenstein et al. (2001) for details on the MGS and LRG samples.

A number of points from both the LRG and MGS were eliminated because of either bad values (e.g., −9999) or because they were considered outliers from the main distribution of points. The former offenders included petroR90_i (13 points in the MGS sample, 1 point in the LRG), mE1_i (43 points, 5 points), petroR90Err_i (7177 points, 1262 points), and mRrCcErr_i (22 points, 12 points). The reason for eliminating bad mE1_i points is that we use it for calculating aE_i from Table 2 of Banerji et al. (2010). A small number of outliers were also removed from the MGS sample, but totaled only 27 points. No such outlier points were removed in the LRG sample. This leaves us with a total of 437,273 MGS and 68,996 LRG objects. Using the GZ1 classifications in the MGS there are 45,249 ellipticals, 119,369 spirals, and 272,655 uncertain (∼62%). For the LRG sample there are 27,227 ellipticals and 13,495 spirals leaving 28,274 uncertain (∼41%).

3. DISCUSSION

Using the morphological classifications from the Galaxy Zoo project first data release (Lintott et al. 2011) we attempt to calculate Photo-z's for four different samples and four combinations of primary and secondary isophotal shape estimates from the SDSS as seen in Table 1. A larger variety of input combinations were tried including those in Table 1 of Banerji et al. (2010). However, we only report those found with the lowest root-mean-square error (rmse) in Table 1 of this Letter.

Table 1. Results

Dataa Inputsb σrmsec
MGS–ELL ugriz+Q+U 0.01561 0.01532 0.01620
 ⋅⋅⋅  ugriz+P50+CI 0.01407 0.01400 0.01475
 ⋅⋅⋅  ugriz+P50+CI+Q+U 0.01641 0.01560 0.01801
 ⋅⋅⋅  ugriz+B 0.01679 0.01668 0.01683
MGS–SP ugriz+Q+U 0.01889 0.01864 0.01913
 ⋅⋅⋅  ugriz+P50+CI 0.01938 0.01927 0.01947
 ⋅⋅⋅  ugriz+P50+CI+Q+U 0.01751 0.01747 0.01777
 ⋅⋅⋅  ugriz+B 0.02092 0.02089 0.02101
LRG–ELL ugriz+Q+U 0.01345 0.01291 0.01420
 ⋅⋅⋅  ugriz+P50+CI 0.01334 0.01278 0.01426
 ⋅⋅⋅  ugriz+P50+CI+Q+U 0.01584 0.01439 0.01693
 ⋅⋅⋅  ugriz+B 0.01180 0.01175 0.01184
LRG–SP ugriz+Q+U 0.01520 0.01404 0.01910
 ⋅⋅⋅  ugriz+P50+CI 0.01514 0.01474 0.01679
 ⋅⋅⋅  ugriz+P50+CI+Q+U 0.01957 0.01870 0.02285
 ⋅⋅⋅  ugriz+B 0.01737 0.01728 0.01765

Notes. aMGS: Main Galaxy Sample (Strauss et al. 2002); LRG: luminous red galaxies (Eisenstein et al. 2001); SP: classified as spiral by Galaxy Zoo; ELL: classified as elliptical by Galaxy Zoo. bu-g-r-i-z: 5 SDSS dereddened magnitudes; P50: Petrosian 50% light radius in SDSS i band; CI: concentration index (P90/P50); Q: Stokes Q value in i band; U: Stokes U value in i band; B: inputs from Table 2 of Banerji et al. (2010): CI, mRrCc_i, aE_i, mCr4_i, and texture_i. cWe quote the bootstrapped 50%, 10%, and 90% confidence levels as in Paper II for the root-mean-square error (rmse).

Download table as:  ASCIITypeset image

The results using the Banerji et al. (2010) suggested isophotal shape estimates as well as others tested in Paper II are found in Figure 1 and Table 1. In Figure 2 we also show plots of the spectroscopic redshift versus the predicted photometric redshift for the inputs that predict the lowest rmse for each of the four data sets listed in Table 1. These are more impressive than one might initially guess. In Paper II we showed how adding additional bandpasses in the ultraviolet via the Galaxy Evolution Explorer8 (GALEX; Martin et al. 2005) could naively improve Photo-z estimation. The same was shown when using additional bandpasses from the infrared from the Two Micron All Sky Survey9 (2MASS; Skrutskie et al. 2006). However, the results were biased because neither GALEX or 2MASS reach the same magnitude or redshift depth as the full SDSS MGS or LRG samples. It is easier to get lower rmse estimates of Photo-z when you have a smaller range of lower redshifts to fit. For the MGS it is clear from the top two panels in Figure 1 that the Galaxy Zoo objects span a similar range of redshifts and r-band magnitudes. On the other hand the situation for the luminous red galaxies is not as straightforward. Looking at the bottom two panels of Figure 1 the large second bump at a redshift of z ∼ 0.35 and r ∼ 18 does not exist. The latter is logical because the Galaxy Zoo catalog was drawn from the MGS and hence there are no galaxies beyond rpetrosian =17.77 (see Petrosian 1976 for details on Petrosian magnitudes) according to their selection criteria (Strauss et al. 2002).

Figure 1.

Figure 1. Plots of root-mean-square error for a given number of galaxies per 50% bootstrap level with representative errors (10% and 90%). Top two panels: Main Galaxy Sample (elliptical and spiral); bottom two panels: luminous red galaxies (elliptical and spiral).

Standard image High-resolution image
Figure 2.

Figure 2. Plots of spectroscopic redshift vs. predicted photometric redshift for the input with the lowest rmse for each of the four given data sets shown in Table 1.

Standard image High-resolution image

Our lowest rmse values come from galaxies categorized as ellipticals in the Luminous Red Galaxy Sample using the SDSS u-g-r-i-z bandpass filters and the isophotal shape estimates from Table 2 of Banerji et al. (2010): ci, mRrCc_i, aE_i, mCr4_i, texture_i. These yield an rmse of only 0.01180, which we believe is the lowest calculated to date for such a large sample of galaxies measured in the bandpasses of the SDSS while also retaining a fairly large range of redshifts (0 ≲ z ≲ 0.25) and dereddened magnitudes (12 ≲ rpetrosian ≲ 17.77).

Taking a closer look at the kinds of inputs that improve the results by galaxy type can be interesting. It is clear from Table 1 that the Stokes parameters appear to work better for spiral than elliptical galaxies. The Stokes parameters measure the axis ratio and position angle of galaxies as projected on the sky. In detail they are flux-weighted second moments of a particular isophote:

Equation (1)

When the isophotes are self-similar ellipses one finds (Stoughton et al. 2002):

Equation (2)
Figure 3.

Figure 3. Redshift and r-band dereddened model magnitudes for the Main Galaxy Sample (top two panels) and luminous red galaxies (bottom two panels).

Standard image High-resolution image

The semimajor and semiminor axes are a and b while ϕ is the position angle. Masters et al. (2010) demonstrates the efficacy of using SDSS derived axis ratios in characterizing the inclinations of spiral galaxies. This is seen in Table 1 where they offer the second best set of inputs when determining photometric redshift for spirals. Both Stokes Q and U parameters also display a larger range of values in the spirals than in the ellipticals. The standard deviations in Stokes Q and U for spirals are 0.1877 and 0.1500 while for ellipticals they are 0.0596 and 0.0459. Hence they clearly offer more room for possible improvement in the former than in the latter.

One of the more surprising results is the difference in using the B inputs for the MGS versus LRG ellipticals. In the latter case these inputs give the lowest rmse results, while in the MGS elliptical case they give the worst. This could be due to the fact that the surface brightness of the LRG galaxies are more easily modeled by the B inputs than the MGS. The MGS ellipticals may still have clumps of star formation that can make the surface brightness more difficult to model than the more passive LRG ellipticals.

When comparing the MGS and LRG spirals, one stark difference is clear when utilizing the P50 (Petrosian 50% light radius in SDSS i band) and CI (Concentration Index = P90/P50) inputs shown in Table 1. In the MGS spiral case these additional inputs yield worse fits, whereas they are among the most useful in the LRG spiral case. This may indicate that MGS spirals are more diverse morphologically than LRG spirals. The P50 and CI inputs are incapable of helping to model the MGS spiral diversity and simply add noise rather than signal to the fits. Masters et al. (2010) point out that red spirals (read LRG type) will "be dominated by inclined dust reddened spirals, and spirals with large bulges." Note that this does not mean that LRG bulge dominated spirals are necessarily S0 galaxies (which would add to their diversity both morphologically and spectroscopically). Lintott et al. (2008) and Bamford et al. (2009) have both shown that contamination of S0s into spirals is only about 3% in the best case scenario. So again, perhaps P50 and CI can do a better job of modeling LRG spirals because they are less diverse than MGS spirals.

There are several outstanding issues with using this approach for studies that may utilize large samples of SDSS LRG derived Photo-z's (e.g., baryonic acoustic oscillations). The first is that the GZ1 catalog has only been able to classify (∼59%) of the LRG galaxies as spiral or elliptical. This means that 41% of our sample cannot benefit from morphology knowledge when estimating Photo-z's. Second, the LRGs used herein do not go to the same depth (in redshift or magnitude) as the full LRG (r ≲ 19) catalog since the GZ1 is based on the MGS (r ≲ 17.77). Note also that the GZ1 morphology estimates get worse as one reaches the fainter end of the sample (Lintott et al. 2008). Third, the Machine Learning derived morphologies of Banerji et al. (2010) can only classify up to 90% as accurately as their "by eye" GZ1 training set. These constraints will have to be taken into account for any studies that attempt to utilize morphology in Photo-z calculations.

The Photo-z code used to generate the results from this Letter are available on the NASA Ames Dashlink Web site https://dashlink.arc.nasa.gov/algorithm/stablegp and is described in Foster et al. (2009).

Thanks to Jim Gray, Ani Thakar, Maria SanSebastien, and Alex Szalay for their help with the SDSS casjobs server and Jeffrey Scargle for reading an early draft. Thanks goes to the Galaxy Group in the Astronomy Department at Uppsala University in Sweden for their generous hospitality where part of this work was discussed and completed. We acknowledge funding received from the NASA Applied Information Systems Research Program and from the NASA Ames Research Center Director's Discretionary Fund.

This publication has been made possible by the participation of more than 160,000 volunteers in the GZ project. Their contributions are individually acknowledged at http://www.galaxyzoo.org/ Volunteers.aspx. Funding for the SDSS has been provided by the Alfred P. Sloan Foundation, the Participating Institutions, the National Aeronautics and Space Administration, the National Science Foundation, the U. S. Department of Energy, the Japanese Monbukagakusho, and the Max Planck Society. The SDSS Web site is http://www.sdss.org/.

The SDSS is managed by the Astrophysical Research Consortium for the Participating Institutions. The Participating Institutions are the University of Chicago, Fermilab, the Institute for Advanced Study, the Japan Participation Group, The Johns Hopkins University, Los Alamos National Laboratory, the Max-Planck-Institute for Astronomy, the Max-Planck-Institute for Astrophysics, New Mexico State University, University of Pittsburgh, Princeton University, the United States Naval Observatory, and the University of Washington.

This research has made use of NASA's Astrophysics Data System Bibliographic Services.

This research has also utilized the viewpoints (Gazis et al. 2010) software package.

Footnotes

Please wait… references are loading.
10.1088/2041-8205/734/1/L9