Steven A. Rodney and John L. Tonry 2009 ApJ 707 1064
doi:10.1088/0004-637X/707/2/1064
© 2009. The American Astronomical Society. All rights reserved.
Received 14 April 2009,
accepted for publication 20 October 2009
Published 2 December 2009
View usage and citation metrics for this article Get permission to re-use this article
Modern supernova (SN) surveys are now uncovering stellar explosions at rates that far surpass what the world's spectroscopic resources can handle. In order to make full use of these SN data sets, it is necessary to use analysis methods that depend only on the survey photometry. This paper presents two methods for utilizing a set of SN light-curve templates to classify SN objects. In the first case, we present an updated version of the Bayesian Adaptive Template Matching program (BATM). To address some shortcomings of that strictly Bayesian approach, we introduce a method for Supernova Ontology with Fuzzy Templates (SOFT), which utilizes fuzzy set theory for the definition and combination of SN light-curve models. For well-sampled light curves with a modest signal-to-noise ratio (S/N >10), the SOFT method can correctly separate thermonuclear (Type Ia) SNe from core collapse SNe with ≥98% accuracy. In addition, the SOFT method has the potential to classify SNe into sub-types, providing photometric identification of very rare or peculiar explosions. The accuracy and precision of the SOFT method are verified using Monte Carlo simulations as well as real SN light curves from the Sloan Digital Sky Survey and the SuperNova Legacy Survey. In a subsequent paper, the SOFT method is extended to address the problem of parameter estimation, providing estimates of redshift, distance, and host galaxy extinction without any spectroscopy.
Just prior to the turn of the millennium, Riess et al. (1998) and Perlmutter et al. (1999) used multi-epoch imaging surveys to show that the expansion of our universe appears to be accelerating. This extraordinary result was revealed with the careful analysis of just a few dozen Type Ia Supernovae (SNeIa). In recent years, supernova (SN) surveys such as ESSENCE (Miknaitis et al. 2007; Wood-Vasey et al. 2007) and SuperNova Legacy Survey (SNLS; Howell et al. 2005; Bronder et al. 2008) have begun to probe the nature of this dark energy with samples of a few hundred SNeIa. The next generation of SN surveys—such as Pan-STARRS,1 LSST,2 and JDEM3—will increase the survey yield by another order of magnitude, producing thousands of new SN detections each year. With this dramatic step forward, we will move beyond the saturation point of the astronomical community: all the time on all the telescopes of the world will not be enough for spectroscopic follow-up of the newly discovered SNe.
The success of these modern SN surveys is strongly dependent on our ability to accurately and precisely evaluate a SN based on photometric information alone. Poznanski et al. (2002) and Johnson & Crotts (2006) have demonstrated that SN colors can be used to effectively segregate SNe into the broad categories of core collapse (CC) and thermonuclear (TN) SNe. Once an object has been classified as a SNIa, the shape of the light curve can be used to determine an accurate luminosity distance (Phillips 1993). Extensions of this correlation to more refined parameterizations of the light-curve shape (Hamuy et al. 1996; Goldhaber et al. 2001) and to multi-color light curves (Jha et al. 2007; Guy et al. 2007) have been increasingly successful at measuring accurate SNIa luminosity distances. This combination of photometric classification and parameter estimation can be executed for rolling SN searches even with only early photometric data, as in the SNLS (Sullivan et al. 2006).
The use of exclusively photometric methods for analyzing large SN data sets raises a number of concerns. Sample purity is an important issue, especially because SNe of Type Ib and Ic are difficult to distinguish from SNIa. This can lead to significant biases in SNIa-derived cosmological parameter estimates (Homeier 2005). Also, without a spectroscopic redshift, one must be aware of the covariance between estimates of redshift and distance. When the parameters of interest are poorly constrained by the SN alone, one would like to have a mechanism for smoothly integrating supplementary data, such as host galaxy colors and morphology.
To address these concerns, one branch of light-curve evaluation techniques has turned to the use of Bayesian probability theory. Tonry et al. (2003) utilized the Bayesian Adaptive Template Matching (BATM) technique (the precursor to the programs presented in this paper) to determine luminosity distances by comparing observed SNIa light curves against a library of templates. Kuznetsova & Connolly (2007) introduced a Bayesian classification scheme designed to accurately determine SN types even with poorly sampled light curves, and Poznanski et al. (2007) presented the automatic Bayesian classifier, which uses a similar approach to classify SNe with as little as one epoch of data.
Barris & Tonry (2004) outlined one of the major advantages of the Bayesian approach to SN light-curve analysis. The Bayesian formalism allows the user to marginalize over any model parameter that is not relevant to the scientific question being addressed. For cosmological studies, one generally wants to compare redshift to luminosity distance, but for a SN rate study, the redshift may be a "nuisance parameter" that can be integrated away. In addition, the Bayesian formalism can provide a complete and accurate treatment of the uncertainties inherent in an imperfect classification program.
In this paper, we present two related frameworks for classifying SN light curves using templates with a fixed light-curve shape. The first of these is a probabilistic, Bayesian approach, and the second is based on fuzzy set theory. This paper is divided into eight sections as follows. Section 2 considers the necessary elements of a SN light-curve model, and describes how the template-based methods presented here are distinguished from other light-curve fitters by avoiding a parameterization of the light-curve shape. Section 3 explains the construction of a library of template SNe using photometric and spectroscopic data. Section 4 presents BATM, the Bayesian Adaptive Template Matching method. Section 5 discusses limitations of the Bayesian approach, and introduces the Supernova Ontology with Fuzzy Templates (SOFT) program. Section 6 describes the use of Monte Carlo simulations and archival SN light curves to verify the accuracy of these methods. Section 7 considers how additional models can be introduced to account for non-SN contaminants and to quantify classification uncertainty. Section 8 contains a final summary and conclusions.
↑ CloseWhen we observe a SN light curve, we are collecting information about the flux as a function of wavelength and time, fobs(λ, t). If we were to make a list of all the relevant parameters that influence the observed flux, we could divide them up into a set of "physical parameters"
and a set of "location parameters" θ. For TN SNe the physical parameter set would include things such as the total mass of ejecta, the mass of 56Ni, the metallicity of the progenitor, the type of companion star, the asphericity of the explosion, etc.:

The
vector could be broadened to include CC SNe as well by including characteristics such as the type of stellar explosion, the presence of hydrogen or helium envelopes, etc. If we had a complete model for all SN progenitor systems and a perfect SN explosion simulator, then we could define a translation function T that takes as input a parameter vector
and produces a function describing the intrinsic SN luminosity for any wavelength and time:

For both TN and CC SNe, the object's location in time and space distorts this intrinsic luminosity in a predictable way, and we observe the flux. This distortion is predictable because we have reasonably good models describing the way SN light is affected by cosmological distances, dust, and time, so we can write the observable flux as

where z is the redshift, dL is the luminosity distance, Aλ/(1+z) describes the host galaxy extinction, and tpk is the time of the light-curve peak. Hereafter, we will drop the λ dependence when discussing fluxes, under the requirement that flux comparisons are always to be done with fluxes from the same broadband filter. Throughout this paper, we use a simplified formulation of the four "location parameters:"

The first parameter is the redshift of the object (z), which determines what portion of the spectral energy distribution (SED) is observed in each bandpass, as well as setting the timescale of the light curve due to cosmological time dilation. For the second parameter, we have recast the luminosity distance using the parameter μe. We define μe as the difference between the true distance modulus at a given redshift and the value expected for an empty universe at that z:4

The third parameter is the time of the light-curve peak (tpk), which simply shifts the light curve along the time axis. Our final parameter represents the host galaxy extinction (AV), and is discussed in detail in Section 3.3. The parameter "AV" here is a proxy for a potentially complex extinction function that may vary with time and wavelength. Throughout this paper, we treat θ as a set of nuisance parameters, to be marginalized over for the purpose of getting a broad classification into one of the established SN categories. The task of extracting estimates of some parameters of interest (i.e., measuring z and μe for cosmology) is addressed in the companion paper.
The core function of any SN light-curve evaluation program is to predict the observable flux from various SN types, over an (infinite) range of possible location parameters θ. By comparing the model to observations we can make statements of probability regarding a candidate SN's class and location. So how does one define the model for predicting light-curve shape?
The fundamental problem is that there is a hidden space of physical parameters
that determine SN light-curve shape, and we do not have a theoretical model that can provide the translation of
into luminosity as in Equation (1). In parameterized SN light-curve fitters such as MLCS, SALT, and SiFTO, this problem is addressed by making the assumption that a set of shape parameters Δ can be used as a proxy for the unobservable
vector.5 Parametric fitters thus produce a single model M(Δ), which they assert is capable of representing any possible physical parameter set
by varying the parameter(s) Δ to modify the shape of the light curve. In general, the M(Δ) model is restricted to TN SNe, although in principle a model for CC SNe could be applied, although it would be more complex and less informative.
For BATM and SOFT, we take an alternative approach, in which we do not parameterize the light-curve shape in order to represent all (TN)SNe with a single model. Instead, we use a finite set of discrete models, each one representing the shape of a single well-observed SN from the local universe. Whereas the parametric approach essentially tries to fit an empirical function across the hidden
space, with BATM we are sampling the
space at discrete locations.
To build up our library of SN light-curve models for comparison against SN candidates there are three principal stages. The first step is the collection of a suitable range of light curves for all SN types. Second, we must fit a smooth interpolating function through the observed data. Finally, we use a spectral model, constrained by the photometry, to predict how each template SNe would look if they appeared at a different location θ.
We begin by collecting light curves of low redshift (z
0.1) SNe that are well sampled with data in all five of the standard Johnson–Cousins bandpasses, UBVRI. We place particular importance on U and B-band data, because these bands are redshifted into optical and near-infrared wavelengths at z
0.5. We also prefer templates with early-time photometry to constrain the rise time of the light curve. To avoid introducing biases due to uncertain distances, wherever possible we restrict our library to SNe that are in the Hubble flow (cz>2500 km s−1) or have an independent distance determination from Cepheids or surface brightness fluctuations. The complete set of light-curve templates are listed in Table 1.
Table 1. Template Light Curves
| Thermonuclear | Core Collapse | ||||
| Type | SN | References | Type | SN | References |
| Ia+ | 1990N | 22, 28 | Ib/c | 1994I | 40, 50, 54 |
| 1991T | 28, 1 | 1998bw | 44, 30, 12 | ||
| 1999aa | 17, 18, 1 | 1999ex | 45, 14 | ||
| 1999dq | 17 | 2002ap | 10, 11, 55 | ||
| 1999gp | 17, 19, | 2004aw | 48 | ||
| 2000cx | 6, 26, 17, 1 | IIb | 1993J | 25, 38, 3 | |
| Ia | 1981B | 5, 49 | 1996cb | 37 | |
| 1989B | 53 | 2008ax | 34 | ||
| 1994D | 39, 35, 31, 1 | IIn | 1999el | 7 | |
| 1996X | 41, 43 | 1994Y | 15 | ||
| 1998ab | 17 | 1998S | 29, 9 | ||
| 1998aq | 42 | IIL | 1979C | 56 | |
| 1998bu | 16, 46 | 1980K | 4 | ||
| 1999ee | 45 | IIP | 1999em | 24, 8 | |
| 2000ca | 20 | 1999gi | 23 | ||
| 2000cn | 17 | 2004et | 32 | ||
| 2000dk | 17 | ||||
| 2000fa | 17 | ||||
| 2001V | 51, 21 | ||||
| 2002bo | 2, 47, 20 | ||||
| 2002er | 36 | ||||
| 2005cf | 52 | ||||
| Ia− | 1998bp | 17 | |||
| 1998de | 17, 33 | ||||
| 1999by | 13 | ||||
| 2002cx | 27 | ||||
Because of their utility for cosmological studies, the literature is replete with examples of well-sampled TN SN light curves. Most of the TN SN light curves in Table 1 were acquired from the collection of Jha et al. (2007), and their original references are listed in the table. For host galaxy extinctions and distance estimates, we have used the compilation of Reindl et al. (2005), or the listed reference. We have divided the TN SN into three subclasses. The so-called "Branch-normal" Type Ia SNe (Branch et al. 1993) are listed as simply "Ia." Overluminous SNe are grouped into the "Ia+" subclass. This includes objects that have been labeled as SN 1991T-like and SN 1999aa-like, as well as the unique SN 2000cx. The "Ia−" subclass includes underluminous objects of the SN 1991bg-like variety, as well as the very peculiar SN 2002cx. One of the strengths of the template-based approach presented here is the ability to seamlessly integrate very peculiar objects such as SN 00cx and 02cx into the light-curve fitting procedure.
For CC SNe, the pool of quality templates is distinctly shallower. Table 1 lists the light-curve templates for five CC SN subclasses. Each CC type is a very heterogeneous set in terms of peak brightness, light-curve shape, and colors. With this in mind, we do not pretend that just a handful of CC SN light-curve templates can be truly representative of an entire class. Nevertheless, the collection listed in Table 1 is sufficient for the modest goal of distinguishing between the CC and TN classes. Type Ib and Ic objects are most likely to be misclassified, because of their photometric similarity to Type Ia SNe. Our template library includes three examples of typical Ib/c SNe, as well as two of the very high-energy "hypernova" type (SN 1998bw and SN 2002ap).
With the observed multi-color light-curve points in hand, we do not yet have a functional template, because the discrete observation points leave large gaps in time where the template flux is as yet undefined. To convert the observed data points into a continuous function we perform a least-squares fit with a natural cubic spline curve in each bandpass. These UBVRI spline curve fits provide excellent interpolation of the true light curve, as shown in Figure 1 for the SN1998aq template. The residuals of the observations around the spline fits are consistent with the observational errors. For any point in time, the template spline fits provide UBVRI magnitudes with uncertainties on the order of 0.02 mag or less.

Figure 1. Template spline fits to UBVRI light curves for SN1998aq are shown on the left side. Light-curve points are shown as crosses, and the spline curve is given as a solid line. The light curves for each filter have been shifted in magnitude for visibility. Residual plots in the panels on the right side show the magnitude difference between the observed light-curve points and the spline fit. The very small and largely uncorrelated residuals demonstrate that interpolation with this fit can accurately reproduce the true SN colors at times where there are no observational data.
Standard (69 KB)High-resolution (80 KB)
We would now like to adapt these low-z UBVRI light curves to a grid of points in the four dimensions of location parameter space θ = (z, μe, tpk, AV), while simultaneously translating them into the filters used for observing the candidate object (for this paper we have used griz filters from the Pan-STARRS survey for our Monte Carlo simulations, plus the g'r'i'z' filter set from Sloan Digital Sky Survey (SDSS)). To carry out this process perfectly, we would need to have a complete spectrum of the template object at all points in time (Figure 2). This ideal situation is unattainable, so we must approximate it by collecting spectra from many similar objects and piecing them together. For the TN SNe we use the composite spectra of Nugent et al. (2002)6 and Hsiao et al. (2007).7 For CC SNe we use the collection of spectra employed by Blondin & Tonry (2007).8

Figure 2. In the ideal scenario for a template library, each template would have complete spectrophotometric information, providing the relative flux value at any wavelength and any time. The thin light gray ribbon (orange in the electronic version) traces a slice across the time axis, which provides a single spectrum from t = 5 days past peak. The broad dark gray (blue) slice indicates the V bandpass. Integrating over the region where the t = 5 ribbon overlaps the broad V band would provide a single data point for a template light curve.
Standard (46 KB)High-resolution (53 KB)
These spectral compilations provide a complete SED at a set of discrete points in time, but we want to be able to provide a template magnitude at any point in time. To do this, we can use our temporally complete photometric model (i.e., Figure 1) to constrain the time evolution of our sparsely sampled spectral model in four steps:
| 1. | Correct the SED for extinction. |
| 2. | Iteratively warp the SED until the integrated broadband fluxes match the UBVRI flux values drawn from the template spline at the time the SED was taken. |
| 3. | Redshift the warped SED to the desired z. |
| 4. | Integrate the SED over the bandpass of interest to get a new broadband magnitude. |
This process is repeated for every available SED, producing a new set of multicolor light-curve points that represents what the input template would look like if it appeared at the given redshift z and were observed in the given bandpass. At the end of this process, we have effectively translated the UBVRI input data from our templates in Table 1 into griz points at any redshift, using the SED's as the bridge between.
Now each of these high-redshift griz light-curve models has a discrete point in each bandpass for every one of the 60+ spectral templates. The dates of observation for the candidate light curves will not coincide with the times of these discrete points, so we once again turn to spline interpolation to fill in the gaps. The same process of least-squares spline fitting that was used for the low-z UBVRI light curves is now carried out for each redshifted griz light curve. Figure 3 shows the set of spline curves created from warping the SN1998aq template to 15 redshifts. These resulting spline curve templates are "adaptive" in that they can be shifted up or down to accommodate a different value of μe, reddened to account for host AV, and shifted left or right for different values of tpk.

Figure 3. Subset of the family of redshifted light-curve templates derived from the SN1998aq parent light curves shown in Figure 1 and the SEDs shown in Figure 2. All light curves are in the PS1 i band with no shifts applied for extinction. The light curves are spaced in redshift from z = 0.1–1.5 with steps of 0.1, with highlighted (red) curves at z = 0.5, 1.0, and 1.5. Each light curve has been dimmed appropriately for its redshift, as if it were in an empty universe.
Standard (57 KB)High-resolution (66 KB)
Note that when applying a redshift in step 3 we avoid making any assumptions of cosmological parameters by applying the redshift as if the template SN were in an empty universe (ΩM = Ωλ = 0). Because we have also defined the distance modulus μe relative to an empty universe (Equation (3)), it becomes a trivial addition operation to adjust any redshifted template to a new luminosity distance μe.
↑ CloseDust extinction in a SN host galaxy can mimic the dimming of a light curve due to distance. This adds additional scatter—or possibly a systematic bias—to any relations derived from SN distances. Given that one primary goal of our SN light-curve analysis program is to enable cosmology with photometry, BATM and SOFT must be able to disentangle these very similar effects from analysis of the light-curve shape alone.
For an arbitrary bandpass X, the host galaxy extinction is given by AX = (AX/AV)*AV magnitudes. The visual extinction AV is a free parameter in our models, so we need to determine the ratio AX/AV for each bandpass of interest in order to synthetically redden our template spline curves to match observations. As described above, each template light curve is constructed from a collection of spectra that have been appropriately redshifted and warped to match the template's photometric light curve. As the template library is being built, we integrate each of these redshifted template spectra over the bandpass X to get an unextinguished magnitude m0. We then apply the RV = 3.1 extinction law of Cardelli et al. (1989) to the spectrum and repeat the spectral integration to get a reddened magnitude mred. The reddening for our bandpass X is now defined as AX ≡ (mred − m0). This procedure is repeated for several input values of AV to get a sampling of AX(AV), from which we can compute the slope AX/AV. Each light curve in the template library then carries with it a measurement of the AX/AV ratio, allowing for AX to be computed very efficiently for any physically appropriate value of AV.
Nugent et al. (2002) and Jha et al. (2007) have measured the time dependence of extinction in Type Ia SN light curves, demonstrating that AX can vary by a few percent over the first few months after explosion (reaching as much as 5% in the I band at early times). This low-level reddening variation should be included in our handling of the SN templates, but further work is needed to quantify this effect on CC SN light curves. For the present work, we use the median AX value at all times.
An additional concern is whether the ratio of total to selective extinction, RV, may have substantial variability from galaxy to galaxy or SN to SN. When the extinction due to dust in the host galaxy of a SN is low, we can safely ignore the effects of a non-standard extinction law. However, any highly reddened SNe with a different host RV will produce systematically biased parameter estimates. This can easily be addressed within the BATM/SOFT framework by allowing RV to be a free parameter (which can be marginalized over). To simplify the Monte Carlo testing presented in this paper, all of our simulated light curves are generated with a fixed extinction law that is set to the Milky Way value of RV = 3.1.
Furthermore, the extinction law for SNe may change at higher redshift due to increased rates of star formation, and it is possible to account for this effect in the choice of prior (Holwerda 2008). Additionally, selection effects lead to lower detection rates of highly extinguished SNe at high z. Accordingly, the AV prior should change with z, although in our verification tests in Section 6 we employ a single AV prior at all redshifts. For testing with synthetic SNe, the AV distribution can be dictated in the data simulation so the use of an AV prior that is constant with z is appropriate. A more complex AV prior would be preferable when applying these methods to high-redshift SN surveys.
↑ CloseTo develop the probabilistic framework of the BATM method, let us start with a candidate object called "SN X." Suppose we have N data points in the light curve of SN X for which we have measured the time t, flux f, and flux uncertainty σ, so that the ith data point is given by Di = (ti, fi, σi). Each model Mj at each location θ gives a single deterministic prediction for the observable flux as a function of time:
. To begin with, we assume that there is no uncertainty inherent to the model (we will reexamine this assumption in Section 5). Thus, if the physical model Mj and location parameters θ are correct, then the observed fluxes of our candidate SN X should deviate from the model only by random observational errors. We assume those errors have a Gaussian distribution, so the probability of observing the ith data point is just the probability of the error term:

If the errors are independent, then we can compute the likelihood9 of a match between SN X and model Mj at location θ as:

To determine the net likelihood of model Mj, we multiply Equation (5) by p(θ|Mj) – the prior probability distribution for location parameters (see Section 4.1)—and integrate over all possible vectors θ:

If we have only one model available, or if we assume that the model Mj is correct, then we can use Bayes' theorem to define the posterior probability as a function of location θ:

In our case, however, we are considering a list of many applicable models. To evaluate the success of each model in reproducing the data, we must repeat the calculation of Equation (6) for all available models Mj. We can then apply Bayes' theorem to define the Posterior Model Probability (PMP), or the "model weight," as

Each PMP is a scalar value between 0 and 1 that indicates the degree to which a given model Mj is preferred by the data D, relative to all other models used. The PMPs from all models can be combined with posterior probabilities from Equation (7) to get model-weighted parameter estimates, using the method of Bayesian Model Averaging (Hoeting et al. 1999). The problem of parameter estimation is addressed in more detail in a companion paper, while for the remainder of this discussion we restrict ourselves to the task of model selection.
As in all Bayesian applications, the choice of prior probabilities must be undertaken with some care, to avoid introducing unintended biases. For Equations (6) and (8) we need to define two types of priors: (1) a scalar prior for each model Mj, based on the expected rate of SNe in each class; and (2) a prior probability distribution over θ, reflecting our initial understanding of the allocation of SNe throughout the universe.
The model prior p(Mj) should describe the a priori probability that a SN matching the model Mj could be discovered by our survey, regardless of location θ. We have set the model priors for each SN subclass in Table 1 according to the following formula:

The first factor,
, is the rate of SNe from the subclass containing model Mj, as derived from past surveys (we have used the compilations of Blanc & Greggio 2008 and Cappellaro et al. 1999). This intrinsic rate is then scaled by Fdisc, the fraction of objects from this class that would be detected by our survey, which we derive using Monte Carlo simulations of survey operations. Finally, we divide by a dilution factor, Nclass, to account for the presence of multiple discrete models representing the same subclass (George 1999). Suppose the prior probability for the II-P class as a whole is
, and we have three templates in that subclass. If we assign each of the II-P templates a model prior of p(Mj) = 0.2, then the total prior weight in the II-P class is artificially inflated to 0.6. Thus, to avoid introducing a bias due to having different numbers of models, we divide by the number of templates within each class, Nclass.
If we assume that the four location parameters θ = (z, μe, AV, tpk) are uncorrelated, then we can recast the location prior as a product of constituent parts:

Note that the tpk prior is independent of the model choice, because the calendar date of a SN explosion is not influenced by the type of SN event. All other location priors could be dependent on the model or class of models under consideration (e.g., the prior for host galaxy extinction might be different for CC versus TN SN). Whenever possible, these priors should be informed by spectroscopy and host galaxy photometry. In the absence of such information, a uniform prior may be applied. In all cases, it is important to scale each prior probability distribution so that it integrates to unity over the range of values that BATM has access to.
In the verification tests presented in Section 6 we have employed a uniform prior for the redshift p(z|Mj), the cosmological dimming p(μe|Mj), and the time of light-curve peak p(tpk). For the host galaxy extinction prior p(AV|Mj) we have used the "galactic line-of-sight" (glos) prior employed by the ESSENCE team (Wood-Vasey et al. 2007), which is based on the host galaxy extinction models of Hatano et al. (1998), Commins (2004), and Riello & Patat (2005). The form of this prior is an exponential plus a truncated Gaussian, expressing the preference for matches that select a low AV value.
↑ CloseTo extract a classification probability with BATM is now straightforward. We simply compute the PMPs for all models, then sum together the PMPs from all models belonging to the class of interest. For example, to determine the probability that an object is of Type Ia:

where the sum is over just those templates in the Type Ia subset, {Ia}. The classification probability for any other class or subclass can be found in the same way, and these normalized classification probabilities are related through simple additive statements:



In the preceding section, we laid out the BATM framework for probabilistic comparison between SN light-curve data and template-based models. The alternative fuzzy set-based method described here (SOFT) is motivated by a careful consideration of the assumptions required by BATM. In order to apply the tools of probability theory in BATM, we invoked three key assumptions.
| 1. | Each SN model Mj provides an exact deterministic prediction for the flux as a function of time. |
| 2. | The set of all models {M} is complete: all possible light-curve shapes are represented. |
| 3. | The set {M} is non-redundant: a candidate SN cannot be a true match to more than one Mj. |
These assumptions would be justifiable if we were using a parameterized SN model that ostensibly covers the entire hidden range of physical parameters
. However, the premise breaks down when we use a discrete set of templates with fixed light-curve shapes.
In the language of set theory, we can define a universal set U that contains all possible SNe with any physical parameters
and any possible location θ. Each of our discrete set of models Mj defines a subset Sj ⊂ U containing all possible SNe that have precisely the physical parameters
j, observed at any location θ. These subsets are classical or "crisp" sets; in that membership in the set is a binary condition: either you are in or you are out. This is in keeping with the determinism of assumption (1) above, which was used to derive Equations (4) and (5). From this we can immediately see that any discrete and finite set of models cannot satisfy both assumptions (1) and (2): if the models are presumed to be deterministic, then there will always be some subset SX ⊂ U that is not represented.
How does this contradiction affect our classification estimates? Suppose we have an extremely sparse template library, consisting of only two template models, M1 and M2, which have underlying physical parameters
1 and
2.10 Now let us observe a candidate object, SN X, which has physical parameters
X that are intermediate between
1 and
2 (e.g., perhaps the candidate has a 56Ni mass that is bracketed by the two templates: MNi,1 < MNi,x < MNi,2). SN X is not a member of either of the crisp subsets S1 and S2 defined by our two models, so the classification likelihood of Equation (6) will be very small for both M1 and M2. If the data quality is high (σi → 0), then the computed likelihood of a match can become vanishingly small, and we will be unable to compute a meaningful classification probability. This is precisely the opposite of what we expect from a respectable SN classification algorithm: high signal-to-noise observations should lead to more precise classifications, not more classification failures.
The problem we are exposing is a fundamental limitation of a template-based Bayesian classifier: one cannot reliably classify SNe that fall into the gaps between discrete templates when constrained by our three core assumptions.
One possible means of addressing this problem of discrete models is presented by the field of logical analysis known as "fuzzy set theory" (Zadeh 1965). To use this mathematical framework, we must first adjust our models so that they define "fuzzy sets," whose membership is not crisply defined. Some examples of fuzzy sets in the realm of astronomy are categories such as "high-mass stars" or "rich clusters." The semantic meaning of these groups is clear enough, but determining whether a certain object belongs within the set is somewhat ambiguous. A 100 M☉ star certainly belongs within the "high-mass" group, and a 0.5 M☉ star does not, but does the membership line get drawn at 2, 5 or 10 M☉? For our purposes, we want to explicitly introduce some fuzziness into the definition of our models, so as to ensure that any candidate SN X has some degree of membership in at least one of the subsets Sj. As discussed below, we will need a new method for combining the evidence from multiple fuzzy models when deriving a composite parameter estimate.
Membership in a fuzzy set A is defined by a membership function g(x|A) which takes any possible set member x and assigns it a "membership grade," which is a real number in the range [0,1]. For crisp sets, the membership function is binary, always providing g(x|A) = 1 for set members, or g(x|A) = 0 for non-members. For fuzzy sets, a membership grade close to unity indicates a strong association with the set, while g(x|A) ~ 0 suggests the opposite.
To see how we might define a membership function for fuzzy SN sets, let us consider the family of template light curves derived from SN 1998aq, as shown in Figure 3. The actual SN event 1998aq had some unknown set of particular physical parameters
98aq. The model we constructed from the 1998aq light-curve template allows us to produce the precise light curves of Figure 3, which predict how a SN with identical physical parameters
98aq would appear if it were observed at a different location θ. Suppose we introduce a small perturbation of the physical parameters,
, perhaps by adding a little more 56Ni or allowing for a bit more mixing in the ejecta. This makes a slightly different explosion, with physical characteristics now given by

Recalling the underlying assumption that changes in the physical parameters will be reflected in changes of the light-curve shape, we can assume that the small physical perturbation δΦ results in a small change to the observable flux:

This new explosion would fall outside of the crisp set defined by the SN 98aq model, but we can include it within an expanded fuzzy set that is centered on SN 98aq.
What form should the membership function for this fuzzy set take? In words, the membership function will define the fuzzy set of "All SNe with light curves similar to Mj at location θ." To quantify similarity in this statement, we introduce σ'j as a "model fuzziness" term. The σ'j term has units of flux, and it defines the width of a Gaussian distribution around the model Mj that indicates how much flux deviation from the core model is allowed for members of this fuzzy set.11 Consider our candidate SN X, and for the moment let us ignore observational errors (σi ~ 0 for all N data points). If we assume the fuzzy model Mj and location θ to be correct, then the probability of getting the observed flux discrepancies between the model and the data due to model uncertainty alone is

This mimics Equation (5), which provided the likelihood of observing a light curve D based on observational errors only. To combine these two probability estimates we convolve Equation (5) with Equation (12) (e.g., Gregory 2005, Section 4.8) to get

We can then modify this likelihood to account for our priors on θ and Mj, which produces a final likelihood estimate suitable for use as the membership function of a fuzzy set:

This, then, is the membership function for our candidate SN X in the fuzzy set defined by (θ, Mj). This fuzzy set contains all possible SNe that are similar to the fuzzy model Mj and are also at the precise location θ.
These membership functions are very similar to the probabilities used in Section 4, but there are some important distinctions. First, consider that in classical probability theory when one assumes a given model Mj, it is then required that the posterior probability over θ must integrate to unity:

This requirement is enforced by the presence of the normalization factor in Equation (7). It is not necessarily appropriate to apply the same normalization to these fuzzy membership functions.
Furthermore, when we apply Bayes' theorem to compute the PMP in Equation (8), the denominator ensures that the sum of all PMPs is equal to unity:

This normalization is not a feature of the fuzzy set membership functions:

The reason for this difference is that fuzzy sets can overlap, and therefore they do not satisfy the Kolmogorov axioms.12 In our fuzzy sets, a candidate SN can be similar to template M1 while also being similar to M2. Thus, any given candidate can have nonzero membership grades in multiple fuzzy sets, and the sum of membership grades can be greater than or less than unity.
↑ CloseBy comparing the light curve of our candidate SN X against all models Mj, we can derive membership grades g(DX|θ, Mj) for all possible values of θ. We would now like to combine the results from all the TN SN model comparisons to get a composite membership grade for the fuzzy set containing all TN SNe. After also computing a composite fuzzy membership grade for the CC SN class, we could compare the relative strengths of membership to determine a classification. How do we go about combining multiple fuzzy set membership functions from Equation (14)?
Zadeh (1965) in his seminal paper on fuzzy set theory proposed the following rules defining the "fuzzy union" and "fuzzy intersection" of two fuzzy sets:

Zadeh's combination rules are illustrated graphically in the left two panels of Figure 4. Bellman & Giertz (1973) provided an axiomatic framework for fuzzy set operators that justifies Zadeh's min/max rules as the purest possible operators. That is, the max function provides the most inclusive fuzzy OR, while the min function is the most exclusive AND operator. However, the min/max functions are not the only allowable operators. Several classes of functions can satisfy the required axioms while also allowing some variation in the level of inclusiveness or exclusiveness (e.g., Hamacher 1978; Yager 1980; Dubois & Prade 1980b). These operators are called interactive because rather than choosing just one of the membership grades g(X|A) or g(X|B) as the min/max operators do, these functions allow both the A and B functions to contribute simultaneously. For further discussion of the axiomatic requirements of fuzzy set operators and comparisons of some interactive classes, see Section II.1 of Dubois & Prade (1980a) and Chapter 2 of Klir & Folger (1988).

Figure 4. Demonstration of combination rules for fuzzy sets. In the upper left panel are shown three hypothetical fuzzy set membership functions g1, g2, and g3. The remaining three panels show different options for the combination of these fuzzy sets. In each case, the aggregate fuzzy set created by applying the conjunction operator (OR) is shown as a (green) dashed line: g1
g2
g3. The disjunction operator (AND) appears as a filled curve (blue), showing g1
g2
g3. These combined sets are displayed with arbitrary normalization for clarity. Upper right: if the membership functions were treated as classical probability functions, then one would use the sum operator for OR, with the product operator for AND. Lower left: applying the Dombi functions for fuzzy set operators with q = ∞, we recover a special case where OR is the max function, and the min function serves as the AND operator. Lower right: a small value of q = 0.2 emphasizes agreement between fuzzy sets. The resulting union and intersection sets are less sharply peaked, and less strongly influenced by outliers.
Standard (48 KB)High-resolution (56 KB)
Dombi (1982) derived a family of fuzzy set operators that is general enough to encompass the operators of Zadeh (1965), Hamacher (1978), and Yager (1980) as special cases. Let a and b represent the membership grades of our candidate object X for the fuzzy sets A and B, respectively. Then the Dombi membership function for X in the fuzzy union set A
B is

and for the fuzzy intersection A
B Dombi has

where the parameter q is in the range (0, ∞). Figure 4 shows the shapes of union and intersection sets resulting from a combination of three fuzzy set membership functions. The q parameter governs the strength of interaction between the two membership function arguments A and B. In the limit where q →∞, the Dombi union function reduces to Zadeh's max operator, while the Dombi intersection becomes Zadeh's min function. As the value of q decreases, these fuzzy operators become progressively "softer," allowing more and more interaction between the sets.
↑ CloseWith the fuzzy SN models and combination operators described above, we now have a complete mathematical framework for combining fuzzy sets. To get here, we have introduced two new parameters: σ'j describes the amount of fuzziness in our SN models, and q controls the amount of interaction allowed when combining sets. These parameters are actually characteristics of the template library, and may be estimated empirically by inter-comparison of templates.
If our template library consisted of just two or three models with substantially different physical characteristics
, then we would need to make them very fuzzy (a very large σ'j) in order to fill the space between models. With a few hundred templates, the need for fuzziness would be reduced, so σ'j could be very small. To calibrate the amount of fuzziness required for our template library, we measured the flux difference between pairs of templates belonging to the same SN subclass, at the same location θ. Taking a median value from all the possible pairwise comparisons, we determined that setting σ'j to be 10% of the template flux (approximately 0.1 mag) is an appropriate degree of fuzziness for the set of TN SN templates listed in Table 1. For the set of CC SNe, with fewer templates and greater heterogeneity, we found that slightly more fuzziness is necessary, at the level of
.
As shown in Figure 4, setting q = 0.2 results in a fuzzy union (dashed line) that emphasizes the overlap between sets. In this sense, it is a sort of hybrid between the strict interpretations of the classical OR and AND operators. Thus, the q = 0.2 fuzzy union achieves the goal stated at the top of Section 5.2 in that it indicates how much a candidate SN X is similar to any or all of the available models.
A more rigorous empirical calibration of q could be done using a training set of SNe with known locations θ. After comparing the training set against the template library, one would adjust the q parameter so that fuzzy model combination provides the most accurate estimate of the true location. Such a detailed calibration of q is beyond the scope of this work, but we have performed preliminary tests using TN SN light curves from the SDSS (Holtzman et al. 2008). This rough, empirical calibration showed that q values in the range (0.1,1.0) provide acceptable results with this template library.
↑ CloseWe now have all the necessary tools for executing the SOFT method on SN light curves. The procedure is as follows.
| 1. | For each template Mj, at every location θ, subtract the model flux from the observed flux of the candidate fi(t). |
| 2. | Using Equations (13) and (14), determine the fuzzy set membership grade of this candidate over all available parameter space. |
| 3. | Collect membership functions into groups according to SN subclass, excluding any that are zero everywhere. |
| 4. | Combine the membership grades within each subclass using the fuzzy union operator of Equation (16). |
The result is a final composite membership function for each SN subclass, g(D|θ, Y), giving the degree of membership for SN X in subclass Y as a function of location θ. The peak of the distribution g(D|θ, Y) provides an estimate of the true location θ, and the width of g(D|θ, Y) indicates the uncertainty of that estimate (the forthcoming companion paper will present a complete discussion of parameter estimation with fuzzy templates). The integral of g(D|θ, Y) over θ gives a classification grade for that particular subclass.
Given the similarity between fuzzy set classification grades and Bayesian posterior model probabilities, one might be tempted to normalize each subclass grade by the sum over all subclasses, in a manner similar to Equation (8). In general, this is not appropriate, because fuzzy subclasses are not mutually exclusive, and therefore do not follow the Kolmogorov axiom of additivity: PA
B
C = PA + PB + PC. Any subclasses that might have some membership interaction should be combined using Dombi's fuzzy intersection or union operators. The one useful exception arises when comparing relative classification grades for the TN and CC SN classes. In that case, we can assert that these two classes represent fully isolated sets, because there is a significant physical difference between any Type Ia and any CC SN. Thus, the TN and CC fuzzy sets can be treated as crisp sets and combined additively.
The normalized membership grade for the TN SN is analogous to the PMP of Equation (8). We call this quantity the Posterior Membership Grade (PMG), and calculate it as follows:

where
indicates the fuzzy union of membership grades from all templates in the TN SN class. The corresponding classification grade for CC SNe, PMGCC, is defined in the same manner.
To evaluate the utility of the SOFT template matching technique, we first generated a Monte Carlo simulation containing 5000 synthetic light curves for each of the TN SN and CC SN classes, evenly distributed over the subclasses. Each synthetic light curves was created using a template that was prepared in the way described in Section 3. To begin, the template was warped and shifted to a randomly selected point in the bounded four-dimensional parameter space of θ = (z, μe, AV, tpk). We then generated an "observed" griz light curve for each synthetic object, using a simulation modeling 120 nights of the Pan-STARRS 1 Medium Deep survey. In terms of wavelength coverage, cadence, and depth, this simulation is roughly intermediate between the SDSS-II SN Survey (Sako et al. 2008) and planned surveys from projects such as the Large Synoptic Survey Telescope (Tyson 2002).
The location parameters θ = (z, μe, AV, tpk) were drawn from a uniform distribution over a broad range of parameter space:

By choosing a uniform distribution in z, μe, and AV, these Monte Carlo simulations are clearly not depicting a realistic survey yield, where the rate of detection depends on cosmology and the underlying SN explosion rate. The purpose here is to determine whether there are regions of parameter space where the SOFT methods are poor classifiers, which could lead to sample selection biases.
Our second data set for testing classification accuracies is a collection of 222 published light curves from the SDSS-II and SNLS surveys. The SDSS light curves, from Holtzman et al. (2008) consist of 146 objects that have spectroscopic redshifts ranging from 0.01 to 0.45. All of these objects were classified as Type Ia, although 16 were given as "SNIa?" classification, indicating some uncertainty in their designation as TN SNe. These SNe are well sampled in the Sloan u'g'r'i'z' bands, with observations done in all five filters on a single night, a typical cadence of less than five nights, and sensitivities of around 22 mag.
From the SNLS we include 71 objects from Astier et al. (2006) with spectroscopic redshifts ranging from 0.25 to 1.0. Fifteen objects in this set were given the classification "Ia*" to indicate that the best spectral match is a SN Ia model, but that an alternative classification cannot be ruled out. The griz light curves of these 71 SNe have a similar sampling frequency to the SDSS set, with typically 8–15 observational epochs over 100 days in three or more filters. Finally, we include an additional five SNLS light curves from Nugent et al. (2006) that are classified as Type II-P SNe. These objects have redshifts between 0.1 and 0.3.
In Figure 5, we show two examples of the real SN light curves used in these verification tests. The first is SN 2005jb from the SDSS survey, which has a spectroscopic redshift measurement of z = 0.258. The second example is SNLS-03D4cz, at a redshift of 0.695, from the SNLS. Over-plotted lines in this figure show the SOFT template match that produces the maximum likelihood for each of these light curves.

Figure 5. Two examples of the maximum likelihood SOFT template matches for real SN light curves from the verification data. Left: SN 2005jb from the SDSS survey, with a measured redshift of z = 0.258. The best-fitting light curve from the SOFT library is based on the normal Type Ia SN 1996X. Note that the u' band is not shown. Right: SNLS-03D4cz from the SNLS. The spectroscopic redshift of this object is z = 0.695, and the maximum likelihood match from SOFT is based on SN 2000cn.
Standard (42 KB)High-resolution (108 KB)
The classification results from our Monte Carlo simulations are shown in Figures 6 and 7 for the TN SN and CC SN model templates, respectively. The left-hand panel of each figure shows the z–μe plane divided into a regular grid. In each box on the grid, we collected all the synthetic light curves whose input parameters (z, μe) fell within that box. There are 400 boxes, so each box contains roughly a dozen of the 5000 synthetic light curves, and each of those dozen light curves has a slightly different μe and z, and perhaps a very different set of input values AV and tpk. Thus, each contributing synthetic SN light curve yields a different set of classification grades PMGTN and PMGTN. Combining all contributing light curves for a given box, we can compute the average classification grade for each of the two principal classes. The resulting classification grade image shown in Figures 6 and 7 is black in regions where most synthetic SNe are classified as TN SNe, and turns white in regions where most are found to be CC SNe. Another representation of these data is given in Figure 8, which shows the average classification grade as a function of redshift for both the TN and CC model sets.

Figure 6. Results from the synthetic light-curve classification tests. Using TN SNe in Table 1 as models, we constructed 5000 synthetic light curves spread uniformly over the parameter space. For each light curve we used the SOFT method to calculate PMGTN and PMGCC, the normalized composite membership grades for the TN SN and CC SN fuzzy sets, respectively. We then divide the parameter space of μe and z into bins of size Δμe = 0.05 mag and Δz = 0.05. For each bin we collect all the synthetic light curves that had input values of μe and z drawn from that region of parameter space, and then compute the average membership grade returned from that bin. We then set a grayscale value for each bin, with black (red in the electronic version) indicating that the average PMGTN for a bin is 100%, and white (blue) indicating PMGCC = 100%. In the right panel, we have done the same binning and averaging, this time with the y-axis marking AV.
Standard (33 KB)High-resolution (68 KB)

Figure 7. Classification probabilities, as in Figure 6. In this case, we generated 5000 CC SN light curves with random parameter vectors using the CC templates from Table 1 and once again computed the normalized fuzzy set classification grades using SOFT. As before, black (red in the electronic version) indicates PMGTN = 100% and white (blue) indicates PMGCC = 100%.
Standard (31 KB)High-resolution (63 KB)

Figure 8. Mean classification probabilities as a function of redshift. Here we have summed over all μe and AV values, collapsing the grids from Figures 6 and 7. The left panel shows classification results for 5000 synthetic light curves based on TN SN models. The right panel shows the CC SN results. In both panels, the (orange) circles show the average PMGTN value as a function of z, and the (blue) triangles indicate the average PMGCC. In each case, the dotted lines show mean classification grades from all 5000 synthetic SNe. Dashed lines are from a subset of SNe that have a peak S/N above 3. The solid lines are constructed from SNe with S/N above 10.
Standard (68 KB)High-resolution (68 KB)
In Figure 6, the input light curves are based on the TN model subset, so a perfectly accurate classifier would produce a completely black image, by returning a 100% probability that every synthetic light curve is a TN SN. The SOFT classifier is not perfect, so most of the boxes in the left panel have average PMGTN values that are less than 100%, particularly in the upper right corner, for high values of redshift, distance, and extinction. In Figure 7, the regions where the input CC models are imperfectly classified with some non-zero value for PMGTN appear gray instead of white. Almost all of this contamination arises from light curves based on Type Ib/c models.
Of the 5000 light curves that contribute to Figure 6, SOFT calculates a PMGTN classification grade that is greater than the PMGCC grade for 76% of them. To limit the false positives and false negatives from the classification results, we can apply a simple cut to remove the most problematic objects by rejecting light curves with a peak S/N that falls below some threshold. Figure 8 shows the mean classification grades as a function of redshift for all SNe (dotted lines) as well as the classification grades from subsets with S/N >3 (dashed) and S/N >10 (solid). After applying the still very modest S/N >10 cut, the fraction of TN SNe that are correctly identified by SOFT increases to 98%. For this S/N >10 subset, the mean TN SN classification grade is >90% across all redshifts.
For the synthetic CC SNe, the S/N >10 cut ensures an average PMGCC classification grade above 80% across all redshifts. The primary source for misclassifications of synthetic CC SN light curves is the Type Ib/c objects. In particular, the SOFT classifier is often confused by Type Ib/c SNe at low redshift that have a very low S/N (i.e., those with heavy host galaxy extinction or a very large synthetic distance modulus μe).
The results from our verification test of the SOFT classifier using 222 real SNe are shown in Figure 9. As was done in the Monte Carlo simulations, we applied the SOFT classification program to all of these SNe using a uniform redshift prior (i.e., we ignore any host galaxy or SN spectroscopy), as well as a uniform prior on μe (i.e., we do not assume any cosmology). For the combined set of 217 Type Ia SNe from SDSS and SNLS, we found that 204 of these objects (94%) returned classification grades that favor a TN SN classification. The 13 objects that SOFT misclassified were all from the SDSS survey, and in most cases we can deduce the reason for the error.

Figure 9. Classification results for 222 SNe from the SDSS and SNLS surveys. The y-axis carries the normalized SOFT classification grade, from Equation (18). The x-axis marks the published spectroscopic redshift, from either the SN itself or its host galaxy. 146 Type Ia SNe from the SDSS survey are plotted as (orange) circles, 71 Type Ia SNe from SNLS are shown as (magenta) diamonds, and the (blue) triangles indicate 5 Type IIP objects from SNLS (note that two of these IIP SNe have almost identical redshift and classification grades, so their triangles are indistinguishable).
Standard (45 KB)High-resolution (45 KB)
Six of the 13 misclassified objects were given a spectroscopic classification of "SNIa?" by Holtzman et al. (2008), indicating some classification ambiguity even when their spectroscopic information is considered. Two more have incomplete light curves, with only one or two observation epochs beyond the light-curve peak. Two of the misclassifications are objects with extremely broad light curves. SN 2005mo and the SDSS object SN7017 (no IAU designation available) both exhibit a decline rate substantially slower than any of the objects in our template library. Using the SALT2 light-curve fitter (Guy et al. 2007) as an independent check, we measured values for the X1 (stretch) parameter of 3.0 for 2005mo and 5.0 for 7017. For comparison, these values are more than two times larger than the slowest-declining Ia+ objects in our template library (SN 1991T,90N,99gp). Furthermore, these SALT2 X1 values are larger than 98% of the objects in the CfA3 "constitution set" of Hicken et al. (2009). Unlike light-curve fitters that parameterize the light-curve shape (such as SALT2), the SOFT classifier is unable to extrapolate to light curves with shapes well beyond the limits of its template library. In order to correctly classify all of these extremely broad Type Ia SNe, we would need to include some very broad Type Ia template light curves in the library.
A final outlier is the very peculiar SN 2005gj, which SOFT classifies as a Type IIn object. This object is an example of the rare subclass of hybrid Type Ia/Type IIn objects, exhibiting spectral indications of a substantial circumstellar medium (Aldering et al. 2006). Although we included templates of some peculiar Type Ia SNe in the SOFT library (SN 2000cx and 2002cx), we do not have any representatives of this hybrid "IIa" class.
↑ CloseThe use of fuzzy set theory in the SOFT method allows us to escape from the assumption that our set of SN templates describes all possible light-curve shapes. There is, however, an additional assumption that we have not addressed: the SOFT method still makes the implicit assumption that all objects being classified are already known to be SNe. In our verification tests, this assumption is perfectly valid, because the Monte Carlo simulation only generates synthetic SN light curves, and the published light curves have all been spectroscopically confirmed as real SNe. When selecting SN candidates from a real survey this assumption is often still justified because the set of all detected transients is passed through a series of preliminary filters to remove objects that are not likely to be SNe. For example, variable stars can mimic a SN light curve, but these can be removed by rejecting transients that are coincident with a stellar source in static sky images, as well as any transients that show clearly periodic brightness fluctuations. However, even with aggressive rejection thresholds, some non-SN objects may slip through to a candidate list. Furthermore, if potential SN candidates are vigorously rejected before attempting classification, then some actual SNe are likely to be unintentionally discarded, too. If this possibility is not accounted for, then any task which requires a complete SN sample (such as a SN rate calculation) will give misleading results.
The most thorough way to address this problem in the BATM/SOFT framework would be to add a new model for every possible class of SN impostor. In principle, one could construct light-curve templates for RR Lyrae, cataclysmic variables, active galactic nuclei, stellar flares, etc. Each light-curve template could be adapted with the same location parameters θ that were used for our SN templates, and the mechanics of computing probability distributions would remain the same from there. This solution is computationally complex, and is hindered by the great diversity of possible transient sources. We do not pursue this template-based approach here, but instead we consider how two non-SN models may be introduced to catch two specific categories of transient contaminants.
A first category of non-SN interlopers that are often found in SN candidate sets is objects with a very low S/N across the light curve. These objects may have light curves that are intrinsically very different from SN light curves, but they have managed to slip through the pre-classification filters because the data quality is poor enough to allow some ambiguity. When we encounter a low-flux light curve, if we assume that it is a SN in order to apply a SOFT classification, then we are artificially inflating the SN classification probabilities by ignoring the possibility that it is some other type of object.
To address these contaminants within the BATM/SOFT framework, we can introduce a "zero flux model" (ZFM), which asserts that there is no SN observed, so the true flux of the light curve is zero everywhere. The Bayesian likelihood of a candidate match using this model is similar to Equation (5), but without any location parameters θ, and without the model flux term
:

To include this likelihood in the set of PMPs for Equation (8), we must assign a prior p(D|ZFM). Ideally, this prior should represent an initial guess at the fraction of low signal-to-noise candidates that are false positives. A carefully chosen ZFM prior would reflect the details of the data reduction and candidate selection pipelines. In practice, it will generally be sufficient to apply a rough prior that is of the same order as the SN model priors, letting the data drive the ZFM probability.
The ZFM is also useful in providing a measure of the confidence that should be given to SN classifications. The observed light curve of a very faint SN can often be matched equally well by many light-curve templates because the observational errors are so large. BATM and SOFT would then classify the object based primarily on the model priors defined in Section 4.1.1. In that case, our classification grades should reflect the fact that the data are providing little or no new information. The ZFM achieves this, by inflating the normalization factor of Equation (8) whenever the light-curve signal gets buried by observational noise.
This effect of the ZFM on classification grades is illustrated in Figure 10, where we present a synthetic light curve that is entirely made up of random noise. There is no real signal for SOFT to fit to, but if the only models available to it are SN templates, then SOFT will necessarily classify the "object" into one of the available SN classes. When the ZFM is included, however, we find a significant likelihood that there is no object there at all, and the renormalized classification grades reflect the ambiguity of this very low-signal event.

Figure 10. Synthetic light curve example demonstrating the importance of including a ZFM to weed out over-confidence. This "light curve" was generated as pure random noise, and was then passed through the SOFT program to measure normalized membership grades for classification. If SOFT only considers the SN templates then the classification probabilities would be p(CC|D) = 0.91 and p(TN|D) = 0.09. The maximum likelihood SN template match in this case comes from the Type Ia SN 1996X, shown here as the solid curves. When the ZFM is included, the renormalized classification grades become p(CC|D) = 0.67, p(TN|D) = 0.06, and p(ZFM|D) = 0.27.
Standard (59 KB)High-resolution (71 KB)
The recent discovery of the peculiar optical transient SCP 06F6 with the Hubble Space Telescope (HST, Barbary et al. 2009) provides a prime example of another category of non-SN contaminants. This object has a bright, clear light curve that is superficially rather similar to a SN, but it is unlike any known SN both photometrically and spectroscopically (Gänsicke et al. 2009). If a transient similar to SCP 06F6 were detected by a survey like Pan-STARRS or LSST, its light curve would not be initially reviewed by human eyes (with thousands of new candidates discovered per night, there just are not enough grad students available). After the SCP 06F6-like candidate passes through preliminary shifting it would probably be labeled as "SN-like" and get passed off to an automated classification algorithm such as BATM or SOFT. If the classifier only compares this object against a set of SN templates, it will necessarily misclassify the object, no matter how broad or deep the template library is.
To account for the possibility that a candidate light curve could be something that resembles a SN but is not one of the known SN types, we need a flexible model that can fit any conceivable light curve reasonably well. For this purpose, we introduce the "something else model" (SEM), a simple non-physical model designed to replicate a non-periodic light curve. The SEM consists of a set of spline curves that are required to pass through a set of 5 mag points positioned at regular intervals in time. The outer two points are positioned at the endpoints of the observed light curve, and the other three are spaced evenly over the interior. The magnitude of each knot mk is a free parameter that is allowed to vary over a range Δmk from 0 to 30 mag, with a uniform prior across that region. This span comfortably encompasses the range of magnitudes accessible to any SN survey. Since this model is not physically motivated, we model each bandpass completely independently. The posterior likelihood for a given vector of spline knots mK is calculated in the same manner as Equation (5):

To determine the net likelihood for the SEM, we must integrate this over all possible spline knot vectors mk:

To simplify the computation of this integral, we can use a rough maximum likelihood approximation. By design, there will always be some set of knots for which this model can provide a reasonably good fit to any light curve, so the likelihood distribution over the knot space will be sharply peaked. In such a regime, we can estimate the integral of Equation (21) as the area of that single peak. Let us denote the height of the peak as the maximum likelihood
, and the width of the peak in the plane of parameter mk as δmk. A first-order approximation for the peak area in that plane is
. To apply a uniform prior over the range Δmk we multiply by a factor 1/Δmk, and taking the product over all knots in all bandpasses, we get
. This estimate can then be included as the numerator of Equation (8) to compute the SEM PMP, p(SEM|D). The components of this approximation are illustrated in Figure 11. The product
, sometimes referred to as the "Occam factor," decreases rapidly as the number of knots increases, and therefore it effectively punishes the SEM model for using many more parameters than the SN template models (for a derivation of this maximum likelihood approximation and further discussion of the Occam factor, see Section 3.5 of Gregory 2005).

Figure 11. Estimating the area of the SEM likelihood distribution along one dimension of parameter space. The curve represents the posterior probability p(D|mk, SEM). The rectangle shows the uniform prior p(mk) = 1/Δmk spanning a wide range of possible spline knot values. A downhill simplex minimization determines the value of the spline parameter mk that yields the maximum likelihood value
. Through numerical sampling of the likelihood distribution around that point, we can estimate the width of the likelihood peak, δmk, and get a rough approximation for the area under the curve as
. The prior times the posterior probability is then estimated as the product of
with the Occam factor, δmk/Δmk. This figure is adapted from Gregory (2005).
Standard (24 KB)High-resolution (31 KB)
To get
we find the maximum likelihood spline fit for each bandpass using a downhill simplex minimization and defining each likelihood value with Equation (6). We then estimate δmk by sampling the likelihood distribution in the vicinity of the peak to find the full width at half maximum (see Figure 11).
The final step to include this SEM in our total probability calculation is to define an appropriate model prior. As with the ZFM, the p(SEM) prior should reflect the data processing pipeline by providing a low probability when the upstream filtering of candidates is more stringent. In general, the occurrence of true "something else" objects is expected to be very rare, so a value of p(SEM) that is three or more orders of magnitude less than the SN model priors would be appropriate.
In Figure 12, we demonstrate the application of this SEM using the two-color SCP 06F6 light curve of Barbary et al. (2009). If we try to determine the probability that this object is a CC or TN SN, then BATM and SOFT measure low values for both p(CC) and p(TN). If we leave out the SEM comparison, then when these low likelihood values are normalized against themselves using Equation (8), the result is a pair of grossly exaggerated probabilities. By including the SEM case, BATM and SOFT can report with high confidence that this object is unlike both the CC and the TN classes.

Figure 12. Light curve fits for the unusual transient object SCP 06F6. The data points from Barbary et al. (2009) are shown as circles for the HST F775W (i') band, and as squares for F850LP (z'). Left: the best-fitting template is the Type IIn SN 1994Y, at a redshift of 0.2, but even this model provides a very poor fit. Right: the SEM is capable of providing a very good fit to these data. The diamond symbols mark the locations of the spline knots that define the SEM fit. In this case, the likelihood of a match for the SEM is larger than that from any of the SN templates.
Standard (28 KB)High-resolution (72 KB)
The structure of the SEM outlined here is necessarily very loose. Due to the relatively large number of parameters and the wide open parameter space, this model will usually reach a reasonably high maximum likelihood
. Those same flexibility features, however, will then substantially reduce the posterior probability through the Occam factor, because the characteristic width δmk is generally much smaller than the parameter range Δmk. The size of the SEM prior, the allowed parameter range Δmk, the number of spline knots per filter, and their spacing in time are all important components in determining the posterior probability p(SEM|D). These values should be tuned for any particular survey by fitting the SEM to known SN transients as well as examples of possible non-SN contaminants. A suitably optimized SEM will yield a relatively low p(SEM|D) for real SNe, while still providing a non-zero p(SEM|D) for unusual interlopers.
We have presented in detail the updated BATM program, and showed that a strictly probabilistic framework is insufficient for SN classification using fixed-shape templates. To overcome the practical and theoretical shortcomings of the BATM approach, we have introduced SOFT, a method based on fuzzy set theory. This technique allows us to classify SNe into the broad categories of TN and CC SNe using photometry alone.
The SOFT method is demonstrated on simulated observations of synthetic SNe modeled after the Pan-STARRS 1 (PS1) survey. Sampling a wide range of redshift, distance, and extinction we find that SOFT can produce average classification accuracies above 90% for TN SNe and above 80% for CC SNe. Additional verification of the SOFT classification accuracy is presented with 146 TN SN light curves from the SDSS-II survey, which again yield better than 90% classification accuracy. Almost all of the SDSS SN misclassifications could be influenced by peculiarities and ambiguities in the light curves themselves.
We suggest two additional non-SN models that can be helpful in identifying anomalous objects within a SN candidate set. The ZFM catches SN impostors that have a very low S/N, and the SEM provides a flexible, non-physical model to assist in the identification of unusual objects like the peculiar transient SCP 06F6.
A number of further refinements to the SOFT program could improve its functionality and accuracy for automated SN classification. The strength of any template matching software is limited by the size and quality of the template library it draws from. The set of templates used for this work has been much enhanced in this regard since the earliest incarnation of BATM, but there is much room for further development. In particular, collecting template light curves with more complete UBVRI data—especially in the rise time of the light curve—would allow for more accurate spline fits to define the primary templates. This is particularly important for the CC SNe, for which the available collection of well-sampled light curves and early-time spectra is notably thin. Adding to the template library some light curves observed at high redshift could provide a useful supplement to the scarce UV data from the low-z sample.
The SOFT method introduces two new parameters that characterize the template library itself: σ'j quantifies the amount of "fuzziness" and the q parameter controls the strength of the fuzzy combination operator. These parameters reflect the number and diversity of templates in the template library, and appropriate values can be estimated from a comparison of templates against each other. We performed a preliminary calibration of these parameters and settled on σTN = 0.1 for TN SN templates, σCC = 0.15 for CC SN templates, and q = 0.2 for all combinations. A more precise determination of σ'j and q using a large training set of well-observed SNe would be a useful next step.
In addition to classifying SNe, the SOFT method can be used to determine parameter estimates for candidate SNe, based on the comparison of light curves. This extension of the SOFT technique is presented in a companion paper, where we focus on the use of SOFT to derive redshift and distance estimates from TN SNe for constraining cosmological models.
↑ CloseWe thank the anonymous referee for constructive comments that led to significant improvements in this work. We thank Stephane Blondin for providing access and assistance with the collection of SNID spectra and Saurabh Jha for sharing low-z light curves. We thank Brian Connolly for helpful discussion and review of an early draft, and Kathleen Robertson for assistance with reference materials. This work made use of the SUSPECT database of SN light curves and spectra, maintained by Jerod Parrent (http://bruford.nhn.ou.edu/~suspect).
Note that this formulation for the luminosity distance does not mean that BATM and SOFT are dependent on an assumed cosmology. In Equation (3), the distance parameter, μe, quantifies the dimming (or brightening) of a SN relative to how it would appear in an empty universe. The only cosmological parameter appearing here is H0, which factors out as a constant term, independent of redshift z. Therefore, when comparing the BATM/SOFT distance estimates of two SNe at different redshifts, there is no bias introduced by assuming a cosmology, so long as we restrict ourselves to relative distance estimates.
Here Δ represents a vector including a width parameter such as Δm15 or "stretch," and possibly also a color parameter and other higher-order terms.
Note that this is in fact a likelihood and not yet a true probability, in the sense that the integral of Equation (5) over all models Mj and all locations θ will not be equal to unity. Thus, p(D|Mj, θ) is merely a dimensionless function that quantifies the likelihood that the data set D could be observed under the assumption that Mj and θ are a perfect model for SN X.
This reduction is only used for convenience. The conclusions drawn are equally valid for any finite set of models.
We have chosen the Gaussian function for convenience. Another form (such as a Lorentzian or a Voigt Profile) could be equally appropriate.
Specifically, the second and third Kolmogorov axioms, stating that ∑iP(ei) = 1 for a complete set of elementary events ei, and that P(A
B
C) = P(A) + P(B) + P(C).