This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.

QUANTIFYING QUASAR VARIABILITY AS PART OF A GENERAL APPROACH TO CLASSIFYING CONTINUOUSLY VARYING SOURCES

, , , , , , , , , , , and

Published 2009 December 16 © 2010. The American Astronomical Society. All rights reserved.
, , Citation Szymon Kozłowski et al 2010 ApJ 708 927 DOI 10.1088/0004-637X/708/2/927

0004-637X/708/2/927

ABSTRACT

Robust fast methods to classify variable light curves in large sky surveys are becoming increasingly important. While it is relatively straightforward to identify common periodic stars and particular transient events (supernovae, novae, microlensing events), there is no equivalent for non-periodic continuously varying sources (quasars, aperiodic stellar variability). In this paper, we present a fast method for modeling and classifying such sources. We demonstrate the method using ∼86, 000 variable sources from the OGLE-II survey of the LMC and ∼2700 mid-IR-selected quasar candidates from the OGLE-III survey of the LMC and SMC. We discuss the location of common variability classes in the parameter space of the model. In particular, we show that quasars occupy a distinct region of variability space, providing a simple quantitative approach to the variability selection of quasars.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

The focus of astronomy on the transient universe is steadily increasing because of the wealth of astrophysical information provided by time variability. Examples of past and current variability surveys are the Galactic microlensing surveys (e.g., OGLE, Udalski et al. 1997; MOA, Sumi et al. 2003; MACHO, Alcock et al. 2000; EROS, Beaulieu et al. 1995; SuperMACHO, Becker et al. 2005), searches for both local (LOSS; Filippenko et al. 2001) and cosmologically distant (e.g., SDSS, Sako et al. 2008; CFHTLS, Cabanac et al. 2007; Essence, Miknaitis et al. 2007) supernovae, all-sky variability surveys (e.g., ASAS, Pojmański 1997; QUEST, Vivas et al. 2004; NSVS, Woźniak et al. 2004; Catalina, Drake et al. 2009), and prompt γ-ray burst monitors (e.g., BATSE, Meegan et al. 1992; RAPTOR, Vestrand et al. 2002; ROTSE, Rykoff et al. 2005; SWIFT, Burrows et al. 2005). A complete, or even partial, listing of more directed monitoring projects would be impossible, and the scale of variability surveys continues to grow rapidly with projects such as OGLE-IV (A. Udalski et al. 2010, in preparation), Pan-STARRS (Hodapp et al. 2004), and LSST (Ivezic et al. 2008).

A common problem for variability surveys is the classification of light curves given the enormous range of physical processes leading to variability (e.g., Sterken & Jaschek 1996). Classification is well developed for simple periodic sources (e.g., Cepheids, RR Lyrae, eclipsing binaries, etc.) and impulsive (e.g., microlensing) or explosive sources (e.g., supernovae, novae), but less well developed for sources with non-periodic, continuous, or very long timescale variability such as quasars. Most efforts at classification have focused on the broad spectrum of periodic sources, using the Fourier amplitudes of the light curves to recognize different variable classes using a broad range of methods (e.g., Eyer 2002; Pojmański 2002; Eyer & Blake 2005; Soszyński et al. 2008b). More sophisticated methods for classification of variable objects include neural networks (e.g., Belokurov et al. 2004) and self-organizing maps (e.g., Wyrzykowski & Belokurov 2008).

Attempts at classifying non-periodic, continuously variable sources have largely focused on identifying quasars. Sesar et al. (2007) found that essentially all quasars vary at some level on long timescales, with 90% of Sloan Digital Sky Survey (SDSS) quasars having rms variability ⩾0.03 mag on a six year timescale. The average variability of quasars can be described by a structure function with greater variability amplitudes on longer time baselines (e.g., Vanden Berk et al. 2004; de Vries et al. 2005), and this has also been observed for smaller numbers of individually monitored quasars (e.g., Curti et al. 1985; Nelson 1996; Collin et al. 2002; Sergeev et al. 2005). Thus, variability searches for quasars (e.g., Hawkins 1983; Veron & Hawkins 1995, or Dobrzycki et al. 2003; Geha et al. 2003; Rengstorf et al. 2004a; Dobrzycki et al. 2005; Sumi et al. 2005 more recently) generally look for non-periodic sources with long timescale variability. The most formal approach to date is that used by Eyer (2002) and Sumi et al. (2005), where the selection criterion was based on the slope of the structure function of the light curves. These searches have generally been more successful in relatively empty extragalactic regions than in those with high stellar densities.

A significant problem in these attempts is that there was no simple method to reduce a non-periodic, continuously variable light curve to a small set of numbers, such as the periods and amplitudes of a Fourier series, which can be used for classification. Recently, Kelly et al. (2009) found that quasar light curves could be well modeled as a stochastic process with an exponential covariance matrix characterized by an amplitude and a timescale based on a small sample (∼100) of quasars with multi-year light curves. Physically, the model is a damped random walk, and it has a broken power law structure function consistent with studies of quasar structure functions. Unfortunately, there are few large samples of quasars with extensive monitoring data. To our knowledge, there are the quasars in the SDSS equatorial strip, with roughly 60 epochs over ∼six years (Sesar et al. 2007; Bramich et al. 2008), the QUEST survey (Rengstorf et al. 2004b), whose light curves are not publicly available, and any quasars lying in the microlensing survey regions. Recently, Kozłowski & Kochanek (2009) used mid-IR color selection (e.g., Stern et al. 2005; also see Lacy et al. 2004) to identify a large sample of quasar candidates in the Magellanic Clouds. While they have yet to be spectroscopically confirmed, the nature of the selection method means that the purity of the main sample of candidates should be high. By combining these candidates with the OGLE-II and OGLE-III (Udalski et al. 1997, 2008) light curves, we can verify that quasar light curves are well described by a stochastic process based on a much larger sample of objects than used by Kelly et al. (2009) and then explore using this approach to light-curve modeling for classifying light curves in general, including periodic sources. Obviously, a damped random walk is not an optimal model for periodic sources, but there is no technical difficulty in modeling them using the stochastic process.

This paper is organized as follows. In Section 2, we describe the light-curve data and the existing catalogs of variable sources we can use for classifying light curves. In Section 3, we describe the mathematical model. We do not use the particular forecasting methodology of Kelly et al. (2009), but a more statistically optimal approach based on Press et al. (1992a) and Rybicki & Press (1992, 1994) that we detail in the Appendix. In Section 4, we examine how well the model works for the known quasars from Kelly et al. (2009) and the Kozłowski & Kochanek (2009) quasar candidates, the distribution of quasars in the parameter space of the model, and the completeness of variability selection. In Section 5, we examine the problem of contamination by determining the distribution of the general population of variable LMC sources in the model parameters. We also examine how the stochastic process models treat true periodic variables. In Section 6, we combine these results and discuss the problem of variability selecting quasars in dense stellar fields like the LMC. We summarize our results and discuss future uses in Section 7.

2. DATA

The data used in this paper were collected with the 1.3 m Warsaw telescope at the Las Campanas Observatory, Chile during the second (1996–2000) and third (2001–2008) phases of the OGLE project (Udalski et al. 1997, 2008). The majority of the OGLE data were collected in the I-band (∼370 points per light curve in OGLE-III) with a smaller number of points in the V-band (∼40 points per light curve). The OGLE-III light curves for many common variable stars are available on the internet.6 The OGLE-II database of light curves (I-band only) can be accessed through several Web interfaces7 (Szymański 2005; Udalski et al. 1997). We make use of published OGLE samples of classical Cepheids (Cep; Soszyński et al. 2008b), double mode Cepheids (dCep; Soszyński et al. 2008a), RR Lyrae stars (RRLyr; Soszyński et al. 2009), OGLE Small Amplitude Variable Red Giants (OSARGs; Soszyński et al. 2004a), eclipsing binaries8 (ECL; Wyrzykowski et al. 2003), ellipsoidal variable red giants9 (ELL; Soszyński et al. 2004b), long secondary period variables (LSPs; Soszyński 2007), and long period variables (Miras, LPVs, and other semiregular variables; Soszyński et al. 2005) in the LMC. We separately extracted the OGLE-II light curves of Be stars from Keller et al. (2002), and ∼300 OGLE-II and ∼2700 OGLE-III light curves of the mid-IR-selected quasar candidates from Kozłowski & Kochanek (2009). We also reanalyzed the 109 quasars from Kelly et al. (2009).

In addition to these specific samples, we downloaded ∼150,000 "variable objects" from six OGLE-II LMC "inner" fields (SC1–SC6) covering about 1.3 deg2 and centered on the highest stellar density regions, and ∼38,000 "variable objects" from the three lowest stellar density "outer" fields (SC15, SC19, and SC20). The core of this paper focuses on the analysis of the highest density inner fields as a worst case scenario for identifying quasars, with the lower stellar density outer LMC fields as a simpler comparison sample. An object was defined to be variable if the standard deviation of its light curve from a constant flux was at least twice the median photometric error (hereafter called the variability criterion). A large fraction of these "variables" are flat light curves with small numbers of outliers, and these are removed by the following cleaning procedure. First, we correct the photometric errors following the procedure and values from Wyrzykowski et al. (2009). Then we removed up to five significant (>5σ) outliers (relative to their adjacent epochs) from each light curve, where a typical OGLE-II light curve has 360 epochs. After removing the sources no longer satisfying the variability criterion, we are left with 86,301 light curves in the inner LMC fields and 25,956 in the outer ones. The inner fields include 584 RR Lyrae (499 "ab" and 85 "c"), 133 classical Cep, 7 dCep, 6745 OSARGs, 374 ELL, 46 ECL, 699 LSPs, 1442 LPVs, and 15 Be stars. We also use the 58 OGLE-II and 984 OGLE-III quasar candidates meeting the variability criterion.

On further inspection many of these "variable" stars are created by problems in the wings of the point-spread functions (PSFs) of bright (variable) stars. This has been noted in earlier studies (e.g., Eyer 2002; Sumi et al. 2005). In particular, faint stars near bright variable stars tend to vary in phase with the bright source. We apply a three step procedure to remove these "ghost" variable objects. First, we mask regions near bright stars. Stars brighter than I ≈ 12.5 mag are saturated in the OGLE data. These stars have PSF wings extending to 60 arcsec but they are not formally "detected" in the OGLE catalogs. In order to mask these stars, we used stellar objects from the 2mass catalog (Cutri et al. 2003) brighter than J = 14 mag. The masks are defined by rectangular regions with $\Delta \rm R.A.=2\delta$, $\Delta \rm decl.=\delta$ around the Two Micron All Sky Survey (2MASS) position of each star, where δ = 120(1 − J/14) arcsec. If a variable object is inside one of these regions, it is removed from the list of variables. We also masked visually identified regions that showed clear photometric problems (a dense clump of variable objects, variables detected on a "bloomed" CCD line, etc.). Our second criterion is that no variable could have any other variable object within 6 arcsec. Finally, the third criterion is that no variable can have a >2 mag brighter variable within 15 arcsec. These steps could be carried out with greater precision, but cleaning the OGLE catalog of spurious variables is not our primary focus even if it is ultimately a limitation and is one reason for using our rather conservative variability criterion. One additional problem with the OGLE-II photometry is a result of the realuminization of the telescope's mirror. A small number of light curves show a step-like change in magnitude between seasons 2 and 3. These light curves are identified during the visual inspection we carried out for the final selection of quasar candidates. A summary of these photometric problems is also given in Eyer (2002) and Sumi et al. (2005).

3. METHODOLOGY

We are interested in modeling random processes leading to variability. Many such processes can be described by the covariance matrix S of the signal, where we will be considering the particular case of an exponential covariance

Equation (1)

between epochs at ti and tj used by Kelly et al. (2009) to model quasar light curves based on a standard method from the forecasting literature (see, e.g., Brockwell & Davis 2002). As explained in detail by Kelly et al. (2009), the physics of this process is a random walk driving term of amplitude σ balanced by a damping timescale τ for a return to the mean. It is useful to define $\sigma ^2 = \tau \hat{\sigma }^2/2$ and use τ and $\hat{\sigma }$ as our model parameters—on short timescales |Δt| ≪ τ the dispersion between two points is $\hat{\sigma }|\Delta t|^{1/2}$ while on long timescales it asymptotes to σ. Computationally, there is less covariance between τ and $\hat{\sigma }$ than τ and σ. The power spectrum of the process is

Equation (2)

corresponding to white noise on long timescales (f → 0) and then a ∝f−2 fall on short timescales (f>1/τ). Kelly et al. (2009) suggest that τ may be related to the thermal timescale of accretion disks. We estimate these parameters using the approach of Press et al. (1992a), its generalization in Rybicki & Press (1992), and the fast computational implementation in Rybicki & Press (1994). We will collectively refer to these as the PRH method. In the Appendix, we summarize this approach, show how it can be used to derive the forecasting model of Kelly et al. (2009), and demonstrate that it is more statistically powerful than the forecasting approach while still requiring only O(N) operations for a light curve with N points.

Operationally, we fit a light curve by maximizing the likelihood (Equation (A8)) that the light curve can be fitted by the process model to determine the parameters τ and $\hat{\sigma }$ (as well as their uncertainties, if desired). We include the light curve mean as a simultaneously optimized linear parameter (Equations (A4) and (A7)). Given the estimates for τ and $\hat{\sigma }$, we have optimal estimates for the mean light curve at both the observed points (Equation (A3)) and at any other time (Equation (A11)), as well as the variances about these means (Equations (A9) and (A12), respectively). Where we show these reconstructions and the "error snakes" defined by the variances, there are two important points to consider when comparing them to the data points. First, these variances are the variances in the mean light curve and not the variance of the data relative to the mean light curve. The latter quantity, given in Equation (A10), is defined only where there is data and so is ill-suited to showing a continuous light curve. Data points will be scattered relative to the mean light curve by the combination of the variance in the mean light curve and the uncertainties in the individual data points. Second, the reconstructed light curve is not an example of an individual random walk defined by the parameters (τ and $\hat{\sigma }$), but rather the average of all such random walks that are consistent with the observed light curve given its uncertainties. The variance in the reconstructed light curve is then the variance of these individual random walks about the mean. If we generated individual random walks constrained by the data (see PRH), they would track the mean light curve and (statistically) stay inside the "error snake" defined by the variances, but they would show more structure on short timescales and excursions outside the "error snake" consistent with the estimated variances.

When we fit the model, we obtain a maximum likelihood estimate of τ and $\hat{\sigma }$ with log likelihood ln Lbest. There are two limiting cases for this model. First, as the timescale τ → 0, the covariance matrix becomes a diagonal matrix Sij → σ2δij equivalent to simply broadening the photometric errors. We refer to this limit as the "noise" limit with log likelihood ln Lnoise. We use a cut of ln Lbest>ln Lnoise + 2 to select objects that are modeled by the exponential process. Given the variability criterion from Section 2, this cut mainly eliminates variable objects with τ smaller than the mean epoch separation. The other limit, τ → , with likelihood ln L, is the limit where we cannot determine τ given the overall time span of the survey. This has not proved to be major issue with the OGLE data. We cannot distinguish τ from τ → for only 2% of the ∼86, 000 OGLE-II variables and 4.5% of the OGLE-III quasar candidates. For these objects the timescale τ was set to log10(τ) ≃ 5.

We also check for periodicity in each light curve using the Lomb–Scargle periodogram method as implemented by Press et al. (1992b). For each light curve we note the three most likely periods Pi (i = 1, 2, 3) and their probabilities. We used narrow notches at 1, 1/2, 1/3, and 1/4 of a day to minimize diurnal aliasing problems, while still being able to examine RR Lyrae periods. These aliasing solutions could be improved, but this is not the focus of this paper. We classify a source as periodic if the estimated period P is shorter than 200 days and the likelihood for the peak in the periodogram being observed at random is log10(pperiodic) < −3. We include a limit that periods must be shorter than 200 days because many quasars with large τ have a "period" satisfying log10(pperiodic) < −3 with P>200 day. These solutions corresponds to fitting a sine wave to the data, and as the number of timescales τ covered by the light curves shrinks, there is an increasing likelihood that a sine wave of some period will enormously improve the goodness of fit over no variability, leading to a false positive in the periodogram. For periodic variables we should see a strong correlation between P and τ.

4. QUASARS

For our quasar samples we use the 109 known quasars from Kelly et al. (2009) and ∼2700 mid-IR-selected quasar candidates detected in the OGLE-III LMC and SMC fields from Kozłowski & Kochanek (2009). The Kelly et al. (2009) sample is a heterogeneous mixture of MACHO variability-selected quasars (Geha et al. 2003) and quasars monitored at least in part for reverberation mapping (Giveon et al. 1999; Peterson et al. 2004). They have the advantage of being confirmed quasars, but they are a non-random sample of quasars, and many of the reverberation mapping quasars have poorly sampled light curves compared to the MACHO or OGLE light curves. The mid-IR-selected candidates have the disadvantage that they are not confirmed quasars. Kozłowski & Kochanek (2009) classified the candidates based on their mid-IR colors (A/B for being inconsistent/consistent with a black body), location in the mid-IR color–magnitude diagram (CMD; YSO/QSO for whether the object is/is not in the region contaminated with young stellar objects, YSOs), and optical to mid-IR color (a/b for whether the optical to mid-IR colors are/are not consistent with spectroscopically selected quasars). Here we only use the "a" sources. Based on the extragalactic sources from the AGN and Galaxy Evolution Survey (AGES; C. S. Kochanek et al. 2010, in preparation), the QSO-Aa candidates, which are the majority, should be almost entirely comprised of quasars, while the other classes are likely dominated by contaminating sources.

Ideally, we should work with a quasar light curve that is free from contamination by the host galaxy. This can be a problem because we fit the light curves in magnitudes rather than flux (as did Kelly et al. 2009). In magnitude space, the effects of contamination on the process parameters are nonlinear. Extra contaminating flux reduces amplitudes and increases timescales, while oversubtracting any contamination has the reverse effects. For the bright, distant quasars from Geha et al. (2003), Kozłowski & Kochanek (2009) and, to a large extent, Giveon et al. (1999), this will be a limited problem. Not only are these very luminous quasars, but the observations are largely in the rest-frame ultraviolet, where host galaxies have little flux. The Kozłowski & Kochanek (2009) sample will also have few sources with significant host contamination because such sources are lost as part of the mid-IR selection (see Gorjian et al. 2008; Assef et al. 2009). It is emphatically not true of the local reverberation mapping targets. If we take the continuum light curves for MRK 279 (Santos-Lleó et al. 2001), MRK 509 (Carone et al. 1996), NGC 3783 (Stirpe et al. 1994), NGC 4051 (Peterson et al. 2000), and NGC 5548 (Peterson et al. 2002, and references therein), and then either subtract the remaining host flux in the spectroscopic apertures or add in the host flux from outside the spectroscopic apertures based on Bentz et al. (2009), we see significant changes in the model parameters. On average, adding the rest of the host flux (subtracting the remaining host flux in the aperture) reduces (increases) the variability amplitude by an average of −0.73 dex (+0.38 dex) and increases (decreases) the characteristic timescale by an average of +0.77 dex (−0.42 dex) for five sources. The effects are larger when including the remainder of the host because the spectroscopic apertures used to obtain these light curves are already excluding most of the host galaxy flux—the host flux outside the aperture is 1–10 times larger than the host flux in the aperture. Thus, as with reverberation mapping, correlations of the process parameters with other quasar properties will be sensitive to the treatment of the host galaxy. We will not consider this problem further since we are focusing on the observed properties of quasar variability rather than their detailed interpretation.

Figure 1 shows four examples of QSO light curves modeled by the stochastic process. These fits are typical, and like Kelly et al. (2009), we find that most quasars are well modeled by the process, as illustrated by the χ2/dof distribution shown in Figure 2. Some care is required to interpret this distribution. All objects we consider are likely to be poorly described by no variability, so the process model will always produce a reduced χ2. Also keep in mind that one limit of the model is simply to broaden the error bars (the "noise" limit), which can always allow χ2/dof → 1 up to the effects of the priors on $\hat{\sigma }$ and τ. Nonetheless, the final χ2/dof distribution is far narrower than the distribution from fitting the linear model and few objects are doing so in the "noise limit" corresponding to simply broadening the uncertainties. The distribution is somewhat broader than it should be for correctly estimated Gaussian uncertainties. For a typical OGLE-III light curve with N = 360 points, we expect $\chi ^2/N\simeq 1\pm \sqrt{2/N}=1\pm 0.07$. That the distribution is broader, and relatively symmetric, suggests that some of the difference lies in the accuracy of the photometric errors, since a 3% misestimation of the uncertainties is enough to cause such a shift. There is some skewness to larger χ2/dof, but this could simply be due our pruning of outliers being too conservative—it only takes two 4σ outliers to produce a 0.07 shift in χ2/N. Nonetheless, we cannot rule out the possibility that the process model is a poor or incomplete model for the variability physics in some objects.

Figure 1.

Figure 1. Examples of light-curve models for four quasars. The twelve-year-long light curves are from the OGLE-II and OGLE-III surveys (years 1997–2008, separated by the seasonal gaps). The top two light curves are spectroscopically confirmed quasars from Geha et al. (2003; top) and Dobrzycki et al. (2005). The bottom two light curves are mid-IR-selected quasars from the Kozłowski & Kochanek (2009) sample (awaiting spectroscopic confirmation). The solid lines represent the best-fit mean model light curves from the exponential covariance method (see the Appendix). The area between the dotted lines represent the 1σ range of possible stochastic models. These "error snakes" bound the reconstructed light curve and are thinner than the data points because of the additional measurement error on the data (see the discussion of this point in Section 3). We also give the best model parameter values along with the goodness of the fit defined by the χ2 per degree of freedom. The average error bars on the model parameters are Δlog10(τ) = 0.37 and $\Delta \log _{10}(\hat{\sigma })=0.044$.

Standard image High-resolution image
Figure 2.

Figure 2. Distributions of the goodness of fit for the stochastic model (linear trend) are shown by the solid (dashed) line for the ∼2, 700 OGLE-III light curves of the mid-IR-selected quasars from Kozłowski & Kochanek (2009). The improvement over fitting the linear trend from using the stochastic model is clearly visible. The dotted line is the expected distribution of χ2/dof based on the number of degrees of freedom for each light curve. The differences between the stochastic model and the expected distribution are some combination of errors in the estimated photometric errors, small numbers of outliers that have not been eliminated from the light curves, and any poorly modeled physics.

Standard image High-resolution image

The detection of variability is strongly magnitude-dependent because fainter sources require larger amplitudes for variability to be detected. Figure 3 shows the fraction of mid-IR quasar candidates that pass the two variability criteria based on either OGLE-II or OGLE-III data. One advantage of the process model is that we can explicitly calculate the completeness of a survey as a function of the parameters using Monte Carlo simulations. At each point on a grid of the process parameters ($\hat{\sigma }$ and τ), we randomly generate N light curves using the temporal sampling of a randomly selected real light curve. Generating simulated light curves of zero mean is trivial, since the next point

Equation (3)

where Δt is the time interval and G(x2) is a Gaussian deviate of dispersion x. The observed light curve is then yi = si + G(n2i) where ni is the observational noise. The chain is initiated by s1 = G2). We use the mean measurement error at a given magnitude, but one could use the error estimates from a light curve matched in magnitude to the trial. Next, we apply our variability criterion, that the rms of the light curve must be twice the measurement errors, and that ln Lbest>ln Lnoise + 2, to each trial light curve. The completeness is simply the fraction of the trials that pass both selection criteria. For speed, we estimate the second criterion by using the likelihood at the input parameters for Lbest and the likelihood for the model with the same asymptotic variance but τ → 0 for Lnoise.

Figure 3.

Figure 3. Fraction of variable QSO and YSO candidates as a function of magnitude for OGLE-III (solid line) and OGLE-II (dashed line). A source is counted as variable if it passes the variability criterion of Section 2 and has ln Lbest>ln Lnoise + 2.

Standard image High-resolution image

Figure 4 shows the resulting completeness estimates for OGLE-II and OGLE-III light curves at I = 16.5, 17.5, 18.5, and 19.5 mag, where the typical photometric uncertainties (including the rescalings of Section 2) are σphot = 0.010, 0.022, 0.050, and 0.12 mag for OGLE-II and σphot = 0.008, 0.017, 0.037, and 0.087 mag for OGLE-III. The general structure of the completeness limit is easily understood. The variability criterion can be approximated by noting that in the continuum limit, where we integrate over a continuous light curve rather examining discreet samples, the variance of a light curve described by an exponential covariance matrix relative to its mean is

Equation (4)

where x = Δt/τ is the ratio of the survey duration to the timescale τ. When the survey duration is long compared to the timescale (x ≫ 1), the signal variance is simply σ2, and light curves above the diagonal line with σ2 + σ2phot>4σ2phot will satisfy the variability criterion. When the survey is short (x ≪ 1), the observed variance is limited by the survey duration to $\sigma ^2 x/3=\hat{\sigma }^2\Delta t/6$ and the completeness is high if $\hat{\sigma }$ is sufficiently large. The criterion that the process is distinguishable from simply increasing the error bars eliminates sources in the corner with short timescales and high amplitudes. Note that the variability criterion is very conservative because its selection limits do not improve with additional data. Using only the second criterion, based on whether the process model fits better than simply expanding the measurement errors, would lead to higher completeness for longer τ and smaller $\hat{\sigma }$ but at the expense of higher false positive rates due to systematic errors in the data. Compared to the OGLE-II survey, the OGLE-III survey is more sensitive and covers a longer temporal baseline but with a larger epoch spacing, so it is more complete for large τ and smaller σ, while less complete for small τ. The completeness limit is largely determined by the variability criterion except for the corner with high amplitudes and short timescales. While not shown, the distribution of variable sources as a function of magnitude follows these completeness limits closely.

Figure 4.

Figure 4. Monte Carlo simulations of completeness for source magnitudes of $\hbox{\mbox{\it I}}=16.5$ (top left), 17.5 (top right), 18.5 (lower left), and 19.5 mag (lower right). The OGLE-II (solid line) and OLGE-III (dotted line) surveys have different cadences, durations, and depths, leading to differences in their completeness limits. While not plotted, the distributions of variable sources at a given magnitude closely track these completeness limits in the sense that variable sources passing the selection criteria are not found in the regions where the completeness calculations say they should not be detected. The three lines show completeness levels of 10%, 50%, and 90% (from left to right). The gray shaded region shows the region occupied by quasars (Cut 2 of Section 6).

Standard image High-resolution image

Figure 5 shows the distribution of quasars in an optical CMD, $\hat{\sigma }_I{-}\tau$ parameter space (shown in detail in Figure 6), $\hat{\sigma }_I-$magnitude space and, where available, the relationship between $\hat{\sigma }_I$ and $\hat{\sigma }_V$ at fixed τ. Because the V-band light curves are poorly sampled compared to the I-band, we determine $\hat{\sigma }_V$ using the τ derived for the I-band light curve. To better understand the contamination problems we discuss later, we superpose a CMD of LMC stars based on the Hubble Space Telescope (HST) Local Group Stellar Photometry Archive10 (Holtzman et al. 2006). On the CMD, one can quickly identify the red clump (RC), the red giant branch (RGB), and the top of the main sequence. Since these are derived from small HST fields, the sequences of higher luminosity stars are not significantly populated.

Figure 5.

Figure 5. Locations of the ∼1000 variable mid-IR-selected quasar candidates from Kozłowski & Kochanek (2009) are shown in the CMD (top left), in $\hat{\sigma }_I{-}I$ space (top right), in $\hat{\sigma }{-}\tau$ space (bottom left) and in ($\hat{\sigma }_V/\hat{\sigma }_I){-}\hat{\sigma }_I$ space (bottom right). In the top-left panel, the contours show the CMD of LMC stars, where the contours are for 1, 5, 10, and 20 stars per 0.05 × 0.05 bin, counting from the outer contour. The RC is at (VI, I) = (1.05, 18.3) mag. In the lower left panel, we show the lines of constant asymptotic variability σ for 0.01, 0.1, and 1.0 mag. The Kozłowski & Kochanek (2009) candidates are coded by color, where the high purity QSO-Aa objects are black, and the high contamination QSO-Ba and YSO-(AB)a are shown in red and green, respectively. The gray shaded regions indicate the regions occupied by quasars, and are defined by Cuts of Section 6.

Standard image High-resolution image
Figure 6.

Figure 6. Locations of the ∼1000 variable mid-IR-selected quasar candidates from Kozłowski & Kochanek (2009) and ∼100 quasars from Kelly et al. (2009) are shown in $\hat{\sigma }{-}\tau$ space. The QSO-Aa objects are presented in the top-left panel, QSO-Ba in the top-right panel, YSO-(AB)a in the bottom-right panel, and the Kelly et al. (2009) quasars in the bottom-left panel. The latter sample is split into Palomar Green (square), AGN Watch (star) and MACHO (triangle) quasars. We also show the lines of constant asymptotic variability σ for 0.01, 0.1, and 1.0 mag (from left to right) and the region occupied by quasars (Cut 2 of Section 6, gray area). The thick solid lines in the bottom-left panel are for five AGN Watch quasars, and show the change in the model parameters after adding (subtracting) the host-galaxy flux outside (inside) the spectroscopic aperture used for the light curve. Adding more host moves the source to longer timescales and lower amplitudes.

Standard image High-resolution image
Figure 7.

Figure 7. Our cleaned sample of ∼10, 000 variable objects from the six central OGLE-II LMC fields (SC1–SC6) satisfying the basic variability criteria from Section 2. Panels are the same as in Figure 5 up to changes in the scales. The shaded regions show the quasar selection Cuts 2–4 from Section 6.

Standard image High-resolution image
Figure 8.

Figure 8. CMDs by variable type. In four panels, for clarity, we show distinct classes of variable objects. For a detailed description and discussion see text (Section 5). Objects in the two masked areas in the left panels were moved from the bottom panel to the upper one. These objects were previously not cataloged by OGLE, but they are known variable classes. In the bottom-left panel, we show both periodic and non-periodic objects of unknown origin (or previously overlooked) and in the text we discuss the fraction that are quasars. The confirmed quasars and the OGLE-II quasar candidates have a similar distribution to the OGLE-III mid-IR-selected quasar candidates (blue). For clarity, we do not include these smaller quasar samples here or in the subsequent figures.

Standard image High-resolution image

We see that quasars occupy a well defined locus in the $\hat{\sigma }{-}\tau$ space, lying in a band with $-2\log _{10}(\hat{\sigma })+0.3 \le \log _{10}(\tau) \le -2\log _{10}(\hat{\sigma })+1.6$ that roughly corresponds to an asymptotic variability of σ ≃ 0.1 mag. Variability timescales longer than a few 1000 days are rare, although τ is indeterminate but long for about 5% of the sample. The estimate of τ becomes very uncertain as it approaches the survey length. At very short τ, systems are lost because they fail the ln Lbest>ln Lnoise + 2 criterion. If we focus on the QSO-A sources, which are overwhelmingly going to be real quasars, we can define a shaded region in Figure 5 (Cut 2 in Section 6) that encompasses most of these sources (69% of those identified as variable in OGLE-III). The more contaminated QSO-B and YSO candidates are less likely to fall into this region (see Figure 6, and Tables 1 and 2), providing added evidence of their higher levels of contamination by LMC sources. The mid-IR quasar candidates tend to lie in the Hertzsprung Gap of the CMD, consistent with the typical color 0.4 mag ⩽ (VI) ⩽ 1.0 mag of z < 3 quasars. They also show relatively achromatic variability between the V and I bands, with more chromaticity for small $\hat{\sigma }$ and larger τ sources. Tables 1 and 2 summarize the properties of the mid-IR-selected quasars for both OGLE-II and OGLE-III light curves. If we compare the locations of the quasars in the space of the variability parameters (Figure 6) to the completeness estimation of Figure 4, it is easy to understand the magnitude dependence of the fraction of variable quasars in Figure 3.

Table 1. Sequential Cuts on Variable Objects

Survey Matched Object Variable Not Noise Cut 1 Cut 2 Cut 3 Cut 4
  to OGLE Type (>2σ)   pperiodic $\hat{\sigma }{-}\tau$ $\hat{\sigma }{-}I$ $\hat{\sigma }_V/\hat{\sigma }_I$
OGLE-III—Quasar Candidates
I>16 mag
OGLE-III 2156 QSO-Aa 721 613 598 413 401 373
OGLE-III 215 QSO-Ba 75 63 62 35 35 33
OGLE-III 198 YSO-Aa 147 140 116 47 28 27
OGLE-III 24 YSO-Ba 15 15 14 9 5 5
OGLE-III—Quasar Candidates
16 mag ⩽ I ⩽ 19.5 mag
OGLE-III 2156 QSO-Aa 721 613 598 413 280 267
OGLE-III 215 QSO-Ba 75 63 62 35 23 23
OGLE-III 198 YSO-Aa 147 140 116 47 26 25
OGLE-III 24 YSO-Ba 15 15 14 9 5 5
OGLE-II—Quasar Candidates
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II 209 QSO-Aa 23 23 19 8 6 4
OGLE-II 17 QSO-Ba 1 1 1 0 0 0
OGLE-II 54 YSO-Aa 31 28 15 7 5 4
OGLE-II 3 YSO-Ba 1 1 1 0 0 0
OGLE-II LMC Inner Fields—All Sources
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II I only No masks 86301 64834 37599 8469 1939 731
OGLE-II I only With masks 13658 10406 3461 933 121 58
OGLE-II V and I No masks 71247 53942 30798 7033 1643 731
OGLE-II V and I With masks 12729 9972 3219 848 105 58
OGLE-II LMC Outer Fields—All sources
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II I only No masks 25956 19251 16173 2044 355 100
OGLE-II I only With masks 2375 584 444 117 11 2
OGLE-II V and I No masks 16404 12152 9820 1401 336 100
OGLE-II V and I With masks 1630 443 312 73 10 2

Download table as:  ASCIITypeset image

Table 2. Individual Cuts on Variable Objects

Survey Matched Object Variable Not noise Cut 1 Cut 2 Cut 3 Cut 4
  to OGLE Type (>2σ)   pperiodic $\hat{\sigma }{-}\tau$ $\hat{\sigma }$-I $\hat{\sigma }_V/\hat{\sigma }_I$
OGLE-III—Quasar Candidates
I>16 mag
OGLE-III 2156 QSO-Aa 721 1303 2131 1043 1560 1328
OGLE-III 215 QSO-Ba 75 146 231 110 165 140
OGLE-III 198 YSO-Aa 147 175 169 77 84 137
OGLE-III 24 YSO-Ba 15 21 23 13 14 19
OGLE-III—Quasar Candidates
16 mag ⩽ I ⩽ 19.5 mag
OGLE-III 2156 QSO-Aa 721 1303 2131 1043 640 1328
OGLE-III 215 QSO-Ba 75 146 231 110 60 140
OGLE-III 198 YSO-Aa 147 175 169 77 63 137
OGLE-III 24 YSO-Ba 15 21 23 13 10 19
OGLE-II—Quasar Candidates
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II 209 QSO-Aa 23 148 199 50 138 69
OGLE-II 17 QSO-Ba 1 10 17 0 7 4
OGLE-II 54 YSO-Aa 31 48 37 15 21 19
OGLE-II 3 YSO-Ba 1 3 3 0 2 1
OGLE-II LMC Inner Fields—All Sources
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II I only No masks 86301 64834 37599 14357 6314 14312
OGLE-II I only With masks 13658 10406 3461 1980 800 5433
OGLE-II V and I No masks 71247 53942 30798 12182 5333 14312
OGLE-II V and I With masks 12729 9973 3219 1853 706 5433
OGLE-II LMC Outer Fields—All Sources
16 mag ⩽ I ⩽ 19.5 mag
OGLE-II I only No masks 25956 19251 16173 2508 968 2056
OGLE-II I only With masks 3446 1272 658 171 167 298
OGLE-II V and I No masks 16404 12152 9820 1833 859 2056
OGLE-II V and I With masks 1630 443 312 98 142 70

Download table as:  ASCIITypeset image

The distribution of the quasars is a combination of intrinsic properties (variability at rest wavelength λ for a black hole of some mass and luminosity), distance (magnitude and scalings between rest-frame and observed-frame properties), and selection effects (Figure 4). The observed magnitude of the quasar depends on distance and K-corrections, and the variability parameters have redshift corrections (τrest = τ/(1 + z) and $\hat{\sigma }_{\rm rest} = \hat{\sigma }(1+z)^{1/2}$; Kelly et al. 2009) and potentially K-corrections for any scalings of the parameters with wavelength, such as those seen in average structure functions (e.g., Vanden Berk et al. 2004). Our incompleteness depends on observed rather than intrinsic properties, and this makes it highly likely that our estimate of the region occupied by quasars is largely correct despite the increasing problems with completeness at fainter magnitudes. In essence, the majority of quasars that we lose due to incompleteness correspond to quasars with similar intrinsic properties at lower redshifts where they are bright enough to be included, rather than quasars with intrinsically different properties. The numbers of quasars observed to have a given set of parameters are, however, strongly dependent on the selection function, and reconstructing the true distribution will require redshifts and (essentially) a V/Vmax method applied to the variability selection procedure. A significant advantage of our approach is that such problems can be quantitatively addressed using the variability model. The LMC sources we consider in the next section are an illustration of the opposite limit from the quasars. Since all LMC sources are at the same distance, we either detect them in this variability space given their magnitude or not.

We can also examine the distribution of the Kelly et al. (2009) quasars relative to the mid-infrared sample in Figure 6. They generally lie in the same region (79% are in the selection box), but their distribution in that region is very different. Most of the MACHO and PG quasars lie at the large τ end of the distribution, with a few of the reverberation mapping quasars dribbling toward short τ. The MACHO (Geha et al. 2003) quasars are variability selected, but the selection criterion may be a biased one compared to a (roughly) magnitude limited sampling of quasars. Figure 6 also shows the effects of adding or subtracting host galaxy flux discussed above.

5. GENERAL VARIABILITY OF SOURCES IN THE LMC

In Section 4, we demonstrated that quasars occupy a limited region of the variability parameter space. In order to select quasars using variability, we need to examine how other variable sources populate these parameter spaces. In Figure 7, we show the location of the general variable populations from the inner LMC fields for the same combinations of variables used for quasars in Figure 5. The distributions of outer field variables are similar but of lower density. Where possible we have identified the variable type using the catalogs discussed in Section 2. Tables 1 and 2 summarize the statistics of selecting sources in these fields. It is critical to clean the input catalogs, since the variability criterion identifies ∼86, 000 inner field variables before we apply the bright star masks, defect masks, and the isolation criteria described in Section 2, while there are only ∼13, 000 left afterward. Of these, ∼65, 000 (no masks) and ∼10, 000 (with masks) pass the restriction that the process model fits the data significantly better than simply expanding the photometric uncertainties (ln Lbest>ln Lnoise + 2). While the tables show the statistics for both cases, we only discuss the results for the sources including the masking and isolation criteria.

In Figure 7, we can quickly identify a number of sequences or overdensities of variable objects. We separate them, for clarity, in the four panels of Figures 812. In these figures, we plot rare samples last so that they are not masked by the common ones. Quasars are shown in multiple panels to maximize their visibility compared to other populations, and Figures 5 and 6 are always available for reference. In the top-left panel of each figure we show two samples of OSARGs, where "a" corresponds to the asymptotic giant branch objects and "b" to the RGB objects (Soszyński et al. 2007). They are approximately 3 mag brighter than the RC and also redder. In this panel, we also show slowly pulsating B stars (SPB; Kołaczkowski et al. 2006), active giants (AG), active subgiants (ASG), and RS CVn binaries (Drake 2006). There is no formal OGLE catalog of these variables, so we defined them based on the selection boxes shown in Figure 8. The OSARGs share the same area of the CMD with the LSPs, ELLs (top-right panel), and LPVs (bottom-right panel). The OGLE ECL samples are located just above the RC, and, slightly higher, one finds ELL (top right). Along the instability strip with an approximate color of (VI) = 0.5, one can find Cep, ∼3 mag brighter than RC, and RRLyr, which are slightly fainter then RCGs. The fundamental-mode Cep (RRLyr), denoted as "FU" ("ab"), and first-overtone ones, denoted as "FO" ("c"), form distinct sequences. Geha et al. (2003) in their attempt to select QSOs based on variability found that Be/Ae stars are the main source of contamination. Be stars can be found in the bottom-right panel just above the main sequence. In the bottom-left panel, we present all the remaining variable objects. These should be quasars, other unclassified variables, and sources with unidentified systematic problems. For comparison, we also show the OGLE-III quasar candidates in three of the panels, excluding the panel of unclassified sources.

The top-right panels of Figures 5 and 7 show the positions of objects in the "modified amplitude"–magnitude ($\hat{\sigma }-I$) space. Once again, a number of source groupings are easily noticed, where the most prominent ones are OSARGs, LSPs, LPVs, RR Lyrae, and Cepheids (color coded in Figure 10). OSARGs, ELLs, LSPs, and LPVs overlap, although they can be partially distinguished based on their periods (Figure 12). In $\hat{\sigma }-I$ space Cepheids and RR Lyrae occupy distinct areas.

We show the positions of all the variable stars and quasars in the space of the stochastic model parameters in Figures 5 and 7 (bottom-left panel). In this space of modified "characteristic time" and "variability amplitude" ($\tau -\hat{\sigma }$), we color coded all variability classes and present them in Figure 9. A majority of OSARGs and a substantial fraction of ELLs, LSPs, and LPVs can be easily distinguished from the quasar locus, although there is heavy contamination for long τ. Note also that this is also the region occupied by most MACHO quasars, and this may be one cause of low yield of variability-selected quasars in Geha et al. (2003). Cep, RRLyr, some LPVs, Be stars, ECLs, SPBs, and β Cep, AGs and ASGs also overlap the quasar locus area. Note that fundamental mode and first overtone Cep and RRLyr can be distinguished from each other in $\hat{\sigma }-\tau$ space.

Figure 9.

Figure 9. Characteristic time as a function of modified amplitude ($\hat{\sigma }{-}\tau$). The four panels show the same variability classes as in Figure 8. The gray band is the quasar locus (Cut 2 in Section 6) used to separate quasars from the unknown non-periodic objects shown in the bottom-left panel.

Standard image High-resolution image
Figure 10.

Figure 10. Modified amplitude vs. magnitude ($\hat{\sigma }{-}I$). The four panels show the same variability classes as in Figure 8. The gray area is the region defined by a high density of OGLE-III quasars, which separates them from other variability classes (Cut 3 in Section 6). The heaviest contamination comes from the active giants and subgiants (AG/ASG, top left), and hot blue pulsators (SPB/β Cep, top left) but these can be ruled out based on their periods (see Figure 12). The gray wedge can in principle extend to arbitrarily faint magnitudes, but must be restricted based on the survey depth. For example, OGLE-II objects fainter than I>19.5 mag are mostly false variables (below bottom horizontal dashed line).

Standard image High-resolution image

The bottom-right panels of Figures 5 and 7 show the variability ratio between the V and I bands as a function of I-band variability. The majority of LMC objects show higher variability amplitudes in the V-band than in the I-band, as we would expect since stellar variability is dominated by temperature changes. Quasar variability is less chromatic, so the amplitude ratio is another tool for separating quasars from variable stars. For these LMC fields, this is not critical because of the large magnitude differences between many of the variable stars and quasars. However, in a field dominated by Galactic sources spread over a wide range of distances, amplitude ratios have the advantage of being distance independent. The color-coded variability classes are shown in Figure 11.

Figure 11.

Figure 11. Ratio of variability amplitudes in the V and I bands. It is clear that stellar objects generally vary more in the V band than in the I band, while QSOs show smaller differences. The gray area is the region described by Cut 4 of Section 6.

Standard image High-resolution image

Figure 12 shows all the variable sources and quasars (blue dots) in the space of the characteristic timescale τ and the most likely period P derived from a standard periodogram. As noted in Section 3, quasars never have significant probabilities of being periodic sources unless the "period" is long, P ≳ 200 days. On inspection, these light curves have fluctuations on long timescales that are quasi-periodic over the extent of the light curve (particularly with yearly or half-yearly aliases), so a sine wave fit to the data is a significantly better fit to the data than a constant. Even the quasars with falsely probable periods show no correlation between the period and τ. This leads us to define a periodic source as one with P < 200 days and log10(pperiodic) < −3. On short timescales (τ or P ≲ 1 day) we see no real correlations because the periodograms have diurnal aliasing problems and many sources fail to pass the ln Lbest>ln Lnoise + 2 variability selection criterion. For periods of days to tens of days, many sources follow a scaling with τ ∝ P2, which is surprising since we expect τ ∝ P by dimensional analysis.

Figure 12.

Figure 12. Relation between Fourier periods P and timescales τ. These are not fundamental and depend on the light-curve sampling, particularly on the mean epoch spacing. Except for the bottom-left panel, the panels show objects with a high probability of being periodic (log10(pperiodic) < −3) and non-periodic quasar candidates (blue dots, here selected with log10(pperiodic)> − 3). Many periodic variable objects show a clear correlation of the characteristic timescale τ with period P. These are the Cepheids, OSARGs, and long period variables. Quasars clearly show no such dependence. The bifurcation in period for periodic objects at τ ⩾ 20 days (top panels) is probably due to the multi-mode pulsations of OSARGs and LSPs. The solid line corresponds to a τ-period relation described by log10(τ) = 2log10(P) − 1.8. The objects in the bottom-left panel are previously unrecognized or overlooked non-periodic objects (black, log10(pperiodic)> − 3 or P>200 days) and periodic variables (green, log10(pperiodic) < −3 and P < 200 days). Spikes at 1, 1/2, 1/3, and 1/4 days are due to aliasing problems in the periodograms that are not fully excluded by the narrow period notches we use to suppress most diurnal aliasing problems.

Standard image High-resolution image

Our one hypothesis for this scaling is that it can be produced if the exponential process model is dominated by how well it fits the peak of the correlation function. Near the peak, the auto-correlation function of a sine wave is proportional to Δt2/P2 + const, while that of the exponential covariance matrix is Δt/τ + const, potentially driving the τ ∝ P2 scaling. There is a regime where the longer period Cep have τ ∝ P, and some of the multiperiodic sources (OSARGs) appear in multiple groupings. Monte Carlo experiments fitting sine wave light curves show that the relation between τ and P is strongly affected by the temporal sampling, with the τ ∝ P2 region shifting to longer periods as the mean epoch spacing increases. Despite some effort we could not derive an analytical model for the scaling. It is clear, however, that the relationship between τ and P depends on sampling rather than being universal.

6. VARIABILITY SELECTION OF QUASARS

We can now combine the results from Sections 4 and 5 to define selection criteria for isolating quasars from other variable sources. Presently this is limited to searching for candidate quasars in the 1.3 deg2 (0.65 deg2) covered by the OGLE-II inner (outer) fields in order to explore how well we can find quasars while avoiding contamination. We limit the analysis to sources with I < 19.5 mag based on the completeness estimates in Section 4. There should be very few quasars in this region, so this is really a test of contamination levels in a field with very high densities of stellar variables. Based on Richards et al. (2006) we expect approximately 1, 6, and 20 quasars per square degree with I < 17.5, 18.5, and 19.5 mag respectively, which agrees well with the statistics of the mid-IR-selected quasars (Kozłowski & Kochanek 2009). We will simultaneously apply the selection method to the broader sample of OGLE-II and OGLE-III quasar light curves to examine their effects on completeness. We select quasars in four steps, starting with two that will be generic to any set of light curves from any location (Cut 1—non-periodicity, Cut 2—variability properties of quasars), and then applying those that depend on the specific field (Cut 3) and the availability of light curves in multiple bands (Cut 4). We will not apply a color selection criterion, since one goal of any wide application of this method would be to find quasars at redshifts where their colors are crossing the stellar color distribution near z ∼ 2.6 (Richards et al. 2006).

Cut 1. We start by eliminating periodic sources. A periodic source is defined to have a maximum likelihood period P < 200 days with ln(pperiodic)> − 3. This removes 76% of the sources, including 99% of RRLyr, 100% of Cep, 91% of ELLs, and 100% of ECLs. It does not remove 20% of OSARGs, 69% of LSPs and 36% of LPVs.

Cut 2. Next we isolate sources with the variability properties of quasars, defined by the region in $\hat{\sigma }{-}\tau$ space bounded by $-2\log _{10}(\hat{\sigma })+0.3<\log _{10}(\tau)<-2\log _{10}(\hat{\sigma })+1.6$, $\log _{10}(\hat{\sigma })>-1.1$ and τ>2 days. This quasar region is shown in gray in Figures 4567, and 9. The lower limits are designed to remove most Cep, RRLyr, OSARGs, ELL, LSPs, SPB, β Cephei stars, AG and ASG, and YSO-(AB) objects. The upper limits remove many of the LPVs and LSPs. This criterion will still leave significant contamination from several of the variable star populations along the left (low $\hat{\sigma }$) and bottom edges. These are blue variables, AG and ASG and RRLyr. This cut removes 97% of ELLs, 92% of OSARGs, 45% of LSPs, and 41% of LPVs passing Cut 1.

Cut 3. The third cut takes advantage of the fact that almost all the contaminating sources are at a common distance and largely consist of relatively bright giant stars. It is defined by a wedge in $\hat{\sigma }$-magnitude space that isolates the OGLE-III quasars. We first require quasar candidates to be fainter than I = 16 mag since such bright quasars are very rare, while the OSARGs, classical Cep, ECL, ELL, and LPVs are all brighter than I = 16 mag given the distance to the LMC. A second limit, $I > 3 \log _{10}(\hat{\sigma }) + 18.5$, separates the RRLyr, dCep, ECLs, some ELLs, SPBs, β Cep, AGs and ASGs, which are to the right of the quasar region. As in Cut 1, we use the same lower limit on the scaled amplitude of $\log _{10}(\hat{\sigma }) >-1.1$, which partially eliminates Be stars. This cut removes 100% of the ELLs, LPVs, LSPs, and OSARGs passing Cut 2. Cut 3 would need to be adjusted for other background environments, such as the Galactic bulge, where similar variable stars are ∼4 mag brighter than in the LMC (depending on the amount of extinction).

Cut 4. For the objects with well-determined V-band amplitudes $\hat{\sigma }_V$ we can also separate quasars using the ratio of the V- and I-band amplitudes. We define an area in $(\hat{\sigma }_V/\hat{\sigma }_I)-\hat{\sigma }_I$ space with $-1.1<\log _{10}(\hat{\sigma }_I)<0.6$ and $\log _{10}(\hat{\sigma }_V/\hat{\sigma }_I)<-0.2\log _{10}(\hat{\sigma }_I)+0.2$, where we select objects with a smaller ratio of the V- to I-band variability amplitudes than is typical for stars (Figure 11, gray area). This cut removes most Cep and RRLyr and a substantial fraction of other common variable stars.

The effects of the selection functions on both the LMC variable population and these quasar samples are summarized in Tables 1 and 2 both for their effects when applied sequentially (Table 1) and individually (Table 2). We also summarize the effects on the sub-categories of the mid-IR-selected samples, where we expect to eliminate more of the heavily contaminated QSO-B and YSO classes than the relatively high purity QSO-A class. Recall from Section 4 that we lose a large fraction of the quasars from the two requirements for being considered a variable because most quasars are relatively faint compared to the depths of the OGLE data.

The first test of a selection method is its completeness. We evaluate this based on the ∼2700 (∼300) mid-IR-selected quasar candidates from Kozłowski & Kochanek (2009) with OGLE-III (OGLE-II) light curves as well as the smaller Kelly et al. (2009) sample. The biggest problem for variability-selected quasars from these samples is that most quasars will be faint. While we can identify 2716 (310) of our mid-IR quasars in the OGLE-III (OGLE-II) data, only 1139 (215) are bright enough (I < 19.5 mag) to plausibly detect quasar variability (see Figure 3). For the full OGLE-III (OGLE-II) samples, 984 (58) pass the basic variability criterion of Section 2, while 635 (58) of the I < 19.5 mag candidates pass this criterion. Adding the noise criterion, ln Lbest>ln Lnoise + 2, we find 619 (848) for OGLE-III and 55 (55) for OGLE-II, brighter than I < 19.5 mag (no magnitude limit). The difference in the fraction of variable objects in the two phases of the OGLE survey is explained by the added depth (0.3 mag) and duration (four versus eight years) of the OGLE-III survey. By design, the remaining cuts have little effect on the quasar candidates, although we lose a larger fraction of the non-QSO-A sub-samples, as we would expect given their higher levels of contamination. After applying the first three cuts to the variable objects, we are left with 65% of the QSO-A sample. We can apply only Cuts 1 and 2 to the Kelly et al. (2009) sample, and 65% of the sources pass the cuts. Cut 4 removes an additional 7% of the OGLE-III QSO-Aa objects. Finally, all four cuts are met by 61% (63%) of the OGLE-III QSO-Aa quasar candidates with no faint magnitude limit (within 16 mag < I < 19.5 mag) and 17% of the OGLE-II candidates. Thus, following this approach can yield variability-selected samples of quasars with moderate completeness.

Next we examine how well we reject other variable sources, focusing on our analysis of the central region of the LMC. We only discuss the sample including masks and isolation criteria (see Section 2), but report the results for the full, unpruned sample as well in Tables 1 and 2. The selection cuts are very effective at reducing the numbers of candidate sources, going from ∼10, 000 (∼600) sources passing the variability criterion in the inner (outer) fields to only 58 (2) candidate quasars. In the outer fields with fewer stars, the contamination is remarkably low.

Of the 58 remaining OGLE-II candidates in the inner fields, 3 are also mid-IR-selected quasars. There are 63 (41 QSO-A) OGLE-II sources in the same fields, 24 (12) passing the two initial variability criteria, showing that the biggest problem for quasars is recognizing variability given the depth of the data. The mid-IR quasar selection method is not perfect, in particular it tends to miss lower redshift quasars with significant mid-IR emission from the host galaxy (Gorjian et al. 2008; Assef et al. 2009), so some of the remaining, unidentified candidates may be real quasars. For the Richards et al. (2006) quasar number counts, we expect of order 1, 8, and 25 quasars brighter than I < 17.5, 18.5, and 19.5 mag in this 1.3 deg2 region, compared to 6, 16, and 41 QSO-Aa candidates in OGLE-II. We examined the light curves of all 58 objects that passed the four criteria. On visual inspection, the majority seem to be created by photometric problems in the wings of other stars, LPV stars just above the RCG, stars in the wings of high proper motion stars (see Eyer & Woźniak 2001), blue and Be stars, eruptive variables, novae/supernovae, nova-like variable, unidentified types of variable stars, and quasar candidates. Of the non-mid-IR sources, only one seems likely to be a quasar, the OGLE-II (OGLE-III) object LMC_SC5 253536 (LMC 167.2 36755, at J2000 05:24:24.17–69:55:56.9). It has a brightness of I = 18.53 (18.37) mag in OGLE-II (OGLE-III). Its color of (VI) = 1.38 (1.06) mag is somewhat redder than expected for z < 3 quasars. We show both the OGLE-II and OGLE-III light curves in Figure 13.

Figure 13.

Figure 13. OGLE-II and OGLE-III light curve of the new variability-selected quasar candidate LMC_SC5 253536. The variability behavior of the light curve closely resembles that of quasars, and it has met the basic variability criteria of Section 2 and all four cuts from Section 6. The solid line represents the best-fit model. As in Figure 1, the area between the dotted lines represent the 1σ range of possible stochastic models. We also give the process parameters and goodness of fit based on either the OGLE-II data or the combined data. The error bars on the model parameters are Δlog10(τ) = 0.49 and $\Delta \log _{10}(\hat{\sigma })=0.089$. The vertical dashed line separates two phases of the OGLE survey.

Standard image High-resolution image

In the outer OGLE-II fields we are left with only 2 candidates. On inspection we find no plausible QSO candidates. None of the OGLE-II light curves of the 43 mid-IR-selected candidates (29 QSO-Aa) from the outer fields passes all the criteria. In fact, only two QSO-Aa and two YSO-Aa objects with OGLE-II light curves pass the two initial variability criteria.

7. SUMMARY

Periodograms to recognize periodically varying sources and begin their classification are a standard astronomical tool. Here we introduce an equally simple approach to classifying continuously varying sources that are not periodic and explored the properties of large samples of variable sources in the LMC and SMC using data from the OGLE-II and OGLE-III microlensing projects. We modeled the sources as a damped random walk, a stochastic process with an exponential covariance matrix characterized by a scaled amplitude $\hat{\sigma }$, a timescale τ, and an asymptotic variance on long timescales of $\sigma = \hat{\sigma }(\tau /2)^{1/2}$. This model was introduced by Kelly et al. (2009), who used a forecasting approach to estimate the parameters that is not as statistically powerful as the PRH method used here. Extracting the damped random walk parameters is no more complex than obtaining a periodogram, so it can be easily used to characterize any light curve.

We first applied this method to a much larger sample of quasars than Kelly et al. (2009) by using the mid-IR-selected quasars from Kozłowski & Kochanek (2009) that also have OGLE-II or OGLE-III light curves. We have verified the suggestion by Kelly et al. (2009) that quasar light curves are well modeled by the damped random walk process. Moreover, we find that quasars occupy a well-defined region of the parameter space of the model, allowing us to design a simple quantitative method for the variability selection of quasars. The details of the approach will likely require some modifications as larger samples of quasars are analyzed and the method is applied in regions with different sources of stellar contamination, but the broad outline should survive. We also note that the model allows quantitative estimation of completeness in variability-selected quasar samples and can estimate the detection ranges needed for computing luminosity functions or other population statistics.

While Kelly et al. (2009) are certainly correct that a damped random walk corresponding to an exponential covariance matrix models quasar variability well, it is less clear that the correlations between the model parameters and the luminosity or black hole masses of quasars are robust. Since we lack redshifts for our quasar sample we cannot explore this in detail, but we note two potential systematic problems in the Kelly et al. (2009) analysis. First, the distribution of the Kelly et al. (2009) quasar sample may be a biased sampling of the quasar distribution. It appears to be biased toward sources with long time variability timescales compared to the mid-IR quasar sample. Second, most of the leverage for determining the slopes in the relations between τ and $\hat{\sigma }$ and the quasar mass and luminosity in Kelly et al. (2009) come from the relatively low black hole mass and luminosity reverberation mapping quasars. Host-galaxy contamination is a major issue for these systems (Bentz et al. 2009), and we find that this also holds for estimates of the variability process parameters because light curves in magnitudes are nonlinearly distorted by contamination. Thus, the slopes of the Kelly et al. (2009) correlations may strongly depend on the treatment of these hosts. Just as for the reverberation mapping studies, we should really measure the variability parameters of the quasar with no diluting contribution from the host galaxy, but this is virtually impossible to do for large samples. This problem could be avoided by modeling the time variable flux rather than modeling the time varying magnitudes, but the meaning of $\hat{\sigma }$ then becomes strongly distance dependent and fractional variability, as represented by variability in magnitudes, probably has greater physical meaning. These issues are not a concern for simply classifying light curves.

We developed a set of selection cuts to try to identify quasars based on their variability, and 63% (17%) of the variable quasars in the QSO-A sources in the OGLE-III (OGLE-II) surveys pass the selection criteria. In a lower stellar density field this can be improved by using more liberal selection criteria. If we apply the same cuts to the 10,000 (600) OGLE-II variable sources in these high density inner (lower density outer) fields, only 58 (2) candidates survive, of which 3 (0) are mid-IR-selected quasar candidates. After visual inspection, only 1 (0) of the other sources is a quasar candidate. Most of the rest are artifacts that could be avoided by better pruning the input catalogs. The remaining sources are blue and eruptive variables, variable giants and supergiants. While 85%–93% contamination (depending on how we count remaining artifacts) seems poor, it was achieved when searching for rare extragalactic sources in an extremely high density stellar field (∼106 stars and ∼8000 variable stars/deg2 compared to ∼20 quasars/deg2 with 16 mag ⩽ I ⩽ 19.5 mag), and the bulk of the false positives were easy to recognize by visual inspection. If we add a color cut for z < 3 quasars (0.4 mag ⩽ (VI) ⩽ 1.0 mag) we can also greatly reduce the residual contamination. In the outer LMC fields (∼105 stars and ∼103 variable stars/deg2) or an extragalactic field (>103 stars and a small number of variable stars/deg2) there will be little difficulty with contamination. Moreover, it is clear that the OGLE-II data are too shallow to search for quasars in small areas compared to OGLE-III.

Clearly the next steps are to apply this approach systematically. Here we carried out a limited analysis of OGLE-II and OGLE-III data pending the general availability of the full sample of OGLE-III light curves. We are also in the process of obtaining redshifts for many of the mid-IR quasar candidates. An obvious next step is to characterize the entire OGLE-III survey. In C. L. MacLeod et al. (2010, in preparation), we analyze the nearly 9000 known quasars with light curves from the SDSS to explore the correlations found by Kelly et al. (2009). These SDSS data can also be used to explore the problems of background sources and their classification for typical extragalactic fields rather than the center of the LMC.

We thank Brandon Kelly for answering questions and generously supplying both his original IDL code and the light curves used in his study. We also thank George Rybicki for consulting on aspects of the fast implementation of his method. We thank R. J. Assef for providing us with the colors of quasars at various redshifts, and the anonymous referee, whose comments helped us to improve the manuscript. This work has been supported by NSF grant AST-0708082. The OGLE project is partly supported by the Polish MNiSW grant N20303032/4275.

APPENDIX: THE PRH METHOD

The basic idea (Press et al. 1992a; Rybicki & Press 1992, 1994) is to suppose that we have data y that is due to an underlying, true, signal s, measurement noise at each point n, and a general trend defined by a response matrix L and a set of linear coefficients q, thus, y = s + n + Lq. For example, we will use the linear coefficients to optimally remove the light curve mean, so we have one linear coefficient q1 for the mean, and the response matrix is simply a column vector Li1 = 1 with an entry for each data point, i = 1 ⋅ ⋅ ⋅ N.11 The intrinsic variability has covariance matrix S = 〈ss〉 and the noise has covariance matrix N = 〈nn〉. We will be explicitly relying on Gaussian statistics in order to add normalizing pre-factors to the squared difference statistics focused on by PRH. By definition, we know that

Equation (A1)

Thus, the probability of the data given the linear coefficients q, the intrinsic light curve s, and any other parameters of the model p (e.g. τ and $\hat{\sigma }$) is

Equation (A2)

After evaluating the Dirac delta function, we complete the squares in the exponential with respect to both the unknown intrinsic source variability s and the linear coefficients q. This exercise determines our best estimate for the mean light curve,

Equation (A3)

and the linear coefficients,

Equation (A4)

where C = S + N is the overall covariance matrix of the data and Cq = (LTC−1L)−1. With these definitions we can factor the argument of the exponential into

Equation (A5)

where

Equation (A6)

is the component of C that is orthogonal to the fitted linear functions, the variances in the linear parameters are

Equation (A7)

${\bDelta s}={\bf s}-{\bf \hat{s}}$ and ${\bDelta q}={\bf q}-{\bf \hat{q}}$. We can marginalize the probability over the light curve s and the linear parameters q under the assumption of uniform priors for these variables to find that

Equation (A8)

where for the exponential model the remaining parameters p are τ and $\hat{\sigma }$. This is the likelihood we optimize to determine τ and $\hat{\sigma }$. We also know the variances in the estimate for the mean light curve

Equation (A9)

and the variance between the data and the estimated light curve is

Equation (A10)

Our only addition here compared to PRH is keeping track of the normalizing prefactor of the exponential.

The relation to Kelly et al. (2009) comes from using this formalism to predict the value of the time series at an unmeasured time. The simplest means of doing so is simply to pad the data vector yd with additional fake points yf that have infinite measurement uncertainties in the sense that N−1 → 0 for these points. For simplicity we consider the case without additional linear parameters. We partition the signal and noise matrices as Sdd, Sff, Sdf, and Sdf and N−1dd, N−1ff = 0, N−1df = 0, and N−1df = 0 for the data–data, fake–fake, data–fake, and fake–data blocks of the matrices. Substituting into Equation (A3), we find that the estimate of the true light curve for the measured data points is the result we would obtain without having padded the data vector, and that the estimate of the light curve at the additional points is

Equation (A11)

with variance relative to the mean light curve of

Equation (A12)

These two expressions define the mean light curves and the "error snakes" shown in Figures 1 and 13. Mathematically, the mean light curve is the weighted average of all process light curves described by parameters p that are statistically consistent with the data, and the variance is the scatter of these light curves about this mean.

We can use these results to "forecast" the light curve by noting that we can express the expected value at epoch i + 1 in terms of the measurement at i and the forecast for i based on the i − 1 earlier data points. In component language,

Equation (A13)

where j, k = 1 ⋅ ⋅ ⋅ i − 1, and

Equation (A14)

where j, k = 1 ⋅ ⋅ ⋅ i. For an exponential covariance matrix, Si+1j = αSij where α = exp(−|ti+1ti|/τ). If you partition the S + N matrix of Equation (A14) into the i − 1 × i − 1 part used to estimate $\hat{s}_i$ and the remaining column vectors and scalar and then use the standard method for inverting a partitioned matrix you find that

Equation (A15)

where Γi = ST(S +  N)−1S and σi is the measurement error for yi. To complete the identification with Kelly et al. (2009), we note that the variance between the data point yi and the prediction $\hat{s}_i$ is

Equation (A16)

so Kelly et al. (2009)'s Ωi = Cii − Γi. Note that Equations (A10) and (A16) differ because $\hat{s}_i$ in Equation (A13) does not depend on the data point yi, but it does depend on that data point in Equation (A10). Making this substitution, we recover the forecasting expression from Kelly et al. (2009) where

Equation (A17)

Similarly one can work out the variance in $\hat{s}_{i+1}$ to find that

Equation (A18)

which then gives the second of the forecasting equations in Kelly et al. (2009),

Equation (A19)

where Ω0 = Cii is the diagonal element of the covariance matrix. Kelly et al. (2009) initialized their predicted time series at the process mean, while the PRH approach would initialize it to the first measurement. For this forecasting approach, the likelihood of the data given any parameters is then

Equation (A20)

instead of Equation (A8).

Having demonstrated that the Kelly et al. (2009) forecasting method can be derived from the PRH approach, we next address the relative virtues of the two methods. An apparent advantage of the forecasting approach is that it is computationally O(Ndata) to compute Equation (A20), while at first glance it is O(N3data) to compute Equation (A8) because of the matrix inversion. This is not the case for the exponential correlation function (Equation (1)) because its inverse is tridiagonal (Rybicki & Press 1994). If we rewrite the matrix (S + N)−1 in Equation (A20) as S−1(S−1 + N−1)N−1, then for a diagonal (or even tridiagonal) noise matrix N, all the matrix calculations, including the determinant, in Equation (A8), can be carried out in O(Ndata) operations. The total number of operations and the complexity of the implementation is somewhat higher if one includes the steps needed to automatically marginalize over the light curve mean using the linear parameters, but this then avoids having it as a parameter which must be optimized as part of the fit. Using the fast method from Rybicki & Press (1994) requires the data to be time ordered and non-overlapping (t1 < t2 < t3 ⋅ ⋅ ⋅ <tN) since S is singular if any titj for i < j—data points at identical (or nearly identical, in the sense that |titj|/τ ≪ 1) times should be combined before applying the analysis, and some care is required to stably compute the determinants.

Balancing the computational complexity is that the PRH method is more statistically powerful because it uses all the information in the light curve simultaneously. In essence, for parameter estimation we should use all available data to predict $\hat{s}_{i+1}$ not just the preceding data. We can demonstrate this by generating Monte Carlo light curves using the exponential process, estimating the parameters using both methods and examining the relative scatters in the recovered values. We use priors of P(τ) = 1/τ and $P(\hat{\sigma })=1/\hat{\sigma }$, as logarithmic priors are generally standard for positive definite but otherwise scale free variables. This differs from Kelly et al. (2009), who assumed a uniform prior for $\hat{\sigma }$ and α = exp(−Δt/τ) where Δt is the typical spacing of the data points.

We took the same set of 109 quasars that Kelly et al. (2009) considered and fitted them with both approaches, finding results in reasonable agreement. We then took the parameters of these best fits and generated Monte Carlo realizations of the light curves (Equation (3)) with the same time sampling and measurement uncertainties of the original light curves. We then re-fitted the Monte Carlo light curves using both methods and examined the differences between the input and output parameters. The results, as shown in Figure 14, unambiguously show that the PRH method produces superior results, as would be expected from using all rather than only part of the correlation information in the data.

Figure 14.

Figure 14. Monte Carlo comparisons of parameter estimation using the PRH and forecasting methods. For each of the 109 quasars in Kelly et al. (2009) we generated four Monte Carlo realizations of each light curve using the either the best forecasting solution (open points) or PRH solution (dots) and then refit all eight of these light-curve realizations using the forecasting (right) or PRH (left) method. The greater statistical power of the PRH approach is obvious.

Standard image High-resolution image

Footnotes

Please wait… references are loading.
10.1088/0004-637X/708/2/927