CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

Giri Gopalan; Saeqa Dil Vrtilek; Luke Bornn

doi:10.1088/0004-637X/809/1/40

1. INTRODUCTION

As our ability to acquire and archive data in all fields rapidly grows, the tools for searching these data for pattern, order, and, ultimately, meaning, need to grow commensurately. A critical issue in this ongoing paradigm shift is that of multivariate data with complex, hidden geometric structure. Color–color, or CC, diagrams (which provide spectral information over different energy ranges) and color-intensity, or CI, diagrams (which show brightness variations for a given color) are common, easily obtained measurements that have long been used to classify X-ray binary types. White & Marshall (1984) plotted all of the X-ray binaries (XRBs) observed by the HEAO-1 satellite on one CC plot; they found that systems containing black holes clustered in one corner of their diagram and pulsars clustered in an opposing corner. While they found significant overlap of several classes of object in the center, they were able to use this clustering to identify new BHC candidates. In Vrtilek & Boroson (2013; hereafter VB13) we show that when CC and CI are combined into a three-dimensional CCI plot, different types of XRBs separate into complex but geometrically distinct volumes. VB13 model the volumes crudely by computing a centroid and constructing an ellipsoid around the centroid that contains 50% of all points while minimizing the volume of the ellipsoid. We suggest that these diagrams provide an easily used, model-independent way to separate classes of systems, in particular systems containing black holes from those containing neutron stars or systems that can produce jets from those that cannot. As a next step toward understanding the physical mechanisms behind this separation of compact object types, we have developed a probabilistic (Bayesian) model which provides a supervised learning approach: unknown classifications of XRBs are predicted given known classifications. We provide the astronomer with the R code which takes as input CCI data and outputs the estimated probabilities of a system being a black hole, pulsar, or non-pulsing X-ray binary system, in addition to standard errors for these estimates. This software provides the astronomer with more information than an off-the-shelf machine learning solution because such a method typically produces point estimates for the classes of the observations, as opposed to an entire distribution for the classes of the observations.

In Section 2, we describe the data used. In Section 3, we specify the models we have used for estimating the probabilities that the compact object type of an X-ray binary system is a black hole, non-pulsing neutron star, or pulsar. In Section 4, we present our results and their implications and we conclude with a summary and future directions for this work in Section 5.

2. DATA

Data on XRBs were obtained by the All Sky Monitor (ASM; Levine et al. 1996) on board NASA's Rossi X-ray Timing Explorer (RXTE) which operated continuously for nearly 15 years. Due to as-yet uncalibrated gain changes in the instrument over the last two years of its life, we only use data obtained within the first 13 years. The MIT ASM team provides data in three energy bins (1.3–3.0; 3.0–5.0; 5.0–12.0 keV.) sampled 4–8 times a day. We take one day averages and define our colors as ratios of the mid- to low-energy range (C1) and high- to low-energy range (C2). The sum of the three energy bands is used to represent the intensity of the source; this value is normalized by dividing the total counts by the average of the top 1% of the data for any given source. These form a three-tuple consisting of the features, or background covariates in statistical terms, for each of the observations. Note that since these features are defined in terms of ratios of counts, they are unitless. We also restrict ourselves to detections that have a signal to noise of at least five, where signal to noise is defined as the ratio of the number of counts in a particular bin to the error on the number of counts. Figure 1 shows an example of a CCI diagram constructed with three types of XRBs.

**Figure 1.** Visualization of *RXTE* ASM data for 24 XRBs over 13 years including only the $5\sigma$ threshold values. Each individual point is the one day average of the CCI data from one of the 24 systems. Black points are observations from black hole XRB systems, red points are observations from non-pulsing neutron star XRB systems, and blue points are observations from pulsing neutron star XRB systems. The general pattern is that observations from different system types separate geometrically in this CCI coordinate system. (Note that since CCI coordinates are defined in terms of ratios of counts, they are unitless.)
Download figure:
Standard image High-resolution image

$5\sigma $ — **Figure 1.** Visualization of *RXTE* ASM data for 24 XRBs over 13 years including only the $5\sigma$ threshold values. Each individual point is the one day average of the CCI data from one of the 24 systems. Black points are observations from black hole XRB systems, red points are observations from non-pulsing neutron star XRB systems, and blue points are observations from pulsing neutron star XRB systems. The general pattern is that observations from different system types separate geometrically in this CCI coordinate system. (Note that since CCI coordinates are defined in terms of ratios of counts, they are unitless.)
Download figure:
Standard image High-resolution image

The classificaton of XRBs is not simple and different authors tend to use different criteria. For our training set, we used 24 systems whose classifications are robust, that is, they are consistent for numerous authors. In particular, we first considered the classifications from the catalogs of Liu et al. (2001, 2006). We then used Remillard & McClintock (2006) identifications of confirmed black hole systems, Homan et al. (2010) identifications of Z and Atoll sources, and Bildsten et al. (1997) identifications of accreting pulsars. We excluded any of the above types that were also identified as bursters as these vary from author to author. This left us with nine systems containing confirmed black holes (Cyg X−1, LMC X−1, J1118+480, J1550−564, J1650−500, J1655−40, GX 339−4, J1859+226, GRS 1915+105), nine confirmed pulsars (J0352+309, J1901+03, J1947+300, J2030+375, J1538−522, Cen X−3, Her X−1, SMC X−1, Vela X−1), and six non-pulsing neutron star systems (Sco X−1, Cyg X−2, GX 17+2, GX 349+2, GX 9+1, GX 9+9).

We test our model by predicting the compact object type of three groups of systems. The first contains six systems whose type is unambiguously classified: one confirmed black hole (LMC X−3); one non-pulsing neutron star system (GX 5−1); and four pulsing neutron star systems (1744−28, 0656−072, 0535+262, 0115+634). The second group of systems contain stars that are classified as both burster and atoll sources (Ser X−1, Aql X−1, 1916−053, 1608−522, 1254−69, and 0614+091), and hence possibly have ambiguous classifications. The third group of systems are sources that are either unclassified or have multiple classifications across various authors (1900−245, GX 3+1,1701−462, 1636−53, and 1700−37).

The training data set consists of 40,857 observations after the preprocessing steps delineated above, of which 13098 come from black hole systems, 25,366 come from non-pulsing neutron star systems, and 2393 come from pulsing star systems. The imbalance in observations between different compact star types is due to the fact that brighter sources are more likely to be detected than weak sources, yet the brightness of the sources is not uniform among star systems. See Figure 2 comparing brightness for various systems.

**Figure 2.** Visualization of intensities for observations by each of the 24 systems within the training data set illustrating substantial variability. Systems 1–9 are black hole systems, 10–15 are non-pulsing neutron star systems, and 16–24 are pulsing neutron star systems. This may explain the wide variability for the number of observations of each system above the signal-to-noise threshold, since fainter measurements tend to be noisier.
Download figure:
Standard image High-resolution image

Because observations that are less bright are inherently more likely to be below the signal-to-noise threshold which we utilized in preprocessing, the data are not missing at random (Little & Rubin 2002). The imbalance in observation types by system is problematic because visual inspection of systems of the same compact object type indicates substantial variability between systems; hence, it is prudent to ensure that the true variation of CCI values for each compact object is accurately reflected in the training set. For instance, see Figure 3 for a visualization of the variance between systems that are black holes. Additionally, the model we employ for generating multiple imputations, discussed in the next section, involves a Gaussian process, and hence can be quite computationally expensive to work with, generally scaling with computations that take $O({N}^{3})$ time, where N is the number of data points in the data set. One solution to mitigate both of these issues is to subsample the training set where the probability that a particular observation is selected for inclusion in the smaller training set is inversely proportional to the total number of observations of its system in the entire training set. We sample 10% of the training data in this manner, without replacement, to achieve a final training set consisting of 4085 observations, of which 1486 come from black hole systems, 1465 come from non-pulsing neutron star systems, and 1134 come from pulsing neutron star systems. The histograms in Figure 4 show the balance between various systems before and after subsampling.

**Figure 3.** Illustration of the wide variability in CCI data between different systems that contain black holes, where each color represents data from a different system.
Download figure:
Standard image High-resolution image

**Figure 4.** Comparison of the distribution of the number of observations by system in the training set before and after subsampling.
Download figure:
Standard image High-resolution image

3. MODELS AND ALGORITHM

Our approach estimates the probabilities that the compact object type of an XRB is a black hole, non-pulsing neutron star, or pulsar from posterior predictions of the compact object type associated with CCI observations within the system. This approach is similar to the multiple imputations methodology developed in the context of survey analysis where the proper handling of missing data is crucial for obtaining valid statistical inferences, which is discussed in further detail by Rubin (1996). In summary, the multiple draws from the predictive distribution of compact object type allow the astronomer to make a judgement about what the compact object type of a given system is, and provide more information than a point prediction. The salient property of this approach is that it takes into account the inherent uncertainty in the prediction for each individual observation, and a concrete illustration of the output of this methodology is displayed in the Results section. For more on applied Bayesian data analysis, see Gelman et al. (2013).

The astronomer's probability model for the compact object of observations from a particular system is a trinomial model where the relevant estimands are the probabilities that the compact object type is a black hole, non-pulsing neutron star, or pulsar. The astronomer's objective is to estimate these probabilities given the compact object type for all observations within the system. Note that by employing a model which is not a constant, the astronomer is implicitly modeling "imputer noise": in principle, the compact object type of a system should be constant for all observations from that system, and so the probability model employed by the astronomer is not physically correct, but instead a pragmatic solution to allow for the (inevitable) mistakes made by an imputer. Indeed, it would be unrealistic to assume any probabilistic model to be accurate all of the time.

Next, we describe the probabilistic (Bayesian) model used to generate predictions for the compact object type of CCI observations. We denote the training set as the 2-tuple $({X}_{\mathrm{train}},{Y}_{\mathrm{train}})$ , where X_train is an N_train by three matrix consisting of the three CCI values of the training points, and Y_train is a length N_train vector with the labels 1, 2, or 3 corresponding to the compact object type of the system each individual data point comes from: more precisely, 1 represents a black hole system, 2 represents a non-pulsing neutron star system, and 3 represents a pulsing system. The test set is denoted by X_pred, which is an N_pred by three matrix that contains the CCI values of observations from a single system for which we would like to predict the compact object type, and we write Y_pred to indicate the labels for the observations from the system. The model aims to predict the unknown vector Y_pred, and from this estimate the probability of the compact object type being a black hole, pulsar or non-pulsar. From this point of view, Y_pred is the inferential object of interest and the remaining parameters discussed are nuisance parameters, meaning that they exist for the mathematics of the probabilistic model employed but are not of ultimate inferential interest.

We introduce three independent latent variables for each compact object type (black hole, non-pulsar, and pulsar) whose marginal distribution is a Gaussian process with mean 0 and covariance matrix Σ which has a squared exponential kernel, a standard choice in the computer experiments and machine learning literature, which is discussed in detail by Rasmussen & Williams (2005), i.e.,

$\begin{eqnarray}&&{{\rm{\Sigma }}}_{{ij}}={\sigma }^{2}\mathrm{exp}(-| | {X}_{i,.}-{X}_{j,.}| {| }_{2}^{2}/\varphi )\end{eqnarray} \tag{ 1 }$

where ${X}_{i,.}$ denotes the i_th row of the data matrix X for which the first N_train rows are the rows of X_train and the rows from ${N}_{\mathrm{train}}+1$ to ${N}_{\mathrm{train}}+{N}_{\mathrm{pred}}$ contain the rows of X_pred. Each latent variable is tied to the compact object type through a multinomial logistic link function: the probability of a particular observation k to be of a class l is proportional to $\mathrm{exp}[{\alpha }_{l}+{\beta }_{l}{Z}_{{kl}}]$ , where ${\alpha }_{l}$ and ${\beta }_{l}$ have independent unit normal marginal distributions, and ${Z}_{.,l}$ is the latent variable drawn corresponding to type l. Note that ${Z}_{.,l}\in {{\mathbb{R}}}^{{N}_{\mathrm{train}}+{N}_{\mathrm{pred}}}$ , where the first N_train elements correspond to the training points, ${Z}_{.,t}$ for short, and the remaining elements correspond to the prediction test points, ${Z}_{.,p}$ for short. The parameter ${\alpha }_{l}$ is the mean offset for the latent Gaussian process corresponding to the compact object type l, and the parameter ${\beta }_{l}$ is indicative of the marginal effect that each latent variable has on the probability of a data point to be of type l. It is important to remember that since we are interested in predicting the vector of compact object types Y_pred, these additional parameters introduced ( ${\alpha }_{l}$ and ${\beta }_{l}$ ) into our model are ultimately marginalized out (i.e., forgotten) as nuisance parameters. The likelihood of the observed labels given the latent variables and remaining nuisance parameters takes a multinomial form:

$\begin{eqnarray}L(\alpha ,\beta ,{Z}_{t};{Y}_{\mathrm{train}}) & = & \displaystyle \prod _{k=1}^{{N}_{\mathrm{train}}}\mathrm{exp}[{\alpha }_{{Y}_{{{train}}_{k}}}\\ & & +{\beta }_{{Y}_{{\mathrm{train}}_{k}}}{Z}_{{t}_{i,{Y}_{{\mathrm{train}}_{k}}}}]\;/{N}_{k}\end{eqnarray} \tag{ 2 }$

where the normalizing constant N_k is given by

$\begin{eqnarray}&&{N}_{k}=\displaystyle \sum _{l=1}^{3}\mathrm{exp}[{\alpha }_{l}+{\beta }_{l}{Z}_{{t}_{k,l}}].\end{eqnarray} \tag{ 3 }$

For notational brevity, we denote the posterior distribution of the vector Y_pred, ${p}^{*}({Y}_{\mathrm{pred}}| {Y}_{\mathrm{train}},{X}_{\mathrm{train}},{X}_{\mathrm{pred}})$ as ${p}^{*}({Y}_{\mathrm{pred}})$ . Then, by the definition of conditional probability and marginalization, we have the following integral representation for ${p}^{*}({Y}_{\mathrm{pred}})$ :

$\begin{eqnarray}{p}^{*}({Y}_{\mathrm{pred}}) & = & {\displaystyle \int }_{{Z}_{p},{Z}_{t},\alpha ,\beta }p({Y}_{p},{Z}_{p},{Z}_{t},\alpha ,\beta | {Y}_{t},{X}_{t},{X}_{p})\\ & & \times {{dZ}}_{p}{Z}_{t}d\alpha d\beta \end{eqnarray} \tag{ 4 }$

$\begin{eqnarray} & = & {\displaystyle \int }_{{Z}_{p},{Z}_{t},\alpha ,\beta }p({Y}_{p}| {Z}_{p},\alpha ,\beta ,-)p\\ & & \times ({Z}_{p}| {Z}_{t},-)p({Z}_{t},\alpha ,\beta ,| {Y}_{t},{X}_{t},{X}_{p}){{dZ}}_{p}{{dZ}}_{t}d\alpha d\beta .\end{eqnarray} \tag{ 5 }$

Note that we purposely overload the p(.) notation and use the dash symbol to represent "all other variables," for less clutter. This decomposition suggests the iterative algorithm described in the appendix for sampling from the posterior distribution of Y_pred.

4. RESULTS

Here, we present our predictions for the compact object types of those systems in the test set discussed in Section 2 using the model and algorithm discussed in Section 3. The predicted class of an XRB is that which presents the maximum estimated probability, an approach which can be justified from a theoretical point of view because the posterior mode is a Bayes' estimator under a 0–1 loss function, which is a reasonable loss function for a classification problem. The class with the maximum estimated probability is mathematically equivalent to the class with the maximum number of posterior predictive draws for the class labels: i.e., the posterior predictive mode.

Table 1 lists predictions for the six systems with known classifications as well as probability estimates and associated standard errors. Using this scheme, there are no misclassifications for this group of X-ray binary systems. Additionally, in Table 2, we include the predictions and probability estimates for "burster" non-pulsing systems, for which there are a number of wrong predictions: four out of six systems. In all cases, these systems are mistakenly thought to contain black holes. Visual inspection of the data is consistent with this result because the region these systems occupy interferes with the region defined by systems containing black holes: for instance, consider Figure 5 which compares a burster system misclassified as a black hole system with a burster system that is not misclassified, where there appears to be significantly more overlap with the black hole system training data for the misclassified system. It is also possible that some of these systems are misclassified as non-pulsing neutron star systems in the literature, yet significantly more scientific investigation must be performed in order to verify this possibility. Finally, in Table 3, we include predictions and probability estimates for unclassified or ambiguously classified XRB systems, of which notably GX 3+1 has a reasonably high estimated probability of being a non-pulsing neutron star system: 0.7674 with a standard error of 0.0326.

**Figure 5.** Left: example of a burster system in blue, Aql X−1, improperly classified as a black hole system by the algorithm with a comparison to black hole training data in red. Right: example of a burster system in blue, Ser X−1, properly classified as a non-pulsing system by the algorithm with a comparison to the black hole training data in red.
Download figure:
Standard image High-resolution image

Table 1. Probability Estimates and Predictions for Compact Object Type of Previously Classified XRBs

System	$P(\mathrm{BH})$	SE	P (Nonpulsar)	SE	P(Pulsar)	SE	Prediction	Actual Class
GX 5−1	0.0853	0.0519	0.8285	0.0756	0.0863	0.0259	NP	NP
1744−28	0.2626	0.0978	0.1244	0.0570	0.6129	0.1072	P	P
0656−072	0.0782	0.0565	0.0659	0.0472	0.856	0.0842	P	P
0535+262	0.2164	0.0728	0.1312	0.0562	0.6525	0.0883	P	P
0115+634	0.2179	0.0780	0.1425	0.0539	0.6396	0.1057	P	P
LMC X−3	0.8585	0.0648	0.0762	0.0347	0.0654	0.0379	BH	BH

Note. We demonstrate the estimated probabilities and predictions for six XRBs whose classifications have been established in the literature, along with their standard errors. All of these systems are properly classified. The predicted class is that with the maximum estimated probability, which is equivalent to the mode of the posterior predictive draws; see Section 4 for further discussion of the justification of this choice.

Download table as: ASCII Typeset image

Table 2. Probability Estimates and Predictions for Compact Object Type of "Burster" XRBs

System	$P(\mathrm{BH})$	SE	P (Nonpulsar)	SE	P(Pulsar)	SE	Prediction	Actual Class
Ser X−1	0.0441	0.0081	0.9341	0.0133	0.0218	0.0077	NP	NP
Aql X−1	0.5869	0.0692	0.3093	0.0507	0.1038	0.0411	BH	NP
1916−053	0.7683	0.1328	0.1010	0.0810	0.1307	0.0765	BH	NP
1608−522	0.4919	0.0678	0.3535	0.0492	0.1545	0.0310	BH	NP
1254−69	0.2472	0.0231	0.6330	0.0246	0.1200	0.0181	NP	NP
0614+091	0.7062	0.0560	0.1720	0.0441	0.1217	0.0392	BH	NP

Note. We demonstrate the estimated probabilities and predictions for six XRBs classified as "bursters" along with their standard errors. Four of these six "bursters" are improperly classified as black holes while the rest are correctly classified. The predicted class is that with the maximum estimated probability, which is equivalent to the mode of the posterior predictive draws; see Section 4 for further discussion of the justification of this choice.

Download table as: ASCII Typeset image

Table 3. Probability Estimates and Predictions for Compact Object Type of Unclassified Or Ambiguously Classified XRBs

System	$P(\mathrm{BH})$	SE	P (Nonpulsar)	SE	P(Pulsar)	SE	Prediction
1900−245	0.4858	0.0989	0.2334	0.0736	0.2808	0.0846	BH
GX 3+1	0.0757	0.0284	0.7674	0.0326	0.1569	0.0210	NP
1701−462	0.5025	0.1107	0.2179	0.0642	0.2796	0.0698	BH
1700−37	0.1574	0.0650	0.1497	0.0503	0.6926	0.0749	P
1636−53	0.3619	0.0304	0.5411	0.0235	0.0969	0.0263	NP

Note. We demonstrate the estimated probabilities and predictions for five XRBs whose classifications are unknown or ambiguous, along with their standard errors. The predicted class is that with the maximum estimated probability, which is equivalent to the mode of the posterior predictive draws; see Section 4 for further discussion of the justification of this choice.

Download table as: ASCII Typeset image

It is important to note that the entire distribution of posterior predictive draws provides significantly more spatial information than a point estimate for a compact system type. For instance, consider the systems Ser X−1 and Aql X−1, the first of which is a properly classified non-pulsing system, and the second of which is one that is improperly classified as a black hole. From Figure 6, there seems to be no question that Ser X−1 is indeed a non-pulsing neutron star system since the proportion of posterior predictive draws is 0.9341 with a standard error of 0.0133. On the other hand, while Aql X−1 is not properly classified, as is evident in Figure 7, we do see some evidence for a non-pulsar signal from the 0.3093 estimated probability of being a non-pulsing system, with a standard error of 0.0507.

**Figure 6.** Example of a non-pulsing neutron star system Ser X−1 that is properly classified by the classifier. The left panel indicates all of the observations from Ser X−1 and the associated predictions of each observation, with the mode taken to be the prediction. The histogram on the right illustrate the estimated probabilities of Ser X−1 being of each of the three classes.
Download figure:
Standard image High-resolution image

**Figure 7.** Example of a non-pulsing neutron star system Aql X−1 that is improperly classified by the classifier and mistaken for a black hole system. There appears to be some signal for a non-pulsar system, however.
Download figure:
Standard image High-resolution image

5. SUMMARY AND FUTURE DIRECTIONS

The main objective of this work has been to develop a probabilistic model for predicting the compact object type of CCI observations from an XRB and to use the predictions generated from this model to estimate the probabilities that the compact object is a black hole, non-pulsing neutron star, or pulsar. We have shown that the model we developed works reasonably well for this purpose based on the accurate classification of well known XRBs, but note that the model seems to make mistakes when classifying bursters that are close to the boundary between black hole systems and non-pulsars in the CCI coordinate system. This suggests the benefit of further investigation of these systems as well as a refinement of our approach, including the data sampling, models, and algorithms employed. It is also possible that some of these "burster" systems are inappropriately classified, but further scientific investigation is required before such a claim can be vigorously asserted.

In order to improve the predictive accuracy of our classification scheme, we can extend the imputation model by imposing a distribution on the Gaussian process parameters or using a cross validation approach to fine tune these parameters. There is a growing body of literature on Gaussian process prediction which attempts to bypass the associated computational impediments, and so we would like to investigate these methods and their potential application to this problem further, e.g., with the INLA method introduced by Rue et al. (2009). Additionally, the RXTE ASM data contain a large fraction of missing data due to the signal-to-noise threshold we employ, and so we may consider applying the framework of Little & Rubin (2002) to model this missing data. The primary advantages of this approach are that it utilizes a Bayesian model for generating imputations, which is consistent with our model for compact object prediction, and it may lead to more plausible predictions for the unknown compact object types. Additionally, we may want to consider different subsampling schemes besides the one we employ to ensure that they do not corrupt the inherent structure in the data set. Finally, in addition to predictive accuracy, so as to make the model more scientifically relevant, we may want to include physically meaningful parameters; the inference of such parameters may explain the scientific reasons for the separation of observations into different regions by compact object type.

The CCI method, by definition, uses measurements of X-ray intensity and color in two X-ray bands. This information will generally not only reflect on the properties of the source but also on the absorption of the intrinsic spectrum by the interstellar medium. ISM absorption will clearly affect the lowest energy band the most, and thus the soft color. However, at higher column densities, the hard color will be affected as well. We are developing a general method for the correction of CCI plots given the sensitivity curve of an X-ray monitoring telescope and likely models of the spectral shape (B. S. Boroson et al. 2015, in preparation). The eROSITA telescope developed at the Max Planck Institute for Extraterrestrial Physics, due to be launched in 2016, has 20 times the sensitivity of the ROSAT/ASM in the low-energy band and will be particularly beneficial for studying the ISM (Merloni et al. 2012).

We can extend our long-range study using data from past and present large field of view X-ray instruments such as MAXI (Matsouka et al. 2009), the HETE-WXM (Yoshida et al. 1995), and BeppSAX-WFC (Boella et al. 1997). Current and planned X-ray telescopes with high sensitivity, such as Chandra (Weisskopf et al. 2002), XMM (Mason et al. 1995), and eROSITA (Merloni et al. 2012), will enable us to apply our methodology to XRBs of much lower luminosity.

Finally, we reiterate that the R code we have written to make predictions for this analysis ought to be be applicable to other CCI data sets quite easily, and so we have provided it for public use.

We acknowledge the Harvard ICHASC for their helpful feedback. Additionally, G.G. and S.D.V. acknowledge partial support through a Smithsonian Institution CGPS grant to SDV.

Facility: RXTE - Rossi X-ray Timing Explorer.

APPENDIX: ALGORITHM DESCRIPTION

As noted in detail by Gelman et al. (2013), prediction in the Bayesian paradigm essentially follows the following iterative scheme: first, draw from the posterior distribution of model parameters and latent variables through a Monte Carlo simulation, and then draw from the predictive distribution of interest, which in our case is that of Y_pred, conditional on these draws. Adapting this general strategy for our problem, Equations (4) and (5) from Section 3 suggest the following iterative algorithm for sampling from the posterior predictive distribution for the compact object type.

1.
Sample from the posterior distribution $p({Z}_{\mathrm{train}},\alpha ,\beta ,| {X}_{\mathrm{train}},{X}_{\mathrm{pred}},{Y}_{\mathrm{train}})$ using elliptical slice sampling due to the joint multivariate normal distribution of $(\alpha ,\beta ,{Z}_{\mathrm{train}})$ . The method of elliptical slice sampling was introduced by Murray et al. (2010).
2.
Sample the posterior latent variables at the prediction points, Z_pred, using the conditional multivariate normal distribution of $p({Z}_{\mathrm{pred}}| {Z}_{\mathrm{train}},-)$ , which has mean ${{\rm{\Sigma }}}_{\mathrm{pred},\mathrm{train}}{{\rm{\Sigma }}}_{\mathrm{train},\mathrm{train}}^{-1}{Z}_{\mathrm{train}}$ and covariance matrix ${{\rm{\Sigma }}}_{\mathrm{pred},\mathrm{pred}}-{{\rm{\Sigma }}}_{\mathrm{pred},\mathrm{train}}{{\rm{\Sigma }}}_{\mathrm{train},\mathrm{train}}^{-1}{{\rm{\Sigma }}}_{\mathrm{train},\mathrm{pred}}$ due to fundamental properties of conditional MVN distributions.³
3.
Sample from Y_pred from a multinomial distribution conditional on the posterior latent draw of ${Z}_{\mathrm{pred}},\alpha ,\beta$ where, as mentioned previously, the probability of ${Y}_{{\mathrm{pred}}_{k}}$ being of type l is proportional to $\mathrm{exp}[{\alpha }_{l}+{\beta }_{l}{Z}_{{\mathrm{pred}}_{k,l}}]$ .

Additionally, we set ${\sigma }^{2}=1$ and φ = 0.1.⁴

Elliptical slice sampling is a Monte Carlo algorithm developed to simulate a posterior probability distribution where the prior distribution is jointly multivariate normal, a condition that holds in our model, as discussed in Section 3. As explained by Murray et al. (2010), this is a scenario where traditional Monte Carlo methods applied within a Bayesian context, such as Gibbs sampling or Metropolis–Hastings, perform poorly. Routines to implement elliptical slice sampling and draw from the posterior distribution of Z_pred and Y_pred were written in the R programming language using the Rcpp, RcppEigen, and RcppArmadillo packages for the efficient inline implementations of linear algebraic routines in C++ (Bates & Eddelbuettel 2013; Eddelbuettel 2013; Eddelbuettel & Sanderson 2014; R Core Team 2015; Sklyar et al. 2015). Additional packages used in the testing and development of this code were mvtnorm and MASS (Venables & Ripley 2002; Genz et al. 2014). As discussed by Murray et al. (2010), the computational impediments of elliptical slice sampling stem primarily from determining the Cholesky decomposition of and inverting a multivariate normal covariance matrix. RcppEigen and RcppArmadillo provide efficient implementations for determining the Cholesky decomposition and performing matrix inversion that can be conveniently included directly within R code. This code, along with the RXTE ASM data, is freely available at https://github.com/ggopalan/XRay-Binary-Classification.

CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

Article metrics

Permissions

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. DATA

3. MODELS AND ALGORITHM

4. RESULTS

5. SUMMARY AND FUTURE DIRECTIONS

APPENDIX: ALGORITHM DESCRIPTION

Footnotes

CLASSIFYING X-RAY BINARIES: A PROBABILISTIC APPROACH

Article metrics

Permissions

Share this article

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION

2. DATA

3. MODELS AND ALGORITHM

4. RESULTS

5. SUMMARY AND FUTURE DIRECTIONS

APPENDIX: ALGORITHM DESCRIPTION

Footnotes