Parametric Model Based On Imputations Techniques for Partly Interval Censored Data

The term ‘survival analysis’ has been used in a broad sense to describe collection of statistical procedures for data analysis. In this case, outcome variable of interest is time until an event occurs where the time to failure of a specific experimental unit might be censored which can be right, left, interval, and Partly Interval Censored data (PIC). In this paper, analysis of this model was conducted based on parametric Cox model via PIC data. Moreover, several imputation techniques were used, which are: midpoint, left & right point, random, mean, and median. Maximum likelihood estimate was considered to obtain the estimated survival function. These estimations were then compared with the existing model, such as: Turnbull and Cox model based on clinical trial data (breast cancer data), for which it showed the validity of the proposed model. Result of data set indicated that the parametric of Cox model proved to be more superior in terms of estimation of survival functions, likelihood ratio tests, and their P-values. Moreover, based on imputation techniques; the midpoint, random, mean, and median showed better results with respect to the estimation of survival function.


Introduction
One of the main objectives in the survival studies is the comparison of survival functions, which is due to the fact that it may be needed to compare two different treatments for one disease. In this paper, the comparison problem for several imputation techniques were discussed based on the existence of PIC failure time data. PIC data is common in numerous applications in medicine, engineering, biology and economy. One of its most common occurrences is in medical or health studies that have periodic follow-up hospital visits, where the event, usually a patient displaying a certain disease symptoms or recovery signs, may occur between visits, but it also may occur in the hospital, hence partly interval censoring. For some subjects, PIC data indicates the exact failure times which are observed. For the remaining subjects, survival time of interest is observed only for a certain time interval instead of being exact [1] [2] [3]. [4] provided an example of this kind of data that is; Framingham Heart Disease Study which follows a group of women diagnosed with breast cancer and separates them into two different groups based on the type of treatment that they receive. Another example about Fatigue Failure (Crack size data) which measures the number of cycles it takes before a crack appears in the subjects may be considered as PIC data. Using imputation greatly simplifies the calculation process especially when dealing with larger sets of data. Applying several imputation techniques to our data was relatively easy to implement. Using R software, we were able to develop a programming code to apply different imputation methods to the data sets and proceed with the parametric analysis. In this study, several imputation techniques were used to estimate survival function and compared with the one that was obtained by Turnbull based on interval censored and PIC failure time data.
In the next two sections, parametric model and imputation techniques will be discussed.

Parametric Model
While nonparametric methods have been the standard approach for analyzing simple homogeneous survival data without covariate information for making fewer assumptions, parametric survival models are sometimes used for inference. Let's take ( n X X X ..., , , ) to be the true survival times, and also are assumed to be independent and identically distributed with survival function is observed. If we assume independent censoring, then the likelihood function for the parameter is presented by Equation (1).
where E is a parameter, that needs to be estimated.
The values of E can be estimated by maximizing the likelihood as shown in Equation (2). (The estimation of E 's is not address it here in the paper, reader refer to [6]) t is the risk set at time i X , which is a set of subjects which are still alive and uncensored at the time. We can use the Breslow estimator to estimate the cumulative baseline hazard ) Nelson Aalen estimator could be used if E is known and is also equal to the maximum likelihood estimate E . The estimator based on Equations (2) and (3) for which have a parametric maximum likelihood interpretation. Equation (2) can be used to estimate E in a large sample, in which E is normally distributed with the proper mean and with a covariance that can be estimated by information matrix.

Probability-based imputation techniques
It is usually require estimating the distribution of the partly interval censored data based on the observed intervals and using our knowledge of the distribution to impute the missing data. The most common probability-based imputation methods are, conditional mean imputation, conditional median imputation, and random Imputation.

An Example
The proposed method was applied to the modified breast cancer data which was found in some studies [6] [7]. The data set here consist of 46 patients who were treated by the means of Radiation therapy (R) only and 48 other patients who were treated by the means of adjuvant Chemotherapy next to Radiation (R+C). This study aimed to compare cosmetic effects of the treatment through Radiation therapy alone (R) against the combination of Radiation therapy with Chemotherapy (R+C) on women who suffer from breast cancer in its early stages. The event of interest in this study is the time when breast retraction is first observed. The patients were checked in the clinic once every 4 to 6 months, where the actual dates of the event were recorded exactly if available. If the dates are not available, interval of events was noted. To set up data as partly interval censored data, for instance: for radiation, 25 observations were set up as right censored, 21 as interval censored, and 20 as exact. Likewise, for R+C, the set up was for 13 observations as right censored, 35 as interval censored, and 20 as exact, reader refer to [6]. Results of this data set were analyzed as PIC as presented next. Based on parametric analysis for cancer PIC data, Figures 1, 2, 3

Conclusion
In this paper, a simple modification of estimating survival function for PIC data using parametric model based on imputation techniques was proposed. Modification of breast cancer data is used and R software is also used to generate the results. Obtained results are same as the result which is obtained by Turnbull. However, based on partly interval censored data, random imputation and mean & median imputation show better results compared with others imputation techniques as well as Turnbull with respect to the smallest P-value ( Table 1). The parametric Cox model based on imputation techniques gives essentially better estimations of survival function as a Turnbull method, and is also much easier in term of predictions.