Empirical performance of interpolation techniques in risk-neutral density (RND) estimation

The objective of this study is to evaluate the empirical performance of interpolation techniques in risk-neutral density (RND) estimation. Firstly, the empirical performance is evaluated by using statistical analysis based on the implied mean and the implied variance of RND. Secondly, the interpolation performance is measured based on pricing error. We propose using the leave-one-out cross-validation (LOOCV) pricing error for interpolation selection purposes. The statistical analyses indicate that there are statistical differences between the interpolation techniques:second-order polynomial, fourth-order polynomial and smoothing spline. The results of LOOCV pricing error shows that interpolation by using fourth-order polynomial provides the best fitting to option prices in which it has the lowest value error.


Introduction
The volatility function technique is considered as a nonparametric approach to extract the RNDs [1] . This method is based on the volatility smile function. In this approach, the estimated RNDs are calculated using second derivative of option prices with a condition of continuum option prices [2]. The condition of continuum option prices is impossible to exist in the real market. Thus, previous studies proposed various interpolation techniques to overcome this condition of continuum option prices. Interpolation techniques play a major part in deriving the true density from option prices.
In beginning, the second-order polynomial interpolation was introduced to smooth a volatility smile [3]. It has been proposed to fitting the implied volatility instead of using the option prices directly due to the non-linearity problem [4]. Similar to [3], a second-order polynomial is used by [5] but in an implied volatility-delta space to interpolate the implied volatility function. The delta can be defined as the rate of change of the option prices with respect to the price of the underlying asset [6]. In a different study, a cubic spline interpolation is used to interpolate the implied volatility in an implied volatility-strike space [7]. The cubic spline is preferred because of the flexibility in controlling the shape of a volatility smile. A smoothing spline intepolation was developed by [1] that combining [5,7] approached with certain modifications. This approach introduced a vega of options as the weight of each distribution together with a smoothing parameter [1]. Recently, drawback of the cubic spline in RNDs estimation has been pointed out by [8] and proposed a fourth-order polynomial interpolation on an implied volatility -strike price space to generate the continuum of option prices.
To be used in RNDs estimation. Thus, our study is largely motivated to know if there are any differences among the interpolation techniques with regards to RNDs estimation. To the best of our knowledge, this study is the first to analyze the statistical differences between the interpolation techniques namely second-order polynomial, fourth order polynomial and smoothing spline which are used to extract the RNDs. This paper contributes to the empirical finance literature by applying the cross-validation technique to determine the performance of each interpolation technique. The leave-one-out cross validation (LOOCV) of option prices error is chosen to select the appropriate interpolation which gives the lowest value error. This paper is organized in the following ways. The second section presents the formula of Black-Scholes-Merton model, and the third section presents the theory of risk-neutral density. Fourth section lists the procedures to estimate the risk-neutral density using interpolation techniques. Section five describes the model evaluation and section six presents the data. Seventh section discusses the results and the final section presents the concluding remarks.

Black-Scholes-Merton model and implied volatility
This study uses the Black-Scholes-Merton (BSM) model to price the option. The formula of BSM model is as follows: The function   N  is the cumulative probability distribution function for a standardised normal distribution. The variables c and p are the European call and European put prices, respectively; 0 S is the price of the underlying asset; K is the strike price; r is the continuously compounded risk-free rate;  is the underlying asset price volatility; and T is the time to maturity of the option.
The volatility of underlying asset price in the BSM formula cannot be derived directly. Thus, this study implements the bisection method algorithm to get the implied volatility by using the inverse formula of the BSM model.

Risk-neutral density from option prices
Price of a call option is given by the discounted value of expected payoff on the maturity date, T with respect to the risk-neutral probability: where C is the European call price, T S is the price of the underlying asset; K is the strike price; is the continuously compounded risk-free rate and   T fS is the RND function. The RND can be extracted from the second derivative of an option price with continuous strike prices [9]. From equation (2) the partial derivative with respect to the strike price, K yields: Solving the risk neutral distribution, () FK , gives Numerically, equations (3) and (5) can be approximated using the following equations:

Estimation of risk-neutral density procedures
This subsection explains the procedure for the RNDs estimation by using smoothing spline and polynomial interpolation.

Extract RNDs using a smoothing spline interpolation
Firstly, the call prices are translated into implied volatilities by using the inverse of the BSM model. Then, the strike prices are converted into option delta. The delta estimation is based on the at-themoney implied volatility in order to maintain the same ordering with the strike prices [1,9,10]. According to [6], the delta of call options can be calculated as follows : where  is the standard normal cumulative distribution function. At this stage, the call prices-strike prices spaces are fully converted into implied volatilitydelta spaces. Then, the smoothing spline is used to interpolate the implied volatility by using the following equation: The parameter of equation (9) can be estimated by minimizing the objective function as stated in equation (10) In line with [1], vega of options is calculated and is to represent the weight of parameter ( i w ) and the value of  is set to 0.99. Once it is fitted, extract the 1000 points of implied volatility from the implied volatility function. Last but not least, convert the 1000 point of implied volatility into the call prices using the BSM model and the option deltas are converted into strike prices. Finally, the RNDs are obtained by using the numerical calculation based on equation (7).

Extract RNDs using a polynomial interpolation
The procedures of the second and fourth order polynomials in RNDs estimation are basically referred to [3,8] . The polynomial interpolation takes place in the implied volatility-strike prices spaces. Thus, the call prices are converted into implied volatilities. The implied volatility-strike space is fitted using the second-order or fourth order polynomials. The implied volatility functions can be expressed as follows: i.
Second-order polynomial : Fourth-order polynomial : The 1000 points of implied volatility are extracted from the fitted curve and are transformed back to call prices by using the BSM model. Finally, the RNDs are calculated based on equation (7).

Model Evaluation
This subsection describes the interpolation performance measurement. Firstly, statistical analysis is applied to examine if the interpolation techniques provide any significant differences in RND estimation. Secondly, the leave-one-out cross-validation (LOOCV) is used to examine the best fitting among the interpolation approaches.

Statistical analyses
The performance of the interpolation approaches in RNDs estimation is tested based on the following hypotheses: Hypothesis 1 (h1) : The interpolation by using second and fourth order polynomials do not provide any statistical difference in RND estimation.

Hypothesis 2 (h2) : The interpolation by using fourth order polynomial and smoothing spline do not provide any statistical difference in RND estimation
Hypothesis 3 (h3) : The interpolation by using second order polynomial and smoothing spline do not provide any statistical difference in RND estimation The implied mean and the implied variance of RNDs are used as an input for statistical analyses. The implied mean of RNDs give an expectation the price of the underlying asset price in future. The implied variance of RNDs indicates the deviation or uncertainty of the evolvement of the underlying asset price. The hypothesis for the implied mean is tested using standard t-test and the implied variance is tested using F-test.
The implied mean and the implied variance can be calculated as follows: i.
Implied mean : where K is the strike price and   fKis the RND corresponding to the strike price.

Cross-validation
Basically the whole data set is fitted using the interpolation approaches. Then, the interpolation approach that provides the lowest error is chosen when compared between the data set and the fitted values. However, this approach may lead to over fitting the dataset and gives a false conclusion of a particular interpolation that provides the best fitting with the lowest error. In addition, it does not exhibit the true performance of the interpolation approach when involves a prediction of a new value. Thus, cross-validation is used to detect and prevent the overfitting case. [11] is the first to propose a model evaluation by using a cross-valuation method. Briefly, the cross-validation deleted some of the data before an interpolation starts. The deleted data is then used to get the predicted value using an interpolation equation. The cross-validation can be divided into three categories namely the simple cross-validation with holdout method, the K-fold cross-validation and the leave-one-out crossvalidation (LOOCV). LOOCV is highly recommended as an appropriate model to use as it gives an unbiased estimator to represent the performance of a model [12].
However, the LOOCV method is not widely used in finance literature. This study is inspired by the seminal work of [13] that uses the LOOCV method to examine the performance of a spline in a volatility surface. Therefore, this study adopts the concept of LOOCV in order to determine the ability of each interpolation in RNDs estimation.
There are several steps in LOOCV estimation. Let N be the number of observation data. Firstly, remove the i-th observation data and then data is fitted corresponding to the particular interpolation. Re-evaluate the i-th data by using the model estimated on Ni  data. This method requires intensive computation since it is fitted N times. The LOOCV estimation can be calculated as follows: i i e y y  . Note that the main objective of this study is to generate a continuum of option prices by using the fitted implied volatility. The difference between the market prices for options and the option prices based on the fitted model is considered as an approximation error. Following [13], the error is calculated as where i C is the market price of call options and ˆi C is the price of call options based on the fitted model.

Data
Our study uses dataset on Dow Jones Industrial Average (DJIA) index options and is obtained from OptionData.net. The dataset covers the period from January 2009 until December 2015. The interest rate used is based on the London Interbank Offer Rate (LIBOR) for 1-month. The price of the underlying asset, interest rate and the dividend yield are retrieved from the DataStream. Only out-ofthe-money (OTM) of call and put options is considered in this study due to liquidity reasons [8,10,[14][15][16][17][18][19][20][21][22]. In line with [8,15,23], the average of bid-ask quotes is used to represent the closing price of options. This study only considers options with a one-month maturity according to the Chicago Board Options Exchange (CBOE) calendar.
There are several restrictions applied on the dataset before the final dataset is obtained. Firstly options with bid or ask quotes equal to zero are eliminated and to ensure that the ask quotes are greater than the bid quotes. Secondly, option prices that violate the arbitrage condition are excluded. Lastly, only options with the lowest delta value less than or equals to 0.25 and the highest delta value equals or greater than 0.75, are used. This condition is applied to ensure the RNDs extracted exhibits the true density. The final sample consists of 83 set of options with a one-month maturity.

Results and Discussions
This subsection explains the result of the analyses. and is divided into two parts. The first part is the explanation of statistical analyses based on the implied mean and the implied variance. The second part explains the LOOCV option price error.
Options prices that is available on 22 February 2010 is used to illustrate the computational work explained in Section 4. Figure 1 illustrates the steps in RNDs estimation for the three types of interpolation. The interpolation of volatility smile is presented in the first row, followed by the graph of call prices based on the fitted implied volatility and in the last row is the RNDs estimation. These computations are applied to the whole 83 set of options. The polynomial interpolation is taken place in an implied volatility-strike prices space. On other hand, the interpolation using smoothing spline is taken place in an implied volatility-delta space. The graph for each step corresponding to each interpolation does not display an outstanding differences and it leads us to further analyze these interpolation techniques using statistical analyses.

Statistical Moments
Implied mean and implied variance for the DJIA index options with a one-month maturity are presented in Figure 2 and Figure 3, respectively. All interpolation method do not clearly exhibit any differences. Then, the performances of the interpolations are tested using statistical analysis. Table 1 reports the statistical differences results of the implied mean and the implied variance according to the hypotheses mentioned in Section 4.1. Majority of the cases clearly reject the null hypotheses of equality of the implied mean and the implied variance 1 . Therefore, it can be concluded that all the interpolation approaches have statistical significances. It implies that the interpolation approach affect the RNDs estimations.

LOOCV pricing error
The previous results only convey that each interpolation has different effect towards the RNDs estimation without pointing out which of the interpolation approach is the most appropriate. This study further analyzes which interpolation provides the best fitting so that the call prices generated has the lowest approximation error. Thus, the LOOCV approach is used to examine the performance of the interpolations. The ability of interpolation approach in general is measured based on various error metrics namely root mean square error (RMSE), mean absolute percentage option pricing error (MAPE), and mean absolute option pricing error (MAE). The results are presented in Table 2.
Overall, the interpolation of fourth-order polynomial has the lowest value compared to the two interpolation approaches. This finding is in agreement with [8] in which a higher order polynomial is better than a lower order polynomial. Even though the smoothing spline is widely used in the previous literature [1,10,24], however it shows the highest error of option pricing based on LOOCV techniques.

Conclusions
The interpolation approaches play a crucial role in the volatility function techniques for the RNDs estimation. This study statistically and empirically analyze the performance of the interpolation approaches namely second-order polynomial, fourth-order polynomial and smoothing spline. Results show that each interpolation has significance difference among each other. Each interpolation provides different value of RNDs estimation. Further, the leave-out-one cross-validation (LOOCV) was conducted to determine which interpolation is appropriate to be used. The finding shows that the interpolation of fourth order polynomial outperformed the second-order polynomial and smoothing spline in which it gives the lowest value of LOOCV pricing error.