A Rapid Ultrasound Vascular Disease Screening Method using PauTa Criterion

Pulsed Wave Doppler (PW) is a traditional ultrasound technique used for the diagnosis of vascular diseases. The conventional diagnostic method is mainly based on hemodynamic parameters obtained from the PW spectrum. However, it relies on clinical observation and medical experience through lots of patient data investigation and analysis. The collected patient data are varied by different ultrasound equipment, detection regions and operation techniques, resulting in different image styles, which decreases the application and generality of the conventional method. And this method also has a strong dependence on patients’ data, especially on negative samples. Thus this paper proposes a rapid disease screening method, named as PauTa Criterion, which is based on statistical distribution characteristics for screening out anomalous targets. The proposed rapid screening method is based on multiple hemodynamic parameters to detect the outliers that are different from healthy samples. Compared with the conventional methods, the proposed method does not rely on a fixed or single ultrasound system and has low sensitivity to system noise. The experimental results show that the proposed method reaches a high accuracy of 93.14%, which is at least 20% higher than existing clustering methods, K-Means and Support Vector Machine (SVM). Accordingly, high accuracy and fast convergence makes the prospect of the proposed method to be used for rapid disease screening possible.


Introduction
Vascular ultrasound is now widely used in the diagnosis of diseases, including arterial stenosis [1,2], fetal anomalies [3], diabetes and atherosclerosis [4] and so on. The hemodynamic parameters obtained by Doppler ultrasound technique [5] can be used for the evaluation of vascular and cardiac functions. When the angle between the ultrasound scanning beam and the blood flow direction is fixed, there is a positive correlation between Doppler frequency shift and blood flow velocity which can be calculated through the spectrum [5]. PW spectrum is used to study blood flow in vessels, so as to assess the location, extent and severity of vascular lesions. In the clinical application of hemodynamic parameters, arterial stenosis is mainly diagnosed by Peak Systolic Velocity (PSV) [6][7][8] and End-diastolic Velocity (EDV) [6][7][8][9], and fetal anomaly is characterized by abnormal Pulsatility Index (PI) [10], Resistance Index (RI) [11], and Systolic/Diastolic Ratio(S/D) [12,13].The current clinical diagnosis is mainly based on statistical analysis, which is widely used because of its convenience. However, it is over-dependent on negative samples, and one indicator can only help diagnose one type of disease, such as PSV for carotid artery stenosis. More importantly, early detection of vascular anomalies can make patients seek medical treatment in time and prevent diseases from becoming serious.
Some researchers proposed diagnostic standards based on single or fixed machines, such as Hewlett-Packard Sonos 1000 Color Duplex System used by Carpenter J P [6], a colour duplex scanner named Acuson 128 used by Moeta G L [7], and the ATL Ultramark 9 HDI scanner used by Suwanwela N [8]. Measured hemodynamic parameters differ among all ultrasound system providers even when viewed on the same patient. Therefore, due to the bias of the values measured by different ultrasound instruments that is caused by intrinsic spectral broadening (ISB) [14], these standards are not general. Even if the standard is fixed, the operation of sonographers can also affect the accuracy of the diagnostic standards [14]. Moreover, the number and specificity of the negative samples greatly influence the accuracy of the diagnostic results. And as mentioned above, the results are influenced by the types of ultrasound equipment, size of samples, and different operating methods, so it is difficult and inefficient for clinicians to diagnosis and quantify disease.
To solve these problems, this paper proposes a rapid disease screening method, PauTa criterion, which can detect vascular anomalies early and assess the health status of individuals. Based on hypothesis testing and Kullback-Leibler Divergenc, we find the probability distribution most similar to the original data [15]. PauTa criterion states that the possibility of random events appearing in the boundary part of normal distribution is the least, and the events are considered as abnormal data. Since PauTa Guidelines focuses on the distribution of data, it is less affected by system noise and negative samples. Compared to traditional methods that require a large number of negative samples, this method only requires random sampling, which makes data acquisition easier. PauTa criterion is not intended to obtain specific correspondence between indicators and disease, but rather to help to find out the possibility of disease through potential correlation between parameters and rapid screening to detect the outliers that are different from healthy samples. Our method provides the same results with different ultrasound instruments. It can be applied to the rapid preliminary screening to a wide range of diseases. In this paper, we test the performance of the proposed PauTa criterion compared with the traditional clustering method, SVM [16] and K-means [17], which are common methods of statistical analysis Experimental results show that the accuracy of the proposed method is 93.14%, which is at least 20% higher than SVM and k-means. By comparing the traditional diagnostic methods for carotid stenosis, experimental results show that the proposed method have the highest accuracy. Our method has the advantages of high accuracy, fast convergence, and excellent scalability. And it does not rely on individual data, but focuses on the overall statistical characteristics of data distribution. This paper is organized as follows. In the section of Methodology, the theoretical part of the proposed PauTa criterion is explained in detail. In the section of Experiments, we present the details of the data collection and algorithm implementation, the experimental results compared with traditional statistical analysis methods. Finally, conclusion is drawn in the section of Conclusion.

Methodology
Since traditional methods rely on a large number of negative samples, which are not easy to obtain, we consider using a large amount of health data as a criterion to find out abnormal individuals. The data of an individual can be considered independent to the others so that the data obey a certain distribution according to the hypothesis test. The following is an example of the calculation of an ACCL (Acceleration) indicator, with x as a vector containing all the ACCL values of N individuals.

Histogram Calculation
In histogram calculations, too many bins will make the height of the bin too short, and on the contrary, it will make the height too high. Therefore, the number of binsN is selected from the amount of raw data as shown in Eq. 1.
Where x is a vector containing the ACCL values of n individuals.
( ) is the maximum value in x and ( ) is the minimum value in x. W is an estimate of the bin's width.W ′ is a correction to theWin order toround down to the highest digit.E and E are the left and right boundary of the histogram. Figure 1. (a)Scatterplot of sample data for ACCL values; (b) Histogram distribution for ACCL values.

Curve Fit
A curve fit is obtained from the histogram, and the parameters μ and σ determine the normal distribution.
If the vertex of the histogram is as the fitting point, the curve depends on the selection of the bin (the more bins, the lower the height), so we use the probability density function (PDF) (the sum of the area is 1, the value of the point in the probability interval is equal to the probability density value multiplied by the interval), the input data are the original data for all points.
It is a vector containing the ACCL values of N individuals to , = [ 1 , 2 , … N ]. In Eq.8 and Eq.9, α is the significance level and set to 0.05, i.e., 95% confidence interval [15]. It is estimated as Xto and CI indicates 95% confidence intervals. We also estimate σ as S and get 95% confidence intervals as CI . The calculation of parameter estimation and confidence intervals are as follows: The probability density function (PDF) called ( ) and the cumulative distribution function (CDF) called ( ).
indicates the single measurement. The PDF and CDF of normal distribution and Rayleigh distribution are defined as:

PauTa Criterion
The PauTa Criterion (3σ Criterion) [18] is based on the normal distribution of sample data, which assumes that random probability occurs in the range of 3σ, and the above 3σ are abnormal. As shown in Figure 2, μ±3σ occupies 99.73% of the area, and if the sample exists outside the area, it has specificity.

Error Analysis
The evaluation indicator RMSE (Root Mean Square Error) [19] is used to evaluate the accuracy of the fitting through calculating the difference between the predicted value and the actual value. The smaller the RMSE value, the better the fitting.
Eq.16 shows the convergence condition, ε is a minimal value, R ′ represents the RMSE value for this iteration, and R denotes the value of the previous iteration: The evaluation indicator MAE (Mean Absolute Error) [20], [21], which calculates the difference between the3σ position and the last estimated position, is used to evaluate the degree of convergence of the results. The convergence condition is that the difference between the new boundary value X ′ , X ′ and the old boundary value X , X is less than a minuscule value ε: The training results are considered representative when meeting the convergence condition. As shown in Algorithm 1, training can stop when a training subset that satisfies the condition is found, where ε is set to 0.01, and k is set to 3.

Experimental Results
The number of hemodynamic parameters calculated from the PW spectrum is13and the parameters contain the mean value of each cycle measurement. These items include: ACCL (Acceleration), ACCT Our database consists of 229 individual samples of the common carotid artery blood flow collected randomly from university, community and Hi-tech South China Hospital. In order to ensure the randomness of the sample, different ages and sexes were included. The sampling location was at a distance of 2 to 3 cm from the bifurcation of the common carotid artery, angle of incidence was controlled at 60°, and sampling gate was placed in the center of the vessel, the sampling time was ten cardiac cycles. To verify the validity of the algorithm, we selected blood flow parameters from50 patients with vascular disease from the hospital as negative samples.
First, we calculate the probability density function, then follow the formula in the previous chapter to fit the histogram as shown in Figure 4.  For proving the validity of our algorithm, the data increase in batches during training, and the algorithm is valid if the RMSE and MAE converge gradually. We record the RMSE and MAE for each iteration, and finally, get the change of these two metrics with the number of iterations.
As the number of data increases, the bins and corresponding RMSE and MAE are recalculated for each additional ten sets of data, and the obtained RMSE change with the amount of raw data as shown in Figure 5. It's important to note that the results are normalized.  Figure 5 shows a decreasing trend in RMSE with increasing data, representing that the fitted curves are more similar to the histograms, thus demonstrating that the distribution of the data is consistent with our hypothesis. Figure 6 (a) shows that as the number of iterations increases, the value of the change in RMSE tends to zero, indicating that the training has converged. Figure6 (b) shows that as the data increase, the MAE fluctuates within a minimal range, proving that the training results are not random and are reliable.
(a) (b) Figure 6. (a) The absolute value of the difference between the RMSE of the two iterations; (b) MAE varies with the number of iterations, adding ten sets of data per iteration.

Comparison of Methods
Our method is compared with traditional statistic methods, single classification SVM and K-means. Both one-class SVM [22,23] and K-means [17] are unsupervised clustering algorithms. We use the LIBSVM developed by Prof. Chih-Jen Lin and others at National Taiwan University to implement oneclass SVM. The kernel function chooses Gaussian, and predetermined negative sample ratio is set to 0.01. The number of k-means clusters is set to 2. And because the result of k-means is related to a randomly selected cluster centroid, we run ten times for each calculation to take the average.
After obtaining the training model, in order to demonstrate that our method can correctly screen out abnormal individuals, we collected hemodynamic parameters from 50 patients with carotid disease in the hospital to determine whether they were abnormal by our method, 49 of whom show abnormalities in one or more parameters. The original dataset is samples of normal people, each sample contains 26 parameters, and 10-fold cross-validation is used to find the optimal parameter selection. After obtaining the optimal training model, we compare the testing accuracy of the three methods. The test set is made The overall accuracy, true positive rate (sensitivity), and true negative rate (specificity) [24] of test set are as follows: As can be seen in Table 1, the overall accuracy of our method is 93.14% and is 20% higher than the other two clustering methods, k-means and one-class SVM. The predicted positive sample has a 98% probability to be correctly identified and the predicted negative sample has an accuracy up to 89.79%.All of indicators are higher than the other two clustering methods. The comparison of experimental results shows that our method clusters more accurately and performs better.
This paper also compares our method with the current clinical used methods for diagnosing the degree of stenosis, focusing on the accuracy, sensitivity and specificity of PSV diagnostic indices. The comparative papers proposed different diagnostic criteria such as170 cm/s [6], 260 cm/s [25], 200 cm/s [26], and 130 cm/s [27].We calculated the accuracy, sensitivity and specificity of all samples according to these criteria.  Table 2 shows the evaluation of PSV abnormalities calculated from all samples collected, including both the normal and abnormal samples. PSV diagnostic indices are different in different papers because the collected samples are varied by different ultrasound equipment, detection regions and operation techniques. Over dependence on fixed ultrasound system can greatly affect the accuracy of diagnosis and even lead to misdiagnosis. But the proposed method focuses on the data distribution, finds out the possibility of the disease through rapid screening and is not affected by fixed or single system. These advantages make the propose method can be used for rapid preliminary screening of a variety of diseases.

Conclusions
Hemodynamic parameters can be used to diagnose vascular disease, but current methods are poorly adapted due to the influence of detection regions, operating techniques, and ultrasound equipment. This paper presents a method for rapid screening of vascular diseases based on hemodynamic parameters and finds out the possibility of the disease through data distribution. From the above experimental results, the proposed method proves to be effective in rapid screening of vascular diseases, and it is highly accurate, more adaptable than traditional methods. The proposed method reflects the characteristics of the data itself, results in low misdiagnosis rate. On the one hand, it is not sensitive to system noise, so different instruments can give the same results. On the other hand, the training process does not require a large number of negative samples. Therefore, it samples easily, converges quickly, and the more data, the more accurate the results will be. Thus, such a method can be potentially beneficial to assist clinicians in rapid disease screening. In the clinical treatment of cardiovascular disease, the earlier the disease is detected, the more likely it is to be cured, which is why our rapid screening method can guide patients to seek further medical checking and consultation.