A machine-learning approach to setting optimal thresholds and its application in rolling bearing fault diagnosis

Bearings are one of the critical components of any mechanical equipment. They induce most equipment faults, and their health status directly impacts the overall performance of equipment. Therefore, effective bearing fault diagnosis is essential, as it helps maintain the equipment stability, increasing economic benefits through timely maintenance. Currently, most studies focus on extracting fault features, with limited attention to establishing fault thresholds. As a result, these thresholds are challenging to utilize in the automatic monitoring diagnosis of intelligent devices. This study employed the generalized fractal dimensions to effectively extract the feature of time-domain vibration signals of bearings. The optimal fault threshold model was developed using the receiver operating characteristic curve, which served as the baseline of exception judgment. The extracted fault threshold model was verified using two bearing operation experiments. The experimental results revealed different damaged positions and components observed in the two experiments. The same fault threshold model was obtained using the method proposed in this study, and it effectively diagnosed the abnormal states within the signals. This finding confirms the effectiveness of the diagnostic method proposed in this study.


Introduction
Bearings are media for connection, withstanding external forces in mechanical equipment.They are among the most frequently utilized central parts and components.In a high-speed, heavy-load working environment, it is common for the outer and inner races or the ball surface of bearings to experience corrosion, spalling, and failing.Statistically, a significant portion of equipment malfunction and subsequent substantial losses are attributed to the faults in bearing assemblies [1].Researchers and enterprises in related fields continually search for new, cost-efficient solutions to address these challenges.
Bearing fault diagnosis revolves around two key points: (1) the extraction of the optimal qualitative fault features from time-domain vibration signals and (2) the development of a better diagnostic model for fault identification.Many studies about feature extraction for bearing fault diagnosis have been conducted [2][3][4][5][6][7].The feature based on the time-frequency domain is one of the most mature analysis methods.However, the traditional time-frequency analysis method exhibits certain limitations in processing the non-stationary or nonlinear signals resulting from bearing faults, affecting its efficiency in presenting deficiencies.Fractal theory finds universal application in various domains [8,[8][9][10][11][12][13][14][15].There are fractal features in non-stationary vibration signals.These fractal features can reflect the dynamic mechanism corresponding to different states of the nonlinear systems.They are regarded as an effective tool for mapping defective space distribution complexity and extracting fault features [16,17].Fractal theory, in its analysis of fractal signal features, operates independently on whether the signals are stable and linear.The traditional feature extraction methods necessitate multiple prerequisites for signal processing, such as consistent sampling rate, minimized background noise, linear behavior within a specified frequency range, and the absence of abrupt transient changes.The traditional single fractal analysis only focuses on overall features and exhibits significant limitations, such as lacking the ability to describe local features.In contrast, multifractal analysis can describe intrinsic fractal features from different local scales, thereby observing increased usage among researchers [18].Some scholars have studied fault diagnosis using fractal theory [19][20][21].Multifractal spectrum and multiple dimensions (box-counting dimension, information dimension, correlation dimension) have proven effective in obtaining valuable qualitative features.Based on existing literature, most researchers primarily focused on rough fault classification.Currently, no literature is available that addresses quantitative diagnosis based on fractal dimensions.
The fault process in a bearing's overall life cycle is approximately divided into three stages: the normal operation stage, the initial damage stage, and the serious fault stage.During usage, bearing performance deteriorates due to friction between its components, chemical corrosion, and fatigue induced by periodic stress.When damage occurs, the feature index value increases or decreases over time.Distinguishing between normal and initial damage states can be challenging.Therefore, the most critical part of the diagnostic process lies in analyzing and judging the occurrence of initial damage according to the feature index [22,23].Detecting the bearing failure early provides significant economic benefit by allowing the production line to schedule maintenance in advance.However, early bearing failure diagnosis research is still at the initial stage.Existing research findings revealed that the methods for fault diagnosis are mostly based on machine learning.Wang et al [24] combined a time-frequency chart, an original time-domain signal, and the frequency-domain signal of short-time Fourier transform and discrete wavelet transform (DWT) with multiple feature parameters for fault diagnosis using convolutional neural networks.Cao et al [25] proposed an enhanced algorithm based on local binary patterns, which extracted the features of different scales using the improved algorithm and DWT.These features were then combined as the input feature of the broad learning system, which was used to diagnose the faults.Wang et al [26] used the generalized composite multiscale weighted permutation entropy to extract fault features.This method requires manually setting four optimal parameters to extract effective features.The dimension of the complex features was reduced using a Supervised Isomap (S-Iso).The said features are the input features of the support vector machine used for diagnosis.Successful fault diagnosis depends on preliminary signal processing and feature extraction methods.Most existing fault diagnosis methods are based on machine learning and deep learning.While diagnostic techniques built on these foundations are effective, their precision hinges on the rigorous adherence to specific parameters during feature extraction.Many methodologies combine various feature indices, often diminishing subtle deviations in the primary feature, thereby complicating early damage detection.A consistent oversight in these approaches is the absence of a definitive statistical threshold, which is crucial for condition-based repair.Though machine learning models offer advanced diagnostic capabilities, their intensive computational requirements and protracted processing durations often confine their use to research rather than hands-on, real-world applications.Undeniably, the smooth operation of mechanical systems relies heavily on the condition of the bearings, underscoring the pressing need to develop a method that can promptly identify early signs of bearing wear or damage.
This study proposes the use of generalized fractal dimensions (GFDs) for extracting qualitative fault features and an receiver operating characteristic curve (ROC curve) for thresholding, reducing the computational cost, and facilitating onsite real-time diagnosis.The fractal features hidden in nonlinear vibration signals could be extracted using the GFD algorithm.GFDs serve as a metric for quantifying the complexity of signals.The complexity of the signals correlates directly to the GFD value.Therefore, the health status of a bearing could be assessed by analyzing the GFDs value of signals.When a bearing becomes damaged, its vibration signal changes, resulting in a distinct level of complexity compared to the normal state.Therefore, the qualitative features of normal and fault states could be identified using the GFDs.The ROC curve method establishes a fault threshold for distinguishing the GFD values of normal and abnormal states.The threshold is a crucial reference point for detecting damage and identifying the optimal fault threshold model.This methodology significantly contributes to research studies on the quantitative diagnosis of initial bearing failures.
The framework of this study is presented as follows.Section 2 introduces the application of GFDs for the qualitative analysis of bearing fault features and the application of ROC for setting thresholds on the fault features.This study also comprehensively discussed the application of the two methods to bearing fault diagnosis.Sections 3 and 4 explore the feasibility of the proposed analysis method, as verified by two groups of actual experimental signal analyses.Section 5 summarizes the conclusion.

Method
This section describes the proposed method for diagnosing faults in rolling bearings.Section 2.1 explores the calculation strategy of the GFDs.Section 2.2 describes the method of using the ROC curve to quantize fault criteria based on the result of the GFDs.The last section describes the process of the method proposed in this study.

Fractal theory
The fractal theory is a popular mathematical theory that provides a relatively novel method for non-stationary and nonlinear signal analyses.This theory can mathematically describe the features of objective things using fractal dimensions.Compared to traditional theories, the fractal theory describes complexity more precisely.This is particularly valuable in understanding the multiple factors that influence the vibration signals of the bearings in complex mechanical equipment, which are often non-stationary or nonlinear.The fractal theory is highly effective at discerning the transition from normal to abnormal states, making it an ideal tool for analyzing the vibration signals of bearings.

Definition of fractal dimensions
The fractal theory introduces the concept of fractal dimensions, which serve as complexity measures.These fractal dimensions represent the available space-filling degree of objects.German mathematician Felix Hausdorff proposed the concept of non-integral dimension in his research, which currently has the most familiar fractal dimensions definition.Its fundamental equation is expressed as follows: where r is the scaling factor, N is the total number of r's covering the set, and D is the fractal dimension.The above equation is used for analyzing the fractal dimensions of isometric shapes.The box-counting dimension is a widely used method for calculating the fractal dimensions of irregular shapes of vibration signals.This method calculates a two-dimensional plane, and the pattern is subdivided into grids of equal size to determine how many grids the given pattern occupies.The area of the fractal pattern is calculated as: grid area × ϵ 2 .The number of grids needed to cover the pattern, with a side length of ϵ, is denoted as N(ϵ).Generally, when the value of ϵ is very large, it may result in the omission of information smaller than the grid size.Similarly, when the value of ϵ is very small, the measured dimensions tend to be more accurate.This relationship with the box-counting dimension is expressed as the following equation: A single fractal cannot effectively master the fractal features.Therefore, extracting features from a complex object is limited.The fractal dimensions should be calculated using the GFDs of the multifractal theory.The multifractal method can count numbers and images, providing a comprehensive view of signal distribution based on the box-counting dimension.In a simple prototype, signals are placed within the grids, and the concept of weight is introduced.The average probability in each grid is denoted as α, which is referred to as Holder exponent [27,28].Based on the definition of the fractal dimensions, N = r D , can be changed to p i (ϵ) = ϵ ϵ i .Here, α represents the condition where the mean is considered.In cases of heterogeneous distribution, different Holder Exponents are required for an accurate description.To facilitate this description, this study defines a function X(q, ϵ), expressed as equations ( 3) and (4): ) q ∝ ϵ α(q) (3) The signal set is subdivided into several grids, each with a length of ϵ.The total count of these grids is demoted as N(ϵ).The total mass of all grids is denoted as Mϵ ).The q value is an arbitrary value between positive and negative infinity In 1959, Renyi defined the dimensional quantity of the generalized entropy and scale index q [29].
The equation derived from Renyi's definition of GFDs is expressed as follows [30]: where D q is the definition of GFDs, and K(q) is the generalized entropy.The meaning of dimensions varies as q changes.When the q value is 0, it represents a box-counting dimension, focusing solely on self-similarity.
In this case, no weighting is involved in the fractal dimensions of the box-counting dimension.Meanwhile, when the q value is 1, it represents the information dimension.On the other hand, when the q value is 2, it represents the correlation dimension.A larger q value requires a higher computational cost.As the scale index continuously increased, variations in the computational cost of diagnosis became apparent [30].Therefore, this study utilized the algorithm with the q value of 3, which yielded prominent diagnostic features.

Method using vibration signals to calculate GFDs
Assuming that a signal {X(i), i = 1, 2, 3, . . ., M} is dispersed in the time series, as shown in figure 1.The horizontal axis represents time, the vertical axis represents the magnitude of the signal, and ϵ is taken as the mesh width.If the mth row and nth column form mn mesh, the coordinates of the mn mesh are shown in figure 2. The calculation considers the distribution pattern of signal magnitude.It means calculating the number of rows of signals in No. mn mesh.The max[X(i)] and min[X(i)], i ∈ {mϵ, (m + 1)ϵ} are taken and divided by mesh width ϵ.The integer value is N(ϵ) which represents the number of meshes covered by signals, expressed as follows [30]: The signals covering mn mesh are marked as f mn .The probability of signals to be covered by mn mesh is as follows: The GFD value can be obtained using equations ( 6) and (7).

ROC
The ROC curve is a graphical analysis tool for representing the classification performance.This quantitative method can make precise assessments and decisions in scenarios where there might be confusion in two classification problems.Furthermore, this method can provide objective and neutral suggestions disregarding benefits.The ROC curve plays a crucial role in identifying the optimal failure threshold model for detecting bearing defects, offering a comprehensive view of classification performance across diverse threshold values.This approach quantifies sensitivity and specificity, providing a balanced evaluation, particularly when dealing with imbalanced datasets.Compared with alternative diagnostic methods, the ROC distinguishes itself by maintaining consistency despite class distribution shifts.Additionally, it offers an intuitive visual depiction of trade-offs, simplifying the process of identifying the threshold that aligns with specific operational objectives.

Binary classification model
The ROC curve analysis theory is based on a binary classification model with only two possible output classes.The bearing diagnosis uses normal/damage classification, serving as the decision criterion for the ROC analysis in this study.A threshold distinguishes the normal from the damaged bearings in a binary classification model.For instance, if the output result of the test signal group was larger than a threshold, it could be identified as 'damaged.'Therefore, a binary classification problem could have four possible results: (1) true damage (TD), diagnosed as damaged and damaged in fact; (2) false damage (FD), diagnosed as damaged but normal in fact; (3) true normal (TN), diagnosed as normal and normal in fact, and (4) false normal (FN), diagnosed as normal but damaged in fact.Whenever a threshold was given, these four possible results could be distinguished from each signal group in the signal set, presenting a binary confusion matrix of the model threshold, as shown in table 1.

ROC space
In the quantification course, some detection indexes were used to distinguish the threshold of the normal signal group and the abnormal signal group as the cut-off point, as shown in figure 3. The true damage rate  (TDR) is defined as the probability of all signal damage groups.It requires some threshold as the cut-off point to correctly diagnose the damage The False Damage Rate (FDR) is defined as the probability of all signal normal groups.It requires some threshold as the cut-off point to falsely diagnose the damage The ROC curve uses the composition method to describe the correlation between TDR and FDR, as shown in figure 4. In building a diagnostic model, multiple different thresholds are set by a continuous variable.To work out a series of TDR and FDR, the TDR value corresponding to each threshold is vertical coordinate, whereas the FDR value is horizontal coordinate.All of the coordinate points are connected up to form the ROC curve.Another index is often used in the diagnostic process, i.e. true normal rate (TNR).It is defined as the probability of all signal groups of normal.Some threshold is required as the cut-off point to diagnose normal correctly.The conversion relationship with FDR is expressed in equation ( 12) In a binary classification model, signal groups with values exceeding the computed threshold value of 0.8 are diagnosed as 'damaged,' while those with a value lower than the threshold are diagnosed as 'normal.'Lowering the threshold to 0.7 identifies more 'damaged' signal groups, thereby increasing the accuracy of the TDR.However, the FDR increases when more 'normal' signal groups are identified as damaged.To visualize these changes, the ROC curve proves valuable in assessing classification problems.The location and shape of the ROC curve in the coordinate system determine inspection accuracy.Figure 4 shows that if the coordinates fall on the diagonal AE from (0, 0) to (1,1), the accuracy hovers around 50%.The signal groups of normal and damaged are then distinguished completely by chance, implying that the classification is a stochastic prediction that lacks diagnostic value and renders meaningless judgment results.The diagonal AE serves as the no-discrimination line, dividing the ROC space into upper left and lower right regions.The upper left region represents a good classification result, while the lower right region represents the opposite.When the coordinates fall on ADE, it indicates an overlap area between the detection signals, resulting in a certain error rate.Meanwhile, when the coordinates fall on the straight-line AC, the FDR is 0. On the other hand, when the coordinates fall on a straight line CE, the TDR reaches 1, and there is no overlap between the normal and damaged signal groups in the two conditions, resulting in the highest possible detection accuracy.Point C, located in the upper left corner, represents the perfect prediction.Its coordinates are (0,1) in the ROC space, and its FDR is 0. This means there is no FD, and a TDR value of 1 suggests the absence of FN.In other words, regardless of whether the classification model output result is 'normal' or 'damaged,' the accuracy remains at 100%.The closer a point is to the upper left corner of the ROC, the smaller the overlap area of the signal groups, indicating a higher detection accuracy.The ROC curve can determine the optimal diagnosis threshold.Under this threshold standard, the overall accuracy (ACC) serves as the reference frame for assessing accuracy.
The ACC represents the ratio of correctly diagnosed cases to the total sample count: (TD+FD+TN+FN) in equation ( 13) is the predicted total sample count.The performances of the different classification models can be compared based on the location and the ACC of the coordinate points in the ROC space.

Optimum threshold computation
The process of the ROC curve to identify the optimal classification model and threshold involves four key steps, as outlined in the preceding sections: (1) confirming the index variation ranges under the two classifications, encompassing all the possible diagnostic thresholds; (2) decision criteria establishment and data labeling; (3) calculating the TDR and FDR corresponding all the possible thresholds as the vertical and horizontal coordinates forming the ROC curve; and (4) performing a comprehensive evaluation of various indexes to choose the optimal model and threshold.Figure 5 shows the specific implementation method, and the key steps are detailed as follows: (1) Upper and lower limits of the threshold and numeric interval The upper and lower limits of the threshold are set as the maximum and minimum values of the damage feature index of GFDs, respectively.The threshold interval is set at 0.01 to ensure precision in identifying the optimum threshold.(2) Establishment of Decision Criteria and Data Labeling Prior to initiating the ROC model training, it is essential to establish precise judgment criteria.This task is particularly challenging given that the experimental data traces the life cycle of a bearing from normal operation to eventual damage.This study relies on the GFD damage feature index to define the criteria for damage assessment.Theoretically, GFDs serve as an indicator of system complexity.An increase in GFD values signifies a corresponding rise in system complexity, suggesting that the vibration signals following the initial bearing damage are considerably more intricate than those observed during normal operation.This progression of damage can generally be categorized into three stages: (1) an entirely normal stage, (2) an initial damage stage, and (3) a transition from a damaged state to complete malfunction.Since the ROC method balances binary classification dilemmas using a diagnostic threshold grounded in statistical decision theory, this study defines the decision criteria at the boundary between normal operation in Stage 1 and the initial damage in Stage 2. To achieve this, all data within Stage 2 were considered as the benchmark for damage assessment.Specifically, if the selected data falls at position N within this stage, all subsequent data points following N are tentatively labeled as 'damaged' according to the chosen reference point.Through the use of the ROC analysis, this study aims to identify the optimal damage criteria for assessing damages.To further enhance the training dataset, the data was divided into subsets containing 12 data points each.Using an overlapping strategy, consecutive data subsets share a 50% overlap.For example, the first subset includes data points 1 to 12, while the second includes points 7 to 18.If the quantity of the GFD values surpassing threshold X constitutes more than Y% of the total data points within a subset, the subset is then classified as 'damaged' .Here, X denotes the  damage threshold, while Y defines the classification benchmark.This decision-making process is presented in figure 6.Furthermore, this study presents three distinct diagnostic criteria within the ROC model, as outlined in table 2. Specifically, the criteria are defined as 30%, 60%, and 90% classification models.Each criterion corresponds to a threshold, denoted by 'X' .If the number of values exceeding X comprises more than the specified percentage (30%, 60%, or 90%) of the total data points, the dataset is diagnosed as 'damaged.'(3) Calculate TDR and FDR Each threshold corresponds to a coordinate point within ROC space.The horizontal axis is the FDR, the vertical axis is the TDR, and all the points are connected to form an ROC curve.Each criterion corresponds to an ROC curve.(4) Perform a comprehensive evaluation of various indexes and select the optimal classification model and threshold.
The ROC encompasses several indexes that serve as reference criteria for optimal classification models.This study considered the point closest to the upper left corner (i.e. the minimum distance to (0, 1)) as the important reference indicator of the optimal classification models.

Criteria 30% classification model
If the quantity of values higher than the threshold X accounts for more than 30% of the total points, it is diagnosed as damage.60% classification model If the quantity of values higher than the threshold X accounts for more than 60% of the total points, it is diagnosed as damage.90% classification model If the quantity of values higher than the threshold X accounts for more than 90% of the total points, it is diagnosed as damage.optimum threshold computation procedure of the ROC was performed.Then, the quantitative classification model and threshold for diagnosing faults were established.GFDs are then employed to calculate the measurement results to be tested.The obtained quantitative classification model will identify the bearing's normal working state.

Description of experimental data source
To ensure clarity, it is important to note that the data used in this study were obtained directly from NASA's predictive data library [31].The detailed description of the experimental setup provided here serves the sole purpose of providing comprehensive context.The experiment was recorded by the NSFI/UCR IMS of Rexnord Corporation in Milwaukee, Wisconsin, U.S. Figure 8 shows the layout of the test platform for the bearing [32].Based on the file description included in the data set, the test platform featured four ZA-2115 double-channel ball-bearing housings manufactured by Rexnord, all mounted on a shaft.The AC motor was connected to the shaft through the belt, causing the shaft to rotate.The speed was fixed at 2000 RPM, and a 6000-lb radial load was simultaneously applied to the spring mechanisms above the bearing and shaft.In the course of load application and shaft rotation, each bearing received forced lubrication.Subsequently, the measured data were analyzed based on the experimental results I from the two groups.In Experiment 1, the outer casings of the four bearings were equipped with two uniaxial accelerometers, representing X-axis and Y-axis.In Experiment 2, only one X-axis accelerometer clung to the outer casing of each bearing.Experiment 2 did not provide the Y-axis data.Therefore, this study only analyzed the X-axis data.The accelerometer was equipped with a PCB 353B33 high-sensitivity ICP sensor.The NI_6062E data acquisition card collected the data.Each data set started from the signal acquisition to bearing failure, containing

Experimental analysis result and discussion
This section describes the analysis results of the GFDs and the ROC.Based on the results, there were multiple balls inside the bearing, and the vibration signals generated by the damaged bearing during operation were likely to occur in the high-frequency section.Therefore, frequencies below 3 kHz were disregarded to reduce signal interference induced by the low frequency.All vibration signal data underwent a 3 kHz high pass filter, which converted them into dB values, with the reference value being the ISO 1683 standard.

GFDs calculation results
The progression of damage can generally be categorized into three stages: (1) an entirely normal stage, (2) an initial damage stage, and (3) a transition from the damaged state to complete malfunction.Figure 9 shows that the bearing health evolution was systematically organized into a 3 × 3 grid to showcase these stages.Each plot within this matrix reflects the vibrational behavior of the bearings across the damage stages.In this study, the vibration signals demonstrated fractal attributes, with the GFDs capturing the essence of signal complexity.As signal complexity intensifies, a corresponding rise in the GFDs values was observed.Therefore, analyzing the GFDs value of the signals presents an avenue for assessing the condition of a bearing.Vibration characteristics and complexity in a faulty bearing differ from a normal bearing.The use of GFDs enables the detection of subtle changes, marking the progression of vibration signals from a normal state to a clearly defined fault condition.In this section, the researchers of this study computed the GFDs for vibration signals from two distinct experiments, laying the groundwork for defining the optimal criteria in the ROC curve analysis.

Experiment 1
The GFDs were calculated and analyzed using the research method of this study, resulting in the creation of a time-varying distribution diagram.Figure 10 shows the distribution diagram of Bearing 1 ∼ 4 of Experiment 1.The horizontal axis represents the time series of the data numbers, while the vertical axis represents the GFDs.Based on the experimental case, Experiment 1 stopped for Bearing 3 inner race damage and Bearing 4 ball component damage.The last several GFD features of Bearing 3 rose before the end of the experiment, indicating the initial damage stage.The GFD trends of Bearing 4 jittered after 1700 h.Based on the different methods presented in [33][34][35], Experiment 1 encountered inexplicable issues in its first 400 h of operation.Suspicious fault indicators emerged during the early stage, followed by the self-recovery phenomenon wherein the initial normal signals were restored.The GFDs calculation results of this study and the methods of other studies exhibited similar uncertainties.The disclosed files do not offer specific information regarding data acquisition problems within this experiment set.In subsequent discussions concerning the training model and verification, the data section was proactively removed from all bearings on the same shaft.

Experiment 2
The time-varying distribution diagram of GFDs of Bearings 1 ∼ 4 in Experiment 2 is shown in figure 11.The horizontal axis represents the time series of data numbers, and the vertical axis represents GFDs.According to the results, Experiment 2 eventually stopped due to the outer race damage of Bearing 1.It was observed that Bearing 1 had an apparent uptrend at about No. 550.The GFDs curves of normal Bearings 2, 3, and 4 were smoother than the trend of Bearing 1, and the last several data of Bearing 4 rose for unknown reasons.

ROC feature thresholding result
GFDs serve as the index of complexity in theoretical analysis.An increase in the GFDs corresponds to an increase in system complexity.This relationship can be understood as bearing failure signals being more complex than normal operation signals.The conventional approach of using fractal dimension values for fault diagnosis involves calculating the fractal dimension values of bearing vibration signals in different stages or states and roughly dividing them to distinguish between normal and abnormal states.However, this method lacks scientific objectivity and cannot provide good diagnostic results during the initial damage stage.When the damage occurs at the initial stage, the fractal dimension value is supposed to fall within the fuzzy range between the normal and abnormal states.Effectively distinguishing between the normal and initial damage states within this fuzzy interval presents a challenge in the current fault diagnosis.
This study utilized the diagnostic method of combining the ROC curve with GFD features to effectively distinguish the normal and initial damage states of the bearings.Balancing the GFD values for normal and abnormal states is crucial in obtaining the optimal fault threshold and the classification model.The effectiveness of the training heavily relies on defining the appropriate training region.The GFD variation trends in the overall bearing life cycle can be divided into three stages: the normal operation stage, the initial damage stage, and the serious fault stage.This study selected Stage 2 (i.e. the initial damage stage) as the 12. Criteria.primary area of the ROC curve analysis.The researchers of this study adopted an arbitrary criterion and constructed an ROC curve using the line formed from the multiple coordinate points established by different thresholds.Lastly, the optimum threshold of the initial damage was identified using the established criteria and the classification model, which serve as the bearing fault diagnosis criteria.Figure 12 shows that the failed bearings in the two experiments were divided.As shown in figure 12(a), the faulty area of Bearing 3 in Experiment 1 was at the inner race, and the feature of the vibration signals was relatively weak.Since the suspected fault was in its very early stage, the GFDs features of the normal and fault signals were unclear.Additionally, the damage time occurred towards the end of the overall experimental process, resulting in few fault training samples.Therefore, the training results of Bearing 3 in Experiment 1 were considered non-objective and, thus, not discussed in this study.
To evaluate the accuracy of the model, a part of the signals must be maintained as validation signals.Moreover, the training and validation signals must be different.In this study, the signals underwent a systematic sampling strategy, dividing them into two groups: odd-numbered and even-numbered.The odd-numbered group trained the model and established the decision criteria.Using the ROC curve, this study determined the optimal model.Subsequently, the signals from the even-numbered group were used to test the accuracy and performance of the established optimal classification models.

Bearing 4 of experiment 1
The previous section established the judgment region for training, consisting of 1500 1800 files.Each criterion enabled the identification of the optimal classification model and the threshold using the ROC curve.Two indexes were obtained after ROC training: the distance to the upper left corner (i.e. the distance to (0, 1)) and the ACC.The results are presented in figures 13 and 14.The minimum distance to (0, 1) is usually used as an important reference indicator for the optimal classification models.When the classification model was set at 30%, the distance value to (0, 1) was 0.0156, making it the closest classification model (0, 1).The initial damage location was No. 1644, with a diagnostic threshold of 8.96 and an ACC of 99.26%.When the classification model was 60%, the distance value to (0, 1) was 0.039.Meanwhile, the initial damage location was No. 1634, with a diagnostic threshold of 8.92 and an ACC of 97.80%.When the classification model was set to 90%, the distance value to (0, 1) was 0.046, and the initial damage location was No. 1634, with a diagnostic threshold of 8.90 and an ACC of 97.80%.The above results are compiled in table 4.   Based on the results of the three classification models, the two indexes were almost positively correlated within the training model.Among the three classification models, the model based on 30% achieved better indexes than the other two.As a result, the optimal classification model of Bearing 4 was chosen based on the result of the 30% classification model.
After verifying the classification models the even group, the 30% classification model had 16 groups of misidentified signals.The TDR, TNR, and ACC were 98.44%, 95.85%, and 97.07%, respectively.The 60% classification model had 18 groups of misidentified signals with TDR, TNR, and ACC of 96.95%, 96.48%, and 96.70%, respectively.The 90% classification model had 12 groups of misidentified signals, with TDR, TNR, and ACC of 96.18%, 99.30%, and 97.80%, respectively.The above results are shown in table 5.As seen, the 90% classification model had the least groups of misrecognitions and the highest overall ACC.However, in practical application, the actual focus would be on the TDR index when the ACC of the three models was higher than 90%.To maximize the ACC of TDR, it is preferable to misjudge the normal state as a fault than to misjudge a fault as the normal state.This approach helps reducing the likelihood of misreporting damage signals.The ACC of damage detection can be increased, and there will be sufficient time for equipment maintenance and prevention of more severe faults.Therefore, the experiment results indicate that the 30% classification model remains the most effective choice.The ACC of the training or validation group for all  three classification models exceeded 90%.The result of the validation group proves the feasibility of using the 30% classification model as the optimal one.Finally, this study used the 30% optimal classification model to inspect the normal bearings in other different positions to verify the effectiveness of the classification model.The results are shown in table 6.In Bearing 2, there was a perfect 100% prediction on ACC.While there was still instances of misrecognition in Bearing 1, the ACC remained at 94.92%, which is considered acceptable in the industry.

Bearing 1 of experiment 2
The section 4.2 established the judgment region for training, consisting of 500 ∼ 700 files.Two indexes were obtained following ROC training: the distance to the upper left corner, denoted as the distance to (0, 1), and ACC.The results are shown in figures 15 and 16.When the classification model was set to 30%, the distance value to (0, 1) was 0, indicating a perfect prediction model.This initial damage location was No. 548, with a diagnostic threshold of 8.96 and ACC of 100%.At a 60% classification model, the distance value to (0, 1) remained at 0, which is the same as the 30% classification model.The initial damage location was No. 552, with a diagnostic threshold of 8.93, and an ACC of 100%.For the 90% classification model, the distance value to (0, 1) was 0.0372, while the initial damage location was No. 554, with a diagnostic threshold of 8.91, and an ACC of 98.34%.The results are summarized in table 7. Comparing the three classification models, both the 30% and 60% classification models achieved perfect predictions.Notably, the 30% classification model demonstrated an ability to detect the fault feature earlier than the 60% classification model.The classification models were compared in the even group.The classification models were verified in the even group.The 30% classification model had two groups of misidentified signals with TDR, TNR, and ACC of 100%, 99.24%, and 99.58%, respectively.The 60% classification model had four groups of misidentified signals with TDR, TNR, and ACC of 98.16%, 100%, and 99.17%, respectively.The 90% classification model had 13 groups of misidentified signals with TDR, TNR, and ACC of 93.98%, 100%, and 97.30%, respectively.These results are summarized in table 8. Notably, the 30% classification model had the highest recognition accuracy and ACC, with TDR reaching a perfect 100% prediction.This confirms the feasibility of using 30% classification model as the optimal choice.In all three classification models, both the training and validation groups achieved ACC higher than 90%, proving the 30% classification model was the best among the three models.
Finally, this study used the 30% optimal classification model to assess normal bearings in various positions in the experiments, in order to confirm the effectiveness of the classification model.The results are shown in table 9.In Bearing 2 and Bearing 3, the ACC achieved a perfect 100%.There were nine groups of misrecognition in Bearing 4, but the diagnostic accuracy was still 99.08%.In practical application, an accuracy as high as 99.08% meets the industry standards.

Discussion
This study employed GFDs and the ROC curve to develop a thresholding fault diagnostic model.Additionally, the researchers of this study used the measured bearing vibration signals for diagnosis.The results validate the effectiveness of the proposed method, which are summarized as follows: (1) Apparent qualitative features can be obtained from the calculation results of the GFDs.Qualitative analysis can easily separate the GFDs of the normal and serious faults.However, if the bearing is damaged at the initial stage, where the distinction between normal and abnormal states is unclear, making a decision is quite challenging.Therefore, this study establishes the criteria for evaluating the initial damage to facilitate quantitative analysis of bearing faults.(2) To implement automatic diagnosis, a qualitative analysis was performed based on the calculation results of the GFDs.The fault trend in the GFD diagram can be divided into three stages of the bearing life cycle: normal operation, initial damage, and serious fault stage.The criteria for evaluating initial damage are established in the initial damage stage, and quantitative analysis is performed using the ROC curve.The method was established by predicting each threshold in each criterion, allowing the effective determination of the optimal classification models and identification of the initial damage through the FDR and TDR indexes across the entire trend.(3) Table 10 presents the optimal classification models built through thresholding.The Bearing 4 of Experiment 1 and the Bearing 1 of Experiment 2 shared the same optimal classification model and diagnostic threshold obtained through ROC training.This result demonstrates the feasibility of the method proposed in this study.The classification models of Bearing 1 and Bearing 4 were applied to bearings in different positions, achieving an ACC higher than 90% and verifying the effectiveness of the proposed method.(4) The findings revealed that when the damage occurred in the balls (Bearing 4) of Experiment 1, and the outer race (Bearing 1) of Experiment 2, the diagnostic thresholds and classification models derived from the GFDs using the ROC curve were relatively consistent.However, when the inner race of Experiment 1 was damaged (Bearing 3), significant damage features were less apparent compared to the ball and outer race damage, making it difficult to identify the optimal classification model of the GFDs through the ROC curve.

Conclusion
This study proposed a novel quantitative method for fault diagnosis.After analyzing the time variation of the GFDs from the time-domain vibration signals measured during bearing operation, the researchers of this study observed three stages in the bearing's life cycle and conducted a qualitative analysis.However, the location of the initial damage cannot be specifically defined.Therefore, the thresholding analysis of the time-varying feature trends of the GFDs based on the ROC curve was proposed.The specific fault thresholds and diagnostic classification models were established as criteria for distinguishing normal and abnormal states.In the ball component fault of Experiment 1 (Bearing 4) and the outer race fault of Experiment 2 (Bearing 1), the optimal classification models from the ROC curve produced identical results, and the same classification model and diagnostic threshold were obtained.This consistency proved the feasibility of the method proposed in this study.This study ensured that the same set of signals was not used for both training and validation to maintain an objective evaluation of the models.The findings revealed the practicality of diagnosing the bearings in other different positions, contributing to the diagnosis domain.Additionally, the novelty of this study lies in its transformation of the complex quantitative diagnosis into a simpler diagnosis task.In recent years, some machine learning methods have been used in automatic fault diagnosis studies.Supervised learning requires higher-power hardware, and it has some defects in practical applications.
Simple calculations characterize the method proposed in this study.This calculation is free from prior knowledge as long as there is a historical dataset of faults during the training phase.The model developed in this study has proven effective in identifying the location of the initial damage.If lifetime prediction can be implemented using other algorithms, the maintenance personnel can gain a more accurate understanding of the health status, facilitating better decisions regarding scheduled maintenance.

Figure 5 .
Figure 5. ROC curve based on criteria and flow chart for establishing the optimal classification models.

Figure 7
presents the flow chart of the bearing fault diagnosis method proposed in this study.The diagnostic method requires analyzing a range of signals, spanning from normal tests to serious faults.During training, the time-domain vibration signals were filtered before the GFD features were calculated.Afterward, the

Figure 7 .
Figure 7. Flowchart of the proposed methodology.

Figure 8 .
Figure 8. Bearing test platform and sensors layout.
From left to right, the stages are sequenced in alignment with the categorization mentioned above: the far-left graph portrays the entirely normal operational state, the central graph captures the nuances of the initial damage phase, and the far-right graph signifies the stage transitioning from a damaged state to total malfunction.In a vertical layout, the results present vibration signals from the bearings in the two repeated experiments under identical conditions: the top row (a) showcases the signals for Bearing 3 and the middle row (b) for Bearing 4 (both from the first experiment), while the bottom row (c) presents the results for Bearing 1 from the second experiment.Although time-domain vibration signals offer essential data, they may not always provide sufficient insights into the subtle signs of initial bearing degradation.To precisely discern early-stage anomalies, this study incorporates the fractal theory, which facilitates a more rigorous and detailed examination of the onset phases of bearing deterioration.The assessment of signal characteristics using fractal theory is based on the analysis of irregular signals that recurrently appear at different scales.Through this theory, the self-similarity and complexity of the signals can be quantified by identifying a fractal dimension.The fractal dimension serves as a critical metric, providing insights into the intrinsic patterns and complexities within a signal.A heightened dimension often indicates increased irregularities or complexities.

Figure 9 .
Figure 9. Time-domain vibration signals of bearings across different damage stages.

Figure 14 .
Figure 14.ACC in the ROC result of Bearing 4 in Experiment 1.

Figure 16 .
Figure 16.ACC in the ROC result of Bearing 1 in Experiment 2.
] represents the number of signals in the ith grid, while P [i,ϵ] represents the probability of the ith grid to the total mass of all grids (i.e.P [i,ϵ] =

Table 2 .
Diagnostic criteria of the three classification models.

Table 3 .
Experimental measurement data and description.
multiple files.Each file recorded vibration signals over a 1-sec time interval, sampled at a frequency of 20 kHz, and contained precisely 20 480 points.Table3presents the experiment data.

Table 4 .
Comparison of results of classification models of Bearing 4.

Table 5 .
Comparison of validation results of classification models of Bearing 4 in even group.

Table 6 .
Result of normal bearings in other different positions checked by optimal classification model of Bearing 4.

Table 7 .
Comparison of results of classification models of Bearing 1.

Table 8 .
Comparison of validation results of classification models of Bearing 1 in even group.

Table 9 .
Result of normal bearings in other different positions checked by Optimal Classification Model of Bearing 1.

Table 10 .
Results of the optimal classification models for Bearing 1 and Bearing 4.