Research on Daily Load Curve Classification Based on Improved Fuzzy C-means Clustering Algorithm

For the problem of severe unpredictability and three-phase unbalance of user demand in low-voltage distribution networks, a daily load curve clustering technique integrating the sparrow search algorithm (SSA) and the fuzzy C-mean clustering algorithm (FCM) is presented. The initial load data set is pre-processed to lessen the interference of classification results. The load characteristic index of the daily load curve is removed to produce a reduced dimensional data set. The clustering validity index is introduced to solve the optimal number of clusters, and the early warning mechanism of the sparrow search algorithm is adopted to improve the global search capability. These improvements are made to improve the sensitivity of the initial clustering center of the FCM algorithm and the problem of local optimum in the process of finding the optimum. The simulation used to validate the revised clustering algorithm’s accuracy and efficacy for daily load categorization is a reference for resolving the three-phase load unbalance issue.


Introduction
The low-voltage distribution network, which is directly connected to the users and is connected to single-phase and three-phase mixed loads with strong erratic electricity consumption, creates a threephase unbalance problem in the low-voltage distribution network as a result of the power system's increasingly complex structure [1].Power supply companies can widely collect data from power users based on advanced metering, communication, and data management technologies [2].Accurate classification according to users' power consumption characteristics is an effective way to manage three-phase unbalance.
More research has been done on electric load clustering.For the delicate problem of choosing the initial clustering center, Wang et al. suggested a two-layer clustering analysis approach [3].To increase the clustering accuracy, Xiang et al. presented a shape clustering method based on the slope segmentation of the load curve [4][5].The computational efficiency binds the above-mentioned direct clustering of electric loads due to the significant growth of load data, which impacts the correctness and reasonableness of clustering findings.
All of the analyses above are clustering at the gathered raw data level.The direct clustering of electric loads is bound by computational efficiency due to the enormous expansion in load data, which impacts the objectivity and precision of clustering outcomes.
Load curves have been solved in recent works utilizing dimensionality reduction techniques.Using kernel principal components and an enhanced K-means technique, Liang et al. suggested reducing the dimensionality of electric load curves [6].The data dimensionality reduction was completed by WANG et al. using singular value decomposition of user-side load characteristics, however, the Kmeans algorithm is less reliable [7].Although Liang et al. employed FCM for clustering and deep learning for dimensionality reduction, the approach had issues with local convergence [6].
This paper chooses daily load characteristic indexes to construct a curve dimensionality reduction data set, pre-processes the initial daily load data by filling in the missing values and normalizing, introduces the clustering validity index to determine the ideal number of clusters automatically, and suggests a fuzzy C-means clustering algorithm to solve the problems mentioned above.

Daily load curve feature extraction
This enables the classification of users into different types based on their patterns of electricity consumption over the course of a 24-hour day [8].To determine the common characteristics of these users' electricity consumption, the intelligent collecting terminal can generate many daily load curves and summarise the trend characteristics of typical daily load curves of each type of user.The standard daily load curve of such users is used in this paper as the average of the daily load curves of all users belonging to the same group.
The following eight characteristic indices are chosen to lower the dimensionality of the daily user load curve to better characterize the daily user load curve and lessen the mis-clustering brought on by isometrics. 

Daily load data pre-processing
Prior to the clustering analysis, the daily load data must be pre-processed in order to enhance the accuracy of the clustering results:  Completing the load data's lacking values.To reduce the effect of missing values on classification results, interpolation is performed by calculating the average of the four data points before and after the missing value. Normalized data on the daily burden.Normalize daily load curves with similar change trends and load order differences to eradicate the effect of order size on clustering results.The formula for normalized calculation is: where ij x and ij x  are the j-th data of the i-th daily load curve before and after normalization; represents the maximum value of the load data contained in the i-th daily load curve.

Fuzzy C-mean clustering algorithm
For the initial data set , , , N X x x x   containing N data, we make the number of classifications be C.The N data to be classified can be divided into C classes, and the clustering center of each type is . The FCM algorithm achieves the number of iterations or makes the objective function smaller than the set threshold by continuously iterating the computation, and its objective function is: where m is the fuzzy weighting index, generally taken as 2; ij u is the affiliation degree to class i calculated from data j x ; ij d is the Euclidean distance between data j x and class i clustering center i v .When performing cluster analysis, the FCM algorithm is prone to entering local optimal states.The presence of vigilantes and warning mechanisms in the SSA improve its global search capability.Finding an appropriate initial clustering center for the FCM algorithm can be done using this advantage.
The sparrow population X consists of n sparrows, each representing a different random initial clustering center of the FCM.Individuals are split into discoverers, joiners, and vigilantes.
Discoverer movement rules: exp( ) R is the level of peril the sparrow faces between [0,1], and ST is the safety threshold between [0.5,1].
In addition to following Equation ( 4) for movement, the joiner must also update the location information according to Equation (5).
where 1 t P X  represents the individual position of the optimal finder in the t+1-th iteration; M is the global worst position in the t-th iteration; A is a 1d matrix with each element randomly assigned to 1 or -1 and; M is the global worst position in the t-th iteration.
The proportion of vigilantes in sparrow communities ranges from 10% to 20% of the total.
  , 1 , , , where t b X represents the optimal global position at the t-th iteration; i f is the adaptation value of the current individual sparrow; b f represents the global optimal adaptation value; d f denotes the global worst adaptation value.
An individual sparrow's fitness value determines how well the initial clustering center represented by its location will perform.The fitness function is represented by Equation (7).

Clustering validity metrics
The number of most compatible clusters cannot be predetermined before the FCM algorithm performs cluster analysis because of the amount of the user daily load curve data.The optimal number of clusters, used to define the affiliation and geometric properties of the classification results, is determined using the XB clustering validity metric, which is used to quantify the validity of the findings obtained according to the current number of clusters [10].The following equation illustrates the algebraic representation: where the numerator represents the tightness within the class; The denominator is used to gauge how drastically different the various categories are from one another in the categorization results.We input the number of distinct clusters for classification and calculate the XB metric.The number of clusters corresponding to the smallest value is the optimal number of clusters C best for the given input data set.Classification of the user daily load curve using the SSA-FCM algorithm consists of the following steps:  Pre-processing of the acquired daily user load data;  Specifying the minimum and maximum number of classifications min 2 C  and max 10 C  ;  Creating the initial population.We determine the maximum number of iterations max 1000 T  , the population size of 300, and the proportion of initialized discoverers and joiners.Then, generating. Calculating the fitness values of all individuals using Equation ( 6) to determine the best individual sparrow position b X , the worst individual position d X , and the worst fitness d f ;  Updating the discoverer, joiner, and vigilant positions from Equations ( 3), ( 4), ( 5);  Calculating the fitness value of the updated sparrow individual and updating its position if the fitness value is greater than it was before the update;  If the algorithm termination condition is met, the calculation is complete, and the sparrow individual with the highest current fitness value is output as the FCM algorithm's initial clustering center.If not, proceed to step 4) and resume the cycle;  Executing the FCM algorithm by utilizing the output of step 7) to classify the daily load curve and output the current clustering results;  Determining the cluster validity index XB using Equation (7).
  , proceed to step 3); if not, the classification number corresponding to the minimum cluster validity index XB and the final classification result for the daily load curve is output.

Improved fuzzy C-mean clustering algorithm
The daily load curve of 1200 households (sampling point interval 1h) in the city is selected as the research object, and the daily load curve's eight load characteristic indexes are set to be classified.

Clustering validity comparison
The SSA-FCM and FCM algorithms were used to classify and calculate the user daily load curve, respectively, and the resulting XB index trends are depicted in Figure 1.As observed in Figure 1, the daily user load curves produced by the SSA-FCM technique for each class have more substantial similarity and better intra-class tightness when the number of clusters is the same.It is demonstrated that using the individual optimal position of the SSA algorithm as the initial clustering center of the FCM algorithm can significantly reduce the impact of the initial clustering center on the accuracy of the final classification results, and alleviate the problem of falling into local optimal.
Figures 2 and 3 display the outcomes of the SSA-FCM and FCM algorithms categorized by the ideal number of clusters.The number of daily load curves included in the findings is shown in Table 1.   2, Figure 3 and Table 1 demonstrate that the FCM algorithm has a mis-clustering issue, fails to discriminate between the fifth and sixth categories of clients, and misclassifies some daily load curves into the second and third categories.For each type of user, the SSA-FCM algorithm has more precise electricity usage characteristics, demonstrating its superiority.

Clustering validity comparison
The classification calculations were carried out on the reduced dimensional dataset and the original dataset using the SSA-FCM algorithm, respectively, to examine the effects of feature extraction approaches on the classification outcomes of user daily load.Figure 4 displays the XB metrics that were attained.For any value of the number of clusters, Figure 4 demonstrates that the classification of the reduced-dimensional dataset is superior to that of the original dataset.The reduced-dimensional dataset's optimal clustering is 6, but the original dataset is 5.This suggests that the reduceddimensional dataset can accurately capture the morphological properties of daily load curves and minimize the mis-clustering by neglecting the variations in daily load curve patterns.

6.Conclusion
In this paper, we propose an SSA-FCM clustering algorithm that considers user load characteristics to solve the distribution problem of three-phase unbalanced loads.
 The characteristic indexes are used to characterize the daily load curve and to construct a reduced-dimensional data set in order to reduce the mis-clustering caused by isometrics and enhance the classification accuracy of the daily load curve;  After interpolating and normalizing the daily load curve, the clustering validity index XB is introduced to the FCM algorithm to automatically determine the optimal number of clusters and enhance the classification results' rationality;  The SSA-FCM daily load curve classification algorithm is used to find the initial clustering centers for the FCM algorithm using the superior global search ability of the SSA algorithm to increase the efficacy of the FCM algorithm's classification calculation by improving the FCM algorithm into local optimum defects.

Figure 2 .Figure 3 .
Figure 2. Clustering results of daily load curvesBased on the SSA-FCM algorithm

Figure 4 .
Figure 4. XB indicator trend by two datasets

Table 1 .
Number of curves of various types in the SSA-FCM algorithm