Nonparametric algorithm of electronic components test data pattern recognition

The paper discusses the quality diagnostics of electrical radio components based on the results of non-destructive testing. A proposed clustering algorithm does not require preliminary information on the number of classes and the training sample. The algorithm allows to automatically determine the number of classes. The division into classes is due to the different characteristics of the measured variables, which correspond to different product quality ranges.


Introduction
The problem of automatic clustering of products according to real data is considered in the case when the number of classes is unknown [1]. The main problem in solving the problem is how to evaluate the results of the data set. The result is that each element has the number of elements in each class and in the list of class centers. The result of the clustering is adequate when the compactness hypothesis is satisfied for the classes obtained. Class centers are marked explicitly. However, it all depends on the quality of the selection and the nature of the data. Frequently set values cannot be explicit. In addition, there are many algorithms of pattern data recognition [2,3].
The development of methods for checking the quality of electrical radio products as components of spacecraft is an actual problem of modern science and technology due to the fact that this industry requires that the products could effectively perform their functions. From the point of view of applications, the most important class is the problem of diagnosing products without the use of destructive testing [4].

Clustering Statement Problem
Let there be a collection of some objects O1, O2, ..., Оs, the properties of which are defined in the feature space. It is required to break them into groups of objects, in a certain sense, close to each other. Information about objects is given in the form of an m × n matrix (m is the number of input parameters (features), s is the sample size of products). Thus, the task of clustering is to divide the feature space into disjoint regions.
Each element of the sample (object) is characterized by certain values of the vector of parameters the classes V1 or V2). Typical for diagnostic tasks is the presence of a cloud structure in the space of features that define a particular class (for example, satisfactory, of medium quality and high quality).
Pattern recognition (self-study) is learning without any instructions from the teacher about the correctness or incorrectness of the reaction of the system in various conditions. Suppose that a set of objects X consists of several non-intersecting subsets Xk, (k = 1, 2, …, l), corresponding to different classes of objects characterized by vectors xX  . Since an object xX  appears in a particular set Xk, (k = 1, 2, …, l) randomly, it is natural to consider the probability of an object x appearing in a class Xk (we denote it as Pk) and the conditional probability density of a vector inside the corresponding class,. In this case, the maxima of the probability densities () k Px are above the "centers" of the classes corresponding to the subsets k X . However, when it is unknown to which class the object x belongs, it is impossible to determine these conditional probability densities. The joint probability density contains fairly complete information about the sets [5]. In particular, the maxima (modes) of the function will correspond to the "centers" of the classes. Therefore, the problem of selflearning is often reduced to the task of restoring the joint probability density () Px and determining the "centers" and then the class boundaries [6]. So for the case 12 ( , ) x x x  a possible form of probability density is presented in figure 1.  Figure 1 shows the probability density P(x), which has two maxima, and, consequently, two classes. There are several nonparametric methods to get estimation of density function [7,8]. The centers of classes it is natural to assume the coordinates of the maxima (modes) of the distribution 12 ( ( ), ( )), 1,2 x k x k k  . Next, we consider the algorithm proposed for grouping data in the case where the initial number of classes is unknown.

Clustering Statement Problem
Let a product be attributed to one or another quality category in accordance with the requirements of state standard applicable to EC. This product is characterized by certain values of the parameters vector, on the basis of which the diagnostics of the latter is carried out. Consider a sample of where m is the dimension of the variable x, s is the sample size. It is necessary to distinguish from the available sample classes of products that are similar in quality characteristics. For classes, the "compactness hypothesis" is valid, in other words, the arrangement of points in the feature space is such that they are grouped in different areas. The number of classes is unknown. The classification algorithm assumes the following sequence.
1. The normalization and centering of the original sample. Thus, the researcher determines the boundary values of the parameter ∆ that do not go beyond the permissible limits of the range of values ∆min <∆ <∆max. The setting of the parameter ∆ is carried out in such a way that in a given range, several values of ∆ are selected and a grouping is performed for each of them and the number of classes is determined. 5. Classification is carried out for each element of the sample:  take an arbitrary k-th element of the sample from the original training sample;  calculate the distance from this element to each of the centers of primary classes: where p k r , is the Euclidean distance between k and p elements of the sample, , , 1, , 1,  After passing through this procedure, if k < s, then k = k + 1 and go back to the beginning of step 5. This process continues for all points of the original sample. As a result, the initial sample is divided into N classes. Thus, the number of classes N for each particular batch of products is determined at the end of the study.

Processing Electronic Components Test Data
As an example, the clustering of electronic components (EC) is considered. One of the main tasks of the modern space industry is to complete the onboard equipment of a high-reliability EC spacecraft. First of all, it is necessary to prevent products that do not satisfy reliability requirements from entering the equipment. As part of solving this problem, it is necessary to ensure the purchase of electronic components from verified suppliers, as well as conducting input control, additional screening tests and destructive physical analysis of EC.
The following are the results of numerical calculations for EC diagnostics based on real data obtained when measuring the parameters of transistors. Data provided by test center of ISS-Reshetnev Company.
To diagnose transistors, we will classify all available observations in order to identify groups of transistors in the space of diagnostic indicators. It is required to assemble the specified 78 cases of transistors into clusters according to 16 variables characterizing their quality.
The results of clustering by the proposed method are given. According to the results of the classification, it can be said that the optimal solution is where the whole sample is divided into 2 classes corresponding to transistors of different quality. We also obtain the table of the belonging of each element to specific clusters, in which one can see that 35 transistors belong to the first cluster and 43 to the second cluster. Below there are the average values of the variables for each cluster (cluster centers). products under consideration (transistors), and the products belong to different manufacturing batches. The proposed algorithm shows a rather high classification efficiency and can be applied in real-life tasks to improve the quality of EC diagnostics.

Conclusion
When analyzing various real data, there is often a need for the task of grouping data, which leads to the appearance of clusters in the space of parameter characterizing the quality of the product. In article an algorithm is proposed for solving the clustering problem, which does not require knowledge of the number of classes. Based on the analysis of the results of model and real data, conclusions can be made about the validity and quality of the classification carried out.