Contribution Principal Component Analysis to Optimizing Data by Reducing Product Data on Transaction

Principal component analysis is to analyze the observation data table into a new data table that has the same correlation. And the aim is to simplify the previously complex observation data so that it is easier to process or analyze. The dataset used is transaction data which is often used by the association method in sales analysis, where the data taken consists of 1397 types of products sold in 1200 transactions. In this data, there are products that have very small sales, which means that the percentage of these products has very little effect on the future process, namely sales analysis with the association method. Therefore the authors try to optimize the data to become ready to use data by reducing products that have a small percentage value that affects research for the dataset. And on this occasion the author uses the main component analysis method to reduce products or form products that can represent the entire dataset without reducing the quality of the data for analysis. From the results of research conducted on transaction data, there was a product decline of 65.21%, where the products totaling 1397 were reduced to 486 products that could represent without reducing their value.


Introduction
Principal Component Analysis (PCA) is a technique that analyze an observation data table into a new data table that has similar correlation. It is used to compress maximum information into first two columns of the transformed matrix known as the principal components by neglecting the other vectors that carries the negligible information or redundant data [1]. The objective of principal component analysis is to simplify the previously complex observation data so that it is easier to process or analyze. According to researchers, principal component analysis is a statistical technique that is processed linearly by replacing the pattern of a group of original variables into simpler variables that are not correlated but can represent information from the original set of variables [2]. In another case PCA can also be combined with least squares support vector machine, where PCA will reduce the dimension of the input variable X, which can reduce the number of inputs for LSSVM [2] and another thing is also PCA which is used to represent the image by developing into two dimensions of PCA [3]. There is a need for data reduction to optimize the data processing process, in this case principal component analysis is able to reduce data from such a large scale to data that can represent all data for processing by forming a matrix, calculating the variance covariance value, looking at the resulting eigen values and viewing the scree plot. The function of covariance matrix as weighting in model estimation to avoid correlation 1. Initial data are prepared in an axb size matrix. Later the number of variable b will be reduced to c the number of principal component being maintained. 2. In the pre-principal component analysis process, the variance covariance matrix value is searched, and to get the variance covariance value, it is necessary to first obtain the deviation matrix value. 3. The Singular Value Decomposition stage is the process stage of the principal component analysis which processes the matrix value from the variance covariance results. Covariance matrix is related to weighting which is impacted to error. The result of the analysis show that the error which is generated from data simulation in [4] is distinguished based on weighting. The error of the simulated data without weighted has better value than the estimated error of the In 2019, the study has done by Nema Salem and Saher Hussein [7]. They performed data reduction using principal component analysis on iris datasets [7]. In a previous study [8] modified the a priori algorithm for sales data analysis. From the previous research, the author will optimize the sales data used in the association method, one of which is a priori, by reducing sales data on preprocessing data using principal component analysis.
There are 1200 transactions with 1397 products in transactions taken from sales data at a supermarket. This data is not optimal for use in sales analysis if the sales product is not reduced first, because in the raw data there are products whose sales are very small so that it is considered not having a big influence on the need for product sales analysis in that period. As an example of a product that has only been sold once a month, it requires the role of principal component analysis in optimizing the data to be processed into ready-to-use data for sales analysis data which is generally used by the association method.
In sales data there are 1397 types of products sold out of 1200 transactions as shown as table 1: Table 1

. Sales Data
Below is a calculation on the singular value decomposition algorithm process used by principal component analysis to analyze large-scale data matrices, where the matrix (m x n) will be transformed into a matrix (m x k) without reducing data variations. As in [9], a dataset, X with m rows represents a variable and n columns represent observations represented in a matrix with m vector row vectors, each length n. The initial stage is to initialize the matrix of the types of products sold in the transaction: From the matrix processing, the deviation score of the matrix will be sought by first looking for the value of the transpose deviation matrix:  After the deviation matrix value and deviation matrix score are obtained, the next step is to find the variance covariance value by: Variance Covariance = Deviation Score Matrix / (1/(panjang deviasi score-1)) (2) After the pre-principal component analysis stage is complete, it is continued with the principal component analysis process where the variance covariance matrix results are processed to get the singular value decomposition value. The value in the variance covariance matrix will be used as the initial matrix value to process data in the singular value decomposition process symbolized by A and get a singular value decompotition matrix in the principal component analysis process, namely: Step Step 2. Find the eigenvalues of (Aᵀ. A), and sort in descending order, in absolute terms, and take the square root to get a singular value from the matrix A: Step 5. Calculate the value of the matrix U or the so-called vector eigenvalues with the solution U = AVSˉ¹, and to complete the full singular value decomposition (singular value decomposition) using A = USVᵀ: The value above is normalized to be: The value of the matrix A above is the full value of the principal component analysis process in the singular value decomposition which will later be normalized so that it can be continued in the next process. It appears that there is data transformation in the matrix.

Results and Analysis
The eigen value, variability and cumulative values of the principal component analysis process is shown as Table 2.  The result is interpreted in Table 3 F1  F70  F140  F210  F280  F350  F420  F490  F560  F630  F700  F770  F840  F910  F980  F1050  F1120  F1190   1722290101  1690380101  1447890102  1741900101  630080109  2285870101  1979390101  304900106  303180101  303520118  27590104  1809060102  1901380102  2183100102  123270103  161310106  204500101 Table 4 shows the types of products in the transaction data that occurred in 1200 transactions. And the product above will be reduced by principal component analysis so that it can get a representative product to be processed or used. Product data that can represent in this case is product data that has a high transaction rate, and principal component analysis will play a role in analyzing this.  The matrix is displayed on table 5 is the initial matrix in the principal component analysis process, where the normalized dataset is entered into the matrix, and this matrix is 1200 x 1397 in size according to the number of transactions and the number of items in the dataset. This matrix will be calculated singular value decomposition in the principal component analysis process as described in the previous chapter. When the principal component analysis process is running, namely calculating the value of the matrix deviation, deviation score, variance covariance matrix and finally the singular value decomposition, then the final result matrix of this principal component analysis can be seen in table 6.   The results of the principal component analysis run by the system according to the process steps described in the previous chapter can be seen in tables 6 and table 7, where the matrix value is visible, namely the singular value decomposition. Based on the information, we conclude that this principal component analysis process has reduced the value of the matrix, which in this case is the product. Moreover, the main component of the products formed are 486 products that can represent the next research process. The reduction of items in this case has been trimmed by around 65.21% of the original data. The data reduction that occurs is shown in Figure 3.

Conclusion
There was a cut or reduction of 65.21% attributes, namely from 1397 products to 486 which had a big influence on the sales analysis process at a later stage, this happened because the frequency of product appearance in all transactions was very small which was explained by the singular value decomposition value below 0.1 or after normalization the value is 0. The data resulting from this reduction are ready for use in sales analysis using the association method.