Application analysis of machine learning in fault diagnosis: A bibliometric analysis

This article to analyze the application of machine learning in fault diagnosis by using bibliometrics, co-citation network analysis and cluster analysis methods. The analysis found that in the application research of machine learning in fault diagnosis, China has the largest number of published papers and cited papers, among which IEEE access is the most published journal, MECHANICAL SYSTEMS AND SIGNAL PROCESSING is the most cited journal,followed by the United States and India. Using document co-citation relationship analysis, a set of key documents in this field was identified., Currently, the important algorithms fault diagnosis include support vector data description method, transfer learning algorithm, convolutional neural network algorithm, natural inspired optimization algorithm, bayesian network, wavelet packet decomposition algorithm, fuzzy logic algorithm and so on. All of them are clustered according to keywords, and their application fields include acoustic emission and fault diagnosis of doubly-fed induction generators.


Introduction
The earliest definition of fault is the deviation of observed variables or calculated parameters from the acceptable range in a process [1][2]. Fault diagnosis refers to the process of finding the fault of equipment or system. Fault diagnosis is developed due to the need of establishing monitoring system [3], which has gone through three stages [3][4]. The first stage is fault diagnosis based on the experience of experts and simple instruments; the second stage is the diagnosis technology based on signal analysis and modeling by means of sensing and detection technology; the second stage is fault diagnosis based on signal analysis and modeling; The third stage: with the development of computer, artificial intelligence and information processing technology, fault diagnosis technology has entered a new development stage-intelligent fault diagnosis stage based on machine learning [4][5][6][7][8].
This article uses bibliometrics, co-citation network analysis, and cluster analysis methods to analyze the overall status of the application of machine learning in fault diagnosis, combined with visual analysis methods to try to find the most concerned hot issues and the research field with the most development potential, which will guide further in-depth research in the future.

Data sources
Based on the core collection data of Web of Science and the Incites database, this article uses the subject search method to construct a search strategy of TS=("machine learning" and "fault diagnosis"). The ICAMLDS 2020 Journal of Physics: Conference Series 1629 (2020) 012020 IOP Publishing doi:10.1088/1742-6596/1629/1/012020 2 publication time of the literature is as of 2019, regardless of language, and the type of literature for article, review and proceedings paper, 752 academic papers were finally retrieved.

Analysis methods and tools
This article uses bibliometric analysis methods to analyze and explore the 752 papers retrieved from multiple perspectives such as the country where the paper was published, the influence of the journal, the co-citation of the literature, and the distribution of subject areas.
When analyzing the co-citation network of documents, use CiteSpace visualization software for visual analysis and display, revealing important documents and key documents in this research field. Through keyword clustering, understand and analyze specific research content and main research fields.

Bibliometric analysis
3.1. Basic overview of national or regional papers Through the use of Incites data analysis platform to analyze the 752 papers that have been retrieved, it can be seen that in the application analysis of machine learning in fault diagnosis, China has published the most papers, followed by the United States and India. China is also cited the most frequently. In terms of institutions, China has the largest number of institutions in the top 100 cited by frequency, with 31 institutions, but the number of institutions in the top 10 and 50 cited by frequency is less than that of the United States. In addition to China and the United States, India is the country with the most cited institutions in the top 100, but there is still a big gap between China and the United States.

Analysis of high level journals
According to the analysis of the number of publications and citation frequency of the source journals of 752 papers, there are five journals that rank top ten in both publication volume and citation frequency, and all the partitions in the JCR database are in region Q1.Among them, the journal with the largest publication volume is IEEE ACCESS, and the journal with the highest citation frequency is MECHANICAL SYSTEMS AND SIGNAL PROCESSING. IEEE TRANSACTIONS ON INDUSTRIAL ELECTRONICS has the highest impact factor.

Co-citation analysis of literature
Co-citation analysis means that two articles appear together in the bibliography of the third cited article, and the two articles form a co-citation relationship. Usually, two or more papers with co-citation relationship are similar in research methods or research topics, and a group of key papers in a certain topic field can be identified through co-citation analysis of the literature. By using CiteSpace to analyze the co-citation of 752 literatures, figure 2-2 is obtained. From the figure, the larger the node is, the higher the citation frequency is, and the connection between nodes indicates the co-citation relationship between the two literatures. The nodes in red represent emergence citations, which are nodes guiding the sudden rise or fall of dosage, and represent possible changes in the research direction of this field. From the perspective of analysis, the three literatures ranking top in citation frequency are shown in Table 2-3, and the citation frequency of these three literatures is increasing year by year, which may become important and key achievements in this field.

Analysis of discipline field
According to the discipline classification of Web of Science, an analysis of 752 papers related to the application of machine learning in fault diagnosis showed that the disciplines were mainly distributed in the fields of engineering and computer science. Specifically, it had been cited more frequently in refined fields such as artificial intelligence, automation and control systems, information systems, mechanical engineering, interdisciplinary applications, instruments & instrumentation. Among them, the engineering field had the highest frequency of appearance, and the artificial intelligence field had the highest betweenness centrality.

Cluster analysis
Keywords are an important manifestation of the research content of an article. By analyzing keywords, the research direction and research hotspots in this field can be roughly understood. Further clustering analysis was carried out on the keywords, and the clustering results were evaluated by two indexes: the clustering module value (Q value) and the clustering average contour value (S value). It is generally believed that Q>0.3 means that the cluster structure is significant, S>0.5 clustering is reasonable, and S>0.7 means that the clustering is convincing. Through the cluster analysis of the keywords of the 752 articles, 9 clusters were obtained. Further analysis of clustering can divide the main research content into two categories, one was the algorithm research of machine learning; the other was the specific application of machine learning in different types of faults.

Algorithm research in the field of fault diagnosis.
From the keyword clustering table, there were many researches on the algorithm in fault diagnosis. Among them, clustering 0 (support vector data description method), clustering 1 (transfer learning algorithm), clustering 3 (convolution neural network algorithm), clustering 4 (natural inspired optimization algorithm), clustering 6 (Bayesian network), clustering 7 (wavelet packet decomposition algorithm) and clustering 8 (fuzzy logic algorithm) were related to the algorithm.

Application research in the field of fault diagnosis.
In the field of practical application, the research on fault diagnosis mainly involved acoustic emission and doubly-fed induction generators. Cluster 2: The application of artificial neural network in acoustic emission. Acoustic emission is a common physical phenomenon. Acoustic emission technology is a dynamic non-destructive testing method that judges the degree of internal damage of the structure based on the stress wave emitted from the material or structure. Some progress has been made in the use of artificial neural networks for acoustic emission signal processing. Because the artificial neural network has the functions of selforganization, self-adaptive and self-learning, and the network has high robustness, it can solve the problem of noise interference in acoustic emission detection, and can accurately judge the activity of acoustic emission source [9].
Cluster 5: Fault detection of doubly fed induction generator. Doubly fed induction motor is mainly used in wind power generation system. Doubly fed induction generator (DFIG) is the main type of wind power generation. With the development of wind power generation, the research on condition monitoring and fault diagnosis of wind turbine is becoming more and more important. At present, early fault intelligent diagnosis method based on multi-sensor information fusion and wavelet analysis method are important analysis methods for fault detection of doubly fed induction generator [10].

Conclusions
Machine learning is one of the most intelligent and cutting-edge research fields of artificial intelligence. The core of intelligent fault diagnosis is to effectively acquire, transmit, process, regenerate and utilize diagnostic information, so that it has the ability to accurately identify and predict the status of the diagnostic object in a given environment [11]. At present, in the research of fault diagnosis, China is not only the country with the most published papers, but also the country with the most cited frequency Based on the analysis of document co-citation relationship, a group of key documents in this field are identified. It is concluded that the papers published by Jia F, LeCun Y, Janssens, Olivier have received more citations. From the keyword clustering analysis, the important fault diagnosis algorithms include support vector data description method, transfer learning algorithm, convolution neural network algorithm, natural inspired optimization algorithm, bayesian network, wavelet packet decomposition algorithm, fuzzy logic algorithm and so on. Their research fields consist of acoustic emission and fault diagnosis of doubly fed induction generator.
With the development of artificial intelligence technology and the improvement of mechanical equipment automation and intelligent level in China, machine learning algorithm will play a greater role in the improvement of power system, mechanical equipment and performance.