paper

Local feature expansion Vision Transformer model for bearing fault diagnosis under noise environments

, , and

Published 28 March 2024 © 2024 IOP Publishing Ltd and Sissa Medialab
, , Citation Xinliang Zhang et al 2024 JINST 19 P03018 DOI 10.1088/1748-0221/19/03/P03018

1748-0221/19/03/P03018

Abstract

Vision Transformer (ViT) shows potential in bearing fault diagnosis due to its multi-head self-attention mechanism and parallel feature extraction network which are efficient to achieve the robust complete feature representation of the fault. However, its adaption to the noise interference relies on the sufficient huge amount of training samples to prepare the local features of the fault and may suffer performance degradation when only a limited number of samples are available for the model training. To combat this challenge, an improved ViT diagnosis model based on the local feature expansion, i.e., LFE-ViT, is proposed. An auxiliary feature extraction block is introduced using a local feature expansion network and works as a parallel module with the ViT encoder. Through the enlargement of the receptive field, the multi-scale local features on a high dimensional space are available upon the limited samples. Then, through a feature embedding channel, the extracted local features are transmitted to the ViT encoder. Finally, by virtue of the multi-head self-attention mechanism to capture the time sequence global information, a fault diagnosis model comprising comprehensively local and global feature information is derived. Experimental validation on the bearing fault dataset from Case Western Reserve University shows that LFE-ViT has provided a rather satisfactory diagnosis performance under limited samples and noise environment.

Export citation and abstract BibTeX RIS