Local feature expansion Vision Transformer model for bearing fault diagnosis under noise environments

Xinliang Zhang; Hongbin Xie; Yitian Zhou; Lijie Jia

doi:10.1088/1748-0221/19/03/P03018

Journal of Instrumentation

paper

Local feature expansion Vision Transformer model for bearing fault diagnosis under noise environments

Xinliang Zhang¹, Hongbin Xie¹, Yitian Zhou² and Lijie Jia¹

Published 28 March 2024 • © 2024 IOP Publishing Ltd and Sissa Medialab
Journal of Instrumentation, Volume 19, March 2024 Citation Xinliang Zhang et al 2024 JINST 19 P03018 DOI 10.1088/1748-0221/19/03/P03018

Download Article PDF

Article metrics

32 Total downloads

Permissions

Get permission to re-use this article

Author e-mails

zxldq@hpu.edu.cn

Author affiliations

¹ School of Electrical Engineering and Automation, Henan International Joint Laboratory of Direct Drive and Control of Intelligent Equipment, Henan Polytechnic University, Jiaozuo 454003, China

² Zhoushan Yangwangnaxin Technology Co. Ltd, Zhoushan Zhejiang 3161041, China

ORCID iDs

Xinliang Zhang https://orcid.org/0000-0003-0467-8946

Dates

Received 10 October 2023
Accepted 5 February 2024
Published 28 March 2024

Buy this article in print

Journal RSS

Sign up for new issue notifications

Abstract

Vision Transformer (ViT) shows potential in bearing fault diagnosis due to its multi-head self-attention mechanism and parallel feature extraction network which are efficient to achieve the robust complete feature representation of the fault. However, its adaption to the noise interference relies on the sufficient huge amount of training samples to prepare the local features of the fault and may suffer performance degradation when only a limited number of samples are available for the model training. To combat this challenge, an improved ViT diagnosis model based on the local feature expansion, i.e., LFE-ViT, is proposed. An auxiliary feature extraction block is introduced using a local feature expansion network and works as a parallel module with the ViT encoder. Through the enlargement of the receptive field, the multi-scale local features on a high dimensional space are available upon the limited samples. Then, through a feature embedding channel, the extracted local features are transmitted to the ViT encoder. Finally, by virtue of the multi-head self-attention mechanism to capture the time sequence global information, a fault diagnosis model comprising comprehensively local and global feature information is derived. Experimental validation on the bearing fault dataset from Case Western Reserve University shows that LFE-ViT has provided a rather satisfactory diagnosis performance under limited samples and noise environment.

Export citation and abstract BibTeX RIS

Next article in issue

Local feature expansion Vision Transformer model for bearing fault diagnosis under noise environments

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract