Processing IMU action recognition based on brain-inspired computing with microfabricated MEMS resonators

Reservoir computing (RC) decomposes the recurrent neural network into a fixed network with recursive connections and a trainable linear network. With the advantages of low training cost and easy hardware implementation, it provides a method for the effective processing of time-domain correlation information. In this paper, we build a hardware RC system with a nonlinear MEMS resonator and build an action recognition data set with time-domain correlation. Moreover, two different universal data set are utilized to verify the classification and prediction performance of the RC hardware system. At the same time, the feasibility of the novel data set was validated by three general machine learning approaches. Specifically, the processing of this novel time-domain correlation data set obtained a relatively high success rate. These results, together with the dataset that we build, enable the broad implementation of brain-inspired computing with microfabricated devices, and shed light on the potential for the realization of integrated perception and calculation in our future work.


Introduction
Nowadays, due to the significant improvement of the processer computing ability and continuous optimization of software algorithms, machine learning (ML) has been widely used in the field of artificial intelligence [1]. The outstanding contributions of ML can be seen in the fields of computing vision, natural language processing, and action recognition [2][3][4]. In order to adapt to complex tasks, artificial neural networks, recursive neural networks (RNNs), convolutional neural networks algorithms have been successively proposed and updated iteratively. At the same time, the massive data generated and transmitted by the internet of things (IoT) has new requirements for real-time information processing of edge computing devices based on new architecture in this information age [5,6]. With the continuous development of IoT and ubiquitous sensor applications, active training and testing using time-dependent information have become a key goal of edge computing [7].
Reservoir computing (RC) is originally designed to reduce the training complexity and mitigate the training process of RNN [8][9][10]. RNN requires complex algorithms to train the connection weight, resulting in slow convergence and consuming a large amount of calculation [11,12]. The RC system avoids this problem by only changing the linear output layer weight (usually called the readout layer) instead of training each connection weight in the reservoir layer. Moreover, the fixed reservoir without an adaptive update is suitable for hardware implementation of various nonlinear dynamic systems [9,13]. Generally, there are two ways to implement RC in a hardware system: (1) multiple nonlinear physical nodes form the reservoir layer (2) using a single nonlinear node and a single feedback loop to form the reservoir layer based on time-domain multiplexing. Compared with the first method, the RC based on the time-delay nonlinear system greatly reduces the difficulty of hardware implementation, which has been successfully proven in a variety of systems, such as spintronic [14][15][16], memristor [17][18][19][20][21], quantum [22][23][24], optoelectronic [25][26][27], and mechanical resonator [28][29][30]. Based on these, the RC system is one of the important methods to realize edge computing.
Hardware RC system requires two crucial properties, multi-dimensional nonlinear mapping ability and memory capacity that can memorize data a few steps or tens of steps ago. Due to its excellent Duffing nonlinear performance and suitable attenuation characteristics, MEMS resonators are appropriate as the nonlinear node of the hardware reservoir system [31,32]. Most of the prediction and classification tasks processed by hardware RC are based on the general data sets, such as parity benchmark [30], nonlinear autoregressive moving average (NARMA) task [13,33,34], Santa Fe laser [26,35,36], Mackey-Glass time-series tasks [35], nonlinear channel equalization benchmark task [37,38], signal classification [18,26], isolated word recognition [13,18,30], video action recognition [34,39], and handwritten digit classification [20,40]. But so far, there is no research on using hardware RC systems to process human action recognition data sets with timeindependent information. Besides, human action recognition plays a significant role in human-to-human interaction and interpersonal relations in this modern society [41]. In order to achieve the goal of integration of perception and calculation, which is to perceive and process the acceleration signal of inertial measurement unit (IMU), we first build a time-delay nonlinear RC system based on MEMS resonator and verified its classification ability with the signal classification task and the prediction ability of the system with NARMA task. Then we design and produce a novel IMU action recognition data set. Compared with the traditional action recognition data set with video information, the information of this IMU data set is more direct and efficient because of its relatively small amount of data. The processing result of utilizing the hardware system to deal with this IMU human action recognition data set shows our hardware RC system is suitable for processing time-dependent dynamic signals. This work lays the foundation for using a MEMS resonator to sense IMU signals and process them simultaneously. Finally, we envisaged using MEMS resonators to sense and process IMU signals simultaneously based on the conclusion of this paper.
The structure of this paper is as follows. In section 2, we describe the methods part, including the fabrication process of MSMS resonators, hardware implementation of an RC structure, and design of the IMU data set. In section 3, we give a detailed discussion about the simulation and experiment results of the RC hardware system on typical tasks. In section 4, the paper ends with a brief conclusion and an outlook for possible future work.

Fabrication process of MEMS resonators
A crucial part of RC system processing information is the nonlinear conversion of input time-dependent data to high-dimensional [42]. Since high-dimensional information is easier to be processed by post-processing algorithms, this nonlinear mapping part is directly related to the accuracy of testing different data sets. Our hardware RC system implements this mapping part through a single clamped-clamped silicon beam.
This single clamped-clamped beam is manufactured in our fab using the silicon-on-insulator (SOI) process which is shown in figures 1(a)-(g). The process uses a total of three masks and starts with a 4-inch SOI chip (thickness of device layer: 25 μm, the thickness of oxide layer: 1 μm, the thickness of substrate layer: 300 μm). After pre-cleaning with acetone, alcohol, and deionized water sequentially, the photoresist is patterned by lithography with the first mask. Sputtering Cr (20 nm)/Au (300 nm) on the patterned photoresist, and using lift-off process to complete the electrode layer production. The photoresist is then graphed by lithography on the electrode layer using a second mask, and the device layer is etched with deep reactive ion etching (DRIE) using the patterned photoresist as a mask. For the subsequent back process, a layer of photoresist is spined to protect the front structure. And then, a third mask is used to pattern the photoresist on the back surface of the SOI chip by photolithography. Using the patterned photoresist as a mask to etch the substrate layer by DRIE. Then the middle oxide layer is released by the method of reactive ion etching (RIE), leaving the movable structure suspended. Finally, the front protective layer was removed by RIE and fuming nitric acid. The SEM image of the microfabricated structure is shown in figure 1(h). The thickness of the beam is designed to be 25 μm, while its in-plane length and width are chosen to be 580 μm and 9 μm, respectively. The electrode is set to be 150 μm × 150 μm, which is convenient for wire bonding.

Hardware implementation of RC structure
RC systems based on MEMS resonators are manufactured using standard materials and manufacturing processes, and their interface circuits and feedback circuits are easily integrated, making them particularly attractive for hardware implementations. The time-delay nonlinear RC system mainly consists of three parts: input layer, reservoir layer, and output layer. The input layer and output layer are both realized by data acquisition (DAQ) of National Instruments. In the input layer, the original input signals u(t) from various data sets are processed by mask signal m(t) and multiplied by the input gain β. The mask signal plays an important role  in breaking the symmetry of the data and keeping the nonlinear nodes in the transient state to obtain various transient responses of the input data. Our mask signal consists of randomly selected numbers in the range (−1, 1) with zero mean and unit variance. As illustrated in figure 2, the processed signal I = βu(t)m(t) is output by the digit to analog converter (DAC) of the DAQ card. And then, the processed signal is modulated by the ac signal V ac sin(2πft) for the convenience of driving resonators. For visualization, the sequence waveform of the DAC output shows the part of the input signal of the NARMA10 data set (this data set will be introduced later). The vertical axis represents the time span and the horizontal axis represents the actual magnitude of the processed signal.
The crucial part of the reservoir layer is the manufactured resonator. The nonlinear vibration state of a single clamped-clamped beam is described by the following Duffing equation with the forced harmonic vibration: where m is effective mass, c is damping coefficient of harmonic system, k 1 is linear spring constant, k 3 is nonlinear spring constant, F cos(ωt) is harmonic force using the electrostatic drive, x(t) is the vibration amplitude over time. The value k 3 determines the nonlinear performance of this system. When the resonator is driven electrostatically, k 3 is determined by nonlinear mechanical spring constant k m3 and electrostatic coefficient k e3 , namely, k 3 = k m3 + k e3 . In this experiment, k 3 is positive and approximately equal to 3.8 × 10 12 N m −3 .
The natural frequency f n is equal to 252 000 Hz in the open-loop test experiment. As shown in figure 1(h), the single clamped-clamped beam has four electrodes, including one driving electrode, one detection electrode, and two electrodes on both sides of the beam. When the dc bias voltage signal is applied to the beam and the harmonic driving signal is applied to the driving electrode, the nonlinear output signal can be observed from the detection electrode.
The single clamped-clamped resonator operates in a nonlinear state by electrostatic drive. The drive signal output by the DAQ card is injected into the driving metal electrode and the bias signal V dc is loaded on the beam directly. The output of the resonator is also detected by the electrostatic method, and the signal is read out by the external interface circuit which contains a trans-impedance amplifier, in-phase amplifier, and bandpass filter. After that, the voltage signal injects into the envelope detector (ENV) for the purpose of demodulation, that is, restores the original valid amplitude information. Part of the ENV output signal is depicted in figure 2. The blue line stands for the output signal from the external interface circuit and the orange signal represents the output signal of ENV. All the circuit modules are integrated into this green circuit board as shown in the figure with the dimension of 11.3 cm × 5.55 cm.
After being processed by ENV, the output signal is captured by the analog to digital converter of the DAQ card for postprocessing and also enters the delay feedback loop at the same time. STM32F407 microcontroller with the main frequency of 168 MHz is used to realize the delay time of the signal in the digit domain. Delay time equals τ = Nθ, which N is the virtual nodes of the RC system and duration θ is the time interval between virtual nodes. As shown in figure 2, the blue line performs the input signal of the STM32F407 microcontroller and the orange line stands for the output signal after a delay which the delay time equals τ = 20 ms. After the output signal of the STM32F407 microcontroller is multiple by the feedback gain α which is achieved by the voltage divider circuit, the obtained signal add to the input signal from the DAQ card and inject into the resonators to form the delay feedback loop. The signal processed by this RC system is captured and stored by DAQ card and then tested and trained by the ridge regression method: which X is the input matrix, W is the weight matrix that needs to be trained, y is the target value, λ stands for the regularization parameters to prevent overfitting.

The design of the IMU action recognition data set
Most of the original information of classification data set commonly used in scientific research region is not time-domain information, such as MNIST image recognition data set and KTH video recognition data set. Each pixel of the image in the MNIST data set does not have the actual time sequence information. Articles using this data set [20,40] only arrange each pixel in order and then input them in sequence, which does not mean that there is an actual time-domain correlation between pixels. Although the KTH video data set [34,39] is used to identify time-related image frame information, the pixel point of each frame image is also time-independent. Therefore, we need to design a data set closely related to timing information in order to use MEMS resonators for processing.
Based on this, we designed and produced a novel type of action recognition data set named IMU action recognition. Processing this data set with the RC system with MEMS resonators is fundamental to achieving the goal of integration of perception and calculation. This data set contains six actions, namely 'circling', 'stepping', 'walking', 'running', 'going upstairs', 'going downstairs'. As shown in figures 3(a) and (b), four same IMU sensors made from Wit-Motion are used to capture human movements. This IMU sensor with the dimension of 36.1 mm × 42 mm is fixed on the experimenter's two wrists and two ankles to collect data. In order to ensure consistency, the background environment for each data collection is the same. There are eight male experimenters with different body shapes participating in the recording of our data set, and each person does each action ten times, that is, a total of 480 sets of data. In each action, the three-axis acceleration signal of each IMU sensor is adopted and saved at a sampling rate of 20 Hz. The recording time for each action is 5 s, so the acceleration information for each axis consists of 100 data points. Each group of data consists of three-axis acceleration signals of four IMU sensors, that is, there are 12 groups of signals that represent an action. All the signals are output by the IMU sensor, without any data processing, directly forming the data set. Figure 3(c) shows the time-domain diagram of raw acceleration signals of four IMU sensors. Each figure represents an action that needs to be classified. Each figure consists of four dials, representing the data of four IMU sensors, and each dial has three time-series curves, representing the acceleration signal of each axis. The abscissa indicates the number of points recorded, and the ordinate represents the signal amplitude.
In order to verify the versatility and feasibility of this action recognition data set, we use three commonly used classification algorithms in the field of ML for testing, namely, linear regression, logistic regression, and multi-layer perceptron (MLP). In all three methods, we used 432 data sets for training and the remaining 48 data sets for testing. (The partitioning of data sets will be covered in detail in the next chapter.) Linear regression is the most basic method in ML which uses regression equations (functions) to model the relationship between one or more independent variables (eigenvalues) and dependent variables (target values). The test result of linear regression is that the accuracy is equal to 53.75%. Logistic regression is to increase the nonlinear sigmoid function on the basis of linear regression, thereby improving the accuracy of classification to a certain extent. Logistic regression test results improved to 72.5%. MLP is a highly parallel information processing system with  strong adaptive learning ability. It is a nonlinear system that can deal with complex multi-input and multioutput classification tasks. The accuracy of this relatively complex algorithm in processing IMU data sets is up to 76.46%. In general, the three ML algorithms get different results on IMU data sets, and the accuracy increases with the increase of algorithm complexity. That means the design of the IMU action recognition data set is relatively reasonable and comparable.

Result and discussion
Having proved the generality of the IMU action recognition data set above, we need to testify the usability of our hardware RC system based on MEMS resonator. The hardware RC system has two basic crucial performances, namely classification ability, and prediction ability. Two general data sets are utilized to illustrate these two performances of the system. The classification performance is experimentally investigated via signal classification task, while the classification performance is discussed via NARMA prediction task. Finally, we use the hardware RC system with better performance to process IMU action recognition data sets.
The MEMS resonator of the probe station works with a mechanical pump in the experiment. The measured decay time of the resonator equals T d ≈ 2 ms under this condition. Continuously proven by predecessors [36,42], the system performance is optimal when the relation between the resonant decay time T d and the minimum time scale θ is T d = 5θ, that is, θ = 0.4 ms.

Signal classification task
The signal classification task is relatively simple, because its input data is linearly separable, which means that this data set can be obtained with high accuracy using a linear system. The input signal x(t) of this task is a random combination of the sine wave and square wave, and the period of each waveform is composed of ten discrete points. The target function y(t) is 1 for sine waves and 0 for square waves.  . NMSE for the NARMA prediction task in the input gain-AC drive voltage plane with hardware RC system. The color indicates the value of NMSE and the X-axis represents the strength of the feedback, and the Y-axis represents AC drive voltage. As the order m increases, the accuracy of the RC hardware system gradually decreases.
Because the accuracy of data processing using the nonlinear RC system is very high which up to 100%. To illustrate the differences between systems, normalized mean square error (NMSE = 1 L L n−1 (y(n) −ŷ(n)) 2 /var(y)) is used to measure system performance. The BPFM(bifurcation point frequency modulation) method discussed in detail in studies [32] is used to find the operating point of the system and determine the order of adjustment parameters. In this experiment, the drive frequency of resonators equals f d = 253 000 Hz and the DC bias voltage equals V dc = 10 V, while AC drive voltage equals V ac = 1.4 V. Since this data set is relatively simple, sufficiently high accuracy can be achieved without a lot of virtual nodes.
Here, we choose N = 50. Based on this, the delay time of the feedback delay loop can be calculated τ = Nθ = 20 ms. The input gain equals β = 1 and the feedback gain equals α = 0.01. As illustrated in figure 4, the red dotted line represents the target output y(t). The result of the simulation in the blue dots is pretty close to the target with the NMSE = 1.3 × 10 −4 , while the result of the experiment in the orange dots are fluctuate around target output with the NMSE = 2.9 × 10 −2 . A good experiment result in the optoelectronic RC system is reported in [27] with NMSE = 1.5 × 10 −3 . Another performance metric named SER is reported in [43] and the SER equals 0.02 with the virtual node N = 56.
It can be seen that this MEMS hardware RC system can handle classification tasks.

NARMA task
The NARMA task is one of the most commonly used data sets to measure the predictive performance of the RC system. Owing to its relatively complex logical relations and controllable memory length, it has been studied by many scholars. In this prediction task, the RC system is trained to predict the output of complex systems, namely, m-order NARMA driven by white noise. The difficulty of the prediction increases with the increase m because a large m requires a higher memory capacity of the RC system. The mth NARMA task is given by the following recursive formula: y(n + 1) = 0.3y(n) + 0.05y(n) m−1 i=0 y(n − 1)  where y(n) is the target value of the task, m is the order of this data set, and u(t) stands for the white noise, which is the random input derived from a uniform distribution over the interval (0, 0.5). The parameters are V dc = 10 V,N = 50, f d = 253 000 Hz in this experiment. In this task, data of length 1000 is used as the training set of the RC system, and the following 1000 data are used for testing. The performance metric used to evaluate NARMA is NMSE. Figure 5 shows the result of the hardware RC system with m = 1, 2, 5, 10. The simulation results match the experimental results relatively well at the condition of m = 1 and m = 2. As the difficulty of prediction increases, the experimental results gradually deviate from the simulation results. The simulation result equals NMSE = 0.115 and the experiment result is NMSE = 0.169 with m = 10. The experiment result is similar to the value obtained using the optoelectronic RC system with the same number of virtual nodes. For instance, the NMSE value of 0.168 is reported in [27] with the virtual node N = 50, and the value of 0.19 is reported in [44]. Figure 6 shows the NMSE as a function of delay feedback gain and AC drive voltage in hardware RC system. Figures 6(a)-(d) represent the color gamut diagram with m = 1 to m = 10, respectively. In the dark blue area, the reservoir system can get reasonable results. A large region NMSE < 0.02 has been obtained in the NARMA1 data set. As the order increases, the blue area gradually decreases. This means that the more memory-demanding the dataset, the more difficult it will be to adjust the parameters. In addition, when ac voltage is around 1.4 V, better results can be obtained. This experiment shows that this MEMS hardware RC system can handle prediction tasks.

IMU action recognition task
After successfully proving that the hardware RC system with MEMS resonator has basic classification and prediction capabilities, the hardware RC system needs to process IMU data sets that have been proven to be feasible. In general, software algorithms are used to preprocess data in advance when processing classification tasks, which is often referred to as feature extraction. For example, the standard Lyon cochlear ear model used when processing the Ti-46 isolated word classification task, and the histogram of oriented gradient used when processing the KTH video classification task. To facilitate the design concept of integration of perception and computing, it is necessary to reduce the steps of software algorithm processing row information. Therefore, we do not use feature extraction when processing IMU data sets.
In order to achieve the purpose of classification, we trained six linear classifiers for the six different actions in the data set, and each classifier was trained by the ridge regression method. If the result of the classifier classification is the corresponding action, the target output is 1. Otherwise, if the result of the classifier is the wrong action, the target output is 0. The results of the classifier are averaged over time, and the winner-takes-all method is used to get the final result. In the testing phase, the highest average classifier corresponding to the correct category is used to select the actual number. A performance metric called success rate (SR) is used to evaluate this classification task. As mentioned above, this novel action recognition task has a total of 480 sets of data. We randomly divided the 480 sets of data into 10 groups with 48 data in each group, including six actions of 8 people. They are then estimated using a ten-fold cross-validation procedure to minimize the fluctuations in results due to random selection between the training set and the test set. Ten parts are randomly selected, one for testing and nine for training.
In this task, the parameters are set as f d = 252 850 Hz, α = 0.4, β = 1, V dc = 10 V, V ac = 1.4 V. It is very crucial to choose the number of virtual nodes N when dealing with such complex classification tasks. Therefore, we first verify the influence of the number of virtual nodes on the SR. As shown in figure 7, the SR is greatly improved with the change of N. Parameter scan virtual node ranges from 50 to 450 in the simulation. The SR is greatly improved with N increasing while N 200. When the number of nodes is greater than 200, the SR will only float within a certain range and will not increase, which means that the adjustment ability of N has reached its limit. Therefore, the number of nodes greater than 450 should not be attempted. The best SR of the simulation equals 92.3% with N = 250. The amount of data to be processed will increase significantly with the increase of the number of virtual nodes, which will affect the computing time and efficiency of the whole hardware RC system. In order to ensure the overall performance of the system, we take N 200 as in actual experiments. As shown in figure 8, the best SR of the experiment equals 91.04% with N = 200. The confusion matrix can be more clearly observed for each action classification. The SR of these four movements is relatively high, including 'circling', 'stepping', 'walking', and 'running'. It is relatively difficult to distinguish between 'go upstairs' and 'go downstairs', and the RC system occasionally classifies 'go upstairs' into 'go downstairs.

Conclusions
We design and process a novel IMU action recognition to testify our hardware RC system can deal with timedomain correlation classification tasks. First, we introduce the implementation of the hardware RC system based on MEMS resonators. After that, the method of processing the IMU data set is shown in detail and three basic ML classification algorithms are utilized to prove the feasibility of this novel data set. And then, we use a classification task and a prediction task, namely, the signal classification task and NAMRA task, to testify the performance of our hardware system. Finally, the IMU action recognition data set is successfully processed by the RC system and the SR of the experiment equals 91.04%. These results have significant implications with respect to hardware RC systems based on MEMS resonators to deal with time-domain correlation tasks. And it also paves the way to the actual sense and process of information simultaneously, in other words, integration of perception and calculation.