An object detection system using array planar capacitive sensors

This paper presents a target detection system using array-based planar capacitive sensors, demonstrating excellent detection performance in gesture recognition. The system achieves gesture target classification by collecting the capacitance values of the planar capacitive sensor array and optimizing the BP neural network through a genetic algorithm for model training. Additionally, the system employs visual detection for gesture targets, using transfer learning with the MobileNet V2 network in the Keras framework to achieve gesture target classification. The system utilizes a data fusion mechanism to merge visual recognition and planar capacitive sensor array recognition for decision-making. Experimental results demonstrate that the target detection system exhibits outstanding detection performance in gesture recognition, with an average accuracy of up to 98.92%.


Introduction
As scientific and technological advancements continue to progress, there is a growing demand for object detection systems.Object detection systems, as a core application in many fields, can save significant human and material resources.Furthermore, in certain areas that are harmful to the human body, it is imperative to use object detection systems for applications.Therefore, object detection systems have broad application prospects and are increasingly becoming one of the hottest research areas [1] .
This paper mainly focuses on applying object detection systems in gesture recognition.In recent years, many experts and scholars have conducted in-depth research on this topic, which can be broadly categorized into two types: tactile gesture recognition and non-tactile gesture recognition.Typical tactile gesture recognition includes data gloves [2] and EMG sensors [3,4] , widely used in specific tactile gesture recognition fields.However, depending on the application field and scenario, some tactile gesture recognition systems may not achieve optimal results, making the research of non-tactile gesture recognition significant.Non-tactile gesture recognition mainly includes depth cameras [5] , miniature radars [6] , WIFI [7,8] , ultrasound [9] , and others.The significant feature of non-tactile gesture recognition methods is the ability to recognize without palm contact, making non-tactile gesture recognition more applicable in daily life scenarios.
In this paper, we design a gesture recognition system using an array-type capacitive sensor combined with a camera.The dataset collected by the array-type capacitive sensor is trained and deployed using a GA-BP neural network, and visual samples are collected using a camera for gesture recognition through transfer learning using the MobileNetV2 network.We demonstrate a specific

System scheme design
The system mainly consists of an FPGA core circuit, visual recognition circuit, excitation signal circuit, array switching circuit, acquisition circuit, LCD circuit, WIFI communication circuit, and power supply circuit.The system design framework is shown in Figure 1.The core function of this system is to recognize specific targets.For the host FPGA core circuit, the target characteristic data set is collected through a capacitance array, and then GA-BP neural network training and deployment are conducted.As for the slave visual recognition circuit, a specific data set is created to train the system under the Keras framework, using the MobileNetV2 network for transfer learning to recognize the target.The slave will send the recognized target information to the host through serial communication, and based on the host's data fusion judgment, the target recognition results can be displayed on the LCD screen.At the same time, the system can drive the WIFI communication circuit to send the recognized final target information to the user's mobile phone.

Capacitor detection circuit design
In this section, the specific implementation principle of the proposed readout system will be elaborately derived.For the circuit presented in Figure 2, the DDS frequency synthesis chip is first used to generate the output sinusoidal excitation Vi.The expression for Vi is ( ) ( ) , where Vi = 1 V, f = 500 kHz, and φ is set to zero in the current stage.Firstly, an analysis of the parasitic capacitance Cp1 and Cp2 of the capacitor sensor is conducted.Based on the properties of the equivalent connection of parasitic capacitance, we can conclude that the influence of the parasitic capacitance Cp1 is not connected to the measurement system.In contrast, the influence of the parasitic capacitance Cp2 is eliminated by the "virtual short" characteristic of the operational amplifier.Therefore, the first stage is through a C/V conversion circuit by: ( ) ( ) ( ) ( ) In the above equation, Cx represents the capacitor formed by the capacitive sensor, and Cref is the feedback capacitor of the measurement circuit.The input end of the subtractor formed by operational amplifier A3 is connected to V1 and V2, enabling it to perform the subtraction operation on the signals, specifically: where R6=R7=R8=R9, so that V3=V1-V2 is relatively simple.At this time, the signal is relatively weak, so we need to operate the method so we can get from the amplifier composed of an A4 operational amplifier: The amplified signal is then filtered so that the signal becomes smooth and it can be obtained through a second-order filter composed of A5: To facilitate the system acquisition, we designed a peak detection circuit and can get the final output results: ( )

GA-BP neural network
A genetic algorithm is a method for searching for the optimal solution by simulating the natural evolution process.It is an efficient heuristic search and parallel random global optimization algorithm widely used to optimize machine learning parameters and solve optimization problems.Due to the tendency of the BP neural network to fall into local optima, this study applies the genetic algorithm to optimize the weights and thresholds of the BP neural network to improve the accuracy of the BP neural network model's prediction results.
The steps of genetic algorithm optimization for the BP neural network can be divided into the following stages: First, the real-number strings representing the weights and thresholds of the BP neural network are encoded to generate the initial population.The second step involves calculating the fitness value of each individual through a fitness function and comparing them.Subsequently, individuals with higher fitness values are selected as parents for crossover operations.In the fourth step, some genes of the parental individuals are exchanged to generate new individuals.Then, in the fifth step, the values of some genes at certain gene loci of the selected individuals are changed with a certain probability.Finally, the optimal solution for the weights and thresholds is obtained by determining the error limit or the maximum number of iterations.

MobileNetV2 network
In this paper, we utilized the MobileNet V2 network for transfer learning in gesture recognition.MobileNet V2 has been previously introduced in our research [10] , so this section briefly overviews it.MobileNet V2 inherits from MobileNet and uses ReLU6 as the activation function.Compared to the ReLU activation function, ReLU6 limits the maximum output value to 6.This is primarily to avoid precision loss caused by using low-precision float16 to represent large numerical values on mobile or portable devices, which could affect the extraction and description of classification features, thereby impacting accuracy.Regarding depth-wise separable convolutions, the MobileNet V2 network utilizes the Bottleneck structure, which consists of two regular convolutions and one depth-wise separable convolution.This structure first expands dimensions through 1×1 convolutions, then extracts features using 3×3 depth-wise separable convolutions, and finally compresses the data using 1×1 convolutions.The two regular convolutions are activated using ReLU6 and Linear functions, respectively.In the MobileNet V2 network structure, the non-linear activation layer following the low-dimensional 1×1 convolution layer is removed, referred to as the linear bottleneck.This is because transforming highdimensional information into low-dimensional information is akin to feature compression, which may result in the loss of some information.If the ReLU6 activation function continues to be used for computation, the loss of information would be even greater.Therefore, replacing the ReLU6 activation function with the Linear function can reduce the loss of feature information and help maintain the model's accuracy.

Sample collection and training
The experiment conducted sample collection for nine different hand gestures and inactivity separately.Firstly, hands were placed in front of an array-type planar capacitive sensor, and the collection system was used to sample the capacitance values for different hand gestures, which were then recorded.Meanwhile, the camera also collected samples for different hand gestures, and the sample collection data are shown in Figure 3. On the one hand, we need to classify the collected capacitance values, remove values with large errors, and then establish the sample data of each gesture and the capacitance array when there is no operation.Since the sampling is continuous, we select better values to group and create labels, with each gesture corresponding to 1000 sets of data.On the other hand, for the collection of image samples, to improve the generalization ability of prediction, we use vertical flipping, sample rotation by 90 degrees, and multi-scale scaling to enhance the collected samples.1, 000 samples were collected from each of the 10 categories, resulting in 10, 000 images.

PC test experiment
Through training the GA-BP neural network with the capacitance-collected samples using PC-side software and predicting on the test set, we achieved a recognition accuracy as high as 94.5%.The confusion matrix is shown in Figure 4.As for the MobileNet V2 model in transfer learning, after multiple parameter adjustments, the camera recognition can reach a recognition accuracy of 96.5% in the test set, with the confusion matrix shown in Figure 5.

Embedded system test experiment
The trained models are deployed in embedded devices, with the GA-BP neural network model deployed in the main control FPGA and the MobileNetV2 network model deployed in the subordinate OPenMV.Then, by designing a fusion decision mechanism, we combine the judgments of the two sensors and output the final decision in the embedded device.The flow of the decision mechanism is shown in Figure 6.We consider R1 and R2 to represent the numerical values of the output labels of capacitance array recognition and camera recognition, obtained by forcibly converting the label strings into numerical values and then comparing the numbers.Meanwhile, r1 and r2 represent the maximum recognition accuracies of the two recognitions, respectively.First, we determine if the two recognitions are the same; if they are, we consider the recognition result as the final result.Secondly, if the results of the two recognitions are inconsistent, we compare the magnitudes of r1 and r2 recognition accuracies.If the value of r1 is greater than r2, we consider the recognition result to be R1 of the capacitance array recognition; otherwise, we consider the recognition result to be R2 of the camera recognition.To verify the accuracy of the system's recognition, we recruited three volunteers for the experiment.Each volunteer performed the designated gesture continuously at the system recognition position for 1000 recognitions, totaling 10, 000 recognitions, and then we recorded the statistics.The identification test table is shown in Table 1.Based on the recorded table above, it is obvious that the embedded system proposed in this paper has a high recognition accuracy, reaching over 98.92%., Zhang et al. [7] , and Wang et al. [9] , the proposed system in this paper has significant characteristics.Specifically, it has a faster recognition rate.In terms of recognition accuracy, this system has higher recognition accuracy than the ones proposed by Wang et al. [2] and Zhang et al. [7] but is slightly lower than the one proposed by Wang et al. [9] .Wang et al. [9] significantly increased the consumption of system resources by using the dual LSTM algorithm, which posed a significant resource risk for embedded device operation.

Conclusion
In this paper, we designed and deployed a system prototype for specific gesture recognition using an array-type multi-channel capacitive sensor and the OV7725 camera on an embedded system.The experimental results indicate that this method is feasible, and the system has certain engineering application prospects.

Figure 3 .
Figure 3. Sample image of gesture and blank collection.

Table 1 .
Different volunteers identify test results on embedded devices.Compare with the latest research findings To validate the advancement and innovation of the gesture recognition system proposed in this paper, a comparison was made with the latest gesture recognition systems.The most advanced systems and methods comparison table is shown in Table 2.In comparison with the system proposed by Wang et al. [2]

Table 2 .
Comparison results with the most advanced systems and methods at home and abroad.