HGSSA-bi LSTM: A Secure Multimodal Biometric Sensing Using Optimized Bi-Directional Long Short-Term Memory with Self-Attention

Biometric sensing technology has become a frequent element of everyday life as a result of the global demand for information security and safety legislation. In recent years, multimodal biometrics technology has become increasingly popular due to its ability to overcome the shortcomings of unimodal biometric systems. A HGSSA-Bi LSTM (Bi-directional long short-term memory) modal is presented in this paper for multimodal biometric identification. For removal of noise (unwanted) the pre-processing stage is used in the initial stage. An extended cascaded filter (ECF) is used with a combination of median and wiener filter in the pre-processing stage. Then, using the CNN model, feature extraction is utilized to extract features from the processed images. After feature extraction, fusing of feature is used with the aid of discriminant correlation analysis (DCA). Finally, the recognition process is performed by using the novel optimized hunger game search self-attention based Bi-LSTM model (HGSSA-Bi LSTM). The obtained outcome for the developed model is finally compared with other previous approaches such as CNN, RNN, DNN, and autoencoder models and the calculated performance based on accuracy 98.5%, precision 98%, F1-score 97.5%, sensitivity 98.5%, and specificity 99% accordingly.

Biometric sensing is the term used to describe the process of verifying a person's identification using their physical or behavioral traits. 1 For the safe recognition of physical or behavioral traits is currently a difficult and important problem for both the scientific and business worlds.Faces, Hands, Irises, Feet's, ears, Fingers, teeth, signatures, veins, retinas, voices, gaits, typing style, scents and DNA were used for the feature of biometric. 2Designing security system has given increasing consideration to person verification based on biometric characteristics. 3The performance requirements in realworld systems cannot be satisfied by single biometric characteristic.The majority of biometric systems have a high false rejection rate (FFR) and are neither user-friendly nor confident with users.They are challenging to forge and difficult to acquire without the user's consent and the majority of those modalities are vulnerable to spoof attacks. 4,5ne biometric sensing is used by unimodal biometric systems to identify individuals. 6,7These systems are far from the ideal with number of issues which includes excessive noise, lack of universality, inadequate discernment, 8 and oversensitivity to aggressive behavior. 9To get beyond the drawbacks of utilizing only one biometric feature to identify people, multimodal biometric systems combine several modalities.The unimodal biometric system exceeds the performance of multi-modal biometric system.The unimodal biometric system is unsuitable for all applications because it uses a single biometric feature that is prone to noise, poor capture, and other intrinsic issues.The increased complexity and cost of multimodal biometric systems, as well as the inconvenience of using multiple biometric modalities, are significant downsides. 10,11owever, unimodal biometric systems can present their own challenges due to the intraclass variations and interclass similarities that can lead to inaccurate identification of genuine users and imposters.
Therefore, accurate person identification with a simple approach is becoming increasingly important in our society's many security concerns. 12Compared to other accessible qualities, iris and fingerprint biometrics are simpler, more accurate, and more dependable. 13hese characteristics make its merger an especially viable answer to the current authentication issues.Additionally, combining fingerprint and iris biometrics is more accurate than combining each with face biometric. 14Despite the fact that iris biometrics possess more features, stability and attack resistance than fingerprint biometrics, traditional fusion methods continue to provide the same weight in fusion for each individual biometric.This is why even their highest mistake rates are not perfect.
As each biometric has a distinct set of inherent defects, combining and fusing several biometrics is the best way to improve the performance of the biometric system. 15The extraction of distinguishing characteristics is often a crucial step in biometrics systems following the gathering and storage of biometric data utilizing capturing sensors. 10eep learning (DL) algorithms have recently excelled earlier features extraction methods.The neural networks with hierarchical layers were built in the DL algorithms. 16In a variety of fields, including pattern recognition, computer vision, robotics, and medical image segmentation, deep learning methods enhanced recognition performance.In biometrics systems, transfer learning and DL techniques based on convolutional neural networks (CNNs) have been used. 17he research work is structured as follows: Recent work is analyzed in relation to biometrics in Literature Review section.The HGSSA-bi LSTM model is explained detail in the Proposed Model section.The implementation results of the proposed method are discussed in Results and Discussion section.Finally, the paper is gets concluded in the Conclusions section.

Literature Review
As augmented and virtual reality (AR/VR) technology advance, several enterprises are gathering a lot of biometric information.Although these data are very important, they also increase the privacy issues.A common biometric technique is identifying recognition based on electrocardiograms (ECG).An ECG record is a time-continuous internal biological characteristic of a person.Thus, EIR could be less susceptible to attack than conventional biometric techniques like facial recognition.Sun et al. 18 presented a personalized autoencoder (PerAE), an EIR system based on autoencoders.The registered system for each user attention-MemAE autoencoder model maintained the PerAE.It improved the autoencoder using the attention and memory module methods.Other users' heartbeats are classified as anomalies by a user's attentionz E-mail: juhipy@gmail.com;nanglia.pankaj@gmail.comECS Sensors Plus, 2024 3 011401 MemAE.PerAE may decrease memory overhead and increase time efficiency by employing a customized autoencoder.It enhances the EIR system's capacity for adaptation, scaling, and maintenance.The results of the experiment indicate that it takes only five minutes to acquire the user's ECG data in order to train an Attention-MemAE with a recognition accuracy of 90%, with approximately 500 heartbeat samples.
Kuzu et al. 19 presented a deep learning-based vein-based biometric verification method.Here, an unique method is utilized that combine an auto-encoder that has undergone unsupervised training with a CNN that has undergone supervised training.In this case, backbones CNNs are deployed on top of new densely linked convolutional autoencoder.This design intends to improve the characteristics derived from hand vein patterns in terms of discrimination.Experiments on dorsal veins, palm, and finger show that the suggested strategy enhances recognition rates when compared to utilizing solely CNNs for feature extraction.The outcomes outperform the present state of the art in vein biometric verification.
In order to identify and repair ECG heartbeat outliers, Karpinski et al. 20 presented a unique approach based on autoencoder neural networks.Major waveform aberrations in heartbeats are frequently regarded as abnormal and excluded.Poorer statistics are the result, through and they might be important in biometric applications.To enhance the number of legitimate heartbeats by correcting corrupted segments with effective self-learning approaches.The best autoencoder design for detecting ECG outliers was determined after a series of trials.To validate the proposed method, the free and opensource physio net ECG-ID database is used.On the other hand, autoencoders had a greater error rate, but were considerably simpler to include in an identification pipeline since they don't need to have their hyper parameters precisely tuned.
Hou et al. 21introduced a new approach for finger-vein authentication was proposed which combines a convolutional auto-encoder (CAE) and a support vector machine (SVM).The CAE employed finger-vein images to generate feature codes, which were then taken in use by the SVM to classify the finger veins.The CAE comprises of three components -(1) a decoder for reconstructing the high-level feature code of the finger vein images; (2) an encoder for extracting these feature codes; and (3) an encoder for obtaining the feature representation of high level from the raw pixels of these images.In this technique, Support Vector Machines (SVMs) were used as a competent classifier to classify the characteristics retrieved from CAE.According to the findings of the study, the proposed deep learning-based solution surpassed the previous method for learning features without prior knowledge, suggesting significant promise in finger vein authentication.Saponara et al. 22 presented a fingerprint image which were recreated using a CNN autoencoder.A method that can duplicate data in the images was called an autoencoder.CNN were useful for feature extraction because of its benefit.Four fingerprint image datasets had been used to show the proposed method's robustness.The 2004 fingerprint verification competition's dataset of fingerprint photos.By calculating the CMC (Cumulative match characteristics) in between the original and reconstructed features for the proposed method.We were able to identify people from four datasets of fingerprint photos (Dataset I-IV) using CNN autoencoder, and our recognition rates were encouraging at 98.1%, 97%, 95.9%, and 95.02%, respectively.The proposed architecture was examined and contrasted with current cutting-edge techniques.The obtained experimental findings demonstrate the suitability of the suggested technique for reconstructing a complicated context of fingerprinting images.
An innovative deep-learning based architecture was presented by Shahreza et al. 23 to protect biometric templates and increase identification effectiveness.The structure of the autoencoder deep convolutional constructed the feature space dimension and the feature gathered from the auto encoder bottleneck layer, secure templates are made using the bio hashing approach.The test results show that, in a typical situation, protected templates supplied by the system outperform raw feature templates protected by bio hashes.When the bio hashing key was stolen, the model outperforms bio hashing of the raw features obtained by earlier recognition techniques by a wide margin.The proposed method applies to other vascular biometric modalities.It was important to note that an opensource framework more academics may validate the results.

Proposed Model
Multimodal biometric refers to the process through which personal identification systems identify individuals using a variety of biometric characteristics.Multimodal authentication provides a better level of authentication in compared to unimodal biometrics, which use only one biometric data, such as a fingerprint, face, palm print, or iris.As illustrated in Fig. 1, this work is completed in four stages: pre-processing, feature extraction, feature fusion, and recognition.In the beginning, pre-processing is done to lessen image noise and improve image quality.The initial method is called ECF and it combines the median filter and wiener filter.Secondly the feature extraction with the aid of CNN network the discriminative features are extracted from the biometric images.In the DCA analysis, the extracted features are fused with finger print, palm print, and iris images.The final stage is recognition, and at this stage the HGSSA-biLSTM model performs the multimodal biometric recognition.The optimization technique is introduced in this case in order to reduce the loss function and adjust the hyper parameters.
Image pre-processing.-Thefirst step in creating the proposed biometric classification is the pre-processing the image.To remove noise from the images and sharpen the image to improve quality, aesthetic appeal, and accuracy, a variety of image enhancement or filtering techniques are applied.Internet users may download images in a range of resolutions and quality levels.The cascading median filter (MF) and the wiener filter (WF) incorporate the ECF filter.The goal of this filtering approach is to remove the salt and pepper noise from the images.The MF is particularly effective in reducing noise while maintain an image's crisp edges.WF is capable of eliminating ECS Sensors Plus, 2024 3 011401 the additive noise and preserving the edges effectively than the other filters.In order to eliminate extra noise, these filters are cascaded during the pre-processing stage.
To remove the salt and pepper noise, the collected pictures are first sent to the MF filtering model.The filtering model scans the image pixel by pixel using a sliding window to remove noise.The median value for the pixel is calculated and replaced whenever noise is found at the pixel position.The images are then sent to the WF method to maintain the edges and lower noises when this procedure is complete.The following steps involved in applying the ECF approach to de-noise the input images.
• Step 1: A 2D 3X3 sliding window is originally selected in order to obtain the pixel values of an image.
• Step 2: The ability to identify noise pixels is assessed.One pixel value (χ_ab )is first selected n order to locate the noisy pixel among the other pixel values.The chosen pixel value is compared against the pixel boundaries (i.e. if χ_ab < 0 or if χ_ab ">225", the pixel is deemed noisy).
• Step 3: Any pixel value in the image that fails to satisfy both of the requirements in step 2 is labelled as noiseless and is not processed.
• Step 4: The noise pixel was found in phase 2, and this step evaluates the noisy pixel's median value using a sliding window.
• Step 5: The sliding window is positioned at the image pixel.
• Step 6: Until the image pixel processed, the steps 2 to 5 are repeated.Then the image is fed to the filter method as WF to remove the residual salt and pepper noise from edge grooming.When it comes to effectiveness and quality, WF filters excel in a number of tasks including denoising and smoothing.Additionally, WF performs much faster than bilateral filters in terms of speed.Equation 1is used to express the WF process.
) is a particular pixel location, the WF is given by, where, K is the input image, 2 ρ and mean φ locally estimated from η of AXB ( ) local neighbourhood of each pixel.
where, n σ is the variance of the noise.
Feature extraction.
-The feature extraction of the proposed method is done by the CNN neural network.CNNs are structural neural networks with several layers, most of which perform the same fundamental functions, such as convolution, pooling, and classification layers.The first functional contemporary CNN was dubbed LeNet-5.In essence, how these foundational layers are deployed and packaged, as well as how the network is trained, distinguish different CNNs from one another.The initial step in this process involves pre-processing the input images with a standard normalization.Then, to extract features and decrease duplication, pooling layers and convolutional layers are used.As time progresses, these simple features merge together effectively.Following this, partial merging of the features occurs, where each resultant feature contributes a part of the definition for its designated class.Finally, the top characteristics are passed onto a fullyconnected layer which provides an output categorization estimate.
Convolution layer.-Tocreate feature maps that respond to different feature detectors, a convolution layer compresses the input by removing interesting properties from feature maps.Fundamental properties like edges are filtered by the neurons in the first convolutional layer.Although each convolution kernel can extract features from the whole input picture plane, neurons are delegated to accommodate particular regions of the input image plane in order to create feature maps with an equivalent receptive field.The depth of each type of neuron is used in this technique to convolutional weight sharing, in which a set of feature maps can be created from Time-Delay Neural Networks (TDNN).Using the same weight and bias vectors, each type of neuron generates a slice in this stack.
Convolution: A number of parameters, including zero-padding, input size, kernel size, map stack depth, and stride, define each convolutional layer.
The size of the output can be calculated by: where, the size of the map and kernel is represented as ) the column and row stride is expressed as R R , .
a b ( ) Activation: The activation layers are activated after the bias and total weighted.However, the pure perceptron's interrupted the simplex combination of linear input, the non-linear activation function is required to get a neural network as a continuous functions for general approximator in the Euclidean space.The output of a pooling layer is compressed the LeCun recommended using the sigmoid function.The CNN performance is enhanced using the ReLU.Due to harsh non-linearity ReLU performance is remarkable, zero in the non-differentiability and feature of sparse.For implementing the convolutional outputs, the ReLu becomes popular to activate.The formula for ReLU is Pooling layer: Convolutional neural networks (CNNs) use pooling, also known as subsampling, as a crucial part of its architecture.It serves as a linking layer between several convolutional layers and has many advantages.The two most common types of pooling are max-pooling and average-pooling.By focusing on local data through a pooling window and reducing data dimensionality, pooling lowers the likelihood of overfitting.Additionally, calculations can be made faster thanks to the data's lower dimension, which also offers translation, rotation, and scaling invariance when properly pooled.Non-overlapping max-pooling simply multiplies the results of each pooling zone PR PR , ) on the j th feature map in the i th layer may be computed in between the weight matrix and the local area position x y , ( ) for each feature map used to calculated the dot product.
Where, the i th layer activation function is denoted as , α the i th layer of the j th feature map in the additive bias is expressed as b .
ij The feature maps links are m indexes in the layer of i 1 th ( − ) and the j th  Feature fusion.-Hereare two debatable problems with the feature fusion technique discussed in the preceding section.When there is a difficulty with a limited sample size, the first problem arises.The number of samples in many real-world applications is typically lower than the number of features m korm j .
( < < ) The covariance matrices are hence unique and non-invertible.As a result, it will be extremely difficult to invert the matrices.Before employing canonical correlation analysis (CCA) as one approach to this problem, reduce the dimension of the feature vector.A two step PCA+CCA approach might thus be considered as a result.
The second issue with CCA-based approaches is the sample class structure.Despite the fact that CCA de correlates the traits, it is also interested in classifying the challenges of pattern recognition.By recognizing projections that allow for the greatest distinction between classes, Linear Discriminant Analysis (LDA) based dimensionality reduction techniques give special consideration to this matter.But a two-stage LDA+CCA approach won't since the attributes obtained by the first stage.But a two-stage LDA+CCA approach won't work since the attributes obtained by the first stage, LDA won't be preserved by the transformation done by the second step, CCA.Thus, it is essential to perform transformations that divide the classes within each group of features while maximizing cross-pair correlations between the two feature sets.
Our approach optimizes the correlations between features across both sets while taking into consideration the class structure, or the classes to which each sample belongs.This assists in emphasizing the differences between classes.This aids multimodal recognition systems in integrating the pertinent data obtained by various senses.The description of the proposed methodology known as Discriminant correlation analysis (DCA) which is detailed in the upcoming section.Consider a moment with c class of distinct which are the samples in the matrix data.Due to the outcomes of the columns in the data matrix n i are divided into distinct groups c, where these columns are belong to the i th class n n .
The feature vector x X ij ∈ corresponds to the j th sample in the i th class.The means of the x ij vectors in the i th class and the entire feature set is denoted as x i ̄and x, ̄i.e., x x i n j n ij The matrix of between-class scatter is denoted as, Where, Calculating the covariance matrix as φ φ be a diagonal matrix when the classes were separated.The transformations are determined and diagonalize so bx T bx φ φ is symmetric positive semi-definite.
Where the orthogonal eigenvectors of the matrix is illustrates as P and the Eigen values that are stored in the decreasing order as real and non-real matrix of diagonal is given as A.
From matrix P, the Eigen values of r largest non-zero is corresponds to the first r eigenvectors let assume as Q cxr The S bx can be obtained in the r most significant eigenvectors with the mapping is, The transformation is known as W QA bx bx 1 2 φ = − / combines S bx the data matrix's dimensionality is decreased as X of r ECS Sensors Plus, 2024 3 011401 The X projection in the space is denote as X′ where matrix of between-class scatter is given as I and then the classes are seperated.At most C 1 − nonzero generalized eigenvalues an upper bounds for r are the ranks of the data matrices i.e.
Similarly for second feature set is solved using the above mentioned approach, Y and the transformations is determined W by that utilize the matrix of between scatter regarding the second modality S by and the dimensionality is decreased as , ≠ it has non-diagonal elements close to zero and diagonal elements close to one.Due to this, there is little association between the centroids of the classes, and as a result, the classes are divided.
The X and Y is transferred to X′ and Y′ are unitized in the between-class scatter matrices making the feature set of non-zero correlation in the corresponding feature of another set.the feature set that underwent transformation S X Y X′ and Y′ are the rank r and S xy rxr ′ ( ) is no degenerate.However, the matrix of diagonal ∑ is the diagonal of main elements gets non-zero as W U cx Where the between-set matrix of covariance is S .xy ′ Now, transform the features sets as follows.It is simple to demonstrate that the altered feature set's between-class scatter matrices are still diagonal, indicating that the classes are still distinct.The matrix of between-class dispersion for X ̑is determined as: Where, S by which is diagonal.The altered feature vectors can be concentrated or added together to perform the feature level fusion.The summation approaches does, however, have the advantage of having fewer dimensions, and the advantage of having fewer dimensions, and the difference in recognition results is minimal.
Multimodal biometric recognition.-Theoptimized hunger game search self-attention based bi-directional long short term memory (HGSSA-bi LSTM) model, as illustrated in Fig. 3, performs the proposed work recognition.LSTM networks frequently use oneway information transmission, which means they can only use past data and cannot use future data.
The Bi-LSTM may consider data from the past as well as the future.Theoretically, two LSTM networks with different timings can be joined using a single output.The forward LSTM can be utilized to learn about the input sequence's prior data, whilst the reverse LSTM can be utilized to learn about the input sequence's subsequent data.The H t hidden state for Bi-LSTM in the t time incorporates h t ⃗ and h t as forward and backward: To reduce the loss function and tune the hyper parameters hunger game search self-attention (HGSSA) optimization is introduced in this method as shown in Fig. 4. Modelling animal behavior and hunger with the HGSSA is an optimization technique.The population of N solutions X used in HGSSA mathematical modelling is followed by the objective function values for the solutions.
The modernization step is carried out using the following equations: The value with the variable R is defined with in the interval a a , [− ]that rely the iteration number as random variables are r 1 and r , 2 which produces a number in the normal distribution.
The control parameter E is defined as The hyperbolic function is denotes as Sech with the Sech x , where the highest value of the objective function LF .
b In addition, the hunger weights are illustrated as W 1 and W . 2 1, 1 The variables r r andr , , 3 4   5 stand for random integers with values in the range, and the variable sh is equivalent to the answer to the summation of the hungry emotion provided below: Additionally, the variable h i corresponds to the answer hungry is provided by: ECS Sensors Plus, 2024 3 011401 The LF b provides the optimal value for the objective, LF i provides the objective for the current solution x , i and the variable h n provides the new hunger: The objective function is given a lower value by LF , w and the random variable r 0,1 6 ∈ [ ] might indicate whether hunger has beneficial or detrimental consequences based on a variety of factors.

Results and Discussion
The experimental setup, performance results, and the experimental outcomes are all discussed in this section.The findings discussion includes an assessment of the proposed HGSSA-Bi LSTM.To demonstrate the efficiency and results of the system ensemble based on the CNN feature extraction and DCA future fusion.The approach is compared with other existing methods in terms of accuracy, precision, sensitivity, ROC curve and confusion matrix.
Dataset description.-TheCASIA dataset of fingerprint, iris, and palm prints was used to evaluate the suggested approach, which was developed on the Python platform.CASIA-Iris V3 has a total of 22,035 iris photos from over 700 participants.The near-infrared light source was used to take each of the iris pictures, which are all 8 bit gray-level JPEG images.500 patients' fingerprints are represented by 20,000 images in CASIA-FingerprintV5.Using the URU4000 fingerprint sensor, the CASIA-fingerprintV5 fingerprint images were recorded all at once.7,200 palm photos were collected from 100 different people utilizing a self-designed multiple spectral image equipment for the CASIA multi-spectral palm print image library.20% of the dataset's images are selected for testing, while 80% are selected for training.
Performance evaluation.-Toevaluate how effectively a classification system performs, a confusion matrix is utilized.The results  I describes the proposed method's performance analysis.The suggested HGSSA-bi LSTM has an accuracy of 98.5%.In terms of accuracy, the proposed technique is 3% better than CNN, 4% better than RNN, 5% better than DNN, and 4% better than the auto encoder.
The suggested HGSSA-bi LSTM has an accuracy of 98%.Then the proposed method is 6% better than CNN, 7% better than RNN, 6.5% better than DNN, and 5% better than the auto encoder in terms of precision.The achieved F1-score of the proposed HGSSA-bi LSTM is 97.5%.Then the proposed method is 5% better than CNN, 6% better than RNN, 6.5% better than DNN, and 4.5% better than the auto encoder in terms of F1-score.The achieved sensitivity of the proposed HGSSA-bi LSTM is 98.5%.Then the proposed method is 4.5% better than CNN, 5% better than RNN, 6.5% better than DNN, and 4.25% better than the auto encoder in terms of sensitivity.The achieved specificity of the proposed HGSSA-bi LSTM is 99%.Then the proposed method is 3.5% better than CNN, 4% better than RNN, 4.25% better than DNN, and 4% better than the auto encoder in terms of specificity.By using the HGSSA-bi LSTM model the weight parameters are chosen perfectly, so the performance metrics are increases compared with the existing methods.
Comparative analysis.-Figure6 depicts the accuracy and Fig. 7 depicts the loss curve performance after training and testing.In order  for the suggested model to function, the retrieved frames are fed into it.The accuracy and loss curves are also distinguished using training and testing data.In the experiment, 80% of the training data and 20% of the testing data are utilized to separate the accuracy and loss curves.The accuracy and loss curve for the CASIA dataset is depicted graphically below.
The accuracy curve, as shown in Fig. 7, compares training and testing data.The proposed model yielded either low or high equalized outcomes.The accuracy increases with the number of iteration increases.For the CASIA dataset, the proposed models over all accuracy after 100 iterations of training are 98%.The aforementioned graph demonstrates the very stable for additional data iterations for both training and testing.Then the loss graph shows that the generated model's loss % has significantly dropped, and that a high number of iterations will result in a very low outcome.The graph demonstrates how the proposed model is relatively stable after several iterations.When compared to other available strategies, the experimental example achieves a loss value of 0.03 for 100 iterations.Due to the extremely optimum training data that was utilized and the efficient model that was constructed and the proposed method achieves minimal error.
The proposed model's result is established using the decoded operating region under curve (ROC).The true positive rates (TPR) and false positive rates (FPR) are used to determine the ROC curve.Figure 8 depicts an explanation of the ROC curve graph.When the FPR is changed, the TPR rises in steps.For biometric image detection on the CASIA dataset, the area under the curve (AUC) for the recommended method is 96%.
Pre-processing, feature extraction, feature fusion, and recognition are the four phases that the suggested work goes through.During the pre-processing step, the ECF approach is employed to filter out noise.The next step is feature extraction, which emphasizes feature extraction using the CNN network model.In addition, the future fusion process is carried out in DCA.The HGSSA-bi LSTM model then performs the recognition process.The suggested model's output is analyzed and compared to earlier approaches such as CNN, RNN, DNN, and auto encoder models.However, the earlier models only achieve low performance due to significant disadvantages.However, this model suffers from significant limitations such as a high error rate and the time required to tweak the hyper parameters.Our proposed model achieves accuracy, precision, F1-score, sensitivity, and specificity when discriminating using the aforementioned technique.Our created model efficiently performs the weight function and hyper parameter adjustment.Furthermore, training percentages in the region of 80% are used to evaluate performance evaluations.The outcome is determined by the following factors: accuracy, precision, recall, specificity, F1-score, and sensitivity.Because of the self-attention layer, the model's performance improves.Our proposed model achieves a high level of accuracy as a result of its good performance.ECS Sensors Plus, 2024 3 011401

Conclusions
To recognize biometric photographs, an HGSSA-Bi LSTM model is introduced in this research.Pre-processing, feature extraction, feature fusion, and recognition are the four processes in which biometric photos are recognized.The ECF is used to remove noise during the pre-processing step.Furthermore, CNN is employed to extract features from biometric photos during the feature extraction step.Following that, the feature fusion process is finished with the assistance of DCA.Finally, the HGSSA-Bi LSTM model is used in the recognition procedure to recognize multimodal biometric pictures.In addition, our suggested model goes through training, testing, and loss and accuracy curves.The suggested approach performance analysis is supported by the CASIA dataset performance metrics of fingerprint, iris, and palm print, which are accuracy 98.5%, precision 98%, F1-score 97.5%, sensitivity 98.5%, and specificity 99%.Furthermore, the suggested approach is compared to other existing methods, such as CNN, RNN, DNN, and autoencoder, and the proposed results outperforms the current methods.ORCID Pankaj Nanglia https://orcid.org/0000-0001-5681-401X

1 ( × ) and there are m 1 2 -D convolution kernels of the size k k 1 1 (
the input image and reduces its PR PR , a b width and height, correspondingly.A 2-D CNN architecture for extracting features from a biometric image using the red, green, and blue (RGB) feature extraction can include multiple 2-D convolutional layers and pooling layers, as typically depicted in Fig. 2. The convolution layer is responsible for calculating the response of a learning filter to the input image, thus aiding in feature extraction.The first 2-D convolution layer will produce m 1 feature maps of the sizes N k ) × ( − + ) depend on the 2-D convolution kernels size.Meanwhile, the input image is only N N 1 × ) in the first 2-D convolutional layer.The neuron value v ij xy at location x y , (

ECS Sensors Plus, 2024 3
011401current layer feature map.The width and height of the convolutional kernel in 2-D is represented as P andk .
( + ) in the offset of convolutional kernel.The 2-D pooling procedure is used to reduce the feature maps' resolution, and its implementation is identical to that of 1-D CNN.By exploiting the encapsulated functions interfaces of deep learning frameworks, all levels of the 2-D CNN model can be overlaid.Many RGB biometric images have been extracted using the 2-D CNN feature extraction method, and 2-D CNN architectures could extract biometric imaging features with both intra-class and inter-class variation.
when the features count exceeds the class count p c .( 〉〉 ) By mapping the eigenvectors of bx T bx φ φ it is possible to obtain easily.Therefore, the CXC covariance matrix bx T bx φ φ is required to find the eigenvectors.bx T bx

Figure 2 .
Figure 2. Feature extraction based on CNN.
in order to achieve a diagonalizing covariance matrix of the between set.SVD (Singular value decomposition) is diagnosed as S ,

=
are the final transformations matrices with X and Y respectively.

Table I .
Performance analysis of the proposed method.