Enhancing adversarial robustness of quantum neural networks by adding noise layers

The rapid advancements in machine learning and quantum computing have given rise to a new research frontier: quantum machine learning. Quantum models designed for tackling classification problems possess the potential to deliver speed enhancements and superior predictive accuracy compared to their classical counterparts. However, recent research has revealed that quantum neural networks (QNNs), akin to their classical deep neural network-based classifier counterparts, are vulnerable to adversarial attacks. In these attacks, meticulously designed perturbations added to clean input data can result in QNNs producing incorrect predictions with high confidence. To mitigate this issue, we suggest enhancing the adversarial robustness of quantum machine learning systems by incorporating noise layers into QNNs. This is accomplished by solving a Min-Max optimization problem to control the magnitude of the noise, thereby increasing the QNN’s resilience against adversarial attacks. Extensive numerical experiments illustrate that our proposed method outperforms state-of-the-art defense techniques in terms of both clean and robust accuracy.


Introduction
The rapid advancement of machine learning, particularly in deep neural networks, has led to unprecedented progress in various fields, such as computer vision [1], natural language processing [2], and autonomous driving [3]. Simultaneously, quantum machine learning [4], a new computational paradigm that combines quantum computing with machine learning, is swiftly emerging. It leverages quantum parallelism and non-classical correlations, including quantum entanglement, to potentially accelerate or revolutionize existing classical algorithms [5,6]. Importantly, the fusion of these disciplines can lead to symbiotic advancements and offer fresh perspectives for tackling a wide range of challenging problems. For example, classical machine learning has been effectively utilized to address quantum many-body problems in the fields of physics and chemistry [7], along with propelling the progress of quantum simulation technology [8]. Simultaneously, the amalgamation of physics concepts and traditional machine learning techniques has exhibited considerable potential in solving practical quantum problems [9].
With the development of quantum machine learning on noisy intermediate-scale quantum devices [10], quantum neural networks (QNNs) [11][12][13][14][15] have surfaced as a promising approach to executing machine learning tasks on these devices. By fusing quantum computing with deep neural network models, this approach permits the embedding of input data into Hilbert spaces and the execution of classical machine learning tasks such as classification [11,15,16], generative modeling [13,17,18], among others [19,20]. QNNs consist of parameterized gate operations in quantum circuits, with classical optimization methods like gradient descent employed for optimizing the quantum gate parameters (e.g. Pauli rotation angles) for particular tasks and determining the optimal parameters [19,20]. Notably, these networks show potential quantum computational advantages when processing certain quantum synthesized data and solving discrete logarithm problems [21]. However, similar to their classical counterparts, quantum machine learning systems also exhibit a lack of robustness against adversarial attacks [22][23][24][25]. More specifically, for QNNs addressing classification problems, adversaries can generate small perturbations that result in erroneous and high-confidence classification outputs through adversarial attack algorithms. These manipulated inputs, known as adversarial examples [26], pose a significant challenge for deploying QNNs in security-sensitive applications such as quantum state recognition [24], quantum topological phase recognition [22], and medical diagnosis [24]. This security threat has also been experimentally demonstrated on superconducting quantum devices [24].
Various methods have been proposed in classical machine learning to improve robustness against adversarial attacks, including adversarial training [27,28] and model regularization with randomized noise [29][30][31]. Motivated by these works, recent research has investigated different approaches to enhance the adversarial robustness of quantum classifiers. Lu et al [22] proposed quantum adversarial training to enhance the robustness of quantum classifiers, but adversarial training inevitably results in a decrease in accuracy on clean datasets [32]. Du et al [33] investigated the use of depolarizing noise in quantum circuits to enhance the adversarial robustness of quantum models. However, the model's accuracy declines as noise increases beyond a threshold, and in practice, noise cannot render quantum models entirely robust [33].
To alleviate these issues, we introduce an innovative framework incorporating the concept of 'noise layer' . The noise layers randomly rotate the quantum bits on the Bloch sphere and optimize the controlled random rotation angles through end-to-end training. Our framework generates adversarial examples within QNNs using noise layers, aiming to maximize the inner minimax problem in adversarial training. Furthermore, to tackle the outer minimization problem, we suggest employing the combined loss to guide the QNN's learning to balance accuracy on both clean and adversarial data. Our method does not rely on any assumptions about the structure of QNN, and can be easily implemented on QNN with minimal additional parameters and computational overhead. In comparison to quantum adversarial training [22], our strategy achieves better accuracy on clean data while maintaining competitive robust accuracy on adversarial data.

QNN classifier
Classification tasks in supervised learning represent an important branch of quantum machine learning. These tasks involve a decision process in which a model is trained on labeled data, enabling it to predict the corresponding labels for a set of input data. In the context of quantum classification, the objective is to learn an algorithm A : ρ → y that maps input quantum data ρ ∈ H, where H is a subspace of the Hilbert space, to a label y ∈ Y in Y, with Y being a finite, countable set of labels. For simplicity, we assume the input quantum states are pure states. To learn A, a quantum classifier must be trained on a given training set T. The training where ρ i and y i are the input state and the corresponding label, respectively. Given a parameterized model f (θ, ρ), where θ is a set of adjustable parameters, learning is performed by optimizing θ to minimize the empirical risk, where L is a pre-defined loss function. A multitude of QNN models have been developed for classification tasks, drawing inspiration from classical deep learning models [11,12,15,16]. These QNNs consist of a series of quantum circuits that comprise parameterized gates [19]. For classical data, the data needed for classification is first encoded into quantum states through specific state preparation procedures or feature mapping processes [34]. Common methods primarily include amplitude encoding [22,35] and angle encoding [14,36]. Amplitude encoding is a method that normalizes the input data and associates it with the probability amplitude of the quantum state. This method can map classical vectors to the Hilbert space of dimension d = 2 n , where n is the total number of quantum bits used for characterization. On the other hand, angle encoding is more intuitive, where each feature corresponds to a quantum bit, thus requiring n quantum bits to represent n features. Once the data is encoded into the quantum state ρ, a series of parameterized gates will be applied and optimized for the classification task. For quantum data, the data can be directly fed into the QNN circuit.
Various structures of QNNs have been proposed, including quantum convolutional neural networks (QCNNs) [12,37], hierarchical quantum classifiers [16], tensor networks [38], and others. Finally, in order to extract information from ρ for prediction, it is necessary to measure the expectation values of selected observables. In the case of binary classification problems, a natural approach is to measure a single qubit, producing a binary outcome of 0 or 1 as the predicted labelŷ. However, since the measurement is a probabilistic process, the exact value can only be obtained through infinite sampling, and the classifier can only output an estimate of the probabilityŷ(ρ). It is noteworthy that while this paper focuses solely on binary classification, the same approach can be applied to multi-class problems.

Quantum adversarial attack
Recent research has revealed that techniques from classical adversarial learning can produce subtle perturbations in input data, deceiving highly accurate quantum classifiers [22][23][24]. Adversarial attacks can be categorized as either white-box or black-box attacks, depending on the adversary's access to the quantum model's information.
White-box attacks [26,27] involve the adversary having complete knowledge of the quantum model's architecture and parameters, while black-box attacks [22,23] occur when the adversary lacks any knowledge of the quantum model.
The fast gradient sign method (FGSM) [22,27] is an effective one-step adversarial attack method where the adversarial example is generated by: where ϵ is the perturbation constraint determining the attack strength, sgn(·) is the sign function, and ∇ ρ denotes the gradient of the loss with respect to the legitimate ρ with the correct label y. Various attack methods have been developed to improve attack efficiency and usability, such as the basic iterative method (BIM) [22,23,33,39], a multi-step variant of FGSM, to generate adversarial examples: where ρ adv k denotes the adversarial example updated at step k, Π is the clipping function, which can be viewed as the projection operator of the normalized wave function in the field of quantum machine learning [22], and α is the attack step size.
One prevalent black-box attack is executed through substitute models [40], where the adversary trains a model using the target model's outputs as labels and employs the trained model to generate adversarial examples. Transferable adversarial attack [41], a variant of the substitute model attack, are a crucial technique for attacking quantum models. In transferable adversarial attacks, the adversary generates adversarial examples from a source model, which could be either a classical [22] or a quantum [23] model, to attack the target model. The source and target models can have completely different structures, but they need to be trained using real training data.

Quantum adversarial defense
Due to the security threats posed by adversarial examples [26], several defense strategies have been proposed to enhance the adversarial robustness of quantum machine learning models, such as adversarial training [22,24] and the use of depolarizing noise [33].
Adversarial training involves treating adversarial examples as training data, which helps the model learn to differentiate between adversarial examples and improve its adversarial robustness. The standard adversarial training approach aims to obtain the optimal solution for QNN parameters θ by solving the following min-max problem: where the inner maximization finds perturbed data ρ adv , and H ϵ is the set of adversarial perturbations subject to the ϵ-constraint. The outer minimization is optimized through standard QNN training. This simple and effective method has been experimentally verified on superconducting computers [24]. In classical adversarial learning, injecting noise into the model to enhance its adversarial robustness has yielded promising results. Inspired by this, Du et al [33] proposed using depolarizing noise in quantum circuits to improve robustness, leveraging the connection between differential privacy and adversarial robustness. Injecting noise can also be viewed as a form of model regularization.

QNNs with noise layers
In this section, we propose improving the adversarial robustness of QNNs by incorporating noise layers. We will first introduce the proposed method and then discuss the impact of this method on QNNs.

Proposed method
Our main idea is to introduce randomness into the variational circuits to achieve model regularization. We introduce noise layers that can add controllable perturbations to the quantum bits in the corresponding quantum circuits. Specifically, QNNs are generally considered as quantum simulations of classical neural networks, consisting of multiple layers of parameterized quantum circuits. Each layer includes trainable parameters for rotation gates and entangling gates, which can be written as a product of unitary gates in the following form:Û where P denotes the number of layers in the variational circuit,Ŵ ℓ represents non-parametric unitary operations at the ℓth layer,V ℓ ( θ ℓ ) represents unitary operations with variational parameters at the ℓth layer, and θ ℓ denotes all parameters in the ℓth layer.
To increase the randomness of QNNs, we propose adding noise layers before each QNN layer. As shown in figure 1(a), we represent the structure of QNNs with noise asÛ N (θ) and the quantum state after passing through the ℓth layer of the noise layer can be expressed asÛ ℓ Then, the QNN with noise layers can be written as follows: The role of the noise layer is to induce random rotations on the Bloch sphere. This is achieved by incorporating single-qubit rotation gates, such as the RY gate shown in figure 1(b), to introduce random rotations and generate random variables as rotation angles. The coefficients used to control the size of each η ℓ i can be mathematically described as follows: where η ℓ i is the noise variable of the ith qubit in the ℓth layer. For simplicity, we assume that η is randomly sampled from a Gaussian distribution with zero mean and variance σ 2 . However, our method can also be applied to other noise distributions, such as Gaussian-Bernoulli. β ℓ is the coefficient used to control the size of each η ℓ i . Assuming that a noise layer is added throughout the entire QNN, only one β ℓ per layer needs to be optimized for each parameterized quantum circuit in that layer. It can be regarded as a variational parameter in the QNN and optimized through gradient descent. Like most classifiers, the QNN with noise layers consists of two independent processes, training and inference. The algorithms for training and inference are presented in algorithms 1 and 2, respectively. Algorithm 1. Training of quantum neural network with noise layer.

Input:
The model with parameter θ, the constraint coefficient β, the combined loss function L ′ , the training set , the learning rate r, the batch size n b , and the optimizer f o Output: The trained model 1 Initialization: generate random initial parameters for θ 2 for number of training iterations do 3 Randomly Random variables are generated by (7)  5 Set up rotation angles of additional single-quantum rotation doors with random variables 6 Compute gradients (noise gradients) G ← 1 Updates: θ, β ← fo(θ, β, r, G) 8 end 9 Output the trained model Algorithm 2. Testing of quantum neural network with noise layer.
Input: The trained model with parameter θ, the constraint coefficient β, the given testing sample ρ, the number of measurements J , the counter count = 0 Output: Predicted class labelŷ 1 Initialization: count ← 0; Random variables are generated by (7)  4 Set up rotation angles of additional single-quantum rotation doors with random variables 5 Measure the output qubits and obtain the measurement result y i 6 if y i == 1 then 7 count ← count + 1; 8 end 9 end 10 Calculate the frequency of the predicted class label: Predicted class labelŷ = 1; 13 end 14 else 15 Predicted class labelŷ = 0; 16 end 17 Output the predicted class label However, when training QNNs with noisy layers, we found that the optimizer tends to minimize the noise, leading to β convergence to values close to zero (as shown in table 2). To address this, we combine noise injection with adversarial training to avoid overfitting to clean data and successfully defend against adversarial attacks. As previously mentioned, adversarial training aims to solve the min-max problem in (4). In QNNs with noisy layers, solving the internal maximization problem aims to find adversarial examples for a given data point, which is obtained by solving the following equation: where ∆ denotes a sufficiently small region to ensure that the adversarial perturbation is small and does not fundamentally alter the original input, and f ϵ (θ, ρ) represents the model with noisy layers. Unlike standard adversarial example generation, our ρ adv * is generated with noise injection. Moreover, when solving the minimization problem, in order to avoid the model only minimizing the loss for perturbed data during the outer minimization process and losing information about clean data, we do not minimize only the adversarial loss given by the adversarial example in (4), but instead use the combined loss L ′ , which is a weighted sum of the losses on clean and perturbed data. The combined loss L ′ is described as follows: where γ is a hyperparameter that controls the weight of the loss for clean data and perturbed data. In this paper, we set it to 0.5. The advantage of this approach is that, compared to the loss function of (4), the model in adversarial training no longer focuses solely on the 'current' training sample (whether it is a clean or adversarial sample). With the loss function defined by (9), when training on clean samples, the model takes into account the impact of adversarial examples; conversely, when training on adversarial examples, the model also considers the impact of clean samples.

Analysis
First, during the training process of QNNs with noisy layers, noise is randomly generated at each gradient descent step. Empirical evidence [42,43] suggests that this regularization-like technique assists the optimization algorithm in finding parameters that are robust to adversarial perturbations. Liu et al [42] theoretically demonstrated that adding random noise is equivalent to Lipschitz regularization. By controlling the noise coefficient, one can balance the model's robustness with the training loss. Moreover, similar to classical adversarial machine learning, quantum adversarial attacks require access to the model's gradient information [22,24]. Due to the peculiarities of variational circuits, finite-difference methods and the parameter-shift rule [14,19,44,45] are commonly used to estimate the gradients of QNNs. The parameter-shift rule is particularly suitable for implementation on current quantum devices. Under normal circumstances, the required partial derivatives can be numerically calculated using finite-difference schemes, as shown below: where ∆ is a small random perturbation vector. To estimate the gradient, this method requires evaluating the loss function twice for each parameter. Due to the presence of noise layers, adversaries are affected by the randomly generated noise perturbations when computing the two evaluations of the loss function, resulting in a significant discrepancy between the computed gradient and the true gradient. Although the parameter-shift rule can accurately obtain the gradients of circuits, this method also requires the execution of circuits twice to estimate the gradient. Similarly, adversaries cannot accurately obtain the gradient of circuits with noise layers. Since most attacks require the computation or estimation of gradients, noise layers help to perturb the gradients and deceive adversarial attacks, thereby reducing the success rate of adversarial attacks. Second, conventional adversarial training [22,28] aims to associate the same class label with a sample and its surrounding adversarial examples to help the model achieve a certain level of certified robustness. However, this approach can lead to severe overfitting on the training data, resulting in a decrease in the model's generalization ability on natural samples. Additionally, the noise injection in QNNs increases the model's randomness, inevitably leading to a decrease in accuracy. To address these issues, we introduce the combined loss function L ′ to successfully train QNNs. We optimize L ′ to find an appropriate value of β that balances the model's prediction accuracy with its robustness against adversarial attacks. This helps avoid excessive or insufficient noise, which can both affect the model's accuracy and increase the risk of overfitting to the training data.
Finally, to better understand the potential sample space generated by QNNs with embedded noise layers, we randomly selected 100 samples of the digits 1 and 9 from the MNIST dataset and performed unsupervised PCA feature reduction to reduce the size of the space to 2, demonstrating the effect of adversarial perturbations on QNNs with noise layers. We represent data that the model can correctly classify with circles and data that it cannot correctly classify with crosses. In figure 2(a), almost all data can be correctly classified by the QNN trained with standard methods. In figure 2(b), we use ϵ = 0.15 to generate adversarial examples corresponding to randomly selected samples, and approximately 65% of the samples cannot be correctly classified. In figure 2(c), we perform quantum adversarial training with ϵ = 0.15 and then generate corresponding adversarial examples. Most of these samples can be correctly classified by the model trained with adversarial training, but the performance of the model in predicting clean samples will be degraded. This is because, in order to help the model learn more complex decision boundaries, adversarial training assigns the same class labels to adversarial examples as to clean training samples, which causes the model to overfit to the training data and destroys the classification performance of clean samples. In figure 2(d), we also use ϵ = 0.15 to perform adversarial training in the QNN with noise layers, and compared with standard adversarial training, the model shows better performance on both clean and adversarial data.

Numerical experiments
In this section, we assess the effectiveness of our defensive strategy through numerical simulations using Pennylane [46]. At the outset, we implemented our approach on the 'moons' dataset sourced from scikit-learn [47], validating its compatibility with encoding techniques such as amplitude and angle encoding, and the influence of noise layers on the Hilbert space. Subsequently, we extended our experiments to encompass the ground state quantum dataset of the one-dimensional (1D) transverse field Ising model and the MNIST dataset [48].

Experimental setup Dataset.
For the synthetic dataset [47], we generated 1000 data points, introducing the noise level of 0.03, and classified these into class 0 and class 1. We reserved twenty percent of this data for testing. For the quantum dataset, following previous research [22,23], we classified the ground state of the 1D transverse field Ising model as follows: where J x represents the strength of the transverse field, and σ z n and σ x n denote the Pauli matrices of the nth spin. A quantum phase transition occurs at J x = 1, between the paramagnetic phase for 0 < J x < 1 and the ferromagnetic phase for J x > 1. We sampled an Ising model with a system size of 4 for various J x values (from 0 to 2), and determined the corresponding ground states. These ground states and their labels constructed the dataset of 999 data points, from which we randomly selected 40% for testing. For the MNIST dataset [48], to mimic quantum computation with limited classical resources, we reduced the image size from 28 × 28 pixels to 16 × 16 pixels and normalized the data. The training involved 400 samples of digits 1 and 9 from the training set, while the testing used 1000 samples from the test set. Network structure and hyperparameters. We adopted the general multi-layer variational neural network structure for the QNNs, aligning with previous research [22,23]. Unless otherwise mentioned, we used amplitude encoding [22,35] to transform classical data into quantum states. The circuit depth of the QNN was set to 10. We used the parameter shift rule [14,19,44,45] to obtain the required gradients, and the quantum version of cross-entropy [22] served as the loss function. The Adam optimizer [49] minimized the loss function with a batch size of 64 and a learning rate of 0.005. We conducted training over 100 epochs for synthetic and quantum datasets, and 200 epochs for the MNIST dataset. In section 4.4.4, we also opted to utilize the QCNN [12]. Diagrams of both types of QNNs (QNN and QCNN) are illustrated in figure 3. The QNN comprises P layers, with each layer containing a rotation unit and an entangling unit. The rotation unit carries out arbitrary single-qubit operations, while the entangling unit consists of controlled NOT (CNOT) gates. Quantum data is converted into classes through the positive operator-valued measure (POVM) measurement of the Z-axis of the Bloch sphere. Furthermore, the σ 2 in (7) was set to 2π 2 5 , and the initial value of β was set to 0.25.
Attack settings. The attacks were assumed to be aware of the QNN's randomization and were modeled according to the quantum adversarial training setup by Lu et al [22], using BIM to generate adversarial examples with an iteration count of 3 and a step size of 0.05. Unless otherwise specified, experiments were evaluated based on the attack performance with ϵ = 0.15. For the black-box attack, we used classical network architectures identical to that of [22], with the final layer of the network outputting only two classes. We utilized a model based on a fully-connected neural network (FNN) and a convolutional neural network (CNN). The FNN includes two hidden layers (comprising 512 and 53 neurons respectively), two Dropout layers to mitigate overfitting, and a Softmax output layer with two units. The CNN contains three convolution layers (with 64, 128, and 64 filters and filter sizes of 8×8, 4×4, and 2×2 respectively), followed by a Flatten layer, and ultimately an output layer with two units.

Baseline.
To the best of our knowledge, quantum adversarial training [22] serves as the most prevalent defense mechanism in quantum adversarial learning [22,24,50], which we use as the baseline defense model. To train the baseline model, at each epoch, we first generate the corresponding adversarial examples from clean samples and use them together as the training dataset for retraining. Additionally, we discuss several defense mechanisms, such as Gaussian data augmentation [51] and the use of depolarizing noise to protect quantum classifiers [33].

Synthetic dataset
We employed amplitude encoding [22,35] and angle encoding [14,36] to convert data from the synthetic dataset into quantum states. The training of QNNs was accomplished using both baseline training and our proposed training. As displayed in figure 4, our approach significantly reduced the training loss when compared to the baseline, irrespective of the encoding and data type (clean or adversarial). Notably, the smaller training loss often implies a more robust model [52]. As evidenced in table 1, our strategy improved the robustness of the synthetic dataset by approximately 3%-4%, relative to the baseline. Figure 5 illustrates the data distribution in Hilbert space at varying stages of amplitude encoding, where adversarial data is produced by FGSM [22,27]. Figures 5(a) and (b) depict the training set distribution in Hilbert space before training, for models with and without the noise layer. Figures 5(c) and (d) display the adversarial data distribution from the test set in Hilbert space, after standard QNNs training, irrespective of  the noise layer's presence. Finally, figures 5(e) and (f) present the adversarial data distribution from the test set in Hilbert space, after training with both the baseline and our method. Insights from figures 5(a) and (b) suggest that the training data presents a 'linear' distribution in the Hilbert space in QNNs without the noise layer. Nonetheless, data in Hilbert space appears more dispersed when the QNN incorporates the noise layer, compared to the model without it. This attribute might aid in accurately classifying as many unseen training points as possible [36], and could be beneficial for subsequent training [53].

Quantum dataset
Contrary to classical data inputs, QNNs can directly categorize quantum data. Figure 6 contrasts the learning curves of three models on the quantum dataset: 1) the baseline QNN, 2) the QNN with the noise layer but without the combined loss computed using (9), and 3) the QNN with the noise layer that employs the combined loss calculated with (9)-our proposed approach. Absent the combined loss, the training loss barely differs from the baseline, on both clean and adversarial data. Nevertheless, our strategy significantly reduces loss on both data types compared to the baseline. Subsequently, figures 6(b) and (c) offer the performance comparison of the standard training model, the baseline, and our model on the quantum dataset test set, under the adversarial perturbations of FGSM [22,27] and BIM [22,39]. These diagrams clearly demonstrate that adversarial training of the baseline model has somewhat enhanced the model's robustness. However, notably, our method surpasses the baseline.

MNIST dataset
In this section, we initially delve into QNNs with noise layers, investigating from three distinct perspectives utilizing the MNIST dataset [48]: (1) factors influencing the limiting coefficient β, (2) the implications of injecting noise layers at different stages, and (3) the effect of σ 2 in (7). Subsequently, we evaluate their robustness under white-box and black-box settings.

Factors affecting the constraint coefficient β
As discussed in section 3.1, training the constraint coefficient β requires the combined loss. We conducted training experiments on QNNs under different conditions, including QNNs without using the combined loss and without noise injection (referred to as QNN-V), QNNs without using the combined loss but with noise injection (referred to as QNN-N), QNNs using the combined loss but without noise injection (referred  to as QNN-C), and QNNs using the combined loss and with noise injection (referred to as QNN-CN), to compare the convergence of training with noise. As shown in table 2

Impact of noise injection phase
In this section, we demonstrate the necessity of using noise layers during both the training and testing phases to improve a model's adversarial robustness. We experimented with four different models: the standard model without any noise layers, the model with noise layers only during testing, the model with noise layers only during training, and the model with noise layers during both phases. For the models with noise layers, we applied the combined loss to ensure optimal performance. Figure 7 displays the impact of different adversarial attacks using FGSM and BIM on the performance of the four models on the MNIST test set. It should be noted that when ϵ = 0, the robust accuracy degrades to the accuracy of clean samples.
As mentioned earlier, the primary purpose of noise injection is to fool gradient-based adversarial attacks. The simplest idea is to pretrain the QNN and then add noise layers to each layer of the QNN during the testing phase. Unfortunately, as shown in figure 7, this method, adding noise layers only during the testing phase, leads to a significant decrease in both accuracy and robustness of the QNN. Moreover, we found that adding noise layers during the testing stage is important. In figure 7, we compare models that only add noise layers during the training stage with those that add noise layers during both training and testing stages. We found that the former had much lower robustness performance compared to the latter. These results suggest that if the noise layer is not used at some stage, the model's adversarial robustness will be reduced.
In summary, our experimental results demonstrate that injecting noise layers during both the training and testing phases and using the combined loss to train the control coefficient β are crucial for improving the adversarial robustness of QNNs. Figure 8 highlights the impact of both the standard training and our proposed approach on the QNNs' accuracy for clean and adversarial data at different σ 2 values on the MNIST dataset. In the majority of instances, QNNs trained using our method demonstrate substantial stability across various σ 2 values.

Effect of σ 2
Nonetheless, when σ 2 = 2π 2 1 , a significant decrease in the QNN's accuracy is observed, regardless of the sample type. This observation suggests that the incorporation of the noise layer could potentially impair the QNN's performance at notably high σ 2 values. Despite this, it is critical to remember that σ 2 = 2π 2 1 signifies a relatively large value. Therefore, we believe that within a reasonable spectrum of σ 2 , the noise's impact on the QNN's performance remains marginal or negligible.

White-box robustness
We evaluated the robustness of QNNs with added noise layers from two perspectives: circuit depth and network architecture. For different circuit depths, we conducted experiments on defensive baseline models and QNNs with noise layers added before each rotation unit layer. Additionally, we selected the QCNN depicted in figure 3 as an alternative network structure to demonstrate the generality of our approach, with the noisy layer added before the quantum convolutional layer. Figure 9 illustrates the impact of circuit depth on robust accuracy. It should be noted that we only trained the QNN with a circuit depth of 40 for 100 epochs. The experimental results of the QCNN are listed in table 3. As shown in figure 9, our method achieved better robustness and had better clean accuracy compared to the baseline. We observed that increased circuit depths contributed to improved model robustness, consistent with the findings of Lu et al [22]. Furthermore, when the testing ϵ is greater than the trained value (ϵ = 0.15), the model's robustness drops sharply, aligning with the conclusion of Madry et al [28]. However, our method offers greater resilience against adversarial attacks, delaying the onset of robustness collapse. Table 3 demonstrates that our method also yields a significant improvement in the adversarial robustness of the QCNN model.

Black-box robustness
In this subsection, we evaluate our proposed approach against transferable attacks [22,24]. Transferable attacks involve training a source model to generate adversarial examples that can be used to attack a target model. Considering that adversaries may not have access to quantum devices and might only have classical resources at their disposal, we employ classical models to generate adversarial examples for the source model, including the FNN and the CNN mentioned in the section 4.1 attack settings. We adopt the network architecture, training process, and adversarial example generation methods (such as FGSM, BIM, and the momentum iterative method (MIM) proposed by [54]) as suggested by [22].
The black-box robustness of all defense models is presented in table 4. Similarly, our proposed method demonstrates superior robustness compared to adversarial training, highlighting the effectiveness of our approach in countering transferable attacks.

Discussion
As discussed in section 2.3, several adversarial defense approaches have been proposed to secure quantum classifiers for practical applications. Among them, quantum adversarial training [22,24] is the most popular and reliable method, which we use as the baseline for our defense model. Additionally, we compare our proposed method with other similar approaches [33,51].
The distinction between Gaussian data augmentation [51] and our method is that the former only adds Gaussian noise to images during training, while we add noise layers to the network to defend against adversarial attacks during both training and testing phases. During training, the noise layers help the optimizer find robust variation parameters for perturbed inputs. During testing, the noise can perturb gradients to deceive gradient-based attacks. Furthermore, Gaussian data augmentation cannot defend against adaptive attacks [55]: once the adversaries are aware of the image enhancement process, they can generate adversarial examples to attack the model. In contrast, our experiments are conducted under the assumption that the adversaries know the randomization process.
Protecting quantum classifiers using quantum noise [33] is a related method that injects noise at different locations in QNNs. This method adds depolarizing noise to QNNs and introduces the concept of differential privacy to defend against adversarial attacks. Although this method provides theoretical guarantees, using differential privacy for defense will compromise the accuracy of clean data. Moreover, in classical adversarial machine learning, differential privacy performs poorly in resisting attacks based on the L ∞ -norm. Additionally, they require manual configuration of the noise level; although our method also requires setting the noise level hyperparameters, it can automatically adjust the noise level through trainable constraint coefficients.

Conclusion and future work
In this paper, we propose adding noise layers in QNNs as a regularization technique aimed at improving the generalization of quantum models for both clean and robust performance. The noise intensity within these layers can be controlled by solving a min-max optimization problem during adversarial training. Our numerical experimental results demonstrate that our method effectively withstands white-box and black-box attacks and outperforms the state-of-the-art defense methods in terms of accuracy on both clean and adversarial perturbed data. It is important to note that while adversarial training is an effective approach for enhancing model robustness and maintaining high accuracy on both clean and adversarial data, it is computationally demanding. Besides computing gradients for updating variational parameters, generating adversarial examples requires multiple gradient computations. In fact, developing a robust model through adversarial training can take approximately 3 to 30 times longer than standard model training. In future work, we plan to investigate methods to improve quantum model stability and reduce computational overhead. We believe that further research in this area is crucial for the development of safe and reliable quantum AI technology.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.