Smoothness Enforcing Projected Gradient Descent Adversarial Training

The paper proposes a novel adversarial training method based on a smoothness enforcing framework called Smoothness Enforcing Adversarial Training (SEAT). It is known that the traditional adversarial training methods suffer from prohibitive computational overhead and mismatched class distributions. To this end, our method is combined with smoothness enforcing methods in adversarial training iterations to achieve better robustness and generalization. SEAT reuses gradient information computed when updating model parameters. Meanwhile, random initialization and output smearing are integrated into the process to optimize the mismatched problems with better generalization performance. Furthermore, the temporal ensembling serves as an implicit self-ensemble of the adversarial information which benefits from a longer memory. In different iterations, the smoothness constraints are imposed to enforce smoothness. Unlike existing adversarial training methods, our method is free from arbitrarily complicated distributions and expensive generation of adversarial examples. Extensive experiments validate the effectiveness of SEAT in comparison with state-of-the-art adversarial training methods.


Introduction
Recent breakthroughs in machine learning make security aspects of machine learning models increasingly significant [1], [2], [3], [4], [5].The problem is very challenging especially in safetycritical environments due to an intriguing discovery: there is no need for complicated and varied methods, small but intentionally worst-case perturbations can trick machine learning models into outputting high confidence erroneous results [6], [7], [8].For brevity, we call these perturbations adversarial attacks.Adversarial attacks has been extensively studied in the past few years.L-BFGS Attack [9] first introduces adversarial examples against deep neural networks.Fast Gradient Sign Method (FGSM) [10] adds a small vector whose elements are equal to the sign of the elements of the gradient of the cost function to attack the input.Basic Iterative Method (BIM) and Iterative Least-Likely Class Method (ILLC) [11] extend the FGSM by running a optimization for multiple iterations.Many other adversarial attacks, for example, Jacobian-Based Saliency Map Attack (JMSA) [12], DeepFool [13] and CW Attack [14] et al. have also been proposed.Clearly, all of these attacks can be viewed as specific attempts to fool machine learning models.
The regularization term can be equivalently seen as a prior distribution reflecting our prior knowledge [15], [16], [17], which intuitively means that the output distribution of a good model should be insensitive and smooth.And because regularization is often executed by adding regularization terms to introduce additional information, we can regard our study as the regularization of adding adversarial information in order to manage the gap between adversarial examples and natural examples.Adversarial examples [18], [19], [20] are defined as the examples produced by the adversarial attacks that are almost indistinguishable from natural examples but can deceive the model [21], [22].The subject of our study focuses on incorporating adversarial examples into the training process, called adversarial training [23], [24].In retrospect, this training method is in fact quite intuitive.The model defenses against adversarial attacks by smoothing the model on adversarial directions, that is, the most sensitive input space direction.Several studies have also confirmed that adversarial training is effective [25], [26].
In general, exploiting adversarial examples to smooth the output distribution for adversarial training are universal, but its drawbacks are also obvious.One major drawback in the philosophy of adversarial training is the computational overhead of adversary generation.When we generate adversarial examples, multiple gradient computations are required to be executed, resulting in unacceptable time cost.Besides, simple adversarial examples augmentation often leaves the hybrid decision boundary particularly sensitive.
To alleviate the aforementioned challenges, many alternatives are motivated.However, solving mismatched class distributions and heavy computational cost simultaneously still exists two main problems to be solved.The first problem is how to make the decouple of adversary update with the gradient calculation work collaboratively with smoothness enforcing.The second problem is how to control the trend of training to help improving classification ability and robustness.
To this end, this paper designs a robust feedback training loop which integrates smoothness enforcing methods into adversarial training, thereby realizing the collaborative working.As a result, this paper proposes a Smoothness Enforcing Adversarial Training (SEAT) for adversarial training.SEAT reuses gradient information computed when updating model parameters.Meanwhile, random initialization and output smearing are integrated into the process to optimize the mismatched problems with better generalization performance.Furthermore, this paper designs temporal ensembling as a implicit self-ensemble of the adversarial information which benefits from a longer memory.In each iteration, we impose smoothness constraints to enforce smoothness.
In summary, our main contributions are three-fold: • SEAT realizes smoothness enforcing under the projected gradient descent adversarial training framework through combining smoothness enforcing and gradient information recycling methods.By doing so, SEAT can keep smoothness enforcing and adversarial training collaborative working, which greatly improves the robustness and eliminates the computational overhead.• SEAT enhances the adversarial information obtained by a longer memory by introducing temporal ensembling, which trains the output distribution to be smooth on the low dimensional manifold around local adversarial perturbation.By doing so, SEAT enriches the expressions of adversarial samples and execute a robust feedback training loop.• SEAT makes the generation trend of decision boundary insensitive for classification through introducing smoothness enforcing methods, random initialization and output smearing, which imposes smoothness constraints.By doing so, SEAT exhibits a smooth feature on the classification results.
The remainder of this paper is organized as follows: Section II presents the related work.In Section III, we first formulate adversarial training.Then we introduce our algorithm to solve the proposed challenging problem and achieve state-of-the-art performance, followed by results and experimental analyses on its effectiveness in Section IV.Finally, we conclude in Section V.

Related Work
Goodfellow et al. [10] first propose to generate adversarial examples at each step and add them to the training process.This attempt successfully incorporates adversarial information into the learning of the network and achieves excellent performance.Madry [27] proposes to utilize PGD attacks for training instead of using only non-iterative attacks to defend against the powerful first-order attacks.This work is followed by several variants.Xie et al. [28] defend against adversarial attacks by applying diverse feature denoising modules.Dhillon et al. [29] propose Stochastic Activation Pruning (SAP), which prunes a random subset of activations and scales up the survivors for balance.A great amount of effort has been devoted and many efficient methods are proposed for PGD-based adversarial training [30,31].However, in practice, these methods often require huge computational cost, and are usually not easy to implement.To alleviate the aforementioned challenges, Shafahi et al. [32] eliminate the overhead cost of the generation by recycling the gradient information.Zhang et al. [33] model the optimization formulation as a differential game.Compared with these state-of-the-art methods, our method can achieve better performance.

Adversarial Training
We consider the adversarial training task with an underlying data distribution D over pairs of examples x ∈ R d and corresponding labels y ∈ Q, where d is the input dimension, x i ∈ X and Q is the space of all labels.We also assume J is the pre-defined loss function, for example, crossentropy loss.We aim to learn a function f : X → [0, 1] Q parameterized by model parameters θ ∈ R p by solving an optimization problem: Adversarial training incorporates adversarial examples into training.We therefore first briefly introduce two most relevant adversarial attacks which generate adversarial examples.Because neural networks utilize Euclidean distance to approximate the perceived distance of space.When an image is mapped into a high-dimensional space, there are some special directions in which an immeasurably small offset is the interpretation of images with an imperceptibly small perceptual distance corresponding to completely different categories in the classification task.FGSM obtains the optimal perturbations under infinite norm by: We apply a transformed version of the above strategy in our training method.Instead of a single attack, Projected Gradient Descent (PGD) is a more powerful multi-step variant adversarial attack: This attack strategy is considered one of the most powerful first-order attacks and will be exploited by us to test defense performance.Based on the above adversarial attacks, the natural saddle point problem in adversarial training can be derived [14]: where the constraint on perturbations (x adv , x) ∈ S is equivalent to || x adv , x || p ≤ ϵ.Generally, we cannot obtain a closed form for the perturbation δ.FGSM gives an approximation at infinite norm δ ≈ ϵ sign((∇ x J(x, y, θ)).The goal of the inner loop of the min-max formulation in Eq.( 3) is to find the maximum perturbation δ max within the bounds.On the other hand, the outer loop aims to minimize the loss of input examples with the perturbations.Generally, adversarial training deals with both non-convex external minimization and nonconcave internal maximization by computing the gradient internally to generate adversarial samples and externally using Stochastic Gradient Descent (SGD) to update network parameters.

Proposed SEAT
Adversarial training relies on the expensive generation of adversarial examples essentially.Therefore, we make some assumptions on the training to approximate this generation process.We model the training into a noise approximation problem and solve for the optimal noise of each period with the smoothness enforcing.With this compromise, the first objective function is thus given by: where D [q, p] is a function, for example, the cross entropy, that measures the distance between two conditional label distribution q and p given inputs.p (y|x + δ adv , θ) is the probability distribution of the noisy input parameterized by θ.The loss function aims to encourage the prediction to be consistent with the actual result after adding the adversarial perturbations.δ adv is the perturbation, defined as: In this study, we replace the generation of adversarial examples by computing the ascent step by reusing the backward pass of the descent step.Firstly, Inspired by [32], the gradient with respect to the network and the gradient with respect to the inputs can be shared.Hence, given the gradient utilized to update the network parameters, this also means we can obtain the gradient with respect to the inputs.Secondly, the network updates continuously after gradient computations.For an arbitrarily complicated adversarial information, repeated and enhanced coding ensures that the information will not be lost.Therefore, we perform this operation over a long period of time.More importantly, we assume that the outputs are smooth, the smoothness assumption prompts us to make conditional output distribution that are smooth with respect to conditional input.
As the randomizing outputs is a simple way to enforce smoothness, we apply this technique, output smearing, to construct diverse training sets, the noisy labels is established as follows: where z is sampled independently from the standard normal distribution and std is the standard deviation.Then we normalize ŷ.By doing so, we construct three diverse training sets from the initial training sets.
Generally, when the loss is calculated along the negative gradient direction, the fastest model optimization rate can be obtained.We thereby define the gradient direction as the adversarial direction.Adversarial direction is extremely sensitive, such that it can most reduce the probability of correct label.This belief prompts us to construct the training: Unlike the introduction of x adv in Eq.( 1), the gradient ∇ x J(x, y, θ) can be efficiently obtained by re-using ∇ θ J(x, y, θ).We initialize the perturbation by sampling independently from the standard normal distribution.

Algorithm 1
The proposed SEAT algorithm.Note that the updates of Temporal Ensembling δ can equally be done in the minibatch loop, in this pseudocode they are applied between epochs.Input: Set of training input indices with known labels L with training data X, the ensemble momentum coefficient α, the perturbation bound ϵ, the iterative number T , learning rate λ. for each minibatch B do 6: Train network with mini-batch from training set L using stochastic gradient descent 8: update network parameters θ 11: Calculating δ 12: end for 15: end for 19: end for 20: return network parameters θ, noisy inputs X; Previous works run gradient descent twice on generating adversarial examples and fitting adversarial examples respectively.Instead of generating adversarial examples, we recycle the gradient information.Here, we directly utilize the gradient with respect to the network.In detail, the re-using is defined as below: One problem with δ is that it is difficult to evaluate the training due to the unknown information distribution.Indeed, we have no direct information about inputs, which can lead to generalization errors caused by replays.Here, we propose the temporal ensembling to alleviate the aforementioned challenge by maintaining an exponential moving average of δ on each minibatch and penalizing large offsets.To begin with, we first introduce the temporal ensembling, where we form a consensus ensembling of the learnable perturbation under multiple inputs of the network-in-training on different epochs.We calculate gradient information ∇ x J(x, y, θ) for each minibatch and map it to δ, and then continuously accumulate δ of each period in the training process to update.The temporal ensembling is defined as below: where the α is the ensemble momentum coefficient to control how far the ensemble reaches into learning history.This weight controls the contribution of the information learned in a single period to the total information.One benefit of using the temporal ensembling is that they are controllable.Therefore, we uniformly control the span of the updates that incorporates the learned information into the training process, yielding better δ.We then correct the start deviation: where m is the current epoch index.With the temporal ensembling, we construct δ that benefits from a long memory.Once δ is computed, our method simply becomes the computation of the divergence between the distributions q(y|x) and p (y|x + δ adv , θ).However, we have little control over the effect of the temporal ensembling and gradient reusing.For the smoothness enforcing, the temporal ensembling and the gradient reusing replay, the calculated values are unreliable or even erroneous, and these incorrect values will degenerate the performance.We can further refine the values based on the smoothness assumption.For simplicity, we utilize the clip, which considers the neighboring computed value and the overall value simultaneously.Since both replay and the temporal ensembling will cause abnormal fluctuations in δ , for each replay epoch and each temporal ensembling epoch, the clip is imposed to measure the stability of δ.If δ > ϵ or δ < −ϵ , we regard δ calculated this round as unstable δ.And we will eliminate them and constrain them to be equal to the value of the positive or negative perturbation bound.

Experiments
We implement a series of experiments to evaluate the performance of our method.The first experiment is conducted on MNIST, CIFAR-10 and CIFAR-100 datasets to show that our method can achieve the best defensive performance with competitive time overhead in comparison with state-of-the-art methods.Moreover, we provide several ablation studies to verify the indispensability of each module.Hereafter, we compare our method with [32] on same replay parameters under different attacks.We further discuss the effects of perturbation bounds and the results on a CIFAR-10 dataset in terms of different perturbation bounds are also reported.Finally, we report a visual evaluation on noisy inputs.In this experiment, we demonstrate that our method is explainable and can defense attack reliably.and MNIST.We use SGD with a momentum of 0.9 and a weight decay of 5e -4.The learning rate starts from 0.1 in initialization and is divided by 10 between 12-th epoch and 22-th epoch, and is set to 0.01 after 22-th epoch.The experiments are taken on NVIDIA Tesla P40 GPUs.For all the experiments, the batch size is set to be 128.

Nature Validation Accuracy and Robustness on MNIST, CIFAR-10 and CIFAR-100
We compare our SEAT with state-of-the-art methods on widely adopted benchmarks, MNIST, CIFAR-10 and CIFAR-100.All reported results are the means of repeated experiments using different random seeds.[27] did not report any experimental details.Thus we directly copy the result they reported.
The results in Table 1, Table 2 and Table 3 indicate that SEAT achieves state-of-the-art robustness.In Table 2 and Table 3, both natural accuracy and robustness accuracy of CIFAR-10 with less categories are better than the results of CIFAR-100 with more categories.Therefore, it is obvious that datasets with more categories is much more challenging to classifiers.Overall, the proposed SEAT obtains first ranks in the metric of robustness accuracy in all datasets.In details, according to Table 1, it is found that our proposed SEAT gets the first place in the robustness accuracy of 97.21% against PGD-40 and 93.73% against CW attacks on MNIST.Besides, it achieves the best robustness accuracy of 49.22% on CIFAR-10 and the best robustness accuracy of 27.85% on CIFAR-100 against the PGD-20 attack, which are much better than other state-of-the-art methods.

Methods
Ablation Moreover, when defending against strong PGD-100 attacks in CIFAR-10, the best robustness accuracy of the proposed SEAT is 48.80%, which is 2.04% higher than the second ranked 10-PGD adversarial training.Our proposed SEAT also outperforms the state-of-the-art methods on both CIFAR and MNIST against the CW attacks.Lots of experiments about adversarial training prove that it is much more difficult to improve robustness results based on controlled training time.However, the proposed SEAT achieves competitive computational overhead too.Obviously, the time cost improves a lot based on the strategy that SEAT does not generate adversarial samples.It should be noted that the computational overhead of some methods is very small, but they fail to achieve compromised robustness accuracy.In contrast, SEAT can effectively reduce the time cost by reducing the number of replays, and good replay parameters values span a larger range.Since SEAT works much better than other methods on robustness, it only achieves acceptable results on the natural accuracy.Note that due to the basic trade-off between robustness and generalization, a small compromise in accuracy is acceptable with a substantial increase in robustness [32].Following other methods in adversarial training, we also apply the replay technique.We train our model on the same minibatch t times in a row.Since the replay parameter can be selected without restrictions, we only report experimental results of the replay parameters that achieve good performance in this paper.Actually, the value of a good replay parameter is less than 10, and outside the range, excessive replays cause rapid performance degradation.From the results we can see that good values occur roughly when the replay parameters are between 4 and 8.In addition, as the replay parameters increase, the robustness of the model improves, which also leads to a decrease in the accuracy on clean data.With this compromise, we arrive at our best replay parameters range.From Table II and Table III, it seems that the correlation between robustness accuracy and natural accuracy is lessened when the replay parameters are in the optimal range, which indicates the stability and effectiveness of our method.

Analysis of Perturbation Bounds and Replay Parameters
In order to further study the robustness of the proposed method, the effect of different perturbation bounds and different replay parameters against different attacks is studied in this paper.The two parameters are combined to analyze the relationship between the performance and the parameters.To craft the analysis, we exploit the corresponding ϵ used during training.Here, CIFAR-10 is selected, as the changes of robustness accuracy on CIFAR-10 is the most intuitive.
Concerning the replay parameter k in Fig. 1, when the value of k is a relatively large value, the change of the robustness accuracy is relatively stable with the value of perturbation bounds varying, which means that SEAT shows satisfactory stability under the parameter setting with the best robustness performance.When the value of perturbation bounds is increasing, the robustness accuracies corresponding to different value of k exist fluctuations.The phenomenon can be seen obviously in attacked group, including PGD-100, PGD-50 and PGD-20.This observation reflects that although there is a basic trade-off between robustness accuracy and nature accuracy, our method has an acceptable compromise on nature accuracy with a significantly improved robustness, and SEAT does not have too much influence on the nature accuracy.

Ablation Study
To assess the importance of various aspects of the model, we design several ablation studies, where ϵ = 8.In the experiments, we find that the clip has an important influence on the experimental results.Since we exploit the clip twice, we call it 1-th clip and 2-th clip, respectively.And we design four ablation studies: removing 1-th clip, removing 2-th clip, removing the temporal ensembling and 2-th clip, removing 1-th clip, the temporal ensembling and 2-th clip.Specifically, "w/o 1, 2-th clip" means SEAT without 1, 2-th clip, "w/o 2-th clip and TE" means clip without 2-th clip and the temporal ensembling, "w/o 1,2-th clip and TE" means SEAT without 1-th clip, 2-th clip and the temporal ensembling.
Table 4 and 5 compare the robustness accuracy under various conditions for our proposed SEAT with different combinations of ablation experimental strategies on CIFAR against PGD-20 attacks, which indicates that when all modules are exploited, SEAT has the best performance.Obviously, the temporal ensembling improves the robustness of SEAT greatly, and 2-th clip further optimizes the results through the invariant constraint.Although 2-th clip can not achieve significant improvements for SEAT, it still improves the stability and insensitivity without extra complexity.Moreover, it can be seen that 1-th clip is very necessary for SEAT.Removing 1-th clip causes a significant drop of the robustness accuracy.For this phenomenon, we speculate that this is because multiple replays without the clip will cause δ to lose a lot of information, leaving only a few outliers.However, good performance is not available at all times.Figure.
2 summarizes the results of nature data on CIFAR.Generally, the result is not too counterintuitive, the temporal ensembling and the clip also bring a compromise in nature accuracy.

Conclusion
This paper proposes a algorithm named SEAT to defend strong first-order adversary PGD attacks.In detail, this paper places the smoothness enforcing methods under the adversarial training framework.In this manner, the smoothness enforcing can smooth the outputs efficiently in improved adversarial training process.To eliminate the computational overhead, the gradient with respect to the network and the gradient with respect to the inputs are shared in one pair of forward propagation and back propagation.Moreover, temporal ensembling is designed to induce smoothness on the data manifold in a long history.Furthermore, To make the generation trend of decision boundary insensitive for classification, smoothness enforcing methods, random initialization and output smearing, are introduced to impose smoothness constraints.
The robustness results of our experiments on the benchmark datasets outperform the current state-of-the-art adversarial training method, which indicate that SEAT is an effective adversarial training method.Furthermore, the ablation studies prove the proposed modules are meaningful.The simplicity of our method is also worth emphasizing, SEAT is free form generating expensive adversarial examples and runs SGD on the adversarial loss, which make it obtain the competitive computational overhead.These results verify that SEAT is effective to defend against adversarial attacks.

1 :
Generate δ by sampling independently from the standard normal distribution 2: Initialize network parameters θ and δ inter ← 0 3: Generate training sets by using output smearing 4: for m in [1, num epochs] do 5:

Figure 1 .
Figure 1.Robustness accuracy of SEAT vary with perturbation bounds and replay parameters on CIFAR-10.

Table 1 .
Results of MNIST robust training with various methods.

Table 2 .
Results of CIFAR-10 robust training with various methods.
images in each category.There are 50000 training images and 10000 test images.The CIFAR-100 is similar to CIFAR-10 except that it has 100 categories, each containing 600 images.Each category has 500 training images and 100 test images.

Table 3 .
Results of CIFAR-100 robust training with various methods.
In our experiments, the perturbation bound ϵ is set to 8/255 in an infinite norm sense for CIFAR.For MNIST, we set the size of perturbation as ϵ = 0.3 in an infinite norm sense.The maximal learning round is set to be 26 in all experiments.We perform multi-step PGD with step size 2/255 and ϵ = 8 when testing.For CW attacks, we set c = 5e2 and lr = 1e -2 for both CIFAR

Table 4 .
Effectiveness of the clip and the temporal ensembling on CIFAR-10 against PGD-20 attacks.

Table 5 .
Effectiveness of the clip and the temporal ensembling on CIFAR-100 against PGD-20 attacks.