On the convergence acceleration of vanilla Generative Adversarial Network

Generative Adversarial Network (GAN) is very successful and widely used in both academic and industrial fields. It is inspired by game theory where two models, a generator and a critic, compete with each other, learn from each other, and make each other stronger and stronger. Most improvements focused on stability and there are already many different models such as deep convolutional GAN (DCGAN) and Wassertein GAN (WGAN) that have succeeded. But the speed of the convergence during training process is also an essential ability that needs to be enhanced. In this paper, a decreasing noise is added to the MNIST data and these new data are delivered to the discriminator of the vanilla GAN to accelerate the training process. The result shows that this change could accelerate the convergence and not affect the results’ quality a lot. And this improvement should be general enough to apply to the other models.


Introduction
The generative model is already widely used in many aspects right now.Generative model is a model that used to generate new picture from the data set.The generating picture is important because it could be well used in many different fields.Many fields like picture fixing, tumour predicting and facial identification could work much better with a powerful generative model.With more and more companies focused on the generative model is getting more important today.
One of the most popular generative models is GAN [1].Many different models based on the original GAN have succeeded in many aspects, such as conditional picture creation (CGAN [2]), super-resolution (SRGAN [3]), natural language processing, face recognition, and picture fixing.These years the application of GAN has already make very amazing achievements and it is very easy to introduce to industry.The OpenAi, Google, NVIDIA and many other companies have their own models based on the GAN.Benefiting from the GAN's advantage, it is becoming the most important generative models.
However, the training of GAN always faces several persistent problems.The most important one is that the GAN model always trains with a risk that falls into the mode collapse.To solve this problem, people create the DCGAN [4] and WGAN [5] to enhance stability.But despite the stability, speed is also a critical ability for a GAN model.Since the diffusion model [6,7] appeared, it is very successful in image generation because of its amazing image quality.The diffusion model is Markov chains trained using variational inference [8].In several fields, the diffusion model shows potential and better results than the GAN model.But the diffusion model also has its problem.The diffusion model usually needs more resources than GAN to train on the same data set and it always spend much more time to train.[9] So how to improve the speed of the two models is already become a critical problem.With a faster training process, people could save more resources on training and help their research.
In this paper, I mainly focused on how to improve the GAN model.The idea inspired by improved GANs [10,11] and diffusion model which utilizes an increasing noise in the training process.Instead of modifying the structure or loss function, a decreasing noise function to the real data is added and then the mixing data is delivered to the discriminator to discriminate harder.In improved GAN, it adds noise to every layer of the generator and the input but here just the input is modified.This modification should be general because it does not change the structure and loss function.So ideally the modification should be compatible with other models.
The training process is performed on the MNIST data set.The result compares the loss functions and image quality from models with and without noise function.It shows that the noise function could help to accelerate the convergence in the first 3000-6000 steps to get a well enough picture earlier.
And the noise function does not affect the final quality.The result could prove that adding noise to the real picture data could do some improvements to the model and the improvements are highly affected by the noise function we choose.In the experiment part, several different functions will be added to the model and these models will show different speed of the training process.

Related work
Generative Adversarial Networks (GAN) and its conditional variant CGAN put out a very successful method to generate fake data.The quality of images generated by GAN is much better than other generative models.And GAN gives a method to train the deep learning model which is generalized.To improve its stability people proposed DCGAN.DCGAN introduce the convolution layers to the vanilla GAN.This improvement helps to sample the features and the quality of images becomes better than vanilla GAN.But it is still very hard to train and sensitive to hyper parameters.WGAN [12]   The data set is the handwritten picture data set, MNIST.MNIST is a very widely used data set and it contains about 70000 pictures.Training on this data set could generate pictures easily and fast with little resources.Also, the MNIST data set is very easy for a human to recognize numbers.In the data set, the numbers range from 0 to 9, which could help us to constrain the results.
The GAN model is a two-player zero-sum game [1,13] and the main purpose is to reach the Nash Equilibrium.However, the discriminator is usually much stronger than the generator at the beginning of the training process.At this period, the training will be slow because the discriminator cannot give some "advice" to strengthen the generator.Making the discriminator's job harder could help to train the model.To explain this more mathematically, the variational lower bound does not need to be constant when the JS convergence is constant.So, if the discriminator is crippled, the lower bound will not be tight.That could help us to end up with a non-constant function so the adversarial network could still work and roughly guide the model in the right direction in gradient descent [14].Figure 2 shows the diffusion model's training process.In the data processing, the model will keep adding gaussian noise to the original picture until the picture becomes a pure gaussian noise image.Then the training process will train the model to calculate the noise that every step adds and finally the model could get a generated picture.In this process, the model needs to add an increasing noise because the original picture will be affected less and less by the Gaussian noise.That means the model need to adjust the strength of the noise step by step to ensure that the picture could become a pure Gaussian noise picture in finite steps.
There is a point needs to pay attention to.In the GAN training process, the noise could help accelerate the training or guide the model to the direction of the gradient descent.But what we hope is the discriminator could distinguish the fake data and the real data when the generator is good enough.That means the noise is not necessary for the later period of the process.So here instead of increasing linear noise function in the diffusion model, the GAN needs a decreasing noise to ensure that the effect will fade at the end.
Equation 1,2 and 3 shows different functions used in the experiment and it is expected that the model could generate hand-written images faster than the model without noise. z~N(0,1) Here, z1 and z2 are the noise which will be added to the model and z is a noise sampled from the normal Gaussian distribution.Also, to illustrate the training process clearly, the loss function will be drawn with respect to the steps.The loss functions used to train are explained in Equation 4. The D(x) and G(x) are loss functions of the discriminator and generator separately, which are binary crossentropy functions.
In this method, it is expected to find that the model could show a better result in less time than the vanilla GAN.Also, it is ideal that the quality of the images does not alleviate because of the function added.

Results
This section will illustrate several results of different models.Figure 3 presents the loss functions of the model with noise.Two loss functions change rapidly in the first 7500 steps and then they fluctuate in a stable range.The picture shows that the loss function of the discriminator drops to 0.75~1 roughly after 5000 steps.Although it keeps fluctuating after 5000 steps, its range is very stable.
In the training process of GAN, the loss functions are supposed to fluctuating because of the adversarial network.In fact, the loss function itself cannot represent the quality of image or training results but the trend of loss function could represent something.
Figure 4 shows the other result.This image is generated by the model without noise.In fact, the trend after 5000 steps does not change a lot so we can focus on the first 5000 -7000 steps.The picture illustrates that the loss function of discriminator does not drop as fast as the former one.To be specific, the discriminator's loss function fluctuates around 1 and it range from 1 to 1.2 after 5000 steps.
From Figure 3 and Figure 4, we could find that the loss functions of discriminator show different trend.After adding noise to the real data, the loss function drops faster than the other model.The noise function is supposed to speed the training process by making the discrimination harder.From the picture, the loss function drops faster could correspond to that idea in some way.
In addition, Figure 5 illustrates the other type of noise.The noise used here is a constant value.Which correspond to the equation ( 2) in Equation 1. From this image, it is shown that the loss function also drops in the first 5000 steps.But the end point is similarly to the model without noise.That might because the noise function is not strong enough to cheat the discriminator.This difference proves that the strength of the noise is quite important for this improvement.In summary, loss function of discriminator in the Figure 3 drops faster than the other two functions in Figure 4 and Figure 5. Also, it reaches a lower range than the other two functions.In the Figure 3, the loss function drops faster and it finally fluctuate from 0.75 to 1.00, which is 0.25 lower than others.Figure 6 shows the different results of three model.The first one is the generated picture with noise after 3748 steps.The second one trains with no noise and 5622 steps.The last one generated by the model with constant noise and it trains for 6599 steps.These three images show the similar quality and all of them could be identified by human.With these pictures, the quality in the early period is affected indeed.Also, the effect is affected by the noise strength.With a proper strength the speed of training could enhance in some way.

Discussion
The result shown in the pictures is in accordance with the expectation.That means the noise function could disturb the discrimination to discriminate the real data and fake data.If the discriminator is too strong, the generator is hard to get improvement.The noise avoids this situation in some way.
It is ideal to find a way to enhance the speed and quality.The most possible way may be to combine several different methods.In DCGAN the convolution layer is added to the generator to extract the features.This method is similar to the noise function adding because they are both modifying the structure.
However, the noise could only accelerate the train.It does no help to enhance the quality of the images.Also, if the model is asked to generate the conditional picture, it is still unknown whether the noise could do help.The noise may disturb the model to learn conditions clearly and exactly.The speed and the quality are two most important aspects for the model.So, this method still has a lot space to improve.

Conclusion
From the result, the model with noise function could get a qualified picture earlier and the convergence is accelerated by the noise.In Figure 3, the loss function of the discriminator drops faster and it reaches a lower range of fluctuation.Compared with the functions in Figure 4 and Figure 5, the function has a stronger decreasing trend in the first 5000 steps.In the GAN model, the loss functions often keep fluctuating after certain steps.This situation also appears in three models.Also, the result in Figure 3 has a lower bound.In Figure 3, the line fluctuates between 0.75 and 1, which is 0.25 lower than others.The hand-written pictures look similar.However, they are generated with different training steps.The model with noise function could generate a similar picture with fewer training steps and this result is expected in the introduction.From the two different types of pictures, the added noise function could help to accelerate the training process and make the convergence a little earlier.
In Figure 3, this result illustrates a faster drop trend in the first 5000 steps.The noise function is expected to disturb the discriminator and enhance the model in this way.This is already proved by the results in Figure 3, Figure 4 and Figure 5. Also, the numbers in Figure 6 and Figure 7 could prove the idea.The goal of generated adversarial networks is to find an equilibrium which could balance the generator and discriminator.This equilibrium is called Nash Equilibrium.The noise adds a perturbation during this process.If without this perturbation, the model will easily go into mode collapse because of gradient descent.But with the perturbation, the model could be pulled out of the dilemma.With this idea, the GAN model could be accelerated to converge.As a very general way to improve the GAN model, this method could apply to many different types of GAN.So, this is a very useful way to help researchers or companies to save time on training the model.Also, this method could be combined with other methods to improve both speed and quality.However, this method does not test the conditional GAN to see whether the noise will affect the conditional result.It will be the next step to research.Also, this method does not show a very impressive improvement in the MNIST data set.It will be ideal if this method could show a satisfactory result on a larger data set.
is expected to solve the mode collapse problem.It replaces the loss function with Wasserstein distance.There is also another model based on it, which is called WGAN-GP.Diffusion Model.Diffusion model is a new popular generative model recently.DDPM (denoising diffusion probabilistic models) and many other types of diffusion models have already put into use.These model shows a very amazing quality in picture generating and text-to-image fields.Despite the time costing, the diffusion model is a very powerful tools to generate new data3.MethodThe method will be explained as shown in the Figure1

Figure 1 .
Figure 1.Introduction pipeline of the method section.The data set is the handwritten picture data set, MNIST.MNIST is a very widely used data set and it contains about 70000 pictures.Training on this data set could generate pictures easily and fast with little resources.Also, the MNIST data set is very easy for a human to recognize numbers.In the data set, the numbers range from 0 to 9, which could help us to constrain the results.The GAN model is a two-player zero-sum game[1,13] and the main purpose is to reach the Nash Equilibrium.However, the discriminator is usually much stronger than the generator at the beginning of the training process.At this period, the training will be slow because the discriminator cannot give some "advice" to strengthen the generator.Making the discriminator's job harder could help to train the model.To explain this more mathematically, the variational lower bound does not need to be

Figure 2 .
Figure 2. Training process of the diffusion model.Figure2shows the diffusion model's training process.In the data processing, the model will keep adding gaussian noise to the original picture until the picture becomes a pure gaussian noise image.Then the training process will train the model to calculate the noise that every step adds and finally the model could get a generated picture.In this process, the model needs to add an increasing noise because the original picture will be affected less and less by the Gaussian noise.That means the model need to adjust the strength of the noise step by step to ensure that the picture could become a pure Gaussian noise picture in finite steps.There is a point needs to pay attention to.In the GAN training process, the noise could help accelerate the training or guide the model to the direction of the gradient descent.But what we hope is the discriminator could distinguish the fake data and the real data when the generator is good enough.That means the noise is not necessary for the later period of the process.So here instead of increasing linear noise function in the diffusion model, the GAN needs a decreasing noise to ensure that the effect will fade at the end.Equation 1,2 and 3 shows different functions used in the experiment and it is expected that the model could generate hand-written images faster than the model without noise.

Figure 5 .
Figure 5. Loss function with constant noise.The loss function of discriminator in the model with noise is roughly around 1.0~1.2after about 3000 steps.It is expected that the quality of images at this time should look similar to the other images.In summary, loss function of discriminator in the Figure3drops faster than the other two functions in Figure4and Figure5.Also, it reaches a lower range than the other two functions.In the Figure3, the loss function drops faster and it finally fluctuate from 0.75 to 1.00, which is 0.25 lower than others.

Figure 6 .
Figure 6.Visualization results of the three models.From left to right are with noise, without noise and with constant noise.Figure6shows the different results of three model.The first one is the generated picture with noise after 3748 steps.The second one trains with no noise and 5622 steps.The last one generated by the model with constant noise and it trains for 6599 steps.These three images show the similar quality and all of them could be identified by human.

Figure 7 .
Figure 7. Visualization results of no noise and constant noise.Figure 7 illustrates the results after 3748 steps of the other two models.The first one is training with no noise and the second one is training with constant noise.The quality and resolution is much worse than the result of model with noise function.With these pictures, the quality in the early period is affected indeed.Also, the effect is affected by the noise strength.With a proper strength the speed of training could enhance in some way.

Figure 7
Figure 7. Visualization results of no noise and constant noise.Figure 7 illustrates the results after 3748 steps of the other two models.The first one is training with no noise and the second one is training with constant noise.The quality and resolution is much worse than the result of model with noise function.With these pictures, the quality in the early period is affected indeed.Also, the effect is affected by the noise strength.With a proper strength the speed of training could enhance in some way.