Face image generation and feature visualization using deep convolutional generative adversarial networks

Generative Neural Networks (GAN) aims to generate realistic and recognizable images, including portraits, cartoons and other modalities. Image generation has broad application prospects and important research value in the fields of public security and digital entertainment, and has become one of the current research hotspots. This article will introduce and apply an important image generation model called GAN, which stands for Generative Adversarial Network. Unlike recent image processing models such as Variational Autoencoders (VAE), The discriminative network evaluates potential candidates while the GAN generates candidates. As a result, the discriminative network distinguishes created and real candidates, while the generative network learns to map from a latent space to an interest data distribution. In this article, the GAN model and some of its extensions will be thoroughly applied and implemented based on the dataset of CelebA, and details will be discussed through the images and graphs generated by the model. Specific training methods for various models and optimization algorithms can be produced by the GAN framework. The experiment’s findings in this article will show how the framework’s potential may be quantified and qualitatively assessed using the samples that were produced.


Introduction
Images, especially face images have a wide range of applications and important research significance in human life [1,2].With the development of science and technology, various image acquisition sensors have appeared in real life, so there are different face image forms, such as face photos used in identity authentication and criminal investigation and pursuit, etc.Also, there are face portraits, face cartoons used in digital entertainment.These image types can become different face image domains.Face image generation aims to design mathematical models so that computers can automatically generate natural and realistic face images, including portrait synthesis, comic synthesis, age synthesis, super-resolution reconstruction, human facial beauty, and it has become one of the research hotspots today [3,4].
The Generative Adversarial Network [5] achieves its functions by playing against each other between two neural networks.This method was proposed and researched for the first time by Ian Goodfellow in year 2014 and due to its superior and advanced performance, it quickly went viral and became a major research hotspot in the following few years.Even till today, there are still many bachelors devoting much energy and attention into this model, trying to further updating and optimizing and applying to more areas.A generating and a discriminative network make up the GAN.The output results of the generative network must similar to the actual ones in the training set while its input acquires random samples from the latent space.This article will implement one of GAN's extension model-DCGAN, which stands for Deep Convolutional Generative Adversarial Networks [6].This model embodies certain architectural constraints, and has been proven to be a strong unsupervised learning for image processing.It is persuasively shown that the deep convolutional adversarial pair which learns a hierarchy of representations from diverse items sections to different scenes in both the generator and discriminator with many trainings on picture datasets from all areas.The DCGAN model discriminator has demonstrated competitive performance with other well-known unsupervised algorithms after being trained for a variety of picture classification applications.On the other hand, it has been demonstrated that the DCGAN model generator exhibits intriguing vector arithmetic properties when manipulating the copious semantic qualities of generated samples [7].

Dataset
The dataset used in this article is celebA.CelebFaces Attributes dataset is known as celebA [8].With large quantities of more than 200K celebrity photos, it is a sizable face attributes collection (64, 64, 3).This dataset of photos includes a wide range of poses and cluttered backgrounds.With 10,177 identities, 202,599 photos, 5 landmark locations, and 40 binary attribute annotations per image, CelebA has a wide variety, a huge quantity, and rich annotations.For the computer vision is taking tasks in various fields, from face attribute recognition to face detection, and even landmark (or facial component) localisation.What' more, it is generally known that the face synthesis and editing dataset is gradually taken into cover of both the training and test sets.

KL divergence
The Kullback-Leibler divergence quantifies the difference between two probability mass functions over the letter X, () and ().
(||) = ∫  () log The excess surprise that one would anticipate from using Q as a model when the actual distribution is P is a straightforward way to understand the KL divergence of P from Q. where achieves the minimum zero when () = (), ∀  ∈ X.The formula makes it clear that the KL divergence is asymmetric.When () is significantly non-zero but () is near to zero, the effect of ()on   is ignored.When only the measure of similarity between two equally significant distributions is needed, it could result in problematic results.

JS divergence
The Jensen-Shannon divergence, stands for another important measure of similarity between two probability distributions: While the definition of JS divergence is based on KL divergence, it includes some notable differences for it always has a finite value.Furthermore, the advantages of JS divergence over KL divergence are that JS divergence is a symmetric measurement of two distributions.

GAN
A new method for semi-supervised and unsupervised learning is called generative adversarial networks (GANs).They accomplish this by implicitly modelling data distributions with large dimensions.A pair of networks that have been trained in competition with one another can be used to define GANs.There are two main parts to GAN.A discriminator D first calculates the likelihood that a given sample comes from the actual dataset.It functions as a critic and is designed to distinguish between genuine samples and false ones.Second, given a noise variable input z that introduces possible output diversity, a generator G generates synthetic samples.In other ways, it can fool the discriminator into giving a high probability by being taught to capture the genuine data distribution in order to make its generative samples as authentic as possible.
During training, these two models compete against one another: the generator (denoted as G) tries very hard to mislead the discriminator, in the meanwhile, the critic model D tries its best so as not to be tricked.The intriguing zero-sum competition between the two models encourages each to enhance their capabilities.
pz, pg, and pr are the data distributions over the noise input z, the data distribution of G over the data x, and the real sample x, respectively.The standard goal of data creation is: Where the purpose of maximum likelihood estimation is to determine whether a set of actual data has a learnt model with a high mass density.What is: The issue is that it was difficult to calculate p(x).The goal of GAN is to create a discriminator that uses the powerful power of discriminative models based on deep learning to determine if a data instance is real or artificially created.While generator G seeks to provide high-quality data to deceive discriminator, discriminator D strives to accurately discern between the actual data and the phony model-generated data.G should ideally fit the genuine underlying data distribution when D is unable to discriminate between the true and created data.Think about the generator: Where this process must be differentiable an no invertibility requirement.The basic implementation of GAN is multi-layer perception.Consider the discriminator: Which can be implemented by any neural networks with a probabilistic prediction (e.g., multi-layer perception with logistic output or AlexNet etc.).Then the joint objective of GAN could be denoted as: Which is clearly a minimax game.The specific update rule of discriminator is: And the update rule of generator is: Now the equilibrium for the minimax game could be denoted as: Where it could be written as: It can be deduced from the form of JS Divergence in the above equation, which explains the objective of GAN is actually trying to minimize the   (  ||  ).And an equilibrium is   () =   () and () =   () / (  ()+  ()) = 0.5.

DCGAN
The image generation model uses the idea of GAN and is specifically used to fake images.The aforementioned GAN is directly extended by DCGAN.The distinction is that the discriminator and generator of DCGAN use convolutional and convolution-transposed layers, respectively.Convolution, activation, and pooling are the three structural components that make up a basic CNN.Convolutional neural networks (CNN) provide the unique feature space that belongs to every image as their output.In circumstances of handling image classification problems, the fully connected neural network (FCN) could be leveraged to accomplish the mission of mapping from images to the labels using the feature encodings from CNN as its input.specifically, classification.Of course, the backward propagation algorithm's iterative network weight adjustment using training data is the most crucial part of the entire process.The common CNNs used today, such VGG and ResNet, have all been modified and blended from basic CNNs [9,10].
Deep convolution generation confronts unsupervised representation learning in the network.A multilayer convolutional layer, a batch specification layer, and LeakyReLU activation make up the discriminator.The output is the scalar probability of the input from the real data distribution, and the input is an input image whose size is 3x64x64.A convolutional transposition layer, a batch specification layer, and ReLU activation make up the generator.A 3x64x64 RGB image is produced from a latent vector that was derived from the standard normal distribution as the input.The latent vector can be transformed into a volume that has the same shape as the image thanks to the stepped transposed layer.The discriminator D is a convolutional network, which turns the input mixed picture into convolutional features and then passes the Logistic function to obtain the probability representation that the image is true or false.In addition, the idea that DCGAN is different from ordinary GAN is: in its convolution, it uses step-size convolution instead of pooling, uses BN to help convergence, and G uses ReLU (the last layer is tanh, so that the pixels of the image become within 255), D uses Leaky ReLU, and the optimization function is Adam.Others are consistent with GAN.
The overall architecture of DCGAN is displayed in Figure 1.

ACGAN
Since the original GAN is unable to regulate the method of producing images, CGAN suggests using category tags as supplemental data to direct the data production process.Label information and noise data are combined and delivered to the generator from the implementation level, and label information and the real picture are combined and supplied to the discriminator.It is suggested to use an enhanced version of the CGAN loss function, where y stands for the category label: ACGAN is a development on the CGAN foundation.The discriminator's ability to differentiate between true and false as well as classification is expanded.It is possible to think of the discriminator in the ACGAN as having an additional classification function.As a matter of fact, the loss function of ACGAN has two terms: Losses due to discrimination and classification.There is no difference discriminative loss and CGAN: The classification loss is: The loss function of classification stated above marks the key contribution of ACGAN.In terms of the real pictures   and generator fake pictures   , the discriminator (or known as classifier in the discriminator) should be able to predict the category it belongs to.

Images generated by GAN
Images generated by GAN by epochs are shown in Figure 2. It is evident that the images are showing clearer human features and getting more mimic as the train epoch grows.

Results of different learning ratio for G & D
The ratio of learning rate between G and D is an important hyperparameter in the implementation of GAN.Three hyperparameters are evaluated with ratio 1:1, 1:2 and 2:1, and the different results are shown in Figure 3.To get more detailed curves for analysis, the losses by iteration are shown in Figure 5.It can be concluded from the curve graphs above the loss of discriminator will soon converge to 0 when the ratio is set as G: D = 1: 2, since the discriminator task is far easier than generator task.In the meantime, it is shown that the losses of generator grow slower since the learning rate of D is relatively smaller.

Results of different learning ratio for G & D
This section is about the images generated by the generator compared with true images by PCA.The results are shown in Figure 6: The visualization results are shown that the generated data distribution at epoch 0 is simply a point and after one epoch of training, the generated data distribution (red) is already getting close to true data distribution (blue).However, after 5 epochs of training, the generated data distribution is converging the central point could be attributed to the GAN is attending to fit one-mode data distribution.

Results of loss curves and generated images for ACGAN
The loss curves of ACGAN are shown in Figure 7.Where all the images generated are with black hair, the two results are to some extent different due to the random noise.The images generated by ACGAN with label no beard is shown in Figure 9.The images generated by ACGAN with label male are shown in Figure 10.Where clearly that all the images are male.Since all the images ACGAN generated with label no beard is female, it is meaningful to figure out whether ACGAN is able to generate images of male with no beard.The results with label male with no beard (i.e., male = 1 and beard =0) are displayed in Figure 11.

Conclusion
This section is the summary of what has been implemented in the article and what has been contributed to this project: Firstly, in this paper, the generative adversarial networks and its extension models have been introduced and reviewed and summarized, and the principles, methodology and mathematical derivation of variants of GAN has been elaborated and explained in detail, especially the important divergence functions and its innovative structure and architecture.Secondly, a DCGAN model based on the CelebA dataset from scratch has been implemented and images have been generated throughout the process by epoch, iteration and with different ratios respectively.Thirdly, different hyperparameter settings have been tried, tested, thus numerous loss curves have been drawn to reflect the various scenarios and phenomena that have been analysed, also through the PCA analysis, it is concluded that GAN is attending to fit one-mode data distribution.Furthermore, an ACGAN model conducting on the CelebA dataset with many labels (including 'black hair', 'no beard' and 'male', etc.) based on the Github open-source has been fully implemented and it is pointed out that there are still unclear results like the reason why the images generated with no beard are women ought to be further researched and solved.Last but not least, experiments for analysing ACGAN based variants of generation tests have been conducted.

Figure 2 .
Figure 2. Generated images.From left to right are results from epoch 1, 5 and 20, respectively.

Figure 3 .
Figure 3. Generated images.From left to right are results with ratio 1:1, 1:2 and 2:1, respectively.The images generated by GAN with ratio 2:1 is relatively with higher quality.To figure out the reason, The loss G and D are recorded with different learning rate ratio, and the numerous results are shown in Figure 4.

Figure 6 .
Figure 6.PCA analysis of generated data (red) and true data (blue).From left to right are results from epoch 1, 5 and 20, respectively.

Figure 7 .
Figure 7. Loss curves of ACGAN by epoch and iteration, respectively.

Figure 8 .
Figure 8. Generated images with label black hair.

Figure 9 .
Figure 9. Generated images with label no beard.

Figure 10 .
Figure 10.Generated images with label male.

Figure 11 .
Figure 11.Generated images with label male with no beard and no beard respectively.