User Tools

Site Tools


deep_generative_image_models_using_a_laplacian_pyramid_of_adversarial_networks

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

deep_generative_image_models_using_a_laplacian_pyramid_of_adversarial_networks [2015/12/17 22:24] (current)
craffel created
Line 1: Line 1:
 +{{tag>"​Talk Summaries"​ "​Neural Networks"​}}
  
 +====== Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks ======
 +
 +| Presenter | Emily Denton |
 +| Context | Columbia Neural Network Reading Group/​Seminar Series |
 +| Date | 12/16/15 |
 +
 +The goal is to create a parametric generative model of natural images. ​ Most existing techniques could only generate small, simple images and/or low-quality images. The basic idea is to initially focus on modeling a low-resolution image space, then build a sequence of conditional image models which iteratively improve the generated image, all using the generative adversarial network framework.
 +
 +===== Generative Modeling =====
 +
 +Generative modeling can be framed as having access to some data distribution some finite training set,and you want to learn a generative model of that data.  There are many ways to quantify the "​goodness"​ of the model, including how well samples from the model resemble entries in the training set and/or have training data have high likelihood in the model. ​ Generative models can be part of a representation learning framework (e.g. encoder/​decoder),​ in unsupervised learning tasks, or in density estimation to learn the structure of the data.  As of a few months ago, most generative models of these images produced images which did produce realistic samples for simple 32x32 images (CIFAR).
 +
 +==== Generative Adversarial Networks ====
 +
 +Maximum likelihood Generative adversarial networks attempt to learn how to draw good samples by defining two networks and training them in opposition to one another. ​ The generative model maps from a prior noise distribution to a data space, and a disctriminative model tries to decide if an image is real or fake. The loss function is then defined to try to improve the generative model'​s capability to fool the discriminator. ​ The generator has access to the gradients of the discriminator,​ which should be able to help it improve its generative process. ​ There are many heuristics to choose the capacity of the discriminator of the generator and discriminator,​ the correct learning rate, and the number of steps to train the discriminator vs. the generator. ​ Getting these heuristics right is crucial to the success of training the system. ​ Conditional generative adversarial networks condition the generative model with an additional piece of information,​ e.g. model class. ​ The discriminative model is also given this information.
 +
 +==== Laplacian Pyramid GAN ====
 +
 +The Laplacian pyramid is an invertible multi-scale image representation,​ which downsamples an image by repeated factors of two and stores the residual which is missing from the each step in the pyramid. ​ The residual is computed from a Gaussian-upsampled version of each downsampled step.  The result is one very low-resolution image, plus a sequence of residuals. ​ To learn a conditional generative adversarial network on this representation,​ it tries to generate the residual conditioned on each individual scale. ​ At each scale of the pyramid, an example of a "​real"​ input is the residual of the downsampled version; the generative network takes in the downsampled version and then produces an estimate of the residual. ​ The discriminator takes in both the high frequency image and the downsampled image. ​ Each GAN at each step is trained independently. ​ Sampling from the model is akin to reconstructing from the Laplacian pyramid, where first a low-resolution image is generated from noise (as in a normal GAN), then this image is passed to the next GAN which produces a residual to compute a higher-resolution image, in so on.  The whole model was not trained end-to-end because this would make it so that each low-resolution ground-truth image only mapped to one high-resolution image, which allows for exploring different possible outputs given one low-resolution input. ​ Fine-tuning would be possible, but wasn't done.
 +
 +===== Experiments =====
 +
 +On the CIFAR-10 dataset (32x32, 10 classes), each generative network took in 4 channels (three channels of color plus one channel of noise) and produced a residual image. ​ Each discriminator was trained on the residual plus the down/​upsampled image, plus a one-hot indicator of the class label. ​ Minibatches consisted of a random sampling of real and a random sampling of fake images. ​ Some of the resulting images looked realistic, some looked a bit blurry; this could be due to propagating errors. ​ Nearest neighbor analysis in pixel space, feature space, and in a transformation space indicated that the model was not learning a lookup table/​memorizing the input images. ​ In a human evaluation, humans were classified about 90% of real images as real for reasonable presentation times, but were only classified about 40% of generated images as real.  The original GAN produced images which were classified as real only about 10% of the time.  ​
 +
 +The second experiment was on the LSUN dataset (10 million 64x64 images, 10 classes). ​ A larger convolutional network was used, whose structure was chosen with cross-validation. ​ The first or second stage was pretty crucial - it chose where to put objects, which were simply refined at further stages. ​ The resulting images were realistic-looking. ​ In general, GANs really try not to place density where there is none in the input space, unlike say a L2 loss which may do a lot of smoothing. ​ Another way to confirm that the network isn't overfitting is to feed in the same initial low-resolution image and sample multiple times to produce multiple different images.
 +
 +More recently, Radford et. al. wrote up a bunch of tricks (e.g. batch normalization,​ deepness) which help train the GANs.  These tricks are orthogonal to the LAPGAN method, so they could be combined to produce even nicer images. ​ These results could also be extended to an autoencoder framework to do unsupervised learning. ​ One key area for exploration is making the GAN models easier to train.
deep_generative_image_models_using_a_laplacian_pyramid_of_adversarial_networks.txt ยท Last modified: 2015/12/17 22:24 by craffel