In this project, you'll define and train a DCGAN on a dataset of faces. Your goal is to get a generator network to generate new images of faces that look as realistic as possible!
The project will be broken down into a series of tasks from loading in data to defining and training adversarial networks. At the end of the notebook, you'll be able to visualize the results of your trained Generator to see how it performs; your generated samples should look like fairly realistic faces with small amounts of noise.
You'll be using the CelebFaces Attributes Dataset (CelebA) to train your adversarial networks.
This dataset is more complex than the number datasets (like MNIST or SVHN) you've been working with, and so, you should prepare to define deeper networks and train them for a longer time to get good results. It is suggested that you utilize a GPU for training.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. Some sample data is show below.
If you are working locally, you can download this data by clicking here
This is a zip file that you'll need to extract in the home directory of this notebook for further loading and processing. After extracting the data, you should be left with a directory of data processed_celeba_small/
# can comment out after executing
#!unzip processed_celeba_small.zip
data_dir = 'processed_celeba_small/'
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import pickle as pkl
import matplotlib.pyplot as plt
import numpy as np
import problem_unittests as tests
#import helper
%matplotlib inline
The CelebA dataset contains over 200,000 celebrity images with annotations. Since you're going to be generating faces, you won't need the annotations, you'll only need the images. Note that these are color images with 3 color channels (RGB)#RGB_Images) each.
Since the project's main focus is on building the GANs, we've done some of the pre-processing for you. Each of the CelebA images has been cropped to remove parts of the image that don't include a face, then resized down to 64x64x3 NumPy images. This pre-processed dataset is a smaller subset of the very large CelebA data.
There are a few other steps that you'll need to transform this data and create a DataLoader.
get_dataloader
function, such that it satisfies these requirements:¶image_size x image_size
in the x and y dimension.To create a dataset given a directory of images, it's recommended that you use PyTorch's ImageFolder wrapper, with a root directory processed_celeba_small/
and data transformation passed in.
# necessary imports
import torch
from torchvision import datasets
from torchvision import transforms
def get_dataloader(batch_size, image_size, data_dir='processed_celeba_small/'):
"""
Batch the neural network data using DataLoader
:param batch_size: The size of each batch; the number of images in a batch
:param img_size: The square size of the image data (x, y)
:param data_dir: Directory where image data is located
:return: DataLoader with batched data
"""
transform = transforms.Compose([
transforms.Resize(image_size),
transforms.ToTensor()
])
my_dataset = datasets.ImageFolder(data_dir,
transform = transform)
# TODO: Implement function and return a dataloader
data_loader = torch.utils.data.DataLoader(dataset = my_dataset,
batch_size = batch_size,
shuffle = True)
return data_loader
celeba_train_loader
with appropriate hyperparameters.¶Call the above function and create a dataloader to view images.
batch_size
parameterimage_size
must be 32
. Resizing the data to a smaller size will make for faster training, while still creating convincing images of faces!# Define function hyperparameters
batch_size = 128
img_size = 32
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
# Call your function and get a dataloader
celeba_train_loader = get_dataloader(batch_size, img_size)
Next, you can view some images! You should seen square images of somewhat-centered faces.
Note: You'll need to convert the Tensor images into a NumPy type and transpose the dimensions to correctly display an image, suggested imshow
code is below, but it may not be perfect.
# helper display function
def imshow(img):
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1, 2, 0)))
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
# obtain one batch of training images
dataiter = iter(celeba_train_loader)
images, _ = dataiter.next() # _ for no labels
# plot the images in the batch, along with the corresponding labels
fig = plt.figure(figsize=(20, 4))
plot_size=20
for idx in np.arange(plot_size):
ax = fig.add_subplot(2, plot_size/2, idx+1, xticks=[], yticks=[])
imshow(images[idx])
You need to do a bit of pre-processing; you know that the output of a tanh
activated generator will contain pixel values in a range from -1 to 1, and so, we need to rescale our training images to a range of -1 to 1. (Right now, they are in a range from 0-1.)
# TODO: Complete the scale function
def scale(x, feature_range=(-1, 1)):
''' Scale takes in an image x and returns that image, scaled
with a feature_range of pixel values from -1 to 1.
This function assumes that the input x is already scaled from 0-1.'''
# assume x is scaled to (0, 1)
# scale to feature_range and return scaled x
min, max = feature_range
x = x * (max - min) + min
return x
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
# check scaled range
# should be close to -1 to 1
img = images[0]
scaled_img = scale(img)
print('Min: ', scaled_img.min())
print('Max: ', scaled_img.max())
A GAN is comprised of two adversarial networks, a discriminator and a generator.
Your first task will be to define the discriminator. This is a convolutional classifier like you've built before, only without any maxpooling layers. To deal with this complex data, it's suggested you use a deep network with normalization. You are also allowed to create any helper functions that may be useful.
import torch.nn as nn
import torch.nn.functional as F
import torch.nn as nn
import torch.nn.functional as F
# helper conv function
def conv(in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm=True):
"""Creates a convolutional layer, with optional batch normalization.
"""
layers = []
conv_layer = nn.Conv2d(in_channels, out_channels,
kernel_size, stride, padding, bias=False)
# append conv layer
layers.append(conv_layer)
if batch_norm:
# append batchnorm layer
layers.append(nn.BatchNorm2d(out_channels))
# using Sequential container
return nn.Sequential(*layers)
class Discriminator(nn.Module):
def __init__(self, conv_dim):
"""
Initialize the Discriminator Module
:param conv_dim: The depth of the first convolutional layer
"""
super(Discriminator, self).__init__()
# complete init function
self.conv_dim = conv_dim
# 32x32 input
self.conv1 = conv(3, conv_dim, 4, batch_norm=False) # first layer, no batch_norm
# 16x16 out
self.conv2 = conv(conv_dim, conv_dim*2, 4)
# 8x8 out
self.conv3 = conv(conv_dim*2, conv_dim*4, 4)
# 4x4 out
# final, fully-connected layer
self.fc = nn.Linear(conv_dim*4*4*4, 1)
def forward(self, x):
"""
Forward propagation of the neural network
:param x: The input to the neural network
:return: Discriminator logits; the output of the neural network
"""
# define feedforward behavior
output = F.leaky_relu(self.conv1(x), 0.2)
output = F.leaky_relu(self.conv2(output), 0.2)
output = F.leaky_relu(self.conv3(output), 0.2)
# flatten
output = output.view(-1, self.conv_dim*4*4*4)
# final output layer
output = self.fc(output)
return output
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_discriminator(Discriminator)
The generator should upsample an input and generate a new image of the same size as our training data 32x32x3
. This should be mostly transpose convolutional layers with normalization applied to the outputs.
z_size
32x32x3
# helper deconv function
def deconv(in_channels, out_channels, kernel_size, stride=2, padding=1, batch_norm=True):
"""Creates a transposed-convolutional layer, with optional batch normalization.
"""
# create a sequence of transpose + optional batch norm layers
layers = []
transpose_conv_layer = nn.ConvTranspose2d(in_channels, out_channels,
kernel_size, stride, padding, bias=False)
# append transpose convolutional layer
layers.append(transpose_conv_layer)
if batch_norm:
# append batchnorm layer
layers.append(nn.BatchNorm2d(out_channels))
return nn.Sequential(*layers)
class Generator(nn.Module):
def __init__(self, z_size, conv_dim):
"""
Initialize the Generator Module
:param z_size: The length of the input latent vector, z
:param conv_dim: The depth of the inputs to the *last* transpose convolutional layer
"""
super(Generator, self).__init__()
# complete init function
self.conv_dim = conv_dim
# first, fully-connected layer
self.fc = nn.Linear(z_size, conv_dim*4*4*4)
# transpose conv layers
self.t_conv1 = deconv(conv_dim*4, conv_dim*2, 4)
self.t_conv2 = deconv(conv_dim*2, conv_dim, 4)
self.t_conv3 = deconv(conv_dim, 3, 4, batch_norm=False)
def forward(self, x):
"""
Forward propagation of the neural network
:param x: The input to the neural network
:return: A 32x32x3 Tensor image as output
"""
# define feedforward behavior
# fully-connected + reshape
output = self.fc(x)
output = output.view(-1, self.conv_dim*4, 4, 4) # (batch_size, depth, 4, 4)
# hidden transpose conv layers + relu
output = F.relu(self.t_conv1(output))
output = F.relu(self.t_conv2(output))
# last layer + tanh activation
output = self.t_conv3(output)
output = F.tanh(output)
return output
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
tests.test_generator(Generator)
To help your models converge, you should initialize the weights of the convolutional and linear layers in your model. From reading the original DCGAN paper, they say:
All weights were initialized from a zero-centered Normal distribution with standard deviation 0.02.
So, your next task will be to define a weight initialization function that does just this!
You can refer back to the lesson on weight initialization or even consult existing model code, such as that from the networks.py
file in CycleGAN Github repository to help you complete this function.
import torch.nn as nn
def weights_init_normal(m):
"""
Applies initial weights to certain layers in a model .
The weights are taken from a normal distribution
with mean = 0, std dev = 0.02.
:param m: A module or layer in a network
"""
# classname will be something like:
# `Conv`, `BatchNorm2d`, `Linear`, etc.
classname = m.__class__.__name__
# TODO: Apply initial weights to convolutional and linear layers
if hasattr(m, 'weight') and (classname.find('Conv') != -1 or classname.find('Linear') != -1):
nn.init.normal(m.weight.data, 0.0, 0.2)
if hasattr(m, 'bias') and m.bias is not None:
nn.init.constant(m.bias.data, 0.0)
Define your models' hyperparameters and instantiate the discriminator and generator from the classes defined above. Make sure you've passed in the correct input arguments.
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
def build_network(d_conv_dim, g_conv_dim, z_size):
# define discriminator and generator
D = Discriminator(d_conv_dim)
G = Generator(z_size=z_size, conv_dim=g_conv_dim)
# initialize model weights
D.apply(weights_init_normal)
G.apply(weights_init_normal)
print(D)
print()
print(G)
return D, G
# Define model hyperparams
d_conv_dim = 32
g_conv_dim = 32
z_size = 100
"""
DON'T MODIFY ANYTHING IN THIS CELL THAT IS BELOW THIS LINE
"""
D, G = build_network(d_conv_dim, g_conv_dim, z_size)
Check if you can train on GPU. Here, we'll set this as a boolean variable train_on_gpu
. Later, you'll be responsible for making sure that
- Models,
- Model inputs, and
- Loss function arguments
Are moved to GPU, where appropriate.
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
import torch
# Check for a GPU
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
print('No GPU found. Please use a GPU to train your neural network.')
else:
print('Training on GPU!')
Now we need to calculate the losses for both types of adversarial networks.
- For the discriminator, the total loss is the sum of the losses for real and fake images,
d_loss = d_real_loss + d_fake_loss
.- Remember that we want the discriminator to output 1 for real images and 0 for fake images, so we need to set up the losses to reflect that.
The generator loss will look similar only with flipped labels. The generator's goal is to get the discriminator to think its generated images are real.
You may choose to use either cross entropy or a least squares error loss to complete the following real_loss
and fake_loss
functions.
def real_loss(D_out):
'''Calculates how close discriminator outputs are to being real.
param, D_out: discriminator logits
return: real loss'''
batch_size = D_out.size(0)
labels = torch.ones(batch_size)
if train_on_gpu:
labels = labels.cuda()
criterion = nn.BCEWithLogitsLoss()
loss = criterion(D_out.squeeze(), labels)
return loss
def fake_loss(D_out):
'''Calculates how close discriminator outputs are to being fake.
param, D_out: discriminator logits
return: fake loss'''
batch_size = D_out.size(0)
labels = torch.zeros(batch_size)
if train_on_gpu:
labels = labels.cuda()
criterion = nn.BCEWithLogitsLoss()
loss = criterion(D_out.squeeze(), labels)
return loss
import torch.optim as optim
# params
lr = 0.0002
beta1=0.5
beta2=0.999 # default value
# Create optimizers for the discriminator D and generator G
d_optimizer = optim.Adam(D.parameters(), lr, [beta1, beta2])
g_optimizer = optim.Adam(G.parameters(), lr, [beta1, beta2])
Training will involve alternating between training the discriminator and the generator. You'll use your functions real_loss
and fake_loss
to help you calculate the discriminator losses.
You've been given some code to print out some loss statistics and save some generated "fake" samples.
Keep in mind that, if you've moved your models to GPU, you'll also have to move any model inputs to GPU.
def train(D, G, n_epochs, print_every=50):
'''Trains adversarial networks for some number of epochs
param, D: the discriminator network
param, G: the generator network
param, n_epochs: number of epochs to train for
param, print_every: when to print and record the models' losses
return: D and G losses'''
# move models to GPU
if train_on_gpu:
D.cuda()
G.cuda()
# keep track of loss and generated, "fake" samples
samples = []
losses = []
# Get some fixed data for sampling. These are images that are held
# constant throughout training, and allow us to inspect the model's performance
sample_size=16
fixed_z = np.random.uniform(-1, 1, size=(sample_size, z_size))
fixed_z = torch.from_numpy(fixed_z).float()
# move z to GPU if available
if train_on_gpu:
fixed_z = fixed_z.cuda()
# epoch training loop
for epoch in range(n_epochs):
# batch training loop
for batch_i, (real_images, _) in enumerate(celeba_train_loader):
batch_size = real_images.size(0)
real_images = scale(real_images)
# ===============================================
# YOUR CODE HERE: TRAIN THE NETWORKS
# ===============================================
# 1. Train the discriminator on real and fake images
d_optimizer.zero_grad()
# Compute the discriminator losses on real images
if train_on_gpu:
real_images = real_images.cuda()
D_real = D(real_images)
d_real_loss = real_loss(D_real)
# Generate fake images
z = np.random.uniform(-1, 1, size=(batch_size, z_size))
z = torch.from_numpy(z).float()
# move x to GPU, if available
if train_on_gpu:
z = z.cuda()
fake_images = G(z)
D_fake = D(fake_images)
d_fake_loss = fake_loss(D_fake)
# add up loss and perform backprop
d_loss = d_real_loss + d_fake_loss
d_loss.backward(retain_graph = True)
d_optimizer.step()
# 2. Train the generator with an adversarial loss
g_optimizer.zero_grad()
# Compute the discriminator losses on fake images
# using flipped labels!
D_fake = D(fake_images)
g_loss = real_loss(D_fake) # use real loss to flip labels
# perform backprop
g_loss.backward()
g_optimizer.step()
# ===============================================
# END OF YOUR CODE
# ===============================================
# Print some loss stats
if batch_i % print_every == 0:
# append discriminator loss and generator loss
losses.append((d_loss.item(), g_loss.item()))
# print discriminator and generator loss
print('Epoch [{:5d}/{:5d}] | d_loss: {:6.4f} | g_loss: {:6.4f}'.format(
epoch+1, n_epochs, d_loss.item(), g_loss.item()))
## AFTER EACH EPOCH##
# this code assumes your generator is named G, feel free to change the name
# generate and save sample, fake images
G.eval() # for generating samples
samples_z = G(fixed_z)
samples.append(samples_z)
G.train() # back to training mode
# Save training generator samples
with open('train_samples.pkl', 'wb') as f:
pkl.dump(samples, f)
# finally return losses
return losses
Set your number of training epochs and train your GAN!
# set number of epochs
n_epochs = 8
"""
DON'T MODIFY ANYTHING IN THIS CELL
"""
# call training function
losses = train(D, G, n_epochs=n_epochs)
Plot the training losses for the generator and discriminator, recorded after each epoch.
fig, ax = plt.subplots()
losses = np.array(losses)
plt.plot(losses.T[0], label='Discriminator', alpha=0.5)
plt.plot(losses.T[1], label='Generator', alpha=0.5)
plt.title("Training Losses")
plt.legend()
View samples of images from the generator, and answer a question about the strengths and weaknesses of your trained models.
# helper function for viewing a list of passed in sample images
def view_samples(epoch, samples):
fig, axes = plt.subplots(figsize=(16,4), nrows=2, ncols=8, sharey=True, sharex=True)
for ax, img in zip(axes.flatten(), samples[epoch]):
img = img.detach().cpu().numpy()
img = np.transpose(img, (1, 2, 0))
img = ((img + 1)*255 / (2)).astype(np.uint8)
ax.xaxis.set_visible(False)
ax.yaxis.set_visible(False)
im = ax.imshow(img.reshape((32,32,3)))
# Load samples from generator, taken while training
with open('train_samples.pkl', 'rb') as f:
samples = pkl.load(f)
_ = view_samples(-1, samples)
When you answer this question, consider the following factors:
Answer:
The generated faces appear that one or two skin colors are mixed. So if we categorized dataset as per skin color, or sex. It seems that hair impact much on face, so we should consider it, also, when we categorize dataset of face.
Model size is good as output face picture size is small 32x32, unfortunately, most of face picture don’t have its chin, so I couldn’t see how chin impact on whole face.
Initially, I’ve started with 20 epoch, as I checked losses, it seems that it is good to do early stopping at around 7 epoch, to save training time.
When submitting this project, make sure to run all the cells before saving the notebook. Save the notebook file as "dlnd_face_generation.ipynb" and save it as a HTML file under "File" -> "Download as". Include the "problem_unittests.py" files in your submission.