Assignment objective
This assignment is to feedback on your learning in deep learning theory and its application to data analytics or artificial intelligence problems.
It builds on Assignment 1 but requires a higher level of mastery of deep learning theory and programming/engineering skills. In particular, you will experience training a much deeper network on a large-scale dataset. You will encounter practical issues that help you consolidate text book learning.
Task 1
Solving MNIST with Convolutional Neural Net- works
In Assignment 1, you tackled the image classification problem in MNIST. There, you used a Densely Connected Neural Network. You should now know that is not an optimal model architecture for the problem. In Assignment 2, you will apply the best practices of deep-learning computer vision to achieve better image classification performance.
Task 1.1
Revisit MNIST classification with DNN
Review your Assignment 1 solution, and reproduce the experiment here. Try to improve the model without changing the model architecture.
Task 1.2
Train a ConvNet from scratch
Build a ConvNet to replace the densely connected network in Task 1.1. Report the classification accuracy on the test set. Aim to achieve higher accuracy.
Task 1.3
Build an input pipeline for data augmentation
Build a data preprocessing pipeline to perform data augmentation. (You may use Keras Image Data Generator or write your own transformations.)
Report the new classification accuracy. Make sure that you use the same number of training epochs as in Task1.2.
(Optional) Profile your in put pipeline to identify the most time-consuming operation. What actions have you taken to address that slow operation? (Hint: You may use the Tensor flow profiler.)
Task 1.4
MNIST with transfer learning
Use a pretrained model as the convolutional base to improve the classification performance.(Hint: You may use model sin Keras Applications or those in the Tensor Flow Hub.)
Try both with fine-tuning and without fine-tuning
Report the model performance as before
Task 1.5
Performance comparison
How many parameters are trainable in each of the two settings (with and without fine-tuning)? How does the difference impact the training time?
Which setting achieved higher accuracy? Why did it work better for this problem? Have we benefitted from using the pretrained model?
Task 2
Fast training of deep networks
Task 2.1
Train a highly accurate network for CIFAR10
In this task, you will train deep neural networks on CIFAR dataset. Compared with the datasets that you have worked on so far, CIFAR10 repre- sents a relatively larger multi-class classification problem and presents a great opportunity for you to solve a "harder" problem.
Task 2.1.1
Document the hardware used
Before you start, write down your hardware specifications, including
the GPU model, the number of GPUs, and the GPU memory
the CPU model, the number of CPUs, and the CPU clock speed
(Hint: you may find commands like nvidia-smi, l scpu or psut i l useful.)
Task 2.1.2
Train a "shallow" ConvNet
Build a ConvNet with fewer than 10 layers. Train the network until it converges. You will use this network as a baseline for the later experiments.
Plot the training and validation history
Report the testing accuracy
Task 2.1.3
Train a ResNet
Train a residual neural network (ResNet) on the CIFAR10 training data and report the test accuracy and the training time.
The ResNet is a popular network architecture for image classification. You may find more information about how ResNet works by reading this paper.
(You may implement a resnet model or use an existing implementation. In either case, you should not use pretrained network weights.)
Task 2.2
Fast training of ResNet
In this task, you will experiment with different ways to reduce the time for training your ResNet on CIFAR10. There are different ways to speedup neural network training; below are two ideas. Please select at least one idea to implement. Explain the experiment steps and report the final performance and training time.
Option 1. Learning rate schedule
Use a learning rate schedule for the training. Some popular learning rate schedules include.
the Step Decay learning rate (e.g., see here)
Cyclical learning rates
The exponential learning rate
Also, Keras provides some convenient functions that you can use.
Option 2. Look ahead optimiser
Read this paper and implement the Lookahead optimiser
Task 2.3
Performance comparison
Based on the above experiments, which method or which combination of methods result in the best accuracy with the same training time.
Task 3
(HD level task) Research on new models
Today, ResNet has become a very mature ConvNet architecture. In this task, you will research one recent ConvNet architecture. You may choose an architecture from the reference list below.
Write a short report for your research, covering these points:
Identify the main issues that your chosen architecture aims to address. (For example, does it try to reduce the number of parameters or to speed up the training?)
What measures the architecture used to reduce the number of parameters, or reducing the training cost, or improving the model performance?
Implement the architecture and compare its performance on CIFAR10 with ResNet. You may include your implementation, experiments, and analyses here in this notebook.
Comments