I- design an autoencoder network, which projects data to a subspace to obtain features that can be then used for classification. You will use the fashion MNIST.
We will compare two different approaches: First the stacked auto encoder – SAE (5 hidden layers) trained with MSE as found in the literature. Then you can use an MLP or a SVM as the classifier for the bottleneck layer outputs (features) of the trained SAE. Use the CNN classifier as a comparison and explain the results.
I suggest the SAE layers to be 500-200-XXX, i.e. the number of units XXX of the bottleneck layer (which selects the dimensionality of the feature space) is selected by you, and it should be as small as possible for good generalization. You should experiment with several bottleneck layer sizes, and explain your criterion for selection.
II- Discard the decoder part of the SAE network after the bottleneck layer, i.e. keep a 2 hidden layer MLP with XXX outputs, and train it with the maximum quadratic mutual information (QMI) loss between the MLP output and the labels. The advantage of the QMI is that it does not require the same number of MLP outputs as labels, which is much more flexible than MSE. It uses distances between PDFs (functions).
However, the QMI criterion still needs class information for training. The labels exist in 10 dimensional space as points (1 hot encoding), which simplifies the calculation of the QMI. Note that you must estimate QMI with kernels, as explained in chapter 2 of the ITL book. You wrote the code for MEE in the last homework and the estimators are basically the same (i.e. sums of Gaussian terms). Recall that the only term that changes for BP is dJ/de, which gives the injected error for backprop, so you can use the same program and just change the computation of the error. The estimators are presented in Ch 2 (2.9 and 2.10) of my ITL book. Use minibatches of 200 samples to decrease the computation complexity.
Collaborative Assignment:
To apply the MLP as a classifier, you need an extra step, i.e. how to decide the class of each test input. My hint is to implement a maximum a posteriori classifier (MAP) by using the model output (y) with trained data using label information (C), which allows you to estimate P(C|y) in testing because the prior probabilities are all the same. But I leave the details for you to decide, explain and implement. Please compare the performance of this classifier with the SAE.
Remember that the goal is to present a comprehensive comparison amongst all these ML algorithms, so use confusion matrices for each method, and also quantify the computation time in your processor.
Solution output Screenshots
Get solution of this project at affordable price. please send your query at contact@codersarts.com we'll share you price quote
Kommentare