Week 11: Pre-Class Assignment: Dimensionality Reduction

Week 11: Pre-Class Assignment: Dimensionality Reduction#

✅ Put your name here.#

pcann

Goals for this Pre-Class Assignment#

Understand principal component analysis
Linear and non linear transformation

Total number of points: 37 points

This assignment is due by 11:59 p.m. the day before class, and should be uploaded into the appropriate “Pre-Class Assignments” submission folder on D2L. Submission instructions can be found at the end of the notebook.

Part 0: Reading (12 points)#

✅ Do This: Read chapter 8 and be sure you can answer the questions

Name four types of PCA. (2 points)
How can you determine performance/accuracy of a DR algorithm? (2 points)
Look up (using the internet or any other resource you might have) these methods - {Isomap, t-SNE, LDA, MDS} - and describe what they do. Are they built into sklearn? Is there any reason you would use these over LLE? (4 points)
What is the curse of dimensionality? (2 points)
Is dimensionality reduction reversbile? That is, once you have performed dimensionality reduction, can you recover your original data again? (2 points)

Put your answers here

Part 1: Combining dmensionality reduction with ANNs (25 points)#

One of the uses of dimensionality reduction (DR) is that you can make ML methods run faster. The idea is that you string together two ML methods sequentially:

unsupervised learning is used for DR,
supervised learning is used for your task (e.g., classification).

Let’s try this by building an ANN classifier that does DR first. We’ll use the standard digits MNIST dataset, but feel free to swap in your own. Look at what the code does in the next cell. Be sure you understand all of it; then, run it.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib as mpl
import pandas as pd

# This can take a minute or two to download. Depending on your TF version you may get different results. 
from keras.datasets import fashion_mnist, mnist

# digits
(X_train, y_train), (X_test, y_test)  = mnist.load_data()

X_train = X_train/255.0
X_test = X_test/255.0

X_train = X_train.reshape(-1, 784 )
X_test = X_test.reshape(-1, 784 )
# confirm we know the size of the data
print(f"MNIST Digits:")
print(f"Size of training set: {X_train.shape}")
print(f"Size of testing set: {X_test.shape}")
print(f"Target labels: {np.unique(y_test)}")

Your taks is to take the MNIST data from above and put it in lower dimensionality using PCA. With the lower-dimensional data, run that through the neural network. There are the obvious questions:

does the quality/score degrade, and by how much?
how many dimensions can we go down to from 784 and get a given score?
how much time does this save?

You will answer these questions by making one plot with two panels.

✅ Do This: (6 points) Write code that

creates two neural network architectures. The only difference between the two architectures should be the number of neurons in the hidden layer. You should choose how to set up the other hyperparameters, e.g. the activation function, optimization method, weight initialization, etc. We don’t want to change too many hyperparameter, otherwise we would compare apples to oranges. Below there is some helper code to get you started. (2 points)
measures the time it takes to fit the train dataset for each architecture. (2 points)
measures the accuracy score on the train and on the test set for each architecture. (2 points)

Some helper code is given below

from tensorflow.keras.layers import Dense, Flatten
from tensorflow.keras.models import Sequential
from tensorflow.keras.optimizers.legacy import Adam

def create_ANN(input_shape = (784,), hidden_layers = (100,) ):
    
    opt = Adam(learning_rate=0.01, beta_1=0.9, beta_2=0.999, epsilon=1e-07)
    
    model = Sequential()
    model.add( Dense(hidden_layers[0], input_shape= input_shape, activation="relu", kernel_initializer='glorot_normal') )

    if len(hidden_layers) > 1:
        for hl in hidden_layers[1:]:
            model.add( Dense(hl, input_shape= input_shape, activation="relu", kernel_initializer='glorot_normal'))
    
    model.add(Dense(10, activation="softmax", kernel_initializer='glorot_normal'))
    
    model.compile(optimizer=opt,
                  loss = 'sparse_categorical_crossentropy',
                  metrics=['accuracy'] )
    return model
        
# Create two neural networks with the specified architectures
nn_100 = create_ANN((784,), (100,) )

history = nn_100.fit(X_train, y_train, verbose = 0, epochs = 10)

nn_100_results = pd.DataFrame(history.history)
nn_100_results.plot(figsize=(8, 5), ylim=[0, 1], grid=True, xlabel="Epoch",
    style=["r--o", "b--o", ])

loss_train, accuracy_train = nn_100.evaluate(X_train, y_train, verbose = False)
loss_test, accuracy_test = nn_100.evaluate(X_test, y_test, verbose = False)

# List of PCA dimensions to try.
pca_dimensions = np.array([2, 3, 10, 50, 100, 200, 400, 784])

✅ Do This: (12 points) Write a piece of code that, given a number of dimensions, performs PCA on the X_train dataset and feeds the transformed dataset to the two NN architectures. Measure the fit time and the accuracy score on the train and test set. Then repeat the same procedure for many PCA dimensions and make a plot with two panels (e.g., using subplots) (4 points):

in the left panel: plot the training and testing scores versus the number of PCA dimensions for the two ANN architectures; there will be four curves in total: two for each architecture. (2 points)
in the right panel: plot the fit time versus PCA dimension for each architecture. There should be two curves in total in this plot for two curves. (2 points)
Add axis labels and a legend for each line in the plot. (2 points)
For each panel make sure to change the colors, linestyles, markers and other parameters to make the plot easy to understand (2 points)
Do you think you should make the x scale logarithmic ?

In a final markdown cell, discuss your conclusions from looking at these two panels.

Below there is some helper code to get you started.

fig, ax = plt.subplots(1, 2, figsize = (10,4))

✅ Question: (3 points) Looking at your plot above what PCA dimension gives a good trade-off between accuracy and speed? Explain your answer

Put your answer here

✅ Question: (4 points) In Week 09 we did something similar but using autoencoders. Go check the lecture code in D2L. How does PCA differ from what we did with autoencoders? Make sure to check the hyperparameters of the autoencoder.