In order to successfully complete this assignment, you must follow all the instructions in this notebook and upload your edited ipynb file to D2L with your answers on or before Friday, April 21st at 11:59pm ET.

BIG HINT: Read the entire homework before starting.


Homework 5: Applications of Linear Algebra#

The goal of this homework assignment is to show you how some of the things you learned in this course can be applied. We hope you find it interesting!

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
import sympy as sym
sym.init_printing(use_unicode=True)
from sklearn import datasets
import pandas as pd

1. Principal Component Analysis (70 pts)#

Illustration of PCA.

Image from: https://en.wikipedia.org/wiki/Principal_component_analysis

In this question we are going to learn about the Principal Component Analysis (PCA) algorithm and how to use the Singular Value Decomposition (SVD) to implement PCA. First let us generate some data:

# This cell will generate 2D data - do not change its contents.
mu = np.array([4,2])
cov_mat = np.array([[8, 3], [3, 2]])
np.random.seed(1) # We seed the data so that it is the same everytime
X = np.random.multivariate_normal(mu, cov_mat, size=2023)

Our data X is stored in a \(2023 \times 2\) numpy array. Think of this array as storing \(2023\) two-dimensional data points.

QUESTION 1.1: (5 pts) In the cell below, make a scatter plot of the data. To get full credit, make sure to:

  • (1) use plt.scatter;

  • (2) make the data points have some transparency using the alpha=0.5 option and control the cycle size using the s=10 option;

  • (3) turn on the plt.grid; and

  • (4) use the option plt.axis('equal').

You may also wish to resize the markers used for each point. You should get a plot somewhat like the one above (without the arrows).

# Make your scatter plot here

The PCA algorithm attempts to find the directions of maximum variance within the data. This can be useful for finding patterns in the data. Looking at the picture above, the longer arrow pointing to the upper right is the direction of maximum variance in that data set, and the shorter arrow pointing to the upper left is the direction with the second most variation in the data that is also orthogonal to the first direction.

We would like to find these directions in an automated fashion without having to visualize the data and without needing any knowledge of the how the data was generated. Indeed, many data sets of interest are high dimensional and cannot be easily visualized. The principal components can be used as an orthogonal basis for representing the data along coordinates that correspond to decreasing variations within the data. In order to find these coordinates we would like to use tools from linear algebra and think of these principal components as vectors. However, there is a problem, which is that the vectors in the picture above are not “anchored” at the origin; rather they start at the center of the data set itself. However, linear algebra likes its vectors to start at the origin. Therefore, we need to first shift the data to the origin.

To shift the data to origin, we need to compute the sample mean of the data. If we have data \(x_1, \ldots, x_N \in \mathbb{R}^d\) (in our case, \(N=2023\) and \(d=2\)), then the mean of the data is:

\begin{equation*} \bar{x} := \frac{1}{N} \sum_{i=1}^N x_i \end{equation*}

Note that \(\bar{x} \in \mathbb{R}^d\), in other words it is a vector too. Then to center the data we subtract \(\bar{x}\) from each data point:

\begin{equation*} z_i = x_i - \bar{x} , , \quad \text{for all } i = 1, \ldots, N. \end{equation*}

If you look at the data you plotted, you should see that it is not centered. So let’s center it now.

QUESTION 1.2: (5 pts) Center your data X - np.average(X, axis=0) and store the centered data in the variable Z. Using your code from Question 1.1, make a scatter plot of your centered data. You should observe that it is now centered at the origin.

# Put your code here 

With our data centered at the origin, we can now treat each data point as a vector (in this case, each data point is a \(2\)-vector). We want to find the orthogonal directions of maximum variance within the data. To do so we need to estimate the covariance matrix of the data.

QUESTION 1.3: (5 pts) Compute the \(2 \times 2\) sample covariance matrix

\[C = \dfrac{1}{2023-1}Z^T Z\]

and store it in the variable C. Print C and print the variable cov_mat defined above, which was the covariance matrix used to generate the data. Spot check visually that C is close to cov_mat.

# Put your code here

Test your code below:

if np.allclose(C, np.cov(X.T)):
    print('Answer seems to be correct')
else:
    print('Answer seems to be incorrect')

The matrix \(C\) is symmetric, and therefore has two orthonormal eigenvectors. One can also verify the eigenvalues must be non-negative. Let \(\lambda_1 \geq \lambda_2\) be the two eigenvalues of \(C\), and let \(\phi_1\) and \(\phi_2\) be the corresponding orthonormal eigenvectors, i.e., \(C \phi_i = \lambda_i \phi_i\) for \(i=1,2\), \(\| \phi_1 \| = \| \phi_2 \| = 1\), and \(\phi_1 \cdot \phi_2 = 0\). The first eigenvector, \(\phi_1\), points in the direction of the maximum variance within the data set. The second eigenvector, \(\phi_2\), points in the orthogonal direction. These eigenvectors are the principal components of the data set. The squareroot of the eigenvalues indicates the amount of variance in each direction, i.e., the variance of the data in direction \(\phi_1\) is \(\sqrt{\lambda_1}\), and the variance of the data in the direction \(\phi_2\) is \(\sqrt{\lambda_2}\).

QUESTION 1.4: (5 pts) Compute the eigenvectors and eigenvalues of C and order the eigenvalues and corresponding eigenvectors in order of decreasing eigenvalue so that \(\lambda_1 \geq \lambda_2\) and \(C \phi_1 = \lambda_1 \phi_1\) and \(C \phi_2 = \lambda_2 \phi_2\). Then make a scatter plot of the centered data (same as in Question 1.2), but this time also plot (overlayed in the same figure) \(\sqrt{\lambda_1} \phi_1\) and \(\sqrt{\lambda_2} \phi_2\) as lines of a different color than the data markers used in your scatter plot (no need to put arrows on these vectors). You should have a plot that is pretty similar to the one at the top of this notebook taken from Wikipedia! Furthermore, you found the directions of maximum variance (i.e., the principal components) in the data in an automated fashion without using any prior knowledge about how the data was generated!

# Put your code here

QUESTION 1.5: (5 pts) For the centered data stored in Z, make a change of basis from the standard basis to the orthonormal basis \(\mathcal{B} = \{ \phi_1, \phi_2 \}\). You should get a new \(2023 \times 2\) matrix/array, which contains the coordinates of your data in the PCA basis \(\mathcal{B}\). Store this new coordinate matrix in the variable Z_pca. Make a scatter plot of your data in the PCA basis coordinate system.

# Put your code here

QUESTION 1.6: (5 pts) Compare your scatter plot of the original data X from Question 1.1 and your scatter plot of Z_pca from Question 1.5. Describe the transformation from X to Z_pca using the language of transformation matrices from earlier in the course (i.e., scaling, translation, reflection, rotation, shear). There is no need to write down the matrices precisely, but be clear as to: (1) which transformation matrices get you from X to Z_pca; (2) what order they need to be applied in; and (3) how you specify the parameters of the matrices you are using (e.g, the scale parameter in a scaling matrix).

Put your answer here.

When we computed the sample covariance matrix \(C = \tfrac{1}{2022}Z^T Z\), that may have reminded you of the singular value decomposition (SVD). Let us explore this link further. Recall that the SVD of a matrix \(M\) decomposes the matrix as

\begin{equation*} M = U \Sigma V^T , . \end{equation*}

QUESTION 1.7: (5 pts) Compute the SVD of Z using numpy and store the output of the svd function in three variables, U, s, and Vt. Verify that the rescaled singular values stored in s/\(\sqrt{2022}\) are equal to the squareroot of the eignvalues you computed in Question 1.4. Additionally, verify that Vt.T is equal to the eigenvectors you computed in Question 1.4, up to possible sign changes.

# Put your code here
s.shape

QUESTION 1.8: (5 pts) Prove that \(Z_{\mathrm{pca}} = U \Sigma\), and verify this fact numerically, where \(Z_{\mathrm{pca}}\) is defined in QUESTION 1.5. Note: For the numerical part, if in Question 1.7 you needed to flip the sign of one or both of the columns of \(V\) to make them align with \(\phi_1\) and/or \(\phi_2\), then you will need to flip the sign of the corresponding column of \(U\) as well.

Put your proof here.

# Put your code here
print(Z_pca)

Now let us have some fun. We are going to work with a \(4096\) dimensional data set in which each data point is a \(64 \times 64\) image of a person’s face. The next cell will download this data set for you.

# Download the faces dataset - you should not need to change its contents.
faces = datasets.fetch_olivetti_faces()
print(faces.images.shape)
print(faces.data.shape)

The data is stored in the faces variable. faces.images stores each face image; from the first print statement, we see there are \(400\) face images and each one is \(64 \times 64\) pixels. faces.data stores the same data but the face images have been unwound into a length \(64^2 = 4096\) vector. This latter one will be easier to work with, but harder to use for visualization.

The next cell plots the first \(15\) images in the faces data set.

# Plot the first 15 faces
fig = plt.figure(figsize=(8, 6))
for i in range(15):
    ax = fig.add_subplot(3, 5, i + 1, xticks=[], yticks=[])
    ax.imshow(faces.images[i], cmap=plt.cm.bone)

Our goal is to run PCA on the faces data set. As we will see below, this will be interesting for a number of reasons. In order to run PCA, we first need to compute the mean of the data set. This mean vector will correspond to the “average face” in the data set.

QUESTION 1.9: (5 pts) Store faces.data in the variable X (this part is done for you below) and compute the mean of the data set, \(\bar{x} \in \mathbb{R}^{4096}\), and store \(\bar{x}\) in the variable x_bar. Visualize \(\bar{x}\) by reshaping x_bar into a \(64 \times 64\) numpy array/matrix and store this reshaped array/matrix in the variable avg_face.

# This line will store faces.data in X
X = faces.data

# DO THIS: Put your code for computing x_bar here

# DO THIS: Reshape x_bar into a 64x64 numpy matrix/array here
avg_face = 

# This line will display the average face once you have stored it in the variable avg_face
plt.imshow(avg_face, cmap = 'bone')

QUESTION 1.10: (5 pts) Mean center the data set X and store the mean centered data in the variable Z. Compute the SVD of Z and store the output in the variables U, s, and Vt.

# Put your code here

Recall from Question 1.7 the rows of Vt are the principal components of the data set, which are ordered in terms of decreasing variance. Another way to think about these principal components is that the first one is the single coordinate that explains the most about the data set, the second principal component explains the second most about the data set, and so on. In this case, each principal component is a vector in \(\mathbb{R}^{4096}\), but we can reshape those vectors into \(64 \times 64\) images. These images are the so-called “eigenfaces.” They are face-like images that explain the variations between the different faces in the data set. The next cell will plot the first \(15\) eigenfaces, i.e., the first \(15\) principal components.

# Display the first 15 eigenfaces (i.e., the first 15 principal components) - no need to edit this cell.
eigenfaces = Vt[:15,:]
eigenfaces = np.reshape(eigenfaces, (15, 64, 64))
fig = plt.figure(figsize=(8, 6))
for i in range(15):
    ax = fig.add_subplot(3, 5, i + 1, xticks=[], yticks=[])
    ax.imshow(eigenfaces[i,:,:], cmap=plt.cm.bone)

QUESTION 1.11: (5 pts) Take a look at the first \(15\) eigenfaces plotted in the previous cell. Describe the different patterns that you see in the eigenfaces. What characteristics of the faces do they correspond to?

Put your answer here.

As part of the day 16 in-class work on SVD, you compressed an image by treating that image as a matrix, computed the SVD of that matrix (image), and then only kept some of the singular values. We can do something similar with PCA, except we are are not going to compute the SVD of each image like in the day 16 in-class work, but rather we are going to compute the coordinates of all our face images in the PCA basis. This is more powerful than the day 16 in-class approach because we only need to compute one SVD for all 400 face images (the one we already computed to get the principal components), as opposed to a different SVD for each image.

The first thing we need to do in order to carry out this compression is to decide how many principal components to use. Remember each singular value (rescaled by \(1/\sqrt{399}\)) measures the variance of the face data along its corresponding principal component (eigenface). The larger the singular value, the more important the principal component. Principal components with small singular values can be discarded and will result in only a small reconstruction error when we try to reconstruct the original faces. The call below plots the singular values rescaled by \(1/\sqrt{399}\).

# Plot the singular values - you should not need to change this cell.
plt.plot(s/np.sqrt(399))
plt.grid(True)
plt.xlabel('Singular value index')
plt.ylabel('Singular value')
plt.title('Singular values in decreasing order');

From the plot we see that \(50\) principal components should do the trick. Let \(z_1, \ldots, z_{400}\) be the centered face image vectors, i.e., \(z_i \in \mathbb{R}^{4096}\) for each \(i = 1, \ldots, 400\). Let \(v_1, \ldots, v_{50}\) be the first \(50\) principal components; remember each \(v_i \in \mathbb{R}^{4096}\) and they are orthonormal. Therefore \(v_1, \ldots, v_{50}\) form an orthonormal basis (ONB) for the subspace \(V_{\mathrm{pca}} = \mathrm{span}(v_1, \ldots, v_n)\). To compress the images, we compute the coordinates of each centered image \(z_i\) in the subspace \(V_{\mathrm{pca}}\), which are given by:

\begin{equation*} \text{Coordinates of } z_i \text{ in } V_{\mathrm{pca}} = (z_i \cdot v_1, z_i \cdot v_2, \ldots, z_i \cdot v_{50}) \in \mathbb{R}^{50} , . \end{equation*}

Note, if this works, we will have compressed each image by nearly \(99\)% of its original size!

QUESTION 1.12: (5 pts) Compute the coordinates of each centered image \(z_i\) in the subspace \(V_{\mathrm{pca}} = \mathrm{span}(v_1, \ldots, v_{50})\). Store the compressed images in the variable Z_compressed, which should be a numpy matrix or numpy array of size \(400 \times 50\).

# Put your code here

Compressing images is great, but at some point you want to be able to go back and look at them. You can’t do that with the compressed \(50\)-dimensional vectors. So we need to also be able to uncompress them. This corresponds to projecting the centered images onto \(V_{\mathrm{pca}}\) and then adding back in the average face that we computed back in Question 1.9.

QUESTION 1.13: (10 pts) Do the following: (1) For each centered image vector \(z_i\), compute its projection onto \(V_{\mathrm{pca}}\), recalling the formula:

\begin{equation*} \mathrm{proj}{V{\mathrm{pca}}} z_i = (z_i \cdot v_1) v_1 + (z_i \cdot v_2) v_2 + \cdots + (z_i \cdot v_{50}) v_{50} , . \end{equation*}

You should be able to use Z_compressed from Question 1.12 to help with this part. (2) Add back in the mean data point, \(\bar{x}\), to each of the projected image vectors, i.e., compute and store:

\begin{equation*} \tilde{x}i = \mathrm{proj}{V_{\mathrm{pca}}} z_i + \bar{x} , . \end{equation*}

(3) Reshape and visualize the first 15 uncompressed faces (use/copy code from above as needed). Compare to the original faces that were plotted before Question 1.9. Comment on how well your compression worked.

# Put your code here

# (1) DO THIS: Project each centered face vector, z_i, onto V_pca 

# (2) DO THIS: Add back in x_bar to your projected vectors

# (3) DO THIS: Reshape and visualize the first 15 uncompressed faces (use code from above to help)

Put your comments here.


2. Linear Dynamical Systems (30 pts)#

We consider the following very simple model of a rocket railroad car.

Rocket car

Imagine a rail road car with mass \(m = 1\) which is powered by rocket engines on each side. We introduce the variables

  • \(x(t)\) is the position of the rocket rail road car on the train track at time \(t\).

  • \(v(t)\) is the velocity of the rocket rail road car at time \(t\).

  • \(F(t)\) is the force from the rocket engines at time \(t\) (the sign depends on which engine is firing).

Mathematical Model. Trivially, the velocity is the derivative of the position with respect to time, i.e.

\[x'(t) = v(t).\]

Newton’s Law says that the force from the rocket engines is equal to the mass of the cart (\(m = 1\)) multiplied by the acceleration (the derivative of the velocity), i.e. $\(mv'(t) = F(t),\)$

or since \(m = 1\), simply

\[v'(t) = F(t),\]

Our goal. By controlling the force \(F(t)\) from the rocket engines, we wish to have the rocket rail road car move to the origin \(0\) and stop there (zero velocity). We assume that we have sensors which can measure the position \(x(t)\) and velocity \(v(t)\) of the car, and that we can then control the force \(F(t)\) by setting

\[F(t) = \alpha x(t) + \beta v(t)\]

for some constants \(\alpha\) and \(\beta\).

QUESTION 2.1: (6 pts) We aim to model this as a linear dynamical system. We let our state vector have two components, the position and the velocity, i.e.,

\[\begin{split}z(t) = \begin{bmatrix}x(t) \\ v(t)\end{bmatrix}\end{split}\]

Use the equations of the motion above to write this as a \(2\times 2\) linear dynamical system. i.e.,

\[z'(t) = Az(t)\]
\[\begin{split}\begin{bmatrix}x'(t) \\ v'(t)\end{bmatrix} = \begin{bmatrix}? & ? \\ ? & ?\end{bmatrix} \begin{bmatrix}x(t) \\ v(t)\end{bmatrix}\end{split}\]

where \(A\) is a \(2\times 2\) matrix.

Put your answer here

Recall from ICA-16 that the solution to the ODE \(z'(t) = Az(t)\) with initial condition \(z(0) = z_0\) is given by

\[z(t) = e^{tA}z_0\]

where \(e^{tA}\) is the matrix exponential of \(A\).

Also, if \(A\) is an \(n \times n\) matrix that is diagonalizable as \(A = CEC^{-1}\), then for any real number \(t\), the matrix exponential of \(tA\) is

\[e^{tA} = Ce^{tE}C^{-1}\]

where \(e^{tE}\) is a diagonal matrix with the numbers \(e^{\lambda_1 t}, e^{\lambda_2 t}, \ldots, e^{\lambda_n t}\) on the diagonal

QUESTION 2.2: (6 pts) Suppose the car starts at an initial position of \(x(0) = 20\), has an initial velocity of \(v(0) = 30\), and we set the constants which control the force of the rocket engines as \(\alpha = -\tfrac{1}{8}\) and \(\beta = -\tfrac{3}{4}\).

Make a scatterplot of the state vector \(z(t)\) for timesteps \(t = 0, 1, \ldots, 50\).

# you can edit this code or write your own
A = # State matrix for this dynamical system
z_0 = # Initial state vector

# Diagonalize A 

N = 50 # Number of timesteps
z_all = np.zeros((2, N)) # create a 2xN array to save all steps

for t in range(N):  # For each timestep  
    z_t =  # Compute z(t) using diagonalization formula z_t = C*e^(t*E)*C^(-1)*z_0
    z_all[:, t] = z_t.reshape(2, ) # save z(t) in array

plt.scatter(np.asarray(z_all[0,:]),np.asarray(z_all[1,:]), s=3) # plot all time steps
plt.scatter(list(z_0[0,:]),list(z_0[1,:]), color = 'r', s=5) # plot start point as a reference

QUESTION 2.3: (6 pts) For the choice of force control constants \(\alpha = -\tfrac{1}{8}\) and \(\beta = -\tfrac{3}{4}\), answer the following questions:

  • Did \(z(t)\) get close to the origin when we started at \(z_0 = \begin{bmatrix}20 \\ 30\end{bmatrix}\)?

  • Will \(z(t)\) get close to the origin no matter what the initial positon and velocity are?

  • Can you prove your conclusion? Can you write down the general form of \(z(t)\) given inital \((x_0,v_0)\)?

Hint: Think about the eigenvalues of \(A\) and how the state vector \(z(t) = Ce^{tE}C^{-1}z_0\) behaves as \(t \to \infty\).

QUESTION 2.4: (6 pts) Now, suppose the car starts at an initial position of \(x(0) = 20\), has an initial velocity of \(v(0) = 30\), but we instead set the constants which control the force of the rocket engines as \(\alpha = \tfrac{1}{25}\) and \(\beta = -\tfrac{24}{25}\).

Make another scatterplot of the state vector \(z(t)\) for timesteps \(t = 0, 1, \ldots, 50\).

# Put your code here (hint, it will be similar to that for Question 2.4)

QUESTION 2.5: (6 pts) For the choice of force control constants \(\alpha = \tfrac{1}{25}\) and \(\beta = -\tfrac{24}{25}\), answer the following questions:

  • Did \(z(t) \to 0\) as \(t \to \infty\) when we started at \(z_0 = \begin{bmatrix}20 \\ 30\end{bmatrix}\)?

  • Will \(z(t) \to 0\) as \(t \to \infty\) if we start at \(z_0 = \begin{bmatrix}-30 \\ 30\end{bmatrix}\) instead? What if we started at \(z_0 = \begin{bmatrix}-31 \\ 29\end{bmatrix}\)?

  • Can you explain why this happens?

Hint: Think about the eigenvalues and eigenvectors of \(A\), the entries in \(e^{tE}\), and how the state vector \(z(t) = Ce^{tE}C^{-1}z_0\) will behave as \(t \to \infty\) depending on the coordinates of \(z_0\) in the eigenvector basis.

Put your answers here


Congratulations, we’re done!#

Written by Dirk Colbry, Matthew Hirn, Ming Yan, Son Tu, and Santhosh Karnik Michigan State University.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.