In order to successfully complete this assignment, you must follow all the instructions in this notebook and upload your edited ipynb file to D2L with your answers on or before Friday, April 7th at 11:59pm ET.

BIG HINT: Read the entire homework before starting.

Homework 4: Projection and diagonalization#

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import sympy as sym
import math
sym.init_printing(use_unicode=True)
# import for checking answers
import os.path
from urllib.request import urlretrieve
if not os.path.isfile("answercheck.py"):
    urlretrieve('https://raw.githubusercontent.com/colbrydi/jupytercheck/master/answercheck.py', 'answercheck.py');
from answercheck import checkanswer

1. Computing the exponential of a matrix (35 pts)#

In this exercise, we will learn how to apply the exponential function \(e^x\) to a matrix in a way that makes sense.

The first step is to use a fact from calculus that we can represent the function \(e^x\) by its Taylor series, that is, a polynomial with infinitely many terms:

\[e^x = \frac{x^0}{0!} + \frac{x^1}{1!} + \frac{x^2}{2!} + \frac{x^3}{3!} + \ldots + \frac{x^n}{n!} + \ldots\]

The following function computes the value of this Taylor series where we only use d many terms, i.e., use a Taylor polynomial of degree d.

def exp_taylor(x, d):
    y = 0.0
    for n in range(d + 1):
        y += (1 / math.factorial(n)) * x ** n 
    return y

QUESTION 1.1: (3 pts) Find the smallest degree Taylor polynomial that can compute the value of \(e^1 = e\) with enough accuracy that numpy cannot distinguish it from np.e using np.isclose.

(Hint: try using a for loop.)

# Put your answer to the above question here

The nice thing about polynomials is that they only use two operations: multiplication and addition. We can do both of these things to matrices! So even though we don’t really have a way to compute \(e^X\) when \(X\) is a matrix, we can compute

\[\frac{X^0}{0!} + \frac{X^1}{1!} + \frac{X^2}{2!} + \frac{X^3}{3!} + \ldots + \frac{X^n}{n!} + \ldots\]

(or at least an approximation to it using a finite number of terms). This is the definition of the matrix exponential \(e^X\).

QUESTION 1.2: (5 pts) Using the function np.linalg.matrix_power, modify the function above to compute the Taylor polynomial of degree d using a matrix X.

# Put your code here

def exp_taylor_matrix(X, d):
    y = 0.0
    for n in range(d + 1):
        y +=  # MODIFY THIS LINE
    return y

Test your code below:

X = np.matrix([[2, 3, -1], [1, -6, 2], [2, 1, -3]])
Y = exp_taylor_matrix(X, 1000) # We use a very high degree to get a good approximation

from answercheck import checkanswer
checkanswer.matrix(Y,"ebc8b8e6828a967e5ee6634bf3f003eb", decimal_accuracy=3);

QUESTION 1.3: (5 pts) Compare your result to applying numpy’s exponential function np.exp to the matrix \(X\). Why are the results not equal?

# Put your code for the above question here 

Put your explanation for the above question here

Great! Now we have a way to take the exponential of a matrix. However, as we learned, matrix multiplication is \(O(n^3)\) for an \(n \times n\) matrix, and this gets expensive when you’re doing it over and over to compute the powers of a matrix.

But section 3 of ICA 15 gave us another way to compute powers of matrices. We will use this to create a faster exp_taylor_matrix function.

For any diagonalizable matrix \(X\), we can find two matrices \(C\) and \(D\) where \(D\) is diagonal such that

\[X = CDC^{-1}.\]

QUESTION 1.4: (5 pts) Explain in words what \(C\) and \(D\) are and how you would compute them in numpy.

Put your answer to the above question here

By using the fact that \(I = C^{-1}C\), repeated multiplication gives

\[\begin{split} \begin{aligned} X^n &= (CDC^{-1})^n \\ &= CDC^{-1} \times CDC^{-1} \times CDC^{-1} \\ &= CD(C^{-1}C)D(C^{-1}C) \cdots (C^{-1}C)DC^{-1} \\ &= CDIDI \cdots IDC^{-1} \\ &= CD^nC^{-1}. \end{aligned} \end{split}\]

QUESTION 1.5: (6 pts)

Define the polynomial (of matrices)

\[p(X) = I + X + 2 X^2 + 3 X^3.\]

Using the distributive property of matrix multiplication and the power property above, show that if \(X\) is diagonalizable, then

\[ p(X) = C \times p(D) \times C^{-1} .\]

Put your answer to the above question here. Writing in plain-text is fine, or you can choose to use LaTeX if you prefer.

QUESTION 1.6: (6 pts) For the diagonal matrix \(D = \begin{bmatrix} \lambda_1 & 0 \\ 0 & \lambda_2 \end{bmatrix} \), using the fact that \(D^n = \begin{bmatrix} \lambda_1^n & 0 \\ 0 & \lambda_2^n \end{bmatrix}, \)

show that

\[\begin{split}p(D) = \begin{bmatrix} p(\lambda_1) & 0 \\ 0 & p(\lambda_2) \end{bmatrix}\end{split}\]

(where \(p\) works on numbers instead of matrices by replacing the identity \(I\) with \(1\)).

Put your answer to the above question here. Writing in plain-text is fine, or you can choose to use LaTeX if you prefer.

It turns out that the same thing is true for the Taylor series of \(e^x\). That is, if

\[p(X) = \frac{X^0}{0!} + \frac{X^1}{1!} + \frac{X^2}{2!} + \frac{X^3}{3!} + \ldots + \frac{X^n}{n!} + \ldots \]

then

\[\begin{split}p(X) = C \times p(D) \times C^{-1} = C \begin{bmatrix} p(\lambda_1) & 0 & 0 & 0 \\ 0 & p(\lambda_2) & 0 & 0\\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & p(\lambda_n) \end{bmatrix} C^{-1}\end{split}\]

But if we remember that the Taylor series is equal to the exponential function, that is, \(p(\lambda_i) = e^{\lambda_i}\), we have $\(e^X = p(X) = C \begin{bmatrix} e^{\lambda_1} & 0 & 0 & 0 \\ 0 & e^{\lambda_2} & 0 & 0\\ 0 & 0 & \ddots & 0 \\ 0 & 0 & 0 & e^{\lambda_n} \end{bmatrix} C^{-1}. \)$

So let’s put this all together to write a faster exp_taylor_matrix function.

QUESTION 1.7: (5 pts) Fill in the code in the following function. Add any lines where you need to.

def faster_exp_taylor_matrix(X):
    eigenvals = # FIND EIGENVALUES
    expD = np.diag(np.exp(eigenvals))
    C = # FIND C
    Y = C @ expD @ np.linalg.inv(C)
    return Y

Rerun the test from before to check your answer:

X = np.matrix([[2, 3, -1], [1, -6, 2], [2, 1, -3]])
Y = faster_exp_taylor_matrix(X)


from answercheck import checkanswer
checkanswer.matrix(Y,"ebc8b8e6828a967e5ee6634bf3f003eb", decimal_accuracy=3);

Notice that the faster function is indeed faster! For larger matrices and higher degree polynomials, this would make even more of a difference.

# Run this cell!

import time
t_start = time.time()
Y = exp_taylor_matrix(X, 100)
print("Original function time:", time.time() - t_start, "seconds")
t_start = time.time()
Y = faster_exp_taylor_matrix(X)
print("Faster function time:", time.time() - t_start, "seconds")

2. Projection (35 pts)#

Given the vector \(v=[-2,1,3]^\top\) and the plane \(W = \{[x_1,x_2,x_3]^\top: x_1+x_2-x_3=0\},\) in \(\mathbb{R}^3\), we want to find the projection of \(v\) onto \(W\) using the following steps.

QUESTION 2.1: (5 pts) Use the equation defining the plane to solve for \(x_1\) in terms of \(x_2\) and \(x_3\). Use this to write the vectors in \(W\) depending on just the values \(x_2\) and \(x_3\).

(Hint: This is very similar to solving for the nullspace of a matrix where you have two free variables.)

Put your answer to the above question here

QUESTION 2.2: (5 pts) Rewrite this your answer to QUESTION 2.1 to be a linear combination of the form \(x_2 v_1 + x_3 v_2\), where \(v_1\) and \(v_2\) are vectors.

Check your values of \(v_1\) and \(v_2\) below:

# Put your answer for the above question here
v1 = 
v2 =
from answercheck import checkanswer
checkanswer.vector(v1 / np.linalg.norm(v1),"2a0160bf571ec2047b7cb0355066aa24");
checkanswer.vector(v2 / np.linalg.norm(v2),"0d5dd28ffe9384507c3378da18db6869");

QUESTION 2.3: (5 pts) Check that v1 and v2 are linearly independent. You may either do this by hand or using Python.

# Use this cell for an answer with Python (leave blank if doing by hand)

Use this cell for an answer by hand or to explain your Python answer if necessary

This tells us that these two vectors are a basis for \(W\), but you can check that they are not orthonormal.

QUESTION 2.4: (5 pts) Since an orthonormal basis is easier to work with, orthogonalize v1 and v2 by applying the Gram-Schmidt process. Store the resulting orthonormal vectors as w1 and w2

# Put your answer to the above question here
w1 =
w2 =

QUESTION 2.5: (5 pts) Use python to verify that {w1,w2} is indeed an orthonormal basis of the plane \(W\). Specifically, verify

  • each basis vector has unit length

  • orthogonality between different basis vectors.

  • w1, w2 \(\in W\)

# Put your answer to the above question here

Recall that in Pre-class 13, the following mathematical definition of projection onto a subspace is defined.

Definition: Let \(W\) be a subspace of \(R^n\) of dimension \(m\). Let \(\{w_1,\cdots,w_m\}\) be an orthonormal basis for \(W\). Then the projection of vector \(v\) in \(R^n\) onto \(W\) is denoted as \(\mbox{proj}_Wv\) and is defined as $\(\mbox{proj}_Wv = (v\cdot w_1)w_1+(v\cdot w_2)w_2+\cdots+(v\cdot w_m)w_m\)$

QUESTION 2.6: (5 pts) Use this formula to find \(P_w v\), the projection of \(v=[-2,1,3]^T\) onto \(W\)

# Put your answer to the above question here
from answercheck import checkanswer
checkanswer.vector(Pwv,"02b1f7fc612d9c4721e21e45f4444688");

QUESTION 2.7: (5 pts) What is the geometric meaning of \(v-P_W v\)? Use this to find the distance from \(v\) to \(W\).

Put your explantion here

# Put your code to find the distance here
distance =
from answercheck import checkanswer
checkanswer.matrix(distance,"dd5b8b47bdfbbad36cedf5ee17fe60b1");

3. Properties of orthogonal matrices (30 pts)#

In pre-class 13, we learned the definition of orthogonal matrices. Let us review it here.

Definition: A \(n\times n\) matrix \(U\) is orthogonal if the columns of \(U\) form an orthonormal basis of \(\mathbb {R} ^{n}\).

Orthogonal matrices have many alternative definitions. Explictly, the following conditions are all equivalent.

  1. \(U\) is orthogonal

  2. the rows of \(U\) form an orthonormal basis of \(\mathbb{R}^{n}\).

  3. the columns of \(U\) form an orthonormal basis of \(\mathbb{R}^{n}\).

  4. U is invertible, and \(U^T = U^{-1}\).

  5. For any two vectors \(x\) and \(y\) in \(\mathbb{R}^n\), multiplication by \(U\) preserves their dot product; that is, \((Ux)\cdot (Uy) = x \cdot y\).

QUESTION 3.1: (5 pts) Let \(v_1 = \left[\frac{1}{\sqrt 2}, 0, \frac{1}{\sqrt 2}\right]\), \(v_2 = [0, 1, 0]\), \(v_3 = \left[\frac{1}{\sqrt 2}, 0, -\frac{1}{\sqrt 2}\right]\).

Use one of conditions 2, 3, or 4 above to verify that \(v_1,v_2,v_3\) form an orthonormal basis of \(\mathbb{R}^3\).

# Put your answer for the above question here

QUESTION 3.2: (5 pts) Use python code to construct the following two \(3\times 3\) transformation matrices:

  • Rotation matrix R that rotates a given vector in \(\mathbb{R}^3\) around the \(y\)-axis counterclockwise by an angle of \(40^\circ\)

  • Reflection matrix F that reflects a given vector in \(\mathbb{R}^3\) with respect to the \(x\)-\(y\) plane.

# Put your answer for the above question here

QUESTION 3.3: (5 pts) Verify that R and F are orthogonal matrices by computing their inverses and verifying that \(R^{-1}= R^T\) and \(F^{-1}= F^T\). It is also a fact that the product of orthogonal matrices is also an orthogonal matrix. Verify that the product RF is an orthogonal matrix.

# Put your answer for the above question here

QUESTION 3.4: (5 pts) The determinant of the orthogonal matrix will always be +1 or -1. Verify that the determinant ofR, F and RF is either +1 or -1.

# Put your answer for the above question here

QUESTION 3.5: (5 pts) Compute the eigenvalues of R, F and RF. What did you find about the magnitudes of these eigenvalues? Could you come up with a theoretical explanation for this phenomenon?

# Put your answer for the above question here

QUESTION 3.6: (5 pts) Are each of the following statements true or false?

  1. The identity matrix is orthogonal.

  2. Every diagonal matrix is orthogonal.

  3. If \(A\) is an \(n \times n\) orthogonal matrix, and \(x\) is any column vector in \(\mathbb{R}^n\), then \(\|Ax\| = \|x\|\).

  4. Every entry of an orthogonal matrix must be between \(0\) and \(1\) inclusive.

  5. An orthogonal matrix must be symmetric.

Put your answers for the above questions here

Congratulations, we’re done!#