10 Pre-Class Assignment: Eigenvectors and Eigenvalues#

Goals for today’s pre-class assignment#

  1. Eigenvectors and Eigenvalues

  2. Solving Eigenproblems - A 2x2 Example

  3. Introduction to Markov Models


1. Eigenvectors and Eigenvalues#

Understanding Eigenvectors and Eigenvalues can be very challenging. These are complex topics with many facets. Different textbooks approach the problem from different directions. All have value. These facets include:

  • Understanding the mathematical definition of Eigenvalues.

  • Being able to calculate an Eigenvalue and Eigenvector.

  • Understanding what Eigenvalues and Eigenvectors represent.

  • Understanding how to use Eigenvalues and Eigenvectors to solve problems.

In this course we consider it more important to understand what eigenvectors and eigenvalues represent and how to use them. However, often this understanding comes from first learning how to calculate them.

Eigenvalues are a special set of scalars associated with a square matrix that are sometimes also known as characteristic roots, characteristic values (Hoffman and Kunze 1971), proper values, or latent roots (Marcus and Minc 1988, p. 144).

The determination of the eigenvalues and eigenvectors of a matrix is extremely important in physics and engineering, where it is equivalent to matrix diagonalization and arises in such common applications as stability analysis, the physics of rotating bodies, and small oscillations of vibrating systems, to name only a few.

The decomposition of a square matrix \(A\) into eigenvalues and eigenvectors is known in this work as eigen decomposition, and the fact that this decomposition is always possible, as long as the matrix consisting of the eigenvectors of \(A\) is square. This is known as the eigen decomposition theorem.

From: http://mathworld.wolfram.com/Eigenvalue.html

The following video provides an intuition for eigenvalues and eigenvectors.

from IPython.display import YouTubeVideo
YouTubeVideo("ue3yoeZvt8E",width=640,height=360, cc_load_policy=True)

Definition#

Let \(A\) be an \(n\times n\) matrix. Find a vector \(x\) in \(R^n\) such that:

\[Ax=\lambda x\]

The above can be rewritten as the following homogeneous equation, where \(I_n\) represents the identity matrix of size \(n\):

\[(A-\lambda I_n)x = 0\]

Trivially, this system can be solved by setting \(x=0\), but this isn’t very useful, so we want to consider another way to solve the equation: Finding values of \(\lambda\) such that the matrix \((A-\lambda I_n)\) becomes singular.

A matrix is singular when its determinant is zero, so the goal then would be to find solutions to \(|A-\lambda I_n|\) = 0. Finding this determinant works just the same as any other determinant, except now we have a variable within the expression. This will result in a polynomial, called the characteristic polynomial, whose roots will define the eigenvalues of the matrix \(A\).

Let’s consider this from a few more sources before working an example.

QUESTION: Explain why finding a nonzero solution to a system of homogeneous equations (like above) requires that the matrix corresponding to this system is singular.

Put your answer here

from IPython.display import YouTubeVideo
YouTubeVideo("PFDu9oVAE-g",width=640,height=360, cc_load_policy=True)

Examples:#

Here are a few more examples of how eigenvalues and eigenvectors are used (You are not required to understand all):

Using singular value decomposition for image compression. This post by Hassam Mahmood explains how you can compress an image by throwing away the small eigenvalues of \(A^TA\).

Spectral Clustering. Whether it’s in plants and biology, medical imaging, buisness and marketing, understanding the connections between fields on Facebook, or even criminology, clustering is an extremely important part of modern data analysis. It allows people to find important subsystems or patterns inside noisy data sets. One such method is spectral clustering, which uses the eigenvalues of the graph of a network. Even the eigenvector of the second smallest eigenvalue of the Laplacian matrix allows us to find the two largest clusters in a network.

Dimensionality Reduction/PCA. The principal components correspond to the largest eigenvalues of \(A^\top A\), and this yields the least squared projection onto a smaller dimensional hyperplane, and the eigenvectors become the axes of the hyperplane. Dimensionality reduction is extremely useful in machine learning and data analysis as it allows one to understand where most of the variation in the data comes from.

Low rank factorization for collaborative prediction. This is what Netflix does (or once did) to predict what rating you’ll have for a movie you have not yet watched. It uses the singular value decomposition and throws away the smallest eigenvalues of \(A^\top A\).

The Google Page Rank algorithm. The largest eigenvector of the graph of the internet is how the pages are ranked.

From: https://math.stackexchange.com/questions/1520832/real-life-examples-for-eigenvalues-eigenvectors


2. Solving Eigenproblems - A 2x2 Example#

from IPython.display import YouTubeVideo
YouTubeVideo("0UbkMlTu1vo",width=640,height=360, cc_load_policy=True)

Let’s consider calculating eigenvalues for a given \(2\times 2\) matrix. We want to solve:

\[|A - \lambda I_2 | = 0\]
\[\begin{split} \left| \left[ \begin{matrix} a_{11} & a_{12} \\ a_{21} & a_{22} \end{matrix} \right] - \lambda \left[ \begin{matrix} 1 & 0 \\ 0 & 1 \end{matrix} \right] \right| = \left| \left[ \begin{matrix} a_{11}-\lambda & a_{12} \\ a_{21} & a_{22}-\lambda \end{matrix} \right] \right| =0 \end{split}\]

We know this determinant:

\[(a_{11}-\lambda)(a_{22}-\lambda) - a_{12} a_{21} = 0 \]

If we expand the above, we get:

\[a_{11}a_{22}+\lambda^2-a_{11}\lambda-a_{22}\lambda - a_{12} a_{21} = 0\]

and

\[\lambda^2-(a_{11}+a_{22})\lambda+a_{11}a_{22} - a_{12} a_{21} = 0\]

This is the general characteristic polynomial for a 2x2 matrix. Solving for roots of this can be done using standard algebra techniques, including using the quadratic formula:

\[ \lambda = \frac{-b \pm \sqrt{b^2 - 4ac}}{2a}\ \text{where}\ a=1,\ b=a_{11}+a_{22},\ c=a_{11}a_{22}-a_{12}a_{21}\]

QUESTION: Using the above equation, what are the eigenvalues for the following \(2\times 2\) matrix? Try calculating this by hand and then store the lower value in a variable namede1 and the larger value in e2 to check your answer:

\[\begin{split}A = \left[ \begin{matrix} -4 & -6 \\ 3 & 5 \end{matrix} \right] \end{split}\]
# Put your answer here
from answercheck import checkanswer

checkanswer.float(e1,'c54490d3480079138c8c027a87a366e3');
from answercheck import checkanswer

checkanswer.float(e2,'d1bd83a33f1a841ab7fda32449746cc4');

DO THIS: Find a numpy function that will calculate eigenvalues and verify the answers from above.

# Put your answer here

QUESTION: What are the corresponding eigenvectors to the matrix \(A\)? This time you can try calculating by hand or just use the function you found in the previous answer. Store the eigenvector associated with the e1 value in a vector named v1 and the eigenvector associated with the eigenvalue e2 in a vector named v2 to check your answer.

Put your answer to the above question here.

from answercheck import checkanswer

checkanswer.eq_vector(v1,'35758bc2fa8ff4f04cfbcd019844f93d');
from answercheck import checkanswer

checkanswer.eq_vector(v2,'90b0437e86d2cf70050d1d6081d942f4');

DO THIS: sympy can also calculate the eigenvectors/values using some of its own methods. Make \(A\) into a sympy matrix and call the .eigenvects() methods on it. (Bonus: Try .charpoly() to retrieve the characteristic polynomial.)

sym.init_printing()

QUESTION: Both sympy and numpy have their own way to produce the eigenvectors and eigenvalues. How are they are different?

Put your answer to the above question here.


3. Introduction to Markov Models#

In probability theory, a Markov model is a stochastic model used to model randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it.

State space diagram. See text for description A diagram representing a two-state Markov process, with the states labelled E and A.

Each number represents the probability of the Markov process changing from one state to another state, with the direction indicated by the arrow. For example, if the Markov process is in state A, then the probability it changes to state E is 0.4, while the probability it remains in state A is 0.6.

From: Wikipedia

The above state model can be represented by a transition matrix.

At each time step (\(t\)) the probability to move between states depends on the previous state (\(t-1\)):

\[A_{t} = 0.6A_{(t-1)}+0.7E_{(t-1)}\]
\[E_{t} = 0.4A_{(t-1)}+0.3E_{(t-1)}\]

The above state model (\(S_t = [A_t, E_t]^T\)) can be represented in the following matrix notation:

\[S_t = PS_{(t-1)}\]

DO THIS: Create a \(2 \times 2\) matrix (P) representing the transition matrix for the above Markov space.

#Put your answer to the above question here
from answercheck import checkanswer

checkanswer.matrix(P,'de1c99f4b4a8d7ea541a084d836ba7e4');

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.