In order to successfully complete this assignment, you must follow all the instructions in this notebook and upload your edited ipynb file to D2L with your answers on or before 11:59pm on Friday December 11th.

BIG HINT: Read the entire homework before starting.

Homework 6: Linear Dynamical Systems

Picture of the COVID19 Virus, not really needed but provides motivation.

Image from: https://pixabay.com/

In this homework we are going to explore ways to model infectious disease using linear algebra. Given recent history I though I would do this example with COVID19.


1. Markov Steady State Model

The dynamics of infection and the spread of an epidemic can be modeled as a linear dynamical system.

In this model, we count the fraction of the population in the following four groups (States):

  • Susceptible: the individuals can be infected next day
  • Infected: the individuals are infected on the current day
  • Recovered (and immune): recovered individuals from the disease and will not be infected again
  • Deceased: the individuals died from the disease

We denote the fractions of these four groups in $x(t)$ as numbers between 0-1 which sum to 1. For example; $x(3)=(0.8,0.1,0.05,0.05)^T$ means that at day 3, 80\% of the population are susceptible, 10% are infected, 5% are recovered and immune, and 5% died.

We choose a simple Markov state transition model after each day:

  • 6% of the susceptible individuals will get infected
  • 2% of infected individuals will die
  • 10% of infected individuals will recover and immune to the disease
  • 14% of infected individuals will recover but not immune to the disease
  • The remainder transition of each state stays in that state.

COVID Model state diagram

%matplotlib inline
import numpy as np
import sympy as sym
import matplotlib.pylab as plt
sym.init_printing()

Do this: (5 points) We can write the changes of the distribution each day from the current state ($x_t$) to the next state ($x_{(t+1)}$) using the following linear equation: $$x_{(t+1)} = A x_t.$$

Define the state transition model $A$ using the above values?

#put your answer to the above question here. 
#The sum of the column vectors for a markov matrix should add to one.  This is easy to check:

np.allclose(np.sum(A,axis=0), [1,1,1,1])
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-3-48e889f3aa3f> in <module>
      1 #The sum of the column vectors for a markov matrix should add to one.  This is easy to check:
      2 
----> 3 np.allclose(np.sum(A,axis=0), [1,1,1,1])

NameError: name 'A' is not defined
from answercheck import checkanswer
checkanswer.detailedwarnings = False
checkanswer.matrix(A,'241f4130d119e260e6245fb10edde38b');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-4-b241fc97a959> in <module>
      1 from answercheck import checkanswer
      2 checkanswer.detailedwarnings = False
----> 3 checkanswer.matrix(A,'241f4130d119e260e6245fb10edde38b');

NameError: name 'A' is not defined

Do this: (10 points) If we start with $x(0) = (1, 0, 0, 0)$ for day 0. Use the for loop to update the distribution of the four groups for 50 days. Save the state for each iteration and plot values as they change over time. use the code below to get your started.

The answercheck provided is looking for $x(50)$, the distribution of the four groups on the 50th day.

x = np.matrix([[1],[0],[0],[0]])
x_all = np.matrix(np.zeros((4,51)))
x_all[:,0] = x

# put your code here


for i in range(4):
    plt.plot(x_all[i].T)
from answercheck import checkanswer
checkanswer.vector(x50,'30b39b16792ccd77b33d8e6e1cf5f64b');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-6-3c6a2587278a> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.vector(x50,'30b39b16792ccd77b33d8e6e1cf5f64b');

NameError: name 'x50' is not defined

Do this: (5 points) Find the diagonalization of the matrix $A$ such that $C$ and $D$ obey the following equation: $$CDC^{-1}= A$$

# Put your answer to the above quesiton here
#Check the answer 
np.allclose(C*D*C**(-1), A)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-8-dddfd585db27> in <module>
      1 #Check the answer
----> 2 np.allclose(C*D*C**(-1), A)

NameError: name 'C' is not defined

We introduce another way to find the distribution after 50 days. We know that $$x_{50}=Ax_{49} = A^2x_{48} = \dots = A^{50}x_{0}$$ In addition, we have $$A^{50} = (CDC^{-1})^{50} = CD^{50}C^{-1}$$ Note that $D^{50}$ is computationally easy to compute and we do not need the for loop to find the distribution at 50 days.

d = np.diagonal(D)

x0 = np.matrix([[1],[0],[0],[0]])
d50 = d**50
x2 = C*np.diag(d50)*C**(-1)*x0
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-9-e5ec069abc2c> in <module>
----> 1 d = np.diagonal(D)
      2 
      3 x0 = np.matrix([[1],[0],[0],[0]])
      4 d50 = d**50
      5 x2 = C*np.diag(d50)*C**(-1)*x0

NameError: name 'D' is not defined
from answercheck import checkanswer
checkanswer.vector(x2,'30b39b16792ccd77b33d8e6e1cf5f64b');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-10-c7e47a0907a9> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.vector(x2,'30b39b16792ccd77b33d8e6e1cf5f64b');

NameError: name 'x2' is not defined

In class we learned that the long term steady state of a system can also be calculated using the eigenvectors of the Markov transition matrix associated with an eigenvalue of one. However, the example Markov models we had in class had only one eigenvector with an eigenvalue of one. This model had two eigenvalues equal to one which represents two steady states. Therefore this system actually has an infinite number of steady state solutions represented by the linear combination of the eigenvectors associated with the eigenvectors one.

QUESTION: (5 points) Write a linear combination of vectors representing all possible long term steady state solutions to this model. NOTE there are many possible "correct" ways to answer this question. Please explain your syntax and describe your answer.

YOUR ANSWER HERE


2. Epidemic Dynamics - Continuous Model

In this second model we flip the problem around and model the system using ordinary differential equations. For this model we will define some additional variables $X= [s, i, r, d]$ which are the individual components of the state vector.

For example, we can model the change in state of $s$ (susceptible) using the following equation:

$$\dot{s} = {ds(t)\over dt} = -0.06s(t)+ 0.14 i(t)$$

It means that the changes in the susceptible group depends on susceptible and infected individuals at time t. It increase because of the recovered people from infected ones and it decreases because of the infection.

Similarly, we have the equations for the other three groups [Infected, Recovered, Deceased]. $$\dot{i} = {di(t)\over dt} = 0.06 s(t)-0.26 i(t)\\ \dot{r} = {dr(t)\over dt}= 0.1 i(t) \\ \dot{d} = {dd(t)\over dt} = 0.02 i(t)$$

%matplotlib inline
import numpy as np
import sympy as sym
import matplotlib.pylab as plt
import pandas as pd
sym.init_printing()

##We are going toreuse C and D. Clear them out here so not to confuse with the answers from above.
C = None
D = None

Do this: (5 points) We can write it as system of ODEs as $$\dot{X}_t = BX_t$$ Write down the matrix $B$ in numpy.matrix

#Put your answer here
from answercheck import checkanswer
checkanswer.matrix(B,'afdaff8b9a1020c184054c412cf56ac3');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-13-2b0d6ff6ddd0> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.matrix(B,'afdaff8b9a1020c184054c412cf56ac3');

NameError: name 'B' is not defined

Do this: (5 points) Find the matrix $C$ and diagonal vector $d$ such that $d$ consists of the diagonal components of $D$ such that $CDC^{-1} = B$.

#Put your answer here
D = np.diag(d)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-15-da39a5e81e9b> in <module>
----> 1 D = np.diag(d)

NameError: name 'd' is not defined
#Check answer by verifying that the decomposition is correct.
np.allclose(C**(-1)*B*C, D)
np.allclose(C*D*C**(-1), B)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-16-2eab4f6ee090> in <module>
      1 #Check answer by verifying that the decomposition is correct.
----> 2 np.allclose(C**(-1)*B*C, D)
      3 np.allclose(C*D*C**(-1), B)

TypeError: unsupported operand type(s) for ** or pow(): 'NoneType' and 'int'

Using this decomposition of $B$ ODE system can be rewritten as follows: $$\dot{X}_t = B X_t=CDC^{-1}X_t$$

Turns out this ODE system can be tricky to solve directly. However, we can linearly transform the variable for the ODE into a simpler problem. Let $Y_t=C^{-1}X_t$, and it's derivative $\dot{Y}_t = C^{-1}\dot{X}_t$. Substituting this new variable into the right side of the above equations we get:

$$\dot{X}_t =CDY_t$$

Multiply both sides by $C^{-1}$;

$$C^{-1}\dot{X}_t =C^{-1}CD=Y_t$$

We can now substitute $\dot{Y}_t$ in the left side and remove the identity which reduces the equation to:

$$\dot{Y}_t = DY_t$$

If we look at the each component in $y$, we have: $$\dot{y}_i(t) = d_{i}y(t), \quad i = 1,\dots,n$$

where $d_i$ is the $i$-th entry of the diagonal of $D$. These components turn out to be a really simple differential equation that all have following analytical solution:

$$y_i(t) = e^{d_{i}t}y_i(0)$$

Lets create a new matrix (called expD) using the components from above.

$$ \exp(D)^t = \left[ \begin{matrix} e^{d_{1}t} & 0 & 0 & 0 \\ 0 & e^{d_{2}t} & 0 & 0 \\ 0 & 0 & e^{d_{3}t} & 0 \\ 0 & 0 & 0 & e^{d_{4}t} \end{matrix} \right] $$

and rewrite the equation back in matrix form:

$$Y_t = exp(D)^tY_0$$

Now that we can solve the "easy" equation we can transform our solution back to the original space using the following equation: $$X_t = CY_t$$

Let's see how this works in practice. Consider the following start state $X_o$.

X0 = np.matrix([[1],[0],[0],[0]])

Do this: (5 points) Transform it into a new state $Y_o$ using syntax describe above and the matrix C calculated in the previous step:

#put your answer here
from answercheck import checkanswer
checkanswer.vector(Y0,'77f4f618ee960c8b383cab77106dda48');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-9a7b272dc526> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.vector(Y0,'77f4f618ee960c8b383cab77106dda48');

NameError: name 'Y0' is not defined

Next we take the exponent of the diagonals (using d from above) and create a matrix expD such that:

expD = np.diag(np.exp(d))
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-20-0b07d6e16f01> in <module>
----> 1 expD = np.diag(np.exp(d))

NameError: name 'd' is not defined

Do this: (5 points) find the solution to $Y_t = exp(D)^tY_0$ for $t = 200$:

#put your answer here
from answercheck import checkanswer
checkanswer.vector(Y200,'31ae0e1a5e743b01539f481067e828aa');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-22-00500a4d621a> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.vector(Y200,'31ae0e1a5e743b01539f481067e828aa');

NameError: name 'Y200' is not defined

Do this: (5 points) transform Y200 back to X200 using the equation above:

#Put your answer here
from answercheck import checkanswer
checkanswer.vector(X200,'61f37036e1a6c782cf4a59d91b9e0a54');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-5b05c763bb8a> in <module>
      1 from answercheck import checkanswer
----> 2 checkanswer.vector(X200,'61f37036e1a6c782cf4a59d91b9e0a54');

NameError: name 'X200' is not defined

Do this: (10 points) Write a loop that displays the value of $X_t$ for 200 iterations. (Hint: The first 50 steps should look similar to the plot in the previous question)

#Put your answer here

Question: (5 points) In your own words, describe why we make the effort to diagonalize these matrices.

YOUR ANSWER HERE


3. LSF Model with Real Data

Now lets look at the problem by looking at some real data. Consider the following dataset from COVID-19 Dashboard by the Center for Systems Science and Engineering (CSSE) at Johns Hopkins Universit This model is different because we do not have all data from all the states above. In this model we just have confirmed cases, confirmed deaths and number of people that have recovered.

%matplotlib inline
import numpy as np
import sympy as sym
import matplotlib.pylab as plt
import pandas as pd
sym.init_printing()
##DOWNLOAD the data (this may take a while)

from urllib.request import urlretrieve
urls = ['https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_confirmed_global.csv',
        'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_deaths_global.csv',
        'https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/time_series_covid19_recovered_global.csv',
       ]
states = ['confirmed','deaths','recovered']
colors = ['yellow', 'red', 'green']
for url,file in zip(urls,states):
    print(f"Downloading {file}")
    urlretrieve(url, f"{file}.csv");
Downloading confirmed
Downloading deaths
Downloading recovered
## Read in the data files
alldata = {}
allnames = {}
for file in states:
    print(f"Reading {file}")
    data=pd.read_csv(f"{file}.csv").values
    alldata[file] = data[:,4:]
    allnames[file] = data[:,1]
Reading confirmed
Reading deaths
Reading recovered
##Plot the data
Infected = np.sum(alldata['confirmed'],axis=0)
Immune = np.sum(alldata['recovered'],axis=0)
Deceased = np.sum(alldata['deaths'],axis=0)
for state, color in zip(states, colors):
    plt.plot(np.sum(alldata[state],axis=0), color=color, label=state);
plt.legend()
plt.xlabel('time')
plt.ylabel('Size of population')
Text(0, 0.5, 'Size of population')

All three curves in the above set can be modeled as third order polynomial in the form.

$$y = c_0 + c_1t + c_2t^2 + c_3t^3$$

As an example lets start by modeling the number of infected:

y = np.matrix(Infected).T

The x values are just a vector of numbers representing time which we will cast as a numpy.array:

t = np.array(list(range(len(y)))).T

The unknowns are the constant values $c = [c_0, c_1, c_2, c_3]^T$

Do This: (5 points) Define the matrix $A$ for the matrix equation $Ac=y$ that leads to the least-squares fit.

#Put your answer here

Do This: (5 points) The system is clearly overdefined. Define the Pseudoinverse for $A$

# Apseudo = 

Do This: (5 points) Solve the above system of equations for c using the Pseudoinverse.

#Put your answer here

Do This: (15 points) Summarise all of the steps by writting a function called solve3t(). That takes one of the three state vectors as inputs, solves using LSF and plots two curves (the original data and the model). the function should also return the C solution vector. Here is some code to help get you started:

def solve3t(data):
    y = np.matrix(data).T
    t = np.array(list(range(len(y)))).T
    
    ### Your code goes here
    
    plt.plot(t,y)
    plt.plot(t, A*c)
    plt.xlabel('Time')
    plt.ylabel('Number of people')
    return c
inf = solve3t(Infected)
inf
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-37-75b849f354c6> in <module>
----> 1 inf = solve3t(Infected)
      2 inf

<ipython-input-36-adc6a2c23d6f> in solve3t(data)
      6 
      7     plt.plot(t,y)
----> 8     plt.plot(t, A*c)
      9     plt.xlabel('Time')
     10     plt.ylabel('Number of people')

NameError: name 'A' is not defined
imm = solve3t(Immune)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-38-affb0978e8f5> in <module>
----> 1 imm = solve3t(Immune)

<ipython-input-36-adc6a2c23d6f> in solve3t(data)
      6 
      7     plt.plot(t,y)
----> 8     plt.plot(t, A*c)
      9     plt.xlabel('Time')
     10     plt.ylabel('Number of people')

NameError: name 'A' is not defined
dec = solve3t(Deceased)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-39-50ccd4fdf064> in <module>
----> 1 dec = solve3t(Deceased)

<ipython-input-36-adc6a2c23d6f> in solve3t(data)
      6 
      7     plt.plot(t,y)
----> 8     plt.plot(t, A*c)
      9     plt.xlabel('Time')
     10     plt.ylabel('Number of people')

NameError: name 'A' is not defined

Question: (5 points) Given the above results which of the three curves does the poorest job modeling the population data. in your own words explain how you came to this answer.

YOUR ANSWER HERE


Congratulations, we're done!

Turn in your assignment using D2L no later than 11:59pm on the day of class. See links at the end of this document for access to the class timeline for your section.

Written by Dirk Colbry and Ming Yan, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.