Homework 2: Exploring models, data, and random numbers using NumPy and Matplotlib and solving a simple ODE

Homework 2: Exploring models, data, and random numbers using NumPy and Matplotlib and solving a simple ODE#

CMSE Logo

✅ Put your name here
#

Learning Goals#

Content Goals#

Review how to define and use functions in Python (including using the return statement and default arguments)
Use functions to calculate statistical values and build computational models
Load data using NumPy
Visualize models and data using matplotlib
Compare models to data to understand assumptions and limitations
Solve a simple ordinary differential equation (ODE) using a numerical method

Practice Goals#

Use the internet as a resource for learning how to write code, learn new things, and troubleshoot problems
Read Python module documentation to learn how to accomplish new things that may be unfamiliar to you

Assignment instructions#

Work through the following assignment, making sure to follow all the directions and answer all the questions.

This assignment is due at 11:59 pm on Friday, October 11. It should be uploaded into the “Homework Assignments” submission folder for Homework #2. Submission instructions can be found at the end of the notebook.

Version check#

Before you begin, make sure your Jupyter kernel is Python 3. There are significant differences between Python 2 and Python 3, and we use Python 3 in this course. The kernel type at the upper-right corner should say “Python 3”, and the following code should show a version of the form: 3.x.y (where “x” and “y” are the sub-version numbers).

If your kernel is Python 2, try selecting Python 3 from the Kernel menu above under the sub-menu “Change kernel”. If that does not work, consult the Teams help channel for assistance.

import sys
print(sys.version)

Grading#

Generating models and comparing to data (22 points)
Looking at distributions and calculating simple statistics (37 points)
Solving a simple ODE (23 points)

Total points possible: 82

0. Importing the modules you will need for this assignment#

In this assignment you will be using two of the new modules you’ve learned about, numpy and matplotlib. Of course, in order to make sure you can use these modules when you need them, you have to import them.

Put the import commands you need to include to be able to use numpy and matplotlib in this notebook in the cell below. Make sure to execute the cell before you move on!

# Put your import commands here

1. Generating models and comparing to data (22 points)#

As the course name suggests, CMSE 801 is a mix of computational modeling and data analysis. We’ve experimented a bit already this semester with both building and visualizing models and performing some exploratory data analysis. In this part of the assignment, you’re going to practice doing a bit of both!

Revisiting the population model from class#

In class we practiced writing a function that could produce a population growth model as a function of time. You’re going to recreate that function and that model here.

As a reminder, the model we used was logistic model. For this model, the growth of the population (\(P\)) as a function of time, \(t\), can be modeled using the following equation:

\[P(t) = \frac{K}{1 + Ae^{-k_R (t-t_0)}}\]

where \(K\) represents the carrying capacity of the population, which is the maximum population that the environment can sustain, and \(k_R\) is the relative growth rate coefficient (in units of 1/time). \(A\) is defined as:

\[A = \frac{K-P_{0}}{P_{0}}\]

where \(P_{0}\) is the initial value of the population at \(t=t_0\).

Remember that the exponential, \(e\), can be computed using the numpy module, np.exp().

✅ Question 1.1 (3 points)#

Write a function called calculate_population which takes the following values as input arguments:

An initial population, \(P_{0}\)
A carrying capacity, \(K\)
A relative growth rate, \(k_R\)
A list of times in years
An initial time \(t_0\)

Make sure your functions returns a list or an array of population values for the provided list of time values. (This should basically be the function you wrote in class.)

# Put your "calculate_population" function here

✅ Question 1.2 (3 points)#

Test your function to see if it generates the right set of population values by using the following parameters and making a plot of the result.

Use a carrying capacity, \(K\), of 12 billion, but define K to be “in billions” so that you just use 12 for your value (this avoids dealing with numbers with many zeros!)
Use an initial population, \(P_0\), of 1 billion, but define \(P_0\) to be “in billions” so that you just use 1 for your value, just like \(K\).
Use relative population growth rate, \(k_R\), of 0.03 per year.

You’ll calculate this population model for an array of years that span 1800 to 2100, which is provided in the cell below.

Make sure you label your x and y axes with appropriate labels.

If everything goes as planned, you should get a plot that looks like this:

# This is the array of time values you should use:
t_list = np.arange(1800,2101,1)

# Put your code here:

Comparing your model to data#

Now that you have a functional population growth modeling tool, you’re going to try to use that tool to model some population data!

Of course, before you can model the data, you need to load it into your notebook.

✅ Question 1.3 (2 points)#

If you haven’t already downloaded it, download the pop_data.csv file from the course website and make sure it is in the same place as your notebook so that you can load it into your notebook.

Using numpy.loadtxt() load all three columns of the pop_data.csv file into your notebook. The three columns are, in this order:

The years for the model
Population Model #1 (in billions)
Population Model #2 (in billions)

You are strongly encouraged to store each column in a separate variable to make it easier to work with the data as you continue through this assignment. Also, you should make sure to check and see if the data file has a header line at the top of the file explaining what each column is (which you can do by just opening the file or looking at it on the web). If so, you’ll want to make sure you skip that line when loading your data.

# Put your code for loading the data here

✅ Question 1.4 (3 points)#

Now that you’ve loaded up your data, start off by making a plot of Population Model #1 as a function of time. You should make sure that you plot the data as a collection of individual dots rather than connected lines.

You should end up with something that looks like this, but you can choose whatever color, marker shape, and marker size that you want:

Again, make sure you give your plot appropriate axis labels so it is easy to tell what your plot is showing.

# Put your code here

Using the data and your modeling tool to estimate the “best fit” parameters#

Later in the semester, we’ll learn about a tool that we can use to fit a model to data to determine the parameters of the model that best fit the data, but for now, you are doing to take a “guess-and-check” approach to try and fit the data using your model. Of course, you’ll want to make some educated guesses based on the properties of the data and the parameters you’re using to create your model.

✅ Question 1.5 (4 points)#

In order to take your best guess as to what the model parameters are that match the Population Model #1 data, make a plot that includes a smooth model line (like you did in Question 1.2) along with the data values (like you did in Question 1.4). Once you have your plot, adjust the carrying capacity, \(K\), and growth rate, \(k_R\), parameters in your model until your smooth line roughly goes through all of the data points. Report your “best fit” parameters in the markdown cell below.

It might take a bit of experimenting to narrow in on the best fit values, so make sure you think about how the parameters change the shape of your model curve!

# Put your code here

✎ Record your best fit values for \(K\) and \(k_R\) here.

My best fit parameters are:

\(K\) =

\(k_R\) =

Confronting the limitations of your model given new data#

Sometimes, in the face of new data, the model we’re using to understand our data becomes insufficient for capturing all of the features in the data. If we trust our data, this is often an indication that we need to revise our model by changing our understanding of the phenomenon we’re trying to model or developing a new theory to match the observed properties.

Now you’re going to look at the Population Model #2 data and see if your current model is sufficient for matching the new data.

✅ Question 1.6 (5 points)#

Make a new plot that includes all three of the following:

Your best fit smooth model line from Question 1.5
The Population Model #1 data
The Population Model #2 data (in a different color than Population Model #1)

Add a legend to your plot to make it easy to tell what each of the three components are and, again, labels your axes!

# Put your code here

✅ Question 1.7 (2 points)#

Based on your plot from Question 1.6, how do you think your model for population growth would need to change in order to fit the Population Model #2 data? Note that just changing the model parameters isn’t going to be enough to match this new model. Something fundamentally would need to change about how the population model is defined.

In the markdown cell below, propose a modification to the current population model defined in the beginning of Part 2 that you think would allow you to model this new data.

✎ Propose your modified model here.

2. Looking at Random Number Distributions and Statistics (37 points)#

Occasionally, when doing computatonal modeling, we will find it useful to generate random numbers. Computers are capable of generating what we call “pseudorandom” numbers, which are pretty close to being completely random, but eventually, when you generate a lot of random numbers, a pattern will start to repeat itself. This is because these “random” numbers are generated by a mathematical algorithm that has been turned into code. As the semester progress, you’ll see examples of where we use random number generators!

In this section of the homework assignment, you will practice using the new numpy module that you have been getting familiar with to generate and visualize the distributions of random numbers created by your computer. You’ll also practice writing more functions that can calculate basic statistical properties of these random numbers.

✅ Question 2.1 (3 points)#

Generate an array \(x\) that has \(n=1000\) random numbers that are uniformly distributed over the interval \([0,1)\).

You’ll want to look up how to use the uniform submodule of numpy.random for this question. You may just want to try using a google search to get the information you need, or you could start from the documentation page for the random number generator functions in NumPy.

To use one of the random number generators in your notebook, you would write something that looks like the following:

# This generates 20 random integers from the range 0 to 100. You can test it out in the provided code below.
my_random_numbers = np.random.randint(0, 100, size=20) 

Also generate an array \(y\) that has \(n=1000\) random numbers from the standard normal distribution. To generate this array, check how to use the standard_normal submodule of numpy.random.

Extra information: If you’re not familiar with uniform or standard normal distributions check out this page and this page to see a comparison between the two distributions. It is not expected that you would know the properties or differences between these distributions, but once you know what they are, it can be useful to know how to generate random numbers from such distributions.

# Try running this example code to see how it works!
# You can run the cell multiple times to see how the numbers change
my_random_numbers = np.random.randint(0, 100, size=20) 
print(my_random_numbers)

# Put your code here for generating "x" and "y" as described above

✅ Question 2.2 (3 points)#

In order to visualize the distribution of these random numbers, we’ll want to use matplotlib. Read the documentation on matplotlib.pyplot.hist and briefly explain what this function does.

✎ Put your answer here

✅ Question 2.3 (4 points)#

Now that you have a sense for what the hist function does, use it to plot histograms of the random numbers you generated in Question 2.1. You should plot the histograms of the \(x\) and \(y\) arrays as two separate plots (you can do this by creating an extra code cell, or your can using a plt.figure() call before each hist() plot). Make sure to label your axes so it is obvious which histogram is which.

# Put your code here (create a second code cell, if needed)

✅ Question 2.4 (9 points)#

As we saw in class, when we start making multiple plots that are all related in some way, it can be useful to use the plt.subplot() functionality in Matplotlib. You’re going to use subplot() along with hist() to visualize multiple distributions of randomly generated numbers.

Follow these directions when making your plots:

Matplotlib contains the function subplot that enables making multiple plots and arranging them together (we saw this in some of the previous assignments from class).

Plot the two histograms for \(x\) and \(y\) in a single row using subplot (this means you should have two plots, one for your \(x\) values, and one for your \(y\) values).

Repeat this for larger values of \(n\) such as \(n=10^4\) and \(n=10^6\), each time creating a new row of plots. Note that this means you will be creating more “\(x\)” and “\(y\)” values for these larger values of \(n\). when you start generating more \(x\) and \(y\) values and making your histograms, you can experiment with increasing the value of the bins argument in pyplot.hist for the larger value(s) of \(n\).

In the end, you should have a total of three rows of plots, each with two histograms side-by-side (6 plots total).

Once you’ve got all of your subplots setup correctly, use the command plt.tight_layout() to arrange the subplots cleanly.

What do you observe in the plots as \(n\) increases?

Make sure to label all the x and y axes and give each row of plots a title that indicates the value of \(n\).

Useful tip: There is an example of how to use the subplot functionality at the end of the Day 6 In-class Assignment.

# Put your code here

✎ Put your answer to “What do you observe in the plots as \(n\) increases?” here.

✅ Question 2.5 (5 points)#

Now we’ll try to calculate some basic statistics of our distributions. You may wish to revisit the details of the Day 7 in-class assignment for the discussion of some basic statistical calculations.

To start, write a function compute_average that computes and returns the average of the values in an input array. Your function should do this calculation “from scratch” rather than using a built-in Python function. However, you can use np.mean() to test and see if the function you wrote agrees with the NumPy function.

# Put your code here

✅ Question 2.6 (5 points)#

Now, write a function compute_median that computes and returns the median of the values in an input array. Note that the median is the middle term after sorting an array in ascending order, when the total number of terms is odd. Otherwise, the median is the average of the two middle terms after the sorting. You can use the np.sort() function for this problem to sort your values. To determine if you can just grab the middle value or need to calculate the average of the two middle values, you’ll need to check if the array has a odd number of terms or not!

Again, you can check your answer using the np.median() function

# Put your code here

✅ Question 2.7 (4 points)#

Now, generate arrays of different lengths with \(n\) ranging from \(10\) to \(10^7\) (e.g., choose \(10\), \(10^2\), \(10^3\), \(10^4\), etc.), whose entries are random numbers from the standard normal distribution.

Then, compute the array average for each \(n\) using your new function.

Finally, plot the average value as a function of \(n\) using a log scale for the x-axis by using the plt.semilogx() function. Make sure to label the axes and give the plot a title.

What do you observe about the average as \(n\) increases? Try running the code multiple times to explore the behavior as your random numbers change. You should be observing the “law of large numbers” – google this to see if it confirms your observations.

# Put your code here

✎ Put your answer to “What do you observe about the average as \(n\) increases?” here.

✅ Question 2.8 (4 points)#

Repeat the above plot but with arrays whose entries are uniformly distributed over the interval \([0,1)\) and compute and plot the median array value over \(n\) instead.

What do you observe about the median as \(n\) increases? Does this match your expectations?

# Put your code here

✎ Put your answer to “What do you observe about the median as \(n\) increases?” here.

3. Solving ODEs: Revisting population modeling again, but with a twist! (23 total points)#

Modeling population growth is different for populations where individuals from the population are removed, or “harvested,” at a constant rate. For example, if a village relies on fishing from the lake nearby to survive, we could model the population of the fish using harvesting. In this example, assume the village needs exactly 90 fish per month from the lake.

Model of population for fish goes like:

\[\frac{dP}{dt} = r P (K - P) - H\]

\(r\) is a natural growth factor: 0.002 (fish * month)\(^{-1}\)
\(K\) is the maximum supportable population of the lake: 700 fish
\(H\) is the harvesting rate: 90 fish/month
\(P\) is the value of the population (fish)
\(dP/dt\) is the rate of change of the population (fish per month)

It may also be useful in this section to remember how values can be calculated incrementally using derivatives (we referred to these as “update equations”). For example, if we know that the value of \(x\) is \(x_1\) at a given time, and we know value of the derivative at that time is \(dx/dt\), then we can compute the value of \(x\) at a short time (\(\Delta t\)) later:

\[x_2 = x_1 + \frac{dx}{dt}\Delta t\]

✅ Question 3.1 Write a function for the derivative (4 points)#

Write a function to calculate and return the derivative \(dP/dt\). Use the template below to get started on your function.

def derivs(______): # inputs?
    
    # some constants provided...
    r = 0.002
    K = 700
    H = 90
    
    # more code?

✅ Question 3.2 Compute a derivative (1 point)#

The current population of the lake is estimated to be 85 fish. Call your derivs function to show the value of \(dP/dt\) when population is 85. You should double-check that your function is working correctly by comparing your answer to the expected value, which is 14.55. If you’re not getting the correct value, make sure you correct this before moving on! Come to office hours or ask questions on Teams if you need help.

## your code here

✅ Question 3.3 Compute population values (7 points)#

The code below provides an array of times spanning 5 years at 1-month increments. It also provides a matching array for the pops, and the initial population (85 fish) has been inserted into the first position of the array.

Write a loop that uses your derivs function to compute values of the population over the course of the next 5 years. Insert your population values into the pops array using approproate NumPy array indexing. When you are done, there should be no more zeros left in the pops array

# times, incrementing month by month, for a total of 5 years
dt = 1/12
times = np.arange(0, 5, dt)

# pops, initially set to a bunch of 0s, and the first value has already been inserted
pops = np.zeros(len(times))
pops[0] = 85

## your code here

✅ Question 3.4 Plot population growth (3 points)#

Create a plot that shows the population growth of fish in the lake over 5 years, assuming a starting population value of 85. Make sure to include axis labels and a title for your plot.

## your code here

✅ Question 3.5 Plotting alternative models (6 points)#

It turns out it is hard to count the number of fish in a lake (how do you tell them apart??). The original estimate was 85, but it could be off by plus or minus 15 fish. The real number could be anywhere between 70 and 100.

Compute and plot population growth assuming different starting values:

At time = 0, \(P_0\) = 85 (you’ve already done this one)
At time = 0, \(P_0\) = 70
At time = 0, \(P_0\) = 100

Your plot should show three different curves, which should be labeled in a legend as well, so you can easily see the different starting values of the population, and the corresponding curves. Make sure to include axis labels and a title.

## your code here

✅ Question 3.6 Reflecting on the model results (2 points)#

What stands out to you about the plot from 3.5? What do these results mean for the real life consequences of this village and its food supply?

✎ Put your answer here

Wait! Before you submit your notebook, read the following important recommendations#

When your TA opens your notebook to grade the assignment, it will be really useful if your notebook is saved in a fully executed state so that they can see the output of all your code. Additionally, it may be necessary from time to time for the TA to actually run your notebook, but when they run the notebook, it is important that you are certain that all the code will actually run for them!

You should get into the following habit: before you save and submit your final notebook for your assignments, go to the “Kernel” tab at the top of the notebook and select “Restart and Run all”. This will restart your notebook and try to run the code cell-by-cell from the top to the bottom. Once it finished, review you notebook and make sure there weren’t any errors that popped up. Sometimes, when working with notebooks, we accidentally change code in one cell that break our code in another cell. If your TA stumbles across code cells that don’t work, they will likely have to give you a zero for those portions of the notebook that don’t work. Testing your notebook one last time is a good way to make sure this doesn’t happen.

Once you’ve tested your whole notebook again, make sure you save it one last time before you upload it to D2L.

Congratulations, you’re done!#

Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Homework Assignments” section, find the submission folder link for Homework #2, and upload it there.