Homework 4: Fitting models to data and evaluating the fit#

✅ Put your name here
#Learning Goals#
Content Goals#
Fitting curves to data and evaluating model fit
Practice Goals#
Using open source tools to fit models to data
Reading and understanding documentation for open source tools
Assignment instructions#
Work through the following assignment, making sure to follow all of the directions and answer all of the questions.
This assignment is due at 11:59pm on Friday, November 15.
It should be uploaded into D2L Homework #4. Submission instructions can be found at the end of the notebook.
Table of Contents and Grading#
0. Importing the modules you will need for this assignment
1. Fitting a model to our stellar luminosity data (16 points)
2. Fitting a model to global sea level data (24 points)
Total points possible: 40
0. Importing the modules you will need for this assignment#
In this assignment you will be likely be using matplotlib, numpy, pandas, and curve_fit. Of course, in order to make sure you can use these modules when you need them, you have to import them.
Put the import commands you need to include to be able to use all of the above modules in this notebook in the cell below. Make sure to execute the cell before you move on!
# Put your import commands here
1. Fitting curves to your data: The Luminosity Function (16 points)#
Having go through the process of visualizing the star data in youe last homework assignment, now we want to answer the following question:
What is the relation between Luminosity, Temperature, and Radius?
In the following we will try and find this relation.
1.1: The model#
Let’s revisit the plots that you should have ended up with in your last homework assignment:
Ideally, you noticed that the main sequence stars fall on a (almost) straight line in both plots. Since both axis are on a log-log plot we can infer that there exist a power-law relation between the Luminosity, \(L\), Temperature, \(T\), and radius, \(R\). Specifically, something of the form:
For the sake of simplicity, we will try to model only the relation between Luminosity and Temperature. In order to simplify the above equation, let’s look back at the the description of our data:
Detais about this dataset:#
The dataset contains information on 240 stars. The columns are
Temperature (K): Absolute temperature of the star measured in Kelvin
Luminosity (L/Lo): Luminosity of the star relative to the luminosity of the sun \(L_o = 3.828 \times 10^{26}\) Watts (Avg Luminosity of Sun)
Radius (R/Ro): Radius of the star relative to the radius of the sun \(R_o = 6.9551 \times 10^8\) m (Avg Radius of Sun)
Absolute Magnitude (Mv): Absolute Magnitude (which is a measurement of the star’s brightness)
Star type: Type of star; this is a limited dataset with only 6 types = “Red Dwarf”, “Brown Dwarf”, “White Dwarf”, “Main Sequence”, “Super Giants”, “Hyper Giants”
Star color: Color of the star
Spectral Class: Classification of stars (which is a another way of categorizing stars based on their observed properties, compared to the “star type”)
Notice that the Luminosity is a relative quantity, i.e. we have information of the ratio between the real luminosity divided (normalized) by the luminosity of the sun, \(L_o = 3.828 \times 10^{26}\). Therefore, it would be better to compare the relative luminosity with the relative temperature. That means, if we focus just on the temperature-luminosity relationship and normalize the temperature as well, the relation now becomes
where \(T_o = 5780\) K is the temperature of the sun. Let’s see if we can find the value of \(\beta\) that best fits our data.
✅ Do this (7 points):
Reload the data,
stars.csv, from the previous homework into a dataframe calledstars_data.Add a new column to your dataset and call it
Normalized Temperature. This column is obtained by dividing theTemperature (K)column by the temperature of the sun \(T_o = 5780\) K.Write a function called
lum_modelthat takes two arguments, the normalized temperature (the independent variable) and \(\beta\), and computes the relative luminosity according to the equation above (i.e. \(\left( \frac{T}{T_o} \right )^{\beta}\)). Remember, now that you have a new column that is the normalized temperature, \(\frac{T}{T_0}\), this simplifies your function and you don’t need to worry about doing the normalization inside the function if you give it the right column of temperature data.
# Put your code here
1.2 Fitting the model#
✅ Do this (4 points):
Now use
curve_fitwith the above function to find the parameter \(\beta\) by fitting theLuminosity(L/Lo)as a function of theNormalized Temperaturefor the main sequence stars only (you’ll want to use a mask to select these stars).What is the value of \(\beta\)? (You should just be able to print it out!)
# Put your code here
1.3 Checking your model#
✅ Do this (3 points): Make a plot comparing your model and the data. Don’t forget to log-scale your axis, label them, and add a legend.
# Put your code here
✅ Answer the following (2 points):
Do you think this highly simplified model is a good model for this data? Clearly justify your answer by thinking back to our discussions on how we can quantify goodness of fit (looking for more than a “yes” or “no” answer). Discuss if there are regions of the data where the model fits better or worse and how you’re identifying those regions.
Side Note: we’re not doing super careful stellar modeling here, so don’t worry too much about the physical interpretation of the value of \(\beta\) you found. We’re just trying to see if we can find a simple model that fits the data.
# Put any additional code here, if it helps you answer the question
✎ Put your answer here
2. Fitting a model to global sea level data (24 total points)#
Exploring changes in global sea levels#
For this part of the assignment, you’ll be exploring a dataset that contains information about the average global sea level as a function of time. This is real data that was collected from four different instruments dating back to 1993.
Your job is to look for patterns or trends in this dataset, fit those patterns/trends with models, and interpret your results.
✅ Question 2.1 (1 point): Using Pandas, you’re going to load the sl_global.txt dataset provided with this assignment. You should note that this file is not a CSV file like many of the files we’ve worked with in this course. As a result, the provided code to load the dataset uses the Pandas read_table() function instead of read_csv().
Once you’ve loaded the data, display the last 15 entries in the resulting Pandas dataframe using the appropriate Pandas function.
You should find that the dataset contains two columns:
Column 1: Date in years (where decimal values indicated fractional years)
Column 2: Global mean sea level represented as deviations [in mm] from a “base level” (also referred to as anomalies)
import pandas as pd
sl = pd.read_table('sl_global.txt')
### Put your code for displaying the last 15 rows below this comment
✅ Question 2.2 (7 points): Now that you’ve loaded the data, make a scatter plot of the global mean sea level as a function of time for the entire dataset. Make sure you plot has appropriate \(x\)- and \(y\)-axis labels and include grid lines in your plot.
Once you have your plot:
Describe two pattern or trends that you observe in the data over the time range spanned by the data.
Hypothesize what is causing each observed patten/trend separately.
Include your responses in the Markdown cell below.
### Put your code here
✅ Do This: Comment on points 1 and 2 from above.
✅ Question 2.3 (2 points): Now that you have a sense for the properties of the entire dataset, you’re going to zoom in and only look at the data from 2000 to 2010.
Using masks, create a new Pandas dataframe for only those values that are between 2000 and 2010. Once you’ve created your new dataframe, make a scatter plot of the values in the new dataframe to ensure that you were successful.
### Put your code here
✅ Question 2.4 (6 points): For the the “zoomed-in” part of the data, use curve_fit() to fit a line to the data that captures the large-scale trend in the data. As a reminder, a line has the form:
where \(A\) and \(B\) are the parameters you’re trying to find and \(t\) is the time.
Remember, to use curve_fit(), you need to define a function that takes the independent variable (time) and the parameters you’re trying to find (in this case, \(A\) and \(B\)).
Once you’ve created your fit, use your function to produce model values for time spanning from \(t = 2000\) to \(t = 2010\) in increments of 0.1 years.
Then, plot your model values (for the best fit) along with the data values.
Include a legend to label the model values and the data values, so the viewer can easily distinguish which is which.
### Put your code here
✅ Question 2.5 (8 points): Now that you’ve come up with a fit for the long-term trend in the data, you’re going to try to see if you can fit the shorter timescale variations in the data.
Using curve_fit() again, fit a model that is the combination of a line and a sinusoidal function. The functional form should look like this:
The first term is the equation for a line with slope \(A\) and intercept \(B\) and the second term is the equation for a sine curve with amplitude \(C\), frequency \(D\), and phase shift \(E\).
In order to get curve_fit to correctly fit this model, you need to give it a plausible initial guess for the parameters of the model. Try using an initial set of parameters, p0 = [1.0, -1000.0, 10, 6, 0]
Remember to define a new function that takes the independent variable (time) and the parameters you’re trying to find (in this case, \(A\), \(B\), \(C\), \(D\), and \(E\)).
Once you have your best fit model, again, plot the data and your model (in time increments of 0.1 years from t=2000 to 2010) on the same plot (same as you did for for Question 2.4). Again, use a legend to distinguish the model values from the data values.
Do the results support your hypotheses from Question 2.2 as to the source of the small-scale and large-scale patterns/trends in the data? Discuss why/why not in the Markdown cell below.
### Put your code here
✅ Do This: Discuss the results and whether or not they support your hypotheses from Question 2.2 here. Make sure to justify your answer!
Congratulations, you’re done!#
Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Homework Assignments” section, find the submission folder link for Homework #4, and upload it there.
© 2024 Copyright the Department of Computational Mathematics, Science and Engineering at Michigan State University