Homework 2#
Deadline
Due Sunday, Sept 15, 2024 at midnight on Crowdmark
Instructions#
Note that the problems in the text often specifically talk about the statsmodels
package. Most of my examples in class do the same thing but with the sklearn
package. Throughout the course, if a problem asks you to do something, you can always answer by using either package.
This homework covers three classes. Problems listed below are from the textbook.
Fri 9/6 and Mon 9/9, we covered Sec 3.1 - Simple linear Regression
3.7.8 (a)(i,ii,iii, modified iv below) and (b)
Modified version of a.iv:
What are the predicted values for the inputs?
Compute the RSS and MSE using these predicted values.
3.7.13 (skip part g)
Warning for part (b),
np.random.normal
takes standard deviation as input forscale
, not variance.
A note on code. The book’s use of the
statsmodels
package is slightly different from the examples provided in the Jupyter notebooks in class. In particular, in the book’s lab examples and in the homework statement they implicitly useimport statsmodels.api as sm
while we use
import statsmodels.formula.api as smf
This results in slight differences in code, in particular whether the function call you use is
OLS
orols
. You may use whichever works for you, the answers should be the same.
Weds 9/11, we covered Sec 3.2 - Multiple Linear Regression
3.7.1
(Modified version of 3.7.9): Using the
Auto
data set, we will predict \(Y = \texttt{mpg}\) using all other variables except name and origin.Generate the correlation matrix between all variables. Are there any pairs that are particularly highly correlated?
Using
statsmodel
, create a linear model predictingmpg
from all other variables exceptname
andorigin
.Is there a relationship between the predictors and the response? Justify your answer.
Which predictors appear to have a statistically significant relationship to the response?
What does the coefficient for the year variable suggest?
Note: the content from Fri 9/13, will be included on HW3 due next week.
Important
Standard instructions for submissions and deadlines can be found on the Homework Info Page.