Homework 5#
Deadline
Due Friday, Oct 18, 2024 at midnight on Crowdmark. Note the change in due date from the original calendar!
Instructions#
This homework covers four classes. Problems listed below are from the textbook.
Wed 10/9, we covered 6.1 (Subset selection)
Fri 10/11, we covered 6.2.1 Ridge Regression, and Mon 10/14, we covered 6.2.2 Lasso
Weds 10/16, we covered Dimension Reduction
Problems#
In this homework, we’ll only have two problems which cover the topics above.
Do the subset selection problem written below.
6.6.9 (a-e, g)
For each of parts (c), (d), and (e), additionally provide a plot showing the test error for either the potential \(\lambda\) (AKA \(\alpha\)) values for ridge/lasso or the potential dimension \(M\) for PCR to justify the choice.
Subset selection problem#
Below are the training and testing error from doing linear regression on different subsets of the variables from the auto
data set to predict mpg
.
Variables |
Train Score |
Test Score |
---|---|---|
null model |
60.76 |
60.73 |
(cylinders,) |
24.02 |
24.15 |
(horsepower,) |
23.94 |
24.19 |
(weight,) |
18.68 |
18.84 |
(acceleration,) |
49.87 |
50.26 |
(cylinders, horsepower) |
20.85 |
21.13 |
(cylinders, weight) |
18.38 |
18.55 |
(cylinders, acceleration) |
23.94 |
24.38 |
(horsepower, weight) |
17.84 |
18.03 |
(horsepower, acceleration) |
22.46 |
22.70 |
(weight, acceleration) |
18.25 |
18.61 |
(cylinders, horsepower, weight) |
17.76 |
17.99 |
(cylinders, horsepower, acceleration) |
20.06 |
20.44 |
(cylinders, weight, acceleration) |
18.13 |
18.54 |
(horsepower, weight, acceleration) |
17.84 |
18.16 |
(cylinders, horsepower, weight, acceleration) |
17.76 |
18.13 |
(Parts a, b, and c) For each of the three subset selection methods discussed in class ((a) best subset selection, (b) forward selection, and (c) backward selection), do the following
Describe the steps taken in the algorithm to arrive at a conclusion for the best possible model.
Be sure to say what \(M_k\) is for \(k= 0 , 1, \cdots, 4\).
What is the best model returned by the algorithm? Give the full equation for
mpg
in terms of the variables, although you don’t know the learned coefficients so those can be left in terms of \(\hat \beta_i\)’s.How many models do you have to train to arrive at the conclusion?
(d) Are your answers to (a), (b), and (c) the same? Do we expect them to be?
Note that I am not assuming you need to code any of these options, you only need to calculate by hand.
Note: the content from Fri, 10/18, will be included on the exam.
Important
Standard instructions for submissions and deadlines can be found on the Homework Info Page.