Homework 5#

Deadline

Due Friday, Oct 18, 2024 at midnight on Crowdmark. Note the change in due date from the original calendar!

Instructions#

This homework covers four classes. Problems listed below are from the textbook.

  • Wed 10/9, we covered 6.1 (Subset selection)

  • Fri 10/11, we covered 6.2.1 Ridge Regression, and Mon 10/14, we covered 6.2.2 Lasso

  • Weds 10/16, we covered Dimension Reduction

Problems#

In this homework, we’ll only have two problems which cover the topics above.

  • Do the subset selection problem written below.

  • 6.6.9 (a-e, g)

    • For each of parts (c), (d), and (e), additionally provide a plot showing the test error for either the potential \(\lambda\) (AKA \(\alpha\)) values for ridge/lasso or the potential dimension \(M\) for PCR to justify the choice.

Subset selection problem#

Below are the training and testing error from doing linear regression on different subsets of the variables from the auto data set to predict mpg.

Variables

Train Score

Test Score

null model

60.76

60.73

(cylinders,)

24.02

24.15

(horsepower,)

23.94

24.19

(weight,)

18.68

18.84

(acceleration,)

49.87

50.26

(cylinders, horsepower)

20.85

21.13

(cylinders, weight)

18.38

18.55

(cylinders, acceleration)

23.94

24.38

(horsepower, weight)

17.84

18.03

(horsepower, acceleration)

22.46

22.70

(weight, acceleration)

18.25

18.61

(cylinders, horsepower, weight)

17.76

17.99

(cylinders, horsepower, acceleration)

20.06

20.44

(cylinders, weight, acceleration)

18.13

18.54

(horsepower, weight, acceleration)

17.84

18.16

(cylinders, horsepower, weight, acceleration)

17.76

18.13

  • (Parts a, b, and c) For each of the three subset selection methods discussed in class ((a) best subset selection, (b) forward selection, and (c) backward selection), do the following

    • Describe the steps taken in the algorithm to arrive at a conclusion for the best possible model.

    • Be sure to say what \(M_k\) is for \(k= 0 , 1, \cdots, 4\).

    • What is the best model returned by the algorithm? Give the full equation for mpg in terms of the variables, although you don’t know the learned coefficients so those can be left in terms of \(\hat \beta_i\)’s.

    • How many models do you have to train to arrive at the conclusion?

  • (d) Are your answers to (a), (b), and (c) the same? Do we expect them to be?

Note that I am not assuming you need to code any of these options, you only need to calculate by hand.

Note: the content from Fri, 10/18, will be included on the exam.

Important

Standard instructions for submissions and deadlines can be found on the Homework Info Page.