blackblack
picture picture

Ch 6.2: Shrinkage - Ridge regression

Lecture 17 - CMSE 381
Michigan State University
::
Dept of Computational Mathematics, Science /span> Engineering
Wed, Feb 25, 2026
Announcements

Last time:

This time:

Announcements:

Screenshot of the course schedule for
lectures 11 to 20.

Section 1 picture picture

Last time
Subset selection

Algorithm diagram showing one way to
break up the subset selection procedure
into steps for choosing models based on
training and testing scores.
Algorithm
diagram
showing
the
forward
stepwise
selection
procedure
for
building
a
model
by
adding
variables
one
at
a
time. Algorithm
diagram
showing
the
backward
stepwise
selection
procedure
for
building
a
model
by
removing
variables
one
at
a
time.
What should you learn from this lecture?

Section 2 picture picture

Ridge Regression
Goal

Y = β0 + β1X1 + β2X2 + β3X3 + β4X4
Ridge regression

Before:
𝑅𝑆𝑆 = i=1n (y i β0 j=1pβ jx𝑖𝑗 ) 2
After:
i=1n (y i β0 j=1pβ jx𝑖𝑗) 2+λ j=1pβ j2 = 𝑅𝑆𝑆+λ j=1pβ j2
Example from the Credit data

𝑅𝑆𝑆 + λ j=1pβ j2
Plot of ridge regression coefficient
paths for the Credit data, showing
how the coefficient estimates shrink
toward zero as lambda increases.
Same Setting, Different Plot

𝑅𝑆𝑆 + λ j=1pβ j2β 2 = j=1pβj2

Plot of ridge regression coefficient
paths for the Credit data, showing
how coefficient estimates shrink as
the relative L2 norm decreases from
left to right.

Scale equivavariance (or lack thereof)

Scale equivariant: Multiplying a variable by c (cXi) just returns a coefficient multiplied by 1c (1cβi)
Solution: Standardize predictors

x~𝑖𝑗 = x𝑖𝑗 1n i=1n(x𝑖𝑗 x¯j)2
Using Cross-Validation to find λ

LOOCV choice of λ for ridge regression and Credit data

Plot of LOOCV error versus lambda for ridge regression on the Credit
data, with dashed vertical lines indicating the selected lambda value.
Coding

Bias-Variance tradeoff

Plot illustrating the bias-variance tradeoff for
simulated data, showing squared bias, variance,
and test mean squared error. Squared bias (black), variance (green), and test mean squared error (purple) for simulated data.
More Bias-Variance Tradeoff

Second plot illustrating the bias-variance tradeoff
for simulated data, showing squared bias,
variance, and test mean squared error. Squared bias (black), variance (green), and test mean squared error (purple) for simulated data.
Advantages of Ridge

Ridge vs. Least Squares:
Ridge vs. Subset Selection:
Look back and look ahead

Screenshot
of
the
course
schedule
for
lectures
11
to
20.
  • What is regularization? Why do we need it?
  • What are the two basic types of regularization methods? How are they implemented mathematically in linear regression?
  • How do you fit a ridge regression model in python?
  • How do you control the model flexibility /span> bias-variance tradeoff when using regularization?

  • How do you find the right amount of regularization using cross-validation? How do you do this in python?
  • What additional precautions do you need to take when using regularization (compared to least squares)?
  • What are the advantages of regularization compared to Least Squares?
  • What are the advantages of regularization compared to subset selection?