Ch 6.3: Dimension Reduction - PCA

Lecture 19 - CMSE 381

Michigan State University
::
Dept of Computational Mathematics, Science /span> Engineering

Mon, March 9, 2026

Announcements

Last time:

Shrinkage: Ridge and Lasso

This lecture:

PCA

Announcements:

Exam #2 next week (Monday 3/16)!
- Bring 8.5x11 sheet of paper
- Handwritten both sides
- Anything you want on it, but must be your work
- Write your name and your group number
- You will turn it in
- Non-internet calculator
Project: by Exam # 2
- project partner, ideas about what method to use

Screenshot of the course schedule for lectures
11 to 20.

Section 1

Last time

Goal

Fit model using all $p$ predictors
Aim to constrain (regularize) coefficient estimates
Shrink the coefficient estimates towards 0

Y = β_{0} + β_{1} X_{1} + β_{2} X_{2} + β_{3} X_{3} + β_{4} X_{4}

Shrinkage

Find

β

to minimize:

Least Squares:

𝑅𝑆𝑆 = \sum_{i = 1}^{n} {(y_{i} - ŷ_{i})}^{2}

What will you learn from this lecture?

How to create new variables as linear combinations of the original predictors?
Why do we need Principal Component Analysis (PCA)? What is the main purpose of using it?
What is a principal component (PC)?
What does the first PC maximize? You should be able to explain this both geometrically in a plot and mathematically.
Similarly, what do the subsequent PCs maximize?
How do you compute the PCs in Python, given a dataset?
How do you project data points on each PC? You should also be able to plot the data points in the PC space.
How do you find out how much variance each PC explains?

Section 2

Dimension Reduction

Linear transformation of predictors

Original Predictors:

X_{1}, \dots, X_{p}

An example or two

Geometric interpretation

projection on a line

Different projections

First projection example in a dimensionality
reduction illustration.

Second projection example in
a dimensionality reduction illustration.

Third
projection example in a dimensionality reduction
illustration.

Fourth projection example in a
dimensionality reduction illustration.

Histograms of

Z

values

Histogram
of
projected
z
values
for
the
second
projection
example.

Histogram
of
projected
z
values
for
the
third
projection
example.

Histogram
of
projected
z
values
for
the
fourth
projection
example.

The goal

Find good $φ$ ’s for some $M ≪ p$
Fit regression model on $Z_{i}$ ’s using least squares

y_{i} = 𝜃_{0} + \sum_{m = 1}^{M} 𝜃_{m} z_{𝑖𝑚} + 𝜀_{i}

Section 3

PCA

An example dataset

Projection onto first PC

Scatter plot showing an example dataset projected onto the first principal component, with the
principal component direction and the mean point indicated.

Z_{1} = 0.839 \cdot (𝚙𝚘𝚙 - \bar{𝚙𝚘𝚙}) + 0.544 \cdot (𝚊𝚍 - \bar{𝚊𝚍})

What does it mean to have the highest variance

Scatter plot showing an example dataset
projected onto the first principal
component, with the principal component
direction and the mean point indicated.

Toy for learning PCA

https://www.desmos.com/calculator/gvmq07pg1k

Principal component scores

Diagram illustrating principal component scores obtained by projecting observations onto the first
principal component direction.

z_{i 1} = 0.839 \cdot ({𝚙𝚘𝚙}_{i} - \bar{𝚙𝚘𝚙}) + 0.544 \cdot ({𝚊𝚍}_{i} - \bar{𝚊𝚍})

Another view

Rotated view of the example dataset showing observations in relation to the first principal component
and their principal component scores.

The other principal components

Do PCA with Penguins

TL;DR

PCA

Unsupervised dimensionality reduction
Choose component $Z_{1}$ in the direction of most variance using only $X_{i}$ ’s information
Choose $Z_{2}$ and beyond by the same method after “getting rid” of info in the directions already explained

Scatter plot showing an example dataset
projected onto the first principal component,
with the principal component direction and
the mean point indicated.

Next time

Screenshot of the course schedule for lectures 11 to 20.