blackblack
picture picture

Ch 2.1: What is Statistical Learning?

Lecture 2 - CMSE 381
Michigan State University
::
Dept of Computational Mathematics, Science /span> Engineering
Weds, Jan 14 , 2026
Announcements

Last time:

PIC

Announcements:

Covered in this class

An example data set: Advertising

PIC

Data available at msu-cmse-courses.github.io/CMSE381-S26/DataSets/DataSets.html

Notation and Big Assumption

Input variables: X1,X2,,Xp

Output variable: Y

Y = f (X)+
Advertising Example

PIC
More examples

PIC
PIC

Section 1 picture picture

Prediction vs Inference
Prediction

Given a value X, try to provide an estimate for f (X).

Build a model:

Ŷ = f ^(X)
Example: If we spend $250 on TV advertising, what do we predict we will we make in sales?

PIC

Group question:

PIC

PIC

The blue solid line is f . The green dashed line is f ^.
Reduceable vs irreducable error All models are wrong, some are useful.

Y Ŷ

Reducible Error

Irreducible Error

More on error

E(Y Ŷ)2 = E[f (X) + f ^(X)]2 = [f (X) f ^(X)]2 + 𝑉𝑎𝑟()

[f (X) f ^(X)]2 is reducible, 𝑉𝑎𝑟() is irreducible

Inference

Want f , but not for prediction

(or possibly combined with prediction)

Determine whether each scenario is prediction, inference, or both.

Application PredictionInference
Predict effectiveness of vaccine
Determine the address written on
the image of an envelope.
Identify risk factors for getting long covid.
Transcribe an audio file of a person talking.
Predict stock prices.

Section 2 picture picture

How to estimate f ?
Input: Training data

PIC
Parametric methods

f (X) = β0+β1X1+β2X2+ +βpXp
  • Train the model

    Example:

    Find βis so that

    Y β0 + β1X1 + β2X2 + + βpXp
  • PIC

    How do you decide on the coefficients?

    Y β0 + β1X1
    PIC

    Desmos toy: https://www.desmos.com/calculator/skvt8c73l7

    Example Non-parametric method: Nearest Neighbors

    Nk(x) = Set of k nearest neighbors of x
    f ^(x) = 1 k xi Nk(x)yi
    PIC

    k = 15

    Parametric methods: Pros and Cons

    Pros
    Cons
    Overfitting

    Possible fix: Find more Flexible models, which means braider functional form

    Problem: needs more variables, could lead to overfitting

    Overfitting: Following the noise too closely

    PIC
    PIC
    Prediction Accuracy vs Model Interpretability

    PIC

    Supervised learning:
    Training data has response variable y for every input x

    PIC

    Unsupervised Learning:
    Training data does not have response variable y for every input x

    PIC

    Regression vs Classification

    Types of variables: Emphasize this is output variable, and which it is determines regression vs classification

    Emphasize that this has to do with the output variable type

    Section 3 picture picture

    Group work
    (a) We collect a set of data on the top 500 firms in the US. For each firm we record profit, number of employees, industry and the CEO salary. We are interested in understanding which factors affect CEO salary.

    From Ex 2.4.2

    (b) We are considering launching a new product and wish to know whether it will be a success or a failure. We collect data on 20 similar products that were previously launched. For each product we have recorded whether it was a success or failure, price charged for the product, marketing budget, competition price, and ten other variables.
    TL;DR

    Wrap up

    Next time:

    PIC

    Announcements: