Jupyter Notebook

Jupyter Notebook#

Lecture 3 - Mean Squared Error#

CMSE 381 - Fall 2024#

Sept 4, 2024#

This notebook has some code to go along with lecture 3 on Mean Squared Error.

# As always, we start with our favorite standard imports. 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt 
import seaborn as sns
%matplotlib inline

Info about the data set#

From https://rdrr.io/cran/ISLR/man/Auto.html
Also hosted on our class website Data Sets Page

Auto: Auto Data Set#

Description

Gas mileage, horsepower, and other information for 392 vehicles. Usage

Format

A data frame with 392 observations on the following 9 variables.

mpg: miles per gallon
cylinders: Number of cylinders between 4 and 8
displacement: Engine displacement (cu. inches)
horsepower: Engine horsepower
weight: Vehicle weight (lbs.)
acceleration: Time to accelerate from 0 to 60 mph (sec.)
year: Model year (modulo 100)
origin: Origin of car (1. American, 2. European, 3. Japanese)
name: Vehicle name

The orginal data contained 408 observations but 16 observations with missing values were removed.

Source

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. The dataset was used in the 1983 American Statistical Association Exposition.

# First, we're going to do all the data loading and cleanup we figured out last time.
auto = pd.read_csv('../../DataSets/Auto.csv')
auto = auto.replace('?', np.nan)
auto = auto.dropna()
auto.horsepower = auto.horsepower.astype('int')
auto.shape

I want to just predict acceleration using horsepower.

✅ Do this: Make a scatter plot of acceleration (the output varible) vs horsepower (the input variable). Does it look like there’s a relationship between the two variables?

# Your code here.

I’ve decided to use the model \( \hat {f}(\texttt{horsepower}) = 23-0.05 \cdot \texttt{horsepower} \)

✅ Do this: Make a panda Series with entries \(\hat f(\texttt{horsepower})\) for each entry in auto.horsepower.

# Your code here

✅ Do this: Using the series you just built, calculate the mean squared error,

\( MSE = \frac{1}{n} \sum_{i=1}^n (y_i - \hat y_i)^2. \)

# Your code here

Have some spare time? Can you mess around with the coefficients in your model to decrease the MSE?

# Your code here

Congratulations, we’re done!#

Written by Dr. Liz Munch, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.