Jupyter Notebook#
Lecture 14: More K-Fold CV#
# Everyone's favorite standard imports
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
from sklearn.linear_model import LinearRegression,LogisticRegression
from sklearn.metrics import mean_squared_error
1. Setting \(k\)-fold up on a slightly more complicated data set.#
Ok, let’s see how we can use \(k\)-fold CV for determining hyperparameters. Below, we’re going to generate a data set that is clearly non-linear.
# Set the seed so everyone has the same numbers
np.random.seed(42)
def f(t, m1 = -7,m2 = 5, m3 = -.8, b = 6):
return m3 * t**3 + m2*t**2 + m1*t+b
n = 300
X_toy = np.random.uniform(-1,5,n)
y_toy = f(X_toy) + np.random.normal(0,2,n)
plt.scatter(X_toy,y_toy)
# Doing this so the plot isn't ugly
X_plot = X_toy.copy()
X_plot.sort()
plt.plot(X_plot,f(X_plot),c = 'red')
X_toy = X_toy.reshape(-1,1)
y_toy = y_toy.reshape(-1,1)
To do this, we are going to set up a polynomial model. For a fixed degree \(p\), we want to use the model $\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \cdots+ \beta_p X^p\)$
Before messing with this on our big data set, let’s see how we can trick linear regression into doing our work for us. Take a look at my silly input data.
✅ Do this: Given this input data, what is each column in the following matrix?
from sklearn.preprocessing import PolynomialFeatures
p = 4
poly = PolynomialFeatures(p)
X_powers = poly.fit_transform(X)
X_powers
#This version might be easier to read, uncomment if it helps
X_powers.astype(int)
✅ Do this: What did I change from the above code to get the matrix below? What is different about the output matrix?
p = 4
poly = PolynomialFeatures(p, include_bias=False)
X_powers = poly.fit_transform(X)
X_powers
The trick in all this is that if I pass in this matrix to linear regression, the resulting model learned is exactly the model $\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \cdots+ \beta_p X^p\)$ we wanted to use earlier so long as I line up the coefficients properly.
✅ Do this: For the original \(X_{toy}\) data set, use the LinearRegression
class on all of the data (so no train/test splits yet) to train a polynomial model with \(p=3\) to predict \(y\). What is the equation of the model learned, including all the values for the coefficients?
# Your code here
✅ Do this: Copy your code from above and modify it to use \(k\)-fold cross validation for \(k=5\) to approximate the test error with a degree \(p=3\) model.
Hint: You have easy-mode code from last class that involved the cross_val_score
command that would be super useful here.
# Your code here
✅ Do this: Using \(k\)-fold cross validation for \(k=5\), set up code to approximate the test error for each of the polynomial models below.
\(y = \beta_0 + \beta_1 X\)
\(y = \beta_0 + \beta_1 X + \beta_2 X^2\)
\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \beta_3 X^3\)
\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \beta_3 X^3+ \beta_4 X^4\)
\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \beta_3 X^3+ \beta_4 X^4+ \beta_5 X^5\)
\(y = \beta_0 + \beta_1 X+ \beta_2 X^2+ \beta_3 X^3+ \beta_4 X^4+ \beta_5 X^5+ \beta_6 X^6\)
Then plot your resulting test errors for each degree. What is the best choice of polynomial for this data set?
# Your code here
If you still have some time, try the following:
see if you can figure out the test errors for everything through a degree 10 polynomial.
What happens to the graph if you mess around with the coefficients of the original polynomial that we used to generate the data set?
# Your code here
Congratulations, we’re done!#
Written by Dr. Liz Munch, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.