Week 08: Pre-Class Assignment: Regularization

Week 08: Pre-Class Assignment: Regularization#

✅ Put your name here.#

✅ Put your group member names here.

L1L2

Goals for this Pre-Class Assignment#

Practice regularization

Total number of points: 20 points

This assignment is due by 11:59 p.m. the day before class, and should be uploaded into the appropriate “Pre-Class Assignments” submission folder on D2L. Submission instructions can be found at the end of the notebook.

Part 0: Artificial Neural Networks#

✅ Task: Read chapter 10 of your textbook, including the problems at the end of the chapter and their solutions in Appendix A.

Part 1: Regularization Techniques#

In this problem you will learn how to use three different regularizers in the context of, you guessed it, linear regression. You will do this in 1D so that you can visualize what you have done.

L1 (LASSO) Regularization#

L1 regularization uses what we called in class the “human norm” - the Manhattan (walking on a grid) distance applied to the parameters of the model.

L2 (Tikhonov, Ridge) Regularization#

L2 regularization uses what we called in class the “bird norm” - the usual Euclidean distance applied to the parameters of the model.

L1 and L2 (Elastic Net) Regularization#

Finally, if all of the above regularizers have merits, a good question is: which one is the best one? An even better question is: is there a way to combine them to get a new variant that has all of the merits? The Elastic Net is such an approach that uses the L1 (LASSO) and L2 (Ridge) in a linear combination, thereby adding a bit of L1 behavior to L2. While the best choice always needs to be checked through validation, which is something you will explore in the context of your project, Elastic Net is probably a good place to start for any new problem you encounter.

pen

✅ Task: (12 points) Read the links above and write in your own words what the differences are. Think about it in terms of problems:

what problem does Tikhonov solve? What are the pros and cons of using the L2 regularization?
what problem does L1 solve? What are the pros and cons of using the L1 regularization?
what problem does Elastic Net solve? What are the pros and cons of using the elastic net?

✎ Put your answers here!

# basic libraries
import numpy as np
import matplotlib.pyplot as plt

# ML libraries
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn import linear_model

# control the number of samples in your fake data
num_samples = 10

# fake data
slope = 1
interp = -1.5
noise_level = 2.0
x = np.linspace(0,10, num_samples)
y = slope*x + interp + noise_level*np.random.randn(x.size)

# reshape into sklearn's desired shape
x = np.reshape(x, (x.size,-1))

# finer grid for plotting
x_fine = np.linspace(0, 10)
x_fine = np.reshape(x_fine, (x_fine.size,-1))

# split data
x_1 = x[0::2]
y_1 = y[0::2]
x_2 = x[1::2]
y_2 = y[1::2]


fig = plt.figure(figsize=(16,6))
plt.plot(x_1, y_1, '^', label='test')
plt.plot(x_2, y_2, 'o', label='train')

# First, just the basics!
linear = linear_model.LinearRegression()
linear.fit(x_2, y_2)
y_linear_pred = linear.predict(x_fine)
print('Coeff for linear:', linear.coef_)
plt.plot(x_fine, y_linear_pred, label='normal LR')

# Tikhonov
ridge = Ridge(alpha = 100.0)
ridge.fit(x_2,y_2)
y_ridge_pred = ridge.predict(x_fine)
print('Tikhonov/L2/Ridge parameters:', ridge.intercept_, ridge.coef_)
plt.plot(x_fine, y_ridge_pred, label='L2')


plt.grid(alpha=0.15)
plt.legend()

Coeff for linear: [0.76381366]
Tikhonov/L2/Ridge parameters: 2.3704220557045144 [0.25250038]

<matplotlib.legend.Legend at 0x7f94eabd76d0>

../../_images/b50aa2661c450656f129533b71f35cd62a56ac4eaa6ccc49bb15192529c81734.png

✅ Task: (8 points) Modify the code above to include both Lasso and Elastic Net.

Run your code several times to build your intuition for whether the regularization is helping on average. That is, is the predicted line generally closer to the data you did not use in the fit. Write your comments on this below in this markdown cell.
Read the sklearn documentation on the elastic net regularizer. How many parameters does it use and how are those parameters connected to the parameters of the L2 and L1 cases?