Jupyter Notebook#

Lec 28: SVMs#

# Everyone's favorite standard imports
import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt
%matplotlib inline
import time


# ML imports we've used previously
from sklearn.metrics import confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import GridSearchCV

# For today, we just need SVC
from sklearn.svm import SVC 

We now have discussed three related methods in class. The book’s definitions as we talked about in class are:

  • Maximal Margin Classifiers, where the goal was to find a separating hyperplane with no misclassifications,

  • Support vector classifiers, where we allow for a soft margin and hence some misclassifications, but only allow for a linear kernel, and

  • Support vector machines, where we have a soft margin and an option for kernels.

It turns out that sklearn has only one function to do all of this. A reminder from last time. There are two things that will likely be confusing.

  • The command is just called SVC, but you should thinking of it as doing the most general SVM as defined in the book and then we can modify our inputs to allow for the other options as necessary.

  • The cost input parameter is not the same as the C defined in the book. However, it controls the same thing; that is, the amount of tolerance we have for data points on the wrong side of the margin and/or wrong side of the boundary.

The command below is the same as from the last notebook. The goal is just to be able to draw the boundaries from the SVM easily.

# Run this cell to define the function
def plot_svc(svc, X, y, h=0.02, pad=0.25):
    x_min, x_max = X[:, 0].min()-pad, X[:, 0].max()+pad
    y_min, y_max = X[:, 1].min()-pad, X[:, 1].max()+pad
    xvec = np.arange(x_min, x_max, h)
    yvec = np.arange(y_min, y_max, h)
    xx, yy = np.meshgrid(xvec,yvec )
    
    Z = svc.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.2)

    plt.scatter(X[:,0], X[:,1], s=70, c=y, cmap=mpl.cm.Paired)
    # Support vectors indicated in plot by X's
    sv = svc.support_vectors_
    plt.scatter(sv[:,0], sv[:,1], c='k', marker='x', s=100, linewidths=1)
    
    if svc.kernel == 'linear':
        # Get the margin lines 
        w = svc.coef_[0]
        a = -w[0] / w[1]
        yhyperplane = a * xvec - (svc.intercept_[0]) / w[1]
        margin = 1 / np.sqrt(np.sum(svc.coef_ ** 2))
        ymargin_down = yhyperplane+  - np.sqrt(1 + a ** 2) * margin
        ymargin_up = yhyperplane + np.sqrt(1 + a ** 2) * margin
        plt.plot(xvec,ymargin_down, "k--")
        plt.plot(xvec,ymargin_up, "k--")

    
    
    plt.xlim(x_min, x_max)
    plt.ylim(y_min, y_max)
    plt.xlabel('X1')
    plt.ylabel('X2')
    plt.show()
    print('Number of support vectors: ', svc.support_.size)

Swapping out the kernel#

In today’s class, we’ve been discussing changing the kernel function and then learning the model $\( f(x) = \beta_0 + \sum_{i \in \mathcal{S}}\alpha_i K(x,x_i) \)$

data3 = np.loadtxt('../../DataSets/SVM-Data3.csv')
X = data3[:,:2]
y = data3[:,2]


plt.scatter(X[:,0], X[:,1], s=70, c=y, cmap=mpl.cm.Paired)
plt.xlabel('X1')
plt.ylabel('X2')
plt.show()

Do this: Train a SVC using a radial kernel (this is kernel = 'rbf' as input to the SVC function) and with \(C=1\), \(\gamma = 1\). Use the plot_svc function to see what the learned boundary looks like.

# Your code here

Do this: What happens if you increase \(C\) to 100? Is this model looking better or worse than what you had before?

# Your code here #

Do this: Use the GridSearchCV function (see the last lab for examples of using it) to determine the best \(C\) and \(\gamma\) parameters. Use the plot_svc function to take a look at the result.

# Your code here

Still have time?#

Download the NIST data set from here: https://archive.ics.uci.edu/ml/datasets/optical+recognition+of+handwritten+digits

You just need two files for now, the training set optdigits.tra and the testing set optdigits.tes. The following commands will pull the data from the remote server. Optionally, you can download the files directly to your computer.

X_train = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tra', header=None)
X_train.head()
X_test = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/optdigits/optdigits.tes', header=None)
X_test.head()
# X_train = pd.read_csv('optdigits.tra', header=None) #<-- put this line back in if you saved the data locally
y_train = X_train[64]
X_train = X_train.drop(X_train.columns[64], axis=1)

# X_test = pd.read_csv('optdigits.tes', header=None) #<-- put this line back in if you saved the data locally
y_test = X_test[64]
X_test = X_test.drop(X_test.columns[64], axis=1)
print(X_train.shape)
print(X_test.shape)

This data set consists of 8x8 images of handwritten digits. The following command will draw each data point for you. Mess around with the value of \(i\) below to see other examples

i = 13
plt.imshow(X_train.values[i].reshape(8,8), cmap="gray") 
plt.show()
print(f'Data point {i} is labeled as {y_train[i]}')

Do this: Build a classifier to predict the correct digit for a given handwritten digit. As you do this, answer the following questions:

  • What choice of kernel does best?

  • What are the optimal choices of parameters for the SVC?

  • How well does your classifier do? Don’t forget that quality measures should always use testing data.

# Your answer here

Congratulations, we’re done!#

Written by Dr. Liz Munch, Michigan State University

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.