In-Class Assignment: Machine learning with an SVM Classifier

In-Class Assignment: Machine learning with an SVM Classifier#

Day 19#

CMSE 202#

✅ Put your name here
#

✅ Put your group member names here
#

Agenda for today’s class#

Review of Pre-Class assignment
The SVM Classifier
Visualizing the boundary and the margins
Hyperparameter tuning (the effect of C)

Imports:#

# imports for the day
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.datasets import make_classification
from sklearn import svm

1. Review of Pre-Class assignment#

We’ll discussion any questions that came up as a class.

What does an SVM do? How do you define the “optimal” line of separation?#

✅ Do This: In the pre-class, we reviewed what a Support Vector Machine (SVM) is and how it worked. We asked you to consider how you might draw an “optimal” line of separation of a 2-class problem. Discuss this with your group members now and record the answer below.

✎ Do This - Write your discussion notes here.

2. The SVM Classifier#

As you watched in the pre-class videos, the basic idea of the Support Vector Machine classifier is as follows:

The algorithm creates a line and marks the points from each class that are the “closest” to the line. These points are called the support vectors. In the image at the top, the black dots have one support vector and the white circles have two (two points equidistant from the line). We then compute the distance between the line and the support vectors. This distance is called the margin.
An SVM tries to make a decision boundary, in the form of a line, such that the separation between the two classes is as wide as possible.

In this class, we’ll try running an SVM classifier and then see if we can judge how well it did. We can compare this to the line you made by hand in the pre-class assignment.

2.1 Classify your “blob” data#

Let’s pull your “blob” data and graph forward from the pre-class assignment and use it here as data to run an SVM classifier on. We know by inspection that the data is linearly separable and that you tried come up with the equation of a line that separates the two classes.

✅ Do This: Using what you did in your pre-class assignment, recreate the blob data, your scatter plot and the line you drew to separate the blobs. It should be the case that blobs created with the same random_state should make the same blobs. Is that true? If you didn’t get all of this worked out in your pre-class assignment, check in with your group and see what they did!

It might be a good idea for everyone in your group to use the same random_state so you can compare more easily.

# make your blobs, scatter plot them and draw your line.

2.2 Running the SVM classifier#

The basic process for making an SVM classifier, as with all classifiers, is a follows:

create the model
fit the model
examine the result

Rather than trying to build an SVM classifier from scratch like you did with the Perceptron model, we’re going to use the sklearn module. You can find the main scikit-learn documentation here.

To accomplish the steps above, you would use the sklearn.svm.svc (short for “Support Vector Classifier”) to do the work. Here are some of the details.

To build the model you would use svm.SVC. This constructor takes a number of parameters:
- you can see all the parameters here.
- the most important are the first two:
  - C this is a parameter of how the fit is to be done. We will explore this later, for now the default is fine.
  - kernel for linear separability we are going to use the "linear" kernel. Note that the kernel name is a string.
- The constructor also returns a model
You then call my_model.fit() (or whatever you called your model variable) to fit the data. It takes two arguments:
- The array of data. Its shape is (n_samples, n_features). That is, the provided array is n_samples large and each sample (each element in the array) is an array of n_features.
- An array of class labels. It should be the size of n_samples, one for each data element. Typically these are integers.
This function returns the same model you called fit() from (thus you don’t need the return value)

Note: we are not using train_test_split for this model, you will fit your SVM model to the whole data set. We are simply comparing the linear separation you guessed to the one SVC produces.

✅ Do This: Create an SVC classifier using the "linear" kernel and with C=10 (if you don’t specify the value for C argument, it will use the default, which is 1.0). Then, fit the model to your data. Note: you should not expect any meaningful output just yet, so as long as your call to fit() doesn’t throw an error, you should be good to move on.

# build your classifier and call the fit function on your data here

2.3 Upacking the SVM results#

What information can we get from the fitted SVM? Remember that these fitted models have a number of attributes and methods that provide use with information regarding the fit.

There are some useful attributes:

The attributes coef_ and intercept_ (note the underline) can be used to draw the the boundary the classifier created. We’ll talk more about that below.
The attribute support_vectors_ are the coordinates of the support vectors used by the algorithm. In this case these are the 2D location of the data points that were determined to be the support vectors.

And some useful methods:

predict takes single parameter, an array of shape (number_of_data_to_predict, number_of_features) and returns an array of class predictions.
decision_function takes single parameter, an array of shape (number_of_data_to_predict, number_of_features) and returns the distance from the decision border for each datum. The sign indicates the class the datum belongs to.
score, takes two parameters: array sample data of shape (number_of_data_to_predict, number_of_features) and array of size number_of_data_to_predict class labels and returns a percentage correct.

✅ Do This: Explore some of the methods above and write a bit of code that prints the accuracy of our formed classifier. What function should you use for this? Check the accuracy using the same data you used for find the model fit – does the result match your expectations?

# explore some of the functions from above and print the accuracy of your model for the data you provided above

3. Visualizing the boundary and the margins#

We won’t go through the nitty-gritty details for “how” an SVM algorithm defines the optimal boundary – the pre-class video described conceptually what is happening – but we are going to try and get a few terms straight so we can plot some results and compare to our prediction.

3.1 What is a hyperplane?#

A hyperplane is geometric element whose dimensionality is one less than the space in which it is embedded. For our purposes, a hyperplane divides the embedded space. Thus:

a 2D space can be divided by a 1D line
a 3D space can be divided by a 2D plane
and so on…

The data that we generated is in 2D and thus we are trying to find a 1D hyperplane – a line – to divide the space. In general, an SVM can search in any n-dimensional space and find a n-1 hyperplane (or at least find the best one it is capable of) that divides the space. SVM literature thus talks always about hyperplanes as a more general concept. For the moment we will be content with hyperplanes in 1D, which are lines.

3.2 Drawing the boundary#

We are going to use the attributes of the fitted SVM to draw the line that was determined by the mode. First, let’s remember that the equation of a line in slope-intercept format is:

\[ y = mx + b \]

Where \(m\) is the slope and \(b\) is the intercept. However, there is another form of a line equation that is called the “standard” form which is:

\[ Ax + By + C = 0 \]

where \(A\) and \(B\) are the coefficients of the line. You should be able to convert back and forth between these two forms relatively easily. We had to do this when thinking about how to draw the best fit line from the Perceptron model as well!

These standard coefficients are what are returned by the .coef_ attributes of the classifier hyperplane. Assuming that we are doing a 2D (2 feature) classifier, the coefficients represent a line in standard format. Thus:

\[ Ax + By + C = 0 \]

then .coef_ returns an array of \([A, B]\) and intercept_ returns the \(C\) value of the line.

By pulling that information from the model, you should be able to determine the slope-intercept form and draw the decision boundary through the “blobs”.

✅ Do This: Plot the blobs again along with your predicted line from the pre-class assignment as well as the line from the model (i.e., the decision boundary).

# your code to draw the blobs and the resulting decision boundary line

✅ Question: How well do your guess and the decision boundary agree? Is the decision boundary rotated at all when compared to your guess? If so, why would it be rotated that way?

✎ Do this - Erase this and put your answer here.

✅ Do This: Print the slope and the intercept you have calculated. Three decimal points of accuracy will do. Confirm that these numbers match the line representing the decision boundary in your plot.

# print the slope and intercept here

3.3 Draw the Support Vectors#

Remember that the attribute support_vectors_ (note the trailing underline) of the model yields a list of points that are used as support vectors.

✅ Do This: Using support_vectors_, redraw the blobs and boundary but now also mark the support vectors the algorithm used to determine the boundary. If you have trouble-coming up with a good way of marking the support vectors, chat with your group to brainstorm ideas!

# draw now with support vectors

3.4 SVM Margins#

Besides drawing the boundary, the SVM tries to draw a margin, a set of parallel lines on either side of the boundary. You learned about this in the pre-class video.

This width of this margin is controlled by the C parameter. When you create the model (the C parameter is always positive), C tells the SVM optimization how much you want to avoid misclassifying each training example. For large values of C, the optimization will choose a smaller-margin hyperplane if that hyperplane does a better job of getting all the training points classified correctly. Conversely, a very small value of C will cause the optimizer to look for a larger-margin separating hyperplane, even if that hyperplane misclassifies more points. For very tiny values of C, you should get misclassified examples, often even if your training data is linearly separable.

Large C - better job of classifying training data; smaller margin
Smaller C - more misclassifcations; larger margin

3.4.1 Calculating the margin#

Below, we have written a function that can help you draw the margins so that you can see the effect that C has on the SVM results. To think about how this works, let’s look at the image from the top of the notebook again:

The way that margins are determined follows this procedure:

Looking at the image, the distance between the two margins is always \(\frac{2}{\| w \|}\), and the distance from the boundary to one margin half that value (the margins are equidistant from the boundary)
- \(w\) is the vector \([A,B]\) (the coefficients of x and y in standard form), so \(\| w \| = \sqrt{A^2 + B^2} \) and the margin distance is therefore \(\frac{1}{\sqrt{A^2 + B^2}} \)
Given the boundary line x and y arrays, the updated \(y_{margin-down}\) array is: \( y - \sqrt{1 + m^2} \) and \(y_{margin-up}\) array is: \(y + \sqrt{1 + m^2} \)

3.4.2 Describing the function `margin_y_fn`#

The function below, margin_y_fn, takes 3 arguments:

the y array of values of the boundary line
The A coefficient from the fit
The B coefficient from the fit

It returns two values:

an array of y values for the lower margin (given the same x values as the boundary line)
an array of y values for the upper margin (also given the same x values as the boundary line).

The function appears below. Make sure to run it before trying the next task.

def margin_y_fn(y, A, B):
    slope = -A/B
    margin_dist = 1/np.sqrt(A**2 + B**2)
    y_down = y - np.sqrt(1 + slope** 2) * margin_dist
    y_up = y + np.sqrt(1 +  slope** 2) * margin_dist
    return y_down, y_up

✅ Do This: Using the function margin_y_fn, make a plot with: the blobs, the boundary, the support vectors, and, now, the margins.

# put your answer here

4. Hyperparameter tuning (the effect of `C`)#

As we discussed the parameter C controls that amount of misclassification that is acceptable:

Large C - better job of classifying training data; smaller margin
Smaller C - more misclassifcations; larger margin

In this last section, we will look into the effect of choosing a different value of C for the same data. We will pull together all the work you have done so far into a single cell.

✅ Do This: Below, copy and paste all the necesssary code that you wrote above such that the cell below does the following:

make your fake data
create an SVC model and fit it to your data
calculate the decision boundaries
plot the data, the optimal line, the decision boundaries and the support vectors, and
print the accuracy.

Make sure you run the cell and confirm that it produces the output you expect.

# your code

4.1 Automate the process to explore the effect of “C”#

Now that you have pulled at that code together and checked that it works, put it inside a function called test_C_parameter that runs all that code and takes the following arguments:

C - the hyperparamter that controls misclassifications
random_state - the state associated with the created data

Then run the function using a variety of choices for C to observe how the margins change. You might need to change random_state to get distributions of blobs that overlap a bit more in order to really understand how changing C influences the results.

# put any an all code you need here, create additional cells as necessary

✅ Question: What do you notice about changing C in terms of the properties of the margins and the misclassifications?