CMSE 381 Code Portfolio Template

Contents

CMSE 381 Code Portfolio Template#

We provide this template to help you collect and organize useful code snippets as you go along with this course. For each day where we have an in-class jupyter notebook exercise, you should collect some snippets of code that you may reuse later. Having this growing portfolio will help you save time during in-class coding sessions and when you work on your final project - no need to fumble through every notebook again!

The general structure provided below serves as a starting point, not an exhaustive list of topics. You should add more sections as needed.

The code portfolio is not a required assignment. Ask your section instructor if and/or how you should submit it.

Basic Data Handing#

Cleaning data#

Below include some code snippets for:

  • replace missing data with NaN

  • get rid of rows with NaN

(Possible places to dig up the code: Day 1 Python Review)

# my snippets


How do drop a whole column of data?

(Possible places to dig up the code: Day 11 More Logistic Regression)

# my snippets

Data type conversion#

Convert some data to specific type, e.g. to intergers.

(Possible places to dig up the code: Day 1 Python Review)

# my snippets

Sometimes I may need to convert numbers to strings, because a qualitative variable was wrongly encoded numerically.

(Possible places to dig up the code: Day 7 Even More Linear Regression, Day 8 The Last of Linear Regression)

# my snippet

Extracting some variables from a large data frame#

Some snippets:

  • to get some rows

  • to some columns with specific names

(Possible places to dig up the code: Day 1 Python Review)

# my snippets


How about getting only the columns that are numeric variables?

(Possible places to dig up the code: Day 19 PCA)

# my snippets

Combining variables from different sources#

If I have data stored in different variables, how do I combine them into one array or dataframe?

(Possible places to dig up the code: Day 20 PCR)

# my snippets

Creating dummy variables for qualitative data#

How do I easily create dummy variables?

(Possible places to dig up the code: Day 7 Even More Linear Regression)

# my snippets

Normalizing the data#

How do I normalize the data so the standard deviations of different variables are all the same? (There are more than one ways)

(Possible places to dig up the code: Day 17 Ridge Regression, Day 19 PCA, Day 20 PCR)

# my snippets

Basic Plotting#

How do I plot anything?? You will need to use matplotlib or seaborn. Don’t forget collecting the code for importing the relevant modules.

Histogram#

(Possible places to dig up the code: Day 1 Python Review)

# my snippets

Visualizing relations between two variables#

What’s an example for scatter or relplot?

(Possible places to dig up the code: Day 1 Python Review)

# my snippets

Visualizing relations between any pair of variables#

What’s an example for pairplot?

(Possible places to dig up the code: Day 1 Python Review)

# my snippets


Another cool way to do it is by using seaborn’s heatmap.

(Possible places to dig up the code: Day 6 Multiple Linear Regression)

# my snippets

Combinatorics#

Some times I need to generate different combinations of variables usng itertools. I need an example that use it in a loop.

(Possible places to dig up the code: Day 6 Multiple Linear Regression)

# my snippets

Accuracy and errors#

getting a score for accuracy or error for classification#

How do I compute accuracy of model predictions and errors?

(Possible places to dig up the code: Day 9 Intro Classification)

# my snippets

getting a confusion matrix for classification#

How do I compute and plot a comfusion matrix?

(Possible places to dig up the code: Day 10 Logistic Regression, Day 11 More Logistic Regression)

# my snippets

Mean squared error#

There is a easy way.

(Possible places to dig up the code: Day 12 LOOCV)

# my snippet

Validation methods#

Validation set method#

Also simple train test split.

(Possible places to dig up the code: Day 9 Intro Classification, Day 12 LOOCV)

# my snippets

Leave One Out Cross-validation (LOOCV)#

How do I calculate the lOOCV?

There is a version with a for loop.

(Possible places to dig up the code: Day 12 LOOCV)

# my snippets

There is also an easier version without for-loop.

(Possible places to dig up the code: Day 12 LOOCV)

# my snippets

K-fold Cross-validation (k-fold CV)#

The code splitting the data using Kfold and a for-loop. Don’t forget the code for importing the module.

(Possible places to dig up the code: Day 13 K-fold CV)

# my snippets

There is also a version without for-loop.

(Possible places to dig up the code: Day 13 K-fold CV)

# my snippets

Searching for an optimal parameter using CV#

How to use GridSearchCV?

(Possible places to dig up the code: Day 27 SVC)

# my snippets

Linear regression#

Getting the model using sklearn#

Some code for linear regresion:

  • import the right modules

  • adjust my data X, y if needed

  • fit the model

  • print the coefficients

  • print the model equations

(Possible places to dig up the code: Day 4 Simple Linear Regression, Day 5 More Linear Regression, Day 6 Multiple Linear Regressi)

# my snippets with detailed comments on which line does what


How to I used the fitted model to make predictions?

(Possible places to dig up the code: Day 7 Even More Linear Regression)

# my snippets

Getting the model using statsmodels#

I will need some code to:

  • fit the model

  • get the summary statistics like coeffients, p-values, confidence intervals, \(R^2\) etc

(Possible places to dig up the code: Day 5 More Linear Regression, , Day 6 Multiple Linear Regression)

# my snippets


How about prediction intervals?

(Possible places to dig up the code: Day 7 Even More Linear Regression)

# my snippets

Plotting the model fitted#

Now I have the coefficients, how do I plot the line on top of the data?

(Possible places to dig up the code: Day 4 Simple Linear Regression, Day 5 More Linear Regression)

# my snippets


Interactions terms#

How do I include interaction terms when using sklearn?

(Possible places to dig up the code: Day 8 The Last of Linear Regression)

# my snippets

How about using statsmodels?

(Possible places to dig up the code: Day 8 The Last of Linear Regression)

# my snippets

KNN classification#

Modeling fitting#

How do I fit a KNN classifier?

(Possible places to dig up the code: Day 9 Intro Classification)

# my snippets

Making predictions#

Once I fitted it, how do I make prediction using new data point? - predicting the actual class - predicting the conditional probability

(Possible places to dig up the code: Day 9 Intro Classification)

# my snippets

Logistic Regression#

Fitting the model#

How do I fit a logistic regression model?

(Possible places to dig up the code: Day 10 Logistic Regression)

# my snippet

Then what?#

How do I: - check the coefficients? - check which classes there are? - make predictions?

(Possible places to dig up the code: Day 10 Logistic Regression)

# my snippet

Polynomial Regression#

How do trick linear regression to do fit a polynomial?

(Possible places to dig up the code: Day 14 More K-fold CV, Day 21 Polynomial & Step Functions)

# my snippet

Ridge Regression#

Model fitting#

How do I fit a ridge regression model?

(Possible places to dig up the code: Day 17 Ridge Regression)

# my snippets

Then what?#

How do I look at the coefficients? Make predictions?

(Possible places to dig up the code: Day 17 Ridge Regression)

# my snippets

Special cross-validation for Ridge regression#

There is some special code for doing CV for ridge regression.

(Possible places to dig up the code: Day 17 Ridge Regression)

# my snippets

Lasso regression#

Model fitting#

How do I fit a lasso regression model?

(Possible places to dig up the code: Day 18 Lasso Regression)

# my snippets

Then what?#

How do I look at the coefficients? Make predictions?

(Possible places to dig up the code: 18 Lasso Regression)

# my snippets

Special cross-validation for Lasso regression#

There is some special code for doing CV for lasso regression.

(Possible places to dig up the code: Day 18 Lasso Regression)

# my snippets

Principal Component Analysis (PCA)#

Getting of PCs#

How do I compute the PCs using PCA?

(Possible places to dig up the code: Day 19 PCA)

# my snippets

Projecting data onto each PC#

How do I project data from the original coordinates to the PC space?

(Possible places to dig up the code: Day 19 PCA)

# my snippets

Getting other info about the PCs#

How do I can get the amount of variance explained by each PC?

(Possible places to dig up the code: Day 19 PCA)

# my snippets

Step functions#

Fitting a step function for regression#

How do I:

  • cut the domain into different pieces

  • represent each piece as a variable

  • then trick linear regression to get the coefficients?

(Possible places to dig up the code: Day 21 Polynomial & Step Functions)

# my snippets

Fitting a step function for classification#

What do I need to do if it’s for classification?

(Possible places to dig up the code: Day 22 Step Classification)

# my snippets

Cubic splines#

Fitting a cubic spline model#

Given a wiggly curve, how do I fit a cubic spline model to it?

(Possible places to dig up the code: Day 23 Splines)

# my snippets

Then what?#

How do I find out the coefficients of the model?

(Possible places to dig up the code: Day 23 Splines)

# my snippets

Decision Trees#

Fitting regression trees#

How do I fit a regression tree?

(Possible places to dig up the code: Day 24 Decision Trees)

# my snippets

Fitting classification trees#

What if it is a classification problem?

(Possible places to dig up the code: Day 24 Decision Trees)

# my snippets

Look at the model#

How do I even visualize the model? (At least two different ways)

(Possible places to dig up the code: Day 24 Decision Trees)

# my snippets

Making Predictions#

How do predict the outcome with a tree model?

(Possible places to dig up the code: Day 24 Decision Trees)

# my snippets

Tree pruning#

How do I make the tree smaller?

(Possible places to dig up the code: Day 24 Decision Trees)

# my snippets

Random Forests#

Fitting the model#

How do I fit a random forest model?

(Possible places to dig up the code: Day 25 Random Forests)

# my snippets

Then what?#

How do get the some predictions? Or evaluate the model accuracy?

(Possible places to dig up the code: Day 25 Random Forests)

# my snippets

Support Vector Machine (SVM)#

Fitting the SVC/SVM#

How do I train a Support Vector Classifier (SVC)/Support Vector Machine (SVM)?

(Possible places to dig up the code: Day 27 SVC, Day 28 SVM)

# my snippets

How do I then get the coefficients? And support vectors?

(Possible places to dig up the code: Day 27 SVC)

# my snippets

What’s a handy function to plot the decision boundary and support vectors?

(Possible places to dig up the code: Day 27 SVC)

# my snippets

Feed forward Neural Networks#

Creating the model#

How do I build a simple multilayer feed-forward neural network using PyTorch?

(Possible places to dig up the code: Day 30 Multilayer NN)

# my snippets

Formatting and splitting the data#

I may need to do something to load my data properly using pytorch.

(Possible places to dig up the code: Day 30 Multilayer NN)

# my snippets

Training the network#

How do I then train the neural network with the data?

(Possible places to dig up the code: Day 30 Multilayer NN)

# my snippets

Use the network#

How do I use the trained network? Maybe to make some predictions.

(Possible places to dig up the code: Day 30 Multilayer NN)

# my snippets

Clustering#

Single linkage#

How do I cluster my data points into single linkage clustering? I may need to use different thresholds.

(Possible places to dig up the code: Day 32 Hierarchical Clustering)

# my snippets

How do I then plot my clusters?

# my snippets


Created by Dr. Mengsen Zhang, Michigan State University

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.