CMSE 381 Code Portfolio Template#
We provide this template to help you collect and organize useful code snippets as you go along with this course. For each day where we have an in-class jupyter notebook exercise, you should collect some snippets of code that you may reuse later. Having this growing portfolio will help you save time during in-class coding sessions and when you work on your final project - no need to fumble through every notebook again!
The general structure provided below serves as a starting point, not an exhaustive list of topics. You should add more sections as needed.
The code portfolio is not a required assignment. Ask your section instructor if and/or how you should submit it.
Basic Data Handing#
Cleaning data#
Below include some code snippets for:
replace missing data with
NaN
get rid of rows with
NaN
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
How do drop a whole column of data?
(Possible places to dig up the code: Day 11 More Logistic Regression)
# my snippets
Data type conversion#
Convert some data to specific type, e.g. to intergers.
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
Sometimes I may need to convert numbers to strings, because a qualitative variable was wrongly encoded numerically.
(Possible places to dig up the code: Day 7 Even More Linear Regression, Day 8 The Last of Linear Regression)
# my snippet
Extracting some variables from a large data frame#
Some snippets:
to get some rows
to some columns with specific names
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
How about getting only the columns that are numeric variables?
(Possible places to dig up the code: Day 19 PCA)
# my snippets
Combining variables from different sources#
If I have data stored in different variables, how do I combine them into one array or dataframe?
(Possible places to dig up the code: Day 20 PCR)
# my snippets
Creating dummy variables for qualitative data#
How do I easily create dummy variables?
(Possible places to dig up the code: Day 7 Even More Linear Regression)
# my snippets
Normalizing the data#
How do I normalize the data so the standard deviations of different variables are all the same? (There are more than one ways)
(Possible places to dig up the code: Day 17 Ridge Regression, Day 19 PCA, Day 20 PCR)
# my snippets
Basic Plotting#
How do I plot anything??
You will need to use matplotlib
or seaborn
. Don’t forget collecting the code for importing the relevant modules.
Histogram#
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
Visualizing relations between two variables#
What’s an example for scatter
or relplot
?
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
Visualizing relations between any pair of variables#
What’s an example for pairplot
?
(Possible places to dig up the code: Day 1 Python Review)
# my snippets
Another cool way to do it is by using seaborn
’s heatmap
.
(Possible places to dig up the code: Day 6 Multiple Linear Regression)
# my snippets
Combinatorics#
Some times I need to generate different combinations of variables usng itertools
. I need an example that use it in a loop.
(Possible places to dig up the code: Day 6 Multiple Linear Regression)
# my snippets
Accuracy and errors#
getting a score for accuracy or error for classification#
How do I compute accuracy of model predictions and errors?
(Possible places to dig up the code: Day 9 Intro Classification)
# my snippets
getting a confusion matrix for classification#
How do I compute and plot a comfusion matrix?
(Possible places to dig up the code: Day 10 Logistic Regression, Day 11 More Logistic Regression)
# my snippets
Mean squared error#
There is a easy way.
(Possible places to dig up the code: Day 12 LOOCV)
# my snippet
Validation methods#
Validation set method#
Also simple train test split.
(Possible places to dig up the code: Day 9 Intro Classification, Day 12 LOOCV)
# my snippets
Leave One Out Cross-validation (LOOCV)#
How do I calculate the lOOCV?
There is a version with a for loop.
(Possible places to dig up the code: Day 12 LOOCV)
# my snippets
There is also an easier version without for-loop.
(Possible places to dig up the code: Day 12 LOOCV)
# my snippets
K-fold Cross-validation (k-fold CV)#
The code splitting the data using Kfold
and a for-loop. Don’t forget the code for importing the module.
(Possible places to dig up the code: Day 13 K-fold CV)
# my snippets
There is also a version without for-loop.
(Possible places to dig up the code: Day 13 K-fold CV)
# my snippets
Searching for an optimal parameter using CV#
How to use GridSearchCV
?
(Possible places to dig up the code: Day 27 SVC)
# my snippets
Linear regression#
Getting the model using sklearn
#
Some code for linear regresion:
import the right modules
adjust my data
X
,y
if neededfit the model
print the coefficients
print the model equations
(Possible places to dig up the code: Day 4 Simple Linear Regression, Day 5 More Linear Regression, Day 6 Multiple Linear Regressi)
# my snippets with detailed comments on which line does what
How to I used the fitted model to make predictions?
(Possible places to dig up the code: Day 7 Even More Linear Regression)
# my snippets
Getting the model using statsmodels
#
I will need some code to:
fit the model
get the summary statistics like coeffients, p-values, confidence intervals, \(R^2\) etc
(Possible places to dig up the code: Day 5 More Linear Regression, , Day 6 Multiple Linear Regression)
# my snippets
How about prediction intervals?
(Possible places to dig up the code: Day 7 Even More Linear Regression)
# my snippets
Plotting the model fitted#
Now I have the coefficients, how do I plot the line on top of the data?
(Possible places to dig up the code: Day 4 Simple Linear Regression, Day 5 More Linear Regression)
# my snippets
Interactions terms#
How do I include interaction terms when using sklearn
?
(Possible places to dig up the code: Day 8 The Last of Linear Regression)
# my snippets
How about using statsmodels
?
(Possible places to dig up the code: Day 8 The Last of Linear Regression)
# my snippets
KNN classification#
Modeling fitting#
How do I fit a KNN classifier?
(Possible places to dig up the code: Day 9 Intro Classification)
# my snippets
Making predictions#
Once I fitted it, how do I make prediction using new data point? - predicting the actual class - predicting the conditional probability
(Possible places to dig up the code: Day 9 Intro Classification)
# my snippets
Logistic Regression#
Fitting the model#
How do I fit a logistic regression model?
(Possible places to dig up the code: Day 10 Logistic Regression)
# my snippet
Then what?#
How do I: - check the coefficients? - check which classes there are? - make predictions?
(Possible places to dig up the code: Day 10 Logistic Regression)
# my snippet
Polynomial Regression#
How do trick linear regression to do fit a polynomial?
(Possible places to dig up the code: Day 14 More K-fold CV, Day 21 Polynomial & Step Functions)
# my snippet
Ridge Regression#
Model fitting#
How do I fit a ridge regression model?
(Possible places to dig up the code: Day 17 Ridge Regression)
# my snippets
Then what?#
How do I look at the coefficients? Make predictions?
(Possible places to dig up the code: Day 17 Ridge Regression)
# my snippets
Special cross-validation for Ridge regression#
There is some special code for doing CV for ridge regression.
(Possible places to dig up the code: Day 17 Ridge Regression)
# my snippets
Lasso regression#
Model fitting#
How do I fit a lasso regression model?
(Possible places to dig up the code: Day 18 Lasso Regression)
# my snippets
Then what?#
How do I look at the coefficients? Make predictions?
(Possible places to dig up the code: 18 Lasso Regression)
# my snippets
Special cross-validation for Lasso regression#
There is some special code for doing CV for lasso regression.
(Possible places to dig up the code: Day 18 Lasso Regression)
# my snippets
Principal Component Analysis (PCA)#
Getting of PCs#
How do I compute the PCs using PCA?
(Possible places to dig up the code: Day 19 PCA)
# my snippets
Projecting data onto each PC#
How do I project data from the original coordinates to the PC space?
(Possible places to dig up the code: Day 19 PCA)
# my snippets
Getting other info about the PCs#
How do I can get the amount of variance explained by each PC?
(Possible places to dig up the code: Day 19 PCA)
# my snippets
Step functions#
Fitting a step function for regression#
How do I:
cut the domain into different pieces
represent each piece as a variable
then trick linear regression to get the coefficients?
(Possible places to dig up the code: Day 21 Polynomial & Step Functions)
# my snippets
Fitting a step function for classification#
What do I need to do if it’s for classification?
(Possible places to dig up the code: Day 22 Step Classification)
# my snippets
Cubic splines#
Fitting a cubic spline model#
Given a wiggly curve, how do I fit a cubic spline model to it?
(Possible places to dig up the code: Day 23 Splines)
# my snippets
Then what?#
How do I find out the coefficients of the model?
(Possible places to dig up the code: Day 23 Splines)
# my snippets
Decision Trees#
Fitting regression trees#
How do I fit a regression tree?
(Possible places to dig up the code: Day 24 Decision Trees)
# my snippets
Fitting classification trees#
What if it is a classification problem?
(Possible places to dig up the code: Day 24 Decision Trees)
# my snippets
Look at the model#
How do I even visualize the model? (At least two different ways)
(Possible places to dig up the code: Day 24 Decision Trees)
# my snippets
Making Predictions#
How do predict the outcome with a tree model?
(Possible places to dig up the code: Day 24 Decision Trees)
# my snippets
Tree pruning#
How do I make the tree smaller?
(Possible places to dig up the code: Day 24 Decision Trees)
# my snippets
Random Forests#
Fitting the model#
How do I fit a random forest model?
(Possible places to dig up the code: Day 25 Random Forests)
# my snippets
Then what?#
How do get the some predictions? Or evaluate the model accuracy?
(Possible places to dig up the code: Day 25 Random Forests)
# my snippets
Support Vector Machine (SVM)#
Fitting the SVC/SVM#
How do I train a Support Vector Classifier (SVC)/Support Vector Machine (SVM)?
(Possible places to dig up the code: Day 27 SVC, Day 28 SVM)
# my snippets
How do I then get the coefficients? And support vectors?
(Possible places to dig up the code: Day 27 SVC)
# my snippets
What’s a handy function to plot the decision boundary and support vectors?
(Possible places to dig up the code: Day 27 SVC)
# my snippets
Feed forward Neural Networks#
Creating the model#
How do I build a simple multilayer feed-forward neural network using PyTorch
?
(Possible places to dig up the code: Day 30 Multilayer NN)
# my snippets
Formatting and splitting the data#
I may need to do something to load my data properly using pytorch.
(Possible places to dig up the code: Day 30 Multilayer NN)
# my snippets
Training the network#
How do I then train the neural network with the data?
(Possible places to dig up the code: Day 30 Multilayer NN)
# my snippets
Use the network#
How do I use the trained network? Maybe to make some predictions.
(Possible places to dig up the code: Day 30 Multilayer NN)
# my snippets
Clustering#
Single linkage#
How do I cluster my data points into single linkage clustering? I may need to use different thresholds.
(Possible places to dig up the code: Day 32 Hierarchical Clustering)
# my snippets
How do I then plot my clusters?
# my snippets
Created by Dr. Mengsen Zhang, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.