CMSE 381 Study Guide#
Here we summarize the detailed learning goals in terms of a series of questions that you should be able to answer at the end of this course.
(Work in progress… Last update 10/6/2025)
Part 1#
Linear Regression#
Part 2#
Methods of validations#
(Required readings: chapter sections 5.1 … 10 pages)
Why do you need validation methods?
What is the difference between validation scores and true test errors?
What are the three most basic validation methods?
For each of these methods,
How do you calculate the validation score? You should be able to describe the procedures verbally and mathematically.
How do you calculate the validation score in Python?
What is the advantages and disadvantages of using this method compared to the two other methods? You should be able to describe them in terms of computational cost and bias/variance of estimation.
How do you use validation methods to select the appropriate meta-parameter?
Given a plot of validation score as a function of model flexibility, which depends on a specific meta-parameter of the model, you should be able to choose the right parameter by reading the plot, and provide an explanation based on the principle of bias-variance tradeoff.
Given a dataset, you should be able to generate this plot yourself in Python for any type of model covered in this class.
Subset/feature selection#
(Required readings: chapter section 6.1 … 9 pages)
Why shouldn’t you have as many predictors in your model as possible?
What are the three basic methods could be used to select the appropriate predictors (feature selection)?
What are the steps/procedures in the algorithm for each of these methods?
How to implement these procedures by hand given the training and CV test error for each combo of predictors?
What are the pros and cons for choosing each of these feature selection methods? When you can or cannot use them?
How do you implement the feature selection algorithm in Python?
Regularization methods#
(Required readings: chapter section 6.2.1-6.2.2 … 12 pages)
What is regularization? Why do we need it?
What are the two basic types of regularization methods? How are they implemented mathematically in linear regression? Why are they also called Shrinkage methods?
How do you fit these two basic type of regularized linear regression model in python?
How do you control the model flexibility & bias-variance tradeoff when using regularization?
How do you find the right amount of regularization using cross-validation? How do you do this in python?
What additional precautions do you need to take when using regularization (compared to least squares)?
What are the advantages of regularization compared to Least Squares?
What are the advantages of regularization compared to subset selection?
What are the advantages of one type Shrinkage method over another? When do you choose one over another?
Dimensionality reduction and feature engineering#
(Required readings: chapter section 6.3 … 9 pages)
WHow to create new variables as linear combinations of the original predictors?
Why do we need Principal Component Analysis (PCA)? What is the main purpose of using it?
What is a principal component (PC)?
What does the first PC maximize? You should be able to explain this both geometrically in a plot and mathematically.
Similarly, what do the subsequent PCs maximize?
How do you compute the PCs in Python, given a dataset?
How do you project data points on each PC? You should also be able to plot the data points in the PC space.
How do you find out how much variance each PC explains?
Why do you want to use the PCs in regression models?
What assumptions do you have to make for it to be a good idea to use principal component regression (PCR)?
Or conversely, what is a typical bad scenario to use PCR?
How do you implement PCR in Python?
How do you interpret the model coefficients when using PCR?
How do you choose the number of PC to use in PCR?
Given a figure of, e.g., cross-validation score as a function of the number of PCs, you should be able to choose appropriately and provide rationales in terms of bias-variance tradeoff.
You should be able to generate such figures in Python.
What is the relationship/differences between PCR and feature selection and regularization methods that you learned in this part of the course?
Part 3#
Polynomial functions and step functions#
(Required readings: chapter section 7.1-7.2 … 4 pages)
Why do people want to use polynomial regression rather than simple linear regression?
How to fit polynomial regression models?
You should be able to describe the procedures verbally and implement them in Python
Why do people want to use step functions rather than simple linear regression?
How to fit step function models?
You should be able to describe the procedures for both regression and classification problems.
You should also be able implement them in Python.
What is common between polynomial regression and step functions?
What is an advantage of using step functions than polynomial despite its simplicity?
You should be able to distinguish between models with equations that apply to the whole domain vs those that don’t.
Basis Functions and Splines#
(Required readings: chapter section 7.3-7.4 … 7 pages)
What are basis functions?
You should be able to articulate what are the basis functions \(b_i\) for different models that we have covered.
Examples include polynomial, step functions, and cubic splines.
What are the purpose of using basis functions?
How do they influence model flexibility?
How does this purpose manifest in different models, such as regression splines?
How to define cubic splines mathematically in terms of piecewise polynomial?
You should be able to write down the equations for the model between knots
the equations for constraints at the knots
and the total degrees of freedom
How to calculate the cubic spline coefficients by hand by applying the above mathematical definitions?
How to define cubic splines in terms of basis functions?
You should be able to describe it mathematically in terms truncated power basis function (textbook version).
You should be able to visualize them as B-spline basis function in Python.
How do you fit a cubic spline model to data in Python?
How do you change the model flexibility?
What are the relevant metaparameters?
How do you choose the appropriate flexibility?
What precautions must you take at the outer boundary? Why?
Decision Trees and Random Forests#
(Required readings: chapter section 8.1-8.2.2 … 16 pages)
How does a decision tree make decisions?
Given a tree and a data point with multiple predictors \((x_1, x_2,\cdots, x_p)\), you should be able to derive what the predicted outcome \(\hat{y}\) should be.
You should be able to do this for both regression and classification decision trees.
What are different parts of a decision tree called? What do they represent?
You should be able to point out what are: leaves (terminal nodes), internal nodes, branches/edges.
Given a simple example (e.g., two predictors), you should be able to point out which region in the predictor space map to which leave.
How do you fit a decision tree?
What is the cost function for regression and classification tree respectively?
How does recursive binary splitting work?
How does it end?
How do you do it in Python? How do you visualize the tree in Python?
What meta-parameter of decision trees determine their flexibility?
Why do you need to prune the tree?
What are the advantages and disadvantages of using decision trees vs. linear regression?
What does bagging of decision trees accomplish?
How do you use out of bag error estimation for decision trees?
You should be able to describe this for both regression and classification trees.
What problem of bagging does using random forest address?
What is the relationship between random forest and bagging?
Maximal Margin Classifiers and Support Vector Classifier/Machines#
(Required readings: chapter section 9.1-9.4 … 17 pages)
What is hyperplane?
How do you mathematically describe this hyperplane?
How do you mathematically describe the two sides of the hyperplane?
Given the equation of the hyperplane and the coordinates for a point, you should be able to tell which side of the plane the point is on.
What qualify a hyperplane as a separating hyperplane?
You should be able to describe this mathematically using an inequality.
You should also be able to determine whether a hyperplane is a separating hyperplane given a graph, or given the equation of the plane and the coordinates and class of a few points.
How to use a separating hyperplane as a classifier?
What makes a hyperplane a maximal margin hyperplane?
What are its margin and support vectors?
Given a graph, you should be able to clearly label the margin and support vectors. You should also be able to infer the size of the margin from reading the graph.
You should also be able to describe the optimization problem mathematically.
Given the equation of the maximal margin hyperplane and the coordinates of support vectors, you should be able to calculate the size of the margin by hand.
What is the difference between Support Vector Classifier and Maximal Margin Classifier?
What short-comings of Maximal Margin Classifier does Support Vector Classifier overcome?
How?
You should be able to describe the difference verbally, graphically, and mathematically.
How to interpret each element of the mathematical formulation of Support Vector Machine graphically?
Given a graph of Support Vector Classifier and some data points, you should be able to derive the values of different parameters in the mathematical formulation (e.g. \(M\), \(\epsilon_i\)).
What parameters controls the flexibility (hence bias-variance trade-off) of Support Vector Classifier?
What are the support vectors of Support Vector Classifier?
How to use Support Vector Classifier in python?
You should be able to obtain and interpret the fitted parameters.
You should be able to select the appropriate hyperparameter that optimize bias-variance trade off.
How is nonlinearity introduced in Support Vector Machine?
You should be able to describe how kernels work conceptually and mathematically.
What type of kernel makes Support Vector Machine equivalent to Support Vector Classifier?
What other kernels are there?
What type of kernel leads to more local behavior?
How to use Support Vector Machine in Python?
You should be able to fit a Support Vector Machine using different kernels covered in class.
You should be able select the appropriate hyperparameter that optimize bias-variance trade off.
Neural Networks#
(Required readings: chapter section 10.1-10.3 … 14 pages)
What is the architecture of 1-layer feed-forward neural network?
You should be able to describe what happens at each layer (e.g., input layer, hidden layer, output layer) conceptually and mathematically.
What is activation? What types of activation functions are there?
You should be able to explain it mathematically.
Given example inputs and weights from the previous matrix, you should be able to calculate the activation \(A\) by hand (with the help of a calculator).
How does a 1-layer feed-forward neural network produce an output?
Given a \(\beta\) matrix and activation from the previous layer, you should be able to calculate the output by hand.
What do the fitted parameters minimize in a 1-layer feed-forward neural network?
What is the architecture of multilayer feed-forward neural networks?
You should be able to describe what happens at each layer mathematically for a typical classification problem.
You should be able to calculate class probability for the output layer using softmax.
How to classify images using multilayer neural networks in Python?
What is the architecture of a convolutional neural network?
You should be able to describe the types of layers, how they are arranged, and the purpose of each of them.
How does convolution work?
You should be able to compute the result of the convolution between a simple matrix and a simple filter by hand.
What information does Convolutional Neural Network capture through convolution?
Unsupervised learning / Clustering#
(Required readings: chapter section 12.1, 12.4 … 16 pages)
What is the difference between supervised vs unsupervised learning?
What do clustering methods aim to accomplish?
How to interpret a dendrogram of hierarchical clustering?
How are different linkage methods defined?
How to perform hierarchical clustering in Python?