CMSE 381 Study Guide#

Here we summarize the detailed learning goals in terms of a series of questions that you should be able to answer at the end of this course.

(Work in progress… Last update 10/6/2025)

Part 1#

Linear Regression#


Part 2#

Methods of validations#

(Required readings: chapter sections 5.1 … 10 pages)

  • Why do you need validation methods?

  • What is the difference between validation scores and true test errors?

  • What are the three most basic validation methods?

  • For each of these methods,

    • How do you calculate the validation score? You should be able to describe the procedures verbally and mathematically.

    • How do you calculate the validation score in Python?

    • What is the advantages and disadvantages of using this method compared to the two other methods? You should be able to describe them in terms of computational cost and bias/variance of estimation.

  • How do you use validation methods to select the appropriate meta-parameter?

    • Given a plot of validation score as a function of model flexibility, which depends on a specific meta-parameter of the model, you should be able to choose the right parameter by reading the plot, and provide an explanation based on the principle of bias-variance tradeoff.

    • Given a dataset, you should be able to generate this plot yourself in Python for any type of model covered in this class.

Subset/feature selection#

(Required readings: chapter section 6.1 … 9 pages)

  • Why shouldn’t you have as many predictors in your model as possible?

  • What are the three basic methods could be used to select the appropriate predictors (feature selection)?

  • What are the steps/procedures in the algorithm for each of these methods?

  • How to implement these procedures by hand given the training and CV test error for each combo of predictors?

  • What are the pros and cons for choosing each of these feature selection methods? When you can or cannot use them?

  • How do you implement the feature selection algorithm in Python?

Regularization methods#

(Required readings: chapter section 6.2.1-6.2.2 … 12 pages)

  • What is regularization? Why do we need it?

  • What are the two basic types of regularization methods? How are they implemented mathematically in linear regression? Why are they also called Shrinkage methods?

  • How do you fit these two basic type of regularized linear regression model in python?

  • How do you control the model flexibility & bias-variance tradeoff when using regularization?

  • How do you find the right amount of regularization using cross-validation? How do you do this in python?

  • What additional precautions do you need to take when using regularization (compared to least squares)?

  • What are the advantages of regularization compared to Least Squares?

  • What are the advantages of regularization compared to subset selection?

  • What are the advantages of one type Shrinkage method over another? When do you choose one over another?

Dimensionality reduction and feature engineering#

(Required readings: chapter section 6.3 … 9 pages)

  • WHow to create new variables as linear combinations of the original predictors?

  • Why do we need Principal Component Analysis (PCA)? What is the main purpose of using it?

  • What is a principal component (PC)?

    • What does the first PC maximize? You should be able to explain this both geometrically in a plot and mathematically.

    • Similarly, what do the subsequent PCs maximize?

  • How do you compute the PCs in Python, given a dataset?

    • How do you project data points on each PC? You should also be able to plot the data points in the PC space.

    • How do you find out how much variance each PC explains?

  • Why do you want to use the PCs in regression models?

    • What assumptions do you have to make for it to be a good idea to use principal component regression (PCR)?

    • Or conversely, what is a typical bad scenario to use PCR?

  • How do you implement PCR in Python?

  • How do you interpret the model coefficients when using PCR?

  • How do you choose the number of PC to use in PCR?

    • Given a figure of, e.g., cross-validation score as a function of the number of PCs, you should be able to choose appropriately and provide rationales in terms of bias-variance tradeoff.

    • You should be able to generate such figures in Python.

  • What is the relationship/differences between PCR and feature selection and regularization methods that you learned in this part of the course?


Part 3#

Polynomial functions and step functions#

(Required readings: chapter section 7.1-7.2 … 4 pages)

  • Why do people want to use polynomial regression rather than simple linear regression?

  • How to fit polynomial regression models?

    • You should be able to describe the procedures verbally and implement them in Python

  • Why do people want to use step functions rather than simple linear regression?

  • How to fit step function models?

    • You should be able to describe the procedures for both regression and classification problems.

    • You should also be able implement them in Python.

  • What is common between polynomial regression and step functions?

  • What is an advantage of using step functions than polynomial despite its simplicity?

    • You should be able to distinguish between models with equations that apply to the whole domain vs those that don’t.

Basis Functions and Splines#

(Required readings: chapter section 7.3-7.4 … 7 pages)

  • What are basis functions?

    • You should be able to articulate what are the basis functions \(b_i\) for different models that we have covered.

    • Examples include polynomial, step functions, and cubic splines.

  • What are the purpose of using basis functions?

    • How do they influence model flexibility?

    • How does this purpose manifest in different models, such as regression splines?

  • How to define cubic splines mathematically in terms of piecewise polynomial?

    • You should be able to write down the equations for the model between knots

    • the equations for constraints at the knots

    • and the total degrees of freedom

  • How to calculate the cubic spline coefficients by hand by applying the above mathematical definitions?

  • How to define cubic splines in terms of basis functions?

    • You should be able to describe it mathematically in terms truncated power basis function (textbook version).

    • You should be able to visualize them as B-spline basis function in Python.

  • How do you fit a cubic spline model to data in Python?

  • How do you change the model flexibility?

    • What are the relevant metaparameters?

    • How do you choose the appropriate flexibility?

  • What precautions must you take at the outer boundary? Why?

Decision Trees and Random Forests#

(Required readings: chapter section 8.1-8.2.2 … 16 pages)

  • How does a decision tree make decisions?

    • Given a tree and a data point with multiple predictors \((x_1, x_2,\cdots, x_p)\), you should be able to derive what the predicted outcome \(\hat{y}\) should be.

    • You should be able to do this for both regression and classification decision trees.

  • What are different parts of a decision tree called? What do they represent?

    • You should be able to point out what are: leaves (terminal nodes), internal nodes, branches/edges.

    • Given a simple example (e.g., two predictors), you should be able to point out which region in the predictor space map to which leave.

  • How do you fit a decision tree?

    • What is the cost function for regression and classification tree respectively?

    • How does recursive binary splitting work?

    • How does it end?

    • How do you do it in Python? How do you visualize the tree in Python?

  • What meta-parameter of decision trees determine their flexibility?

  • Why do you need to prune the tree?

  • What are the advantages and disadvantages of using decision trees vs. linear regression?

  • What does bagging of decision trees accomplish?

  • How do you use out of bag error estimation for decision trees?

    • You should be able to describe this for both regression and classification trees.

  • What problem of bagging does using random forest address?

  • What is the relationship between random forest and bagging?

Maximal Margin Classifiers and Support Vector Classifier/Machines#

(Required readings: chapter section 9.1-9.4 … 17 pages)

  • What is hyperplane?

    • How do you mathematically describe this hyperplane?

    • How do you mathematically describe the two sides of the hyperplane?

    • Given the equation of the hyperplane and the coordinates for a point, you should be able to tell which side of the plane the point is on.

  • What qualify a hyperplane as a separating hyperplane?

    • You should be able to describe this mathematically using an inequality.

    • You should also be able to determine whether a hyperplane is a separating hyperplane given a graph, or given the equation of the plane and the coordinates and class of a few points.

  • How to use a separating hyperplane as a classifier?

  • What makes a hyperplane a maximal margin hyperplane?

    • What are its margin and support vectors?

    • Given a graph, you should be able to clearly label the margin and support vectors. You should also be able to infer the size of the margin from reading the graph.

    • You should also be able to describe the optimization problem mathematically.

    • Given the equation of the maximal margin hyperplane and the coordinates of support vectors, you should be able to calculate the size of the margin by hand.

  • What is the difference between Support Vector Classifier and Maximal Margin Classifier?

    • What short-comings of Maximal Margin Classifier does Support Vector Classifier overcome?

    • How?

    • You should be able to describe the difference verbally, graphically, and mathematically.

  • How to interpret each element of the mathematical formulation of Support Vector Machine graphically?

    • Given a graph of Support Vector Classifier and some data points, you should be able to derive the values of different parameters in the mathematical formulation (e.g. \(M\), \(\epsilon_i\)).

  • What parameters controls the flexibility (hence bias-variance trade-off) of Support Vector Classifier?

  • What are the support vectors of Support Vector Classifier?

  • How to use Support Vector Classifier in python?

    • You should be able to obtain and interpret the fitted parameters.

    • You should be able to select the appropriate hyperparameter that optimize bias-variance trade off.

  • How is nonlinearity introduced in Support Vector Machine?

    • You should be able to describe how kernels work conceptually and mathematically.

  • What type of kernel makes Support Vector Machine equivalent to Support Vector Classifier?

  • What other kernels are there?

  • What type of kernel leads to more local behavior?

  • How to use Support Vector Machine in Python?

    • You should be able to fit a Support Vector Machine using different kernels covered in class.

    • You should be able select the appropriate hyperparameter that optimize bias-variance trade off.

Neural Networks#

(Required readings: chapter section 10.1-10.3 … 14 pages)

  • What is the architecture of 1-layer feed-forward neural network?

    • You should be able to describe what happens at each layer (e.g., input layer, hidden layer, output layer) conceptually and mathematically.

  • What is activation? What types of activation functions are there?

    • You should be able to explain it mathematically.

    • Given example inputs and weights from the previous matrix, you should be able to calculate the activation \(A\) by hand (with the help of a calculator).

  • How does a 1-layer feed-forward neural network produce an output?

    • Given a \(\beta\) matrix and activation from the previous layer, you should be able to calculate the output by hand.

  • What do the fitted parameters minimize in a 1-layer feed-forward neural network?

  • What is the architecture of multilayer feed-forward neural networks?

    • You should be able to describe what happens at each layer mathematically for a typical classification problem.

    • You should be able to calculate class probability for the output layer using softmax.

  • How to classify images using multilayer neural networks in Python?

  • What is the architecture of a convolutional neural network?

    • You should be able to describe the types of layers, how they are arranged, and the purpose of each of them.

  • How does convolution work?

    • You should be able to compute the result of the convolution between a simple matrix and a simple filter by hand.

  • What information does Convolutional Neural Network capture through convolution?

Unsupervised learning / Clustering#

(Required readings: chapter section 12.1, 12.4 … 16 pages)

  • What is the difference between supervised vs unsupervised learning?

  • What do clustering methods aim to accomplish?

  • How to interpret a dendrogram of hierarchical clustering?

  • How are different linkage methods defined?

  • How to perform hierarchical clustering in Python?