{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pre-Class Assignment: Regression\n", "# Day 13\n", "# CMSE 202" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "###
✅ Put your name here
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Goals for Preclass\n", "\n", "After this pre-class assignment you should be able to:\n", "\n", "1. Generate a variety of randomized data\n", "2. Construct a 1-dimensional Linear Regression fit to these data\n", "3. Explain how to judge the quality of a 1-dimensional Linear Regression fit\n", "\n", "**This assignment is due by 11:59 p.m. the day before class,** and should be uploaded into appropriate submission folder on D2L. Submission instructions can be found at the end of the notebook." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Imports for the notebook\n", "\n", "Make sure you execute the following cell to get all of the imports you will need for this notebook.\n", "\n", "✅ **Review all of these imports, are any of them unfamiliar to you?** If so, look up the unfamiliar module(s) to learn a bit about what they do." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import matplotlib.pyplot as plt\n", "import pandas as pd\n", "import numpy as np\n", "import random\n", "import statsmodels.api as sm\n", "from IPython.display import HTML" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "✎ Do This - Erase the contents of this cell and replace it with notes on any of the imports that were unfamiliar to you and what they appear to be for. (double-click on this text to edit this cell, and hit shift+enter to save the text)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "## 1. Regression\n", "\n", "In this pre-class assignment, we're going to revisit a concept that you should have explored a bit in CMSE 201: Regression.\n", "\n", "The term \"Regression\" actually represents an entire class of algorithms that are used to **model** data. A regression model provides a way to visualize and predict how a **predictor variable**, usually the x-axis relates to a **dependent variable** usually shown on the y-axis. It is usually a first place to start when trying to build a model of data.\n", "\n", "Of the family of regression algorithms, the one most often taught at first is called Ordinary Least Squares, often shortened to OLS. When regression is first discussed in 201, you should have also learned a bit about the OLS approach to regression.\n", "\n", "**Let's revisit and review OLS.**\n", "\n", "A regression is used to estimate predictor parameters from data. In simple linear regression the x and y data are assumed to be linearly related. That is, there exists an equation:\n", "\n", "$$ y = Ax + B $$\n", "\n", "such that we must discover the values of $A$, often called the **slope** and of $B$, often called the **intercept** that is somehow optimal. There are many forms of optimization, but we will focus on OLS optimization." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 1.1 Ordinary Least Squares\n", "\n", "The OLS approach to optimizing the parameters above are to \"minimize the residuals.\" The picture below demonstrates the concept of a residual:\n", "\n", "