In order to successfully complete this assignment, you must follow all the instructions in this notebook and upload your edited ipynb file to D2L with your answers on or before 11:59pm on Friday September 25th.

BIG HINT: Read the entire homework before starting.

Homework 1: Data as Vectors

In this homework, we will download and explore some widely available datasets using the Python Scikit-learn module. We want you to start thinking of data samples of "feature vectors". Each sample in a dataset is composed of $n$ measurements. Each individual measurement does not necessarily have to relate to other measurements in the vector but each measurement $v_i \in V$ corresponds to a "similar" measurement in another sample $u_i \in U$.

Simple clipart picture of some wine bottles.  In this assignment we will be download a dataset related to wine so this image is intended as a visual anchor

image from : pixabay.com


1. Download and Explore the data

The "Wine" is provided with the Sklearn library and is an easy dataset use as an example.

**DO THIS:** Run the following code to download the wine dataset.

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import sklearn.datasets as sdata

sk_data = sdata.load_wine()

Let's inspect the sk_data class by using the dir command:

dir(sk_data)
['DESCR', 'data', 'feature_names', 'target', 'target_names']

The DESCR object looks interesting. Let's print it out and see what is going on...

print(sk_data.DESCR)
.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
    ============================= ==== ===== ======= =====
                                   Min   Max   Mean     SD
    ============================= ==== ===== ======= =====
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0.98  3.88    2.29  0.63
    Flavanoids:                   0.34  5.08    2.03  1.00
    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12
    Proanthocyanins:              0.41  3.58    1.59  0.57
    Colour Intensity:              1.3  13.0     5.1   2.3
    Hue:                          0.48  1.71    0.96  0.23
    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71
    Proline:                       278  1680     746   315
    ============================= ==== ===== ======= =====

    :Missing Attribute Values: None
    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

Original Owners: 

Forina, M. et al, PARVUS - 
An Extendible Package for Data Exploration, Classification and Correlation. 
Institute of Pharmaceutical and Food Analysis and Technologies,
Via Brigata Salerno, 16147 Genoa, Italy.

Citation:

Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science. 

.. topic:: References

  (1) S. Aeberhard, D. Coomans and O. de Vel, 
  Comparison of Classifiers in High Dimensional Settings, 
  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of  
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Technometrics). 

  The data was used with many others for comparing various 
  classifiers. The classes are separable, though only RDA 
  has achieved 100% correct classification. 
  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) 
  (All results using the leave-one-out technique) 

  (2) S. Aeberhard, D. Coomans and O. de Vel, 
  "THE CLASSIFICATION PERFORMANCE OF RDA" 
  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of 
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Journal of Chemometrics).

**DO THIS:** How many features are there in this dataset? (You could count them manually but it is highly recommended that you write code to find the right number. We may be using similar datasets in the future and it is generally better to try to make code that is portable for when things change. Store the number of features in the fector as the python variable N so that you can check your answer below:

#put your answer to the above question here.
from answercheck import checkanswer

checkanswer(N,'c51ce410c124a10e0db5e4b97fc2af39');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-9fe5336775ac> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(N,'c51ce410c124a10e0db5e4b97fc2af39');

NameError: name 'N' is not defined

**DO THIS:** In this dataset, How many different wines were tested using these $n$ features? Again, write code to calculate the answer instead of just "hard coding" the number. Store the size in a variable named M

#put your answer to the above question here.
from answercheck import checkanswer

checkanswer(M,'8f85517967795eeef66c225f7883bdcb');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-c7675f24a4a7> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(M,'8f85517967795eeef66c225f7883bdcb');

NameError: name 'M' is not defined

The following figure graphs each feature for the entire dataset.

plt.figure(figsize=(20,10))
plt.plot(sk_data.data);
plt.legend(sk_data.feature_names)
<matplotlib.legend.Legend at 0x7f8fc6ac0ef0>

Another way to look at this dataset is as a large 2D array (or matrix) which can be viewed as an image using the imshow function. In this case we choose the "Reds" colormap to keep with the wine theme.

plt.figure(figsize=(20,2))
plt.imshow(sk_data.data.T,cmap="Reds")
plt.colorbar()
<matplotlib.colorbar.Colorbar at 0x7f8fc5a2cb00>

The pandas library can be helpful for viewing data. Here we will just show the basics for turning the raw data stored as a numpy array into a pandas.DataaFrame object. Each row is a particular wine and the columns are the individual feature measurements.

import pandas

df = pandas.DataFrame(sk_data.data, columns = sk_data.feature_names)
df
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline
0 14.23 1.71 2.43 15.6 127.0 2.80 3.06 0.28 2.29 5.640000 1.04 3.92 1065.0
1 13.20 1.78 2.14 11.2 100.0 2.65 2.76 0.26 1.28 4.380000 1.05 3.40 1050.0
2 13.16 2.36 2.67 18.6 101.0 2.80 3.24 0.30 2.81 5.680000 1.03 3.17 1185.0
3 14.37 1.95 2.50 16.8 113.0 3.85 3.49 0.24 2.18 7.800000 0.86 3.45 1480.0
4 13.24 2.59 2.87 21.0 118.0 2.80 2.69 0.39 1.82 4.320000 1.04 2.93 735.0
5 14.20 1.76 2.45 15.2 112.0 3.27 3.39 0.34 1.97 6.750000 1.05 2.85 1450.0
6 14.39 1.87 2.45 14.6 96.0 2.50 2.52 0.30 1.98 5.250000 1.02 3.58 1290.0
7 14.06 2.15 2.61 17.6 121.0 2.60 2.51 0.31 1.25 5.050000 1.06 3.58 1295.0
8 14.83 1.64 2.17 14.0 97.0 2.80 2.98 0.29 1.98 5.200000 1.08 2.85 1045.0
9 13.86 1.35 2.27 16.0 98.0 2.98 3.15 0.22 1.85 7.220000 1.01 3.55 1045.0
10 14.10 2.16 2.30 18.0 105.0 2.95 3.32 0.22 2.38 5.750000 1.25 3.17 1510.0
11 14.12 1.48 2.32 16.8 95.0 2.20 2.43 0.26 1.57 5.000000 1.17 2.82 1280.0
12 13.75 1.73 2.41 16.0 89.0 2.60 2.76 0.29 1.81 5.600000 1.15 2.90 1320.0
13 14.75 1.73 2.39 11.4 91.0 3.10 3.69 0.43 2.81 5.400000 1.25 2.73 1150.0
14 14.38 1.87 2.38 12.0 102.0 3.30 3.64 0.29 2.96 7.500000 1.20 3.00 1547.0
15 13.63 1.81 2.70 17.2 112.0 2.85 2.91 0.30 1.46 7.300000 1.28 2.88 1310.0
16 14.30 1.92 2.72 20.0 120.0 2.80 3.14 0.33 1.97 6.200000 1.07 2.65 1280.0
17 13.83 1.57 2.62 20.0 115.0 2.95 3.40 0.40 1.72 6.600000 1.13 2.57 1130.0
18 14.19 1.59 2.48 16.5 108.0 3.30 3.93 0.32 1.86 8.700000 1.23 2.82 1680.0
19 13.64 3.10 2.56 15.2 116.0 2.70 3.03 0.17 1.66 5.100000 0.96 3.36 845.0
20 14.06 1.63 2.28 16.0 126.0 3.00 3.17 0.24 2.10 5.650000 1.09 3.71 780.0
21 12.93 3.80 2.65 18.6 102.0 2.41 2.41 0.25 1.98 4.500000 1.03 3.52 770.0
22 13.71 1.86 2.36 16.6 101.0 2.61 2.88 0.27 1.69 3.800000 1.11 4.00 1035.0
23 12.85 1.60 2.52 17.8 95.0 2.48 2.37 0.26 1.46 3.930000 1.09 3.63 1015.0
24 13.50 1.81 2.61 20.0 96.0 2.53 2.61 0.28 1.66 3.520000 1.12 3.82 845.0
25 13.05 2.05 3.22 25.0 124.0 2.63 2.68 0.47 1.92 3.580000 1.13 3.20 830.0
26 13.39 1.77 2.62 16.1 93.0 2.85 2.94 0.34 1.45 4.800000 0.92 3.22 1195.0
27 13.30 1.72 2.14 17.0 94.0 2.40 2.19 0.27 1.35 3.950000 1.02 2.77 1285.0
28 13.87 1.90 2.80 19.4 107.0 2.95 2.97 0.37 1.76 4.500000 1.25 3.40 915.0
29 14.02 1.68 2.21 16.0 96.0 2.65 2.33 0.26 1.98 4.700000 1.04 3.59 1035.0
... ... ... ... ... ... ... ... ... ... ... ... ... ...
148 13.32 3.24 2.38 21.5 92.0 1.93 0.76 0.45 1.25 8.420000 0.55 1.62 650.0
149 13.08 3.90 2.36 21.5 113.0 1.41 1.39 0.34 1.14 9.400000 0.57 1.33 550.0
150 13.50 3.12 2.62 24.0 123.0 1.40 1.57 0.22 1.25 8.600000 0.59 1.30 500.0
151 12.79 2.67 2.48 22.0 112.0 1.48 1.36 0.24 1.26 10.800000 0.48 1.47 480.0
152 13.11 1.90 2.75 25.5 116.0 2.20 1.28 0.26 1.56 7.100000 0.61 1.33 425.0
153 13.23 3.30 2.28 18.5 98.0 1.80 0.83 0.61 1.87 10.520000 0.56 1.51 675.0
154 12.58 1.29 2.10 20.0 103.0 1.48 0.58 0.53 1.40 7.600000 0.58 1.55 640.0
155 13.17 5.19 2.32 22.0 93.0 1.74 0.63 0.61 1.55 7.900000 0.60 1.48 725.0
156 13.84 4.12 2.38 19.5 89.0 1.80 0.83 0.48 1.56 9.010000 0.57 1.64 480.0
157 12.45 3.03 2.64 27.0 97.0 1.90 0.58 0.63 1.14 7.500000 0.67 1.73 880.0
158 14.34 1.68 2.70 25.0 98.0 2.80 1.31 0.53 2.70 13.000000 0.57 1.96 660.0
159 13.48 1.67 2.64 22.5 89.0 2.60 1.10 0.52 2.29 11.750000 0.57 1.78 620.0
160 12.36 3.83 2.38 21.0 88.0 2.30 0.92 0.50 1.04 7.650000 0.56 1.58 520.0
161 13.69 3.26 2.54 20.0 107.0 1.83 0.56 0.50 0.80 5.880000 0.96 1.82 680.0
162 12.85 3.27 2.58 22.0 106.0 1.65 0.60 0.60 0.96 5.580000 0.87 2.11 570.0
163 12.96 3.45 2.35 18.5 106.0 1.39 0.70 0.40 0.94 5.280000 0.68 1.75 675.0
164 13.78 2.76 2.30 22.0 90.0 1.35 0.68 0.41 1.03 9.580000 0.70 1.68 615.0
165 13.73 4.36 2.26 22.5 88.0 1.28 0.47 0.52 1.15 6.620000 0.78 1.75 520.0
166 13.45 3.70 2.60 23.0 111.0 1.70 0.92 0.43 1.46 10.680000 0.85 1.56 695.0
167 12.82 3.37 2.30 19.5 88.0 1.48 0.66 0.40 0.97 10.260000 0.72 1.75 685.0
168 13.58 2.58 2.69 24.5 105.0 1.55 0.84 0.39 1.54 8.660000 0.74 1.80 750.0
169 13.40 4.60 2.86 25.0 112.0 1.98 0.96 0.27 1.11 8.500000 0.67 1.92 630.0
170 12.20 3.03 2.32 19.0 96.0 1.25 0.49 0.40 0.73 5.500000 0.66 1.83 510.0
171 12.77 2.39 2.28 19.5 86.0 1.39 0.51 0.48 0.64 9.899999 0.57 1.63 470.0
172 14.16 2.51 2.48 20.0 91.0 1.68 0.70 0.44 1.24 9.700000 0.62 1.71 660.0
173 13.71 5.65 2.45 20.5 95.0 1.68 0.61 0.52 1.06 7.700000 0.64 1.74 740.0
174 13.40 3.91 2.48 23.0 102.0 1.80 0.75 0.43 1.41 7.300000 0.70 1.56 750.0
175 13.27 4.28 2.26 20.0 120.0 1.59 0.69 0.43 1.35 10.200000 0.59 1.56 835.0
176 13.17 2.59 2.37 20.0 120.0 1.65 0.68 0.53 1.46 9.300000 0.60 1.62 840.0
177 14.13 4.10 2.74 24.5 96.0 2.05 0.76 0.56 1.35 9.200000 0.61 1.60 560.0

178 rows × 13 columns

Another useful pandas funciton is the describe function which gives some basic statistics for the measurements. Check to make sure these measurements match up with the statistics provided in the dataset DESCRIB

df.describe()
alcohol malic_acid ash alcalinity_of_ash magnesium total_phenols flavanoids nonflavanoid_phenols proanthocyanins color_intensity hue od280/od315_of_diluted_wines proline
count 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000 178.000000
mean 13.000618 2.336348 2.366517 19.494944 99.741573 2.295112 2.029270 0.361854 1.590899 5.058090 0.957449 2.611685 746.893258
std 0.811827 1.117146 0.274344 3.339564 14.282484 0.625851 0.998859 0.124453 0.572359 2.318286 0.228572 0.709990 314.907474
min 11.030000 0.740000 1.360000 10.600000 70.000000 0.980000 0.340000 0.130000 0.410000 1.280000 0.480000 1.270000 278.000000
25% 12.362500 1.602500 2.210000 17.200000 88.000000 1.742500 1.205000 0.270000 1.250000 3.220000 0.782500 1.937500 500.500000
50% 13.050000 1.865000 2.360000 19.500000 98.000000 2.355000 2.135000 0.340000 1.555000 4.690000 0.965000 2.780000 673.500000
75% 13.677500 3.082500 2.557500 21.500000 107.000000 2.800000 2.875000 0.437500 1.950000 6.200000 1.120000 3.170000 985.000000
max 14.830000 5.800000 3.230000 30.000000 162.000000 3.880000 5.080000 0.660000 3.580000 13.000000 1.710000 4.000000 1680.000000

2. Distance Measure

Now that you have a feel for the type of data available in the wine dataset we need to build a measure to compare the wines. The following is a "stub" function. Modify it to return the euclidean distance between two different input features.

HINT a good solution is one that does not "Hard-code" properties of the wine datset (such as vector length), instead a good function is one that will calculate the distance between any two vectors in $R^n$ for any size of $n$.

def dist(u,v):
    d = 0
    return d

Lets test our functions on a couple of simple examples. The following are common examples for which we know the values:

dist([0,0],[0,1]) == 1
False
dist([0,0, 0],[1,0, 0]) == 1
False
dist([0,0],[3,4]) == 5
False
anyvec = [1,22,3,444,5.123,69,2229,42.0]
dist(anyvec,anyvec) == 0
True
from answercheck import checkanswer

checkanswer(dist(sk_data.data[0,:],sk_data.data[51,:]),'7a502b88ac326e0d79fe2cc8f33efd15');
Testing 0
Answer seems to be incorrect

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-251e579c82c3> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(dist(sk_data.data[0,:],sk_data.data[51,:]),'7a502b88ac326e0d79fe2cc8f33efd15');

~/_CMSE314_F20/CMSE314/answercheck.py in __init__(self, var, hashtag)
     23 
     24     def __init__(self, var, hashtag=None):
---> 25         checkanswer.basic(var, hashtag)
     26 
     27     def basic(var, hashtag=None):

~/_CMSE314_F20/CMSE314/answercheck.py in basic(var, hashtag)
     48             else:
     49                 print("Answer seems to be incorrect\n")
---> 50                 assert checktag == hashtag, f"Answer is incorrect {checktag}"
     51         else:
     52             raise TypeError(f"No answer hastag provided: {checktag}")

AssertionError: Answer is incorrect cfcd208495d565ef66e7dff9f98764da

Assuming the distance measure is working above, we can calculate the distance between all wines relative to each other. Notice that this graph is symmetric because the distance between Wine A and B is the same as the distance between Wine B and A. Also Notice that the diagonal for the matrix is aways zero. i.e. the distance between wine A and wine A is zero.

distmatrix= np.zeros((M,M))

def distance_matrix(A):
    for i in range(M):
        for j in range(M):
            distmatrix[i,j] = dist(A[i,:], A[j,:])
    plt.figure(figsize=(20,10))
    plt.imshow(distmatrix, cmap="Reds")    
    plt.colorbar();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-8ba66a8843aa> in <module>
----> 1 distmatrix= np.zeros((M,M))
      2 
      3 def distance_matrix(A):
      4     for i in range(M):
      5         for j in range(M):

NameError: name 'M' is not defined
distance_matrix(sk_data.data)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-3e6d5f36ff3b> in <module>
----> 1 distance_matrix(sk_data.data)

NameError: name 'distance_matrix' is not defined

3. Introduction to Principal Component Analysis

Large datasets such as the wine dataset are considered "high dimensional" when the number of measurements in their feature vector gets large. What is considered "large" depends a little on what you are trying to do. If you are trying to visualize the data, anything large is bigger than 2 or 3 dimensions (it is hard for the human brain to visualize data in more than three dimensions).

Later in the semester you will learn how to use something called "Eigenvectors" and "EigenValues" to do Principal Component Analysis (PCA) of high dimensional Data. PCA is probably the most common algorithm in a class of algorithms used for "dimensionality reduction". The purpose of these algorithms is to summaries complex, high dimensional data into a smaller set of dimensions that fit the problem you are trying to solve. In our case, we want to visualize feature values of $N$ measurements only using 2 axes.

To learn more about PCA try checking out the PCA wikipedia page.

Before we get into all of the details about how to do the PCA math using eigenvalues/eigenvectors we will just use a PCA function avaliable in the sklearn library.

The following code imports the PCA function and reduces the wine data (sk_data.data) down to it's two largest principal components. Think of these principal components as a weighted sum of the original data specifically designed to maintain the most information.

#Reduce the data down to two priciple compoents to make plotting easier.
from sklearn.decomposition import PCA
reduced_data = PCA(n_components=2).fit_transform(sk_data.data)

Now we plot the original dataset with different colors corresponding to the three wine classes which was included with the data (sk_data.target):

#Strip out the three classes of data and plot
class0 = reduced_data[sk_data.target==0,:]
class1 = reduced_data[sk_data.target==1,:]
class2 = reduced_data[sk_data.target==2,:]

plt.scatter(class0[:,0],class0[:,1])
plt.scatter(class1[:,0],class1[:,1])
plt.scatter(class2[:,0],class2[:,1])

plt.xlabel('First principal component')
plt.ylabel('Secoind principal component')
Text(0, 0.5, 'Secoind principal component')

We can now see each of the sample wines and their "relative" relationship to each other. Unfortunately, the first two principal components do not have any units so it is hard to interpret their meaning. In the next section we will use "normalization" to clean up the data and make it easier to visualize.


4. Normalizing the data

One problem with the above PCA is we treat each measurement in the feature vector as having the same units. This means some measurements seem to have more "weight" in the PCA analysis just because they are bigger. One way we can fix this problem is to "Normalize" all the measurements between zero (0) and one (1). This normalization step allows us to better compare the measurements.

Let us assume that the above data is stored in a matrix $data$ with each row ($i \in M$) representing a wine and each column representing a feature ($j \in N$}. We want to "normalize" each measurement to a value between zero (0) and one (1) using the following equation:

For each wine and each feature ($j \in N$): $$A_{i,j} = \frac{data_{i,j} - min_j}{max_j-min_j}$$

where $min_j$ is the minimum value of the $j$th feature and $max_j$ is the maximum value of the $j$th feature.

**DO THIS:** Write a program to normalize all of the values in the sk_data.data dataset. Store the normalized values in a matrix $A$. HINT avoid writing lots of loops, libraries such as numpy, pandas and scikit-learn all have functions that may help turn a 20 line program into 3 lines of code.

#Put your answer to the above question here. 
from answercheck import checkanswer

checkanswer(A,'85608294aee283f63b58cfdc8da99a7c');
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-45d13c83fa0e> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(A,'85608294aee283f63b58cfdc8da99a7c');

NameError: name 'A' is not defined
plt.figure(figsize=(20,2))

plt.plot(A);
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-1c608064f02c> in <module>
      1 plt.figure(figsize=(20,2))
      2 
----> 3 plt.plot(A);

NameError: name 'A' is not defined
<Figure size 1440x144 with 0 Axes>
%matplotlib inline
import matplotlib.pylab as plt

plt.figure(figsize=(20,2))

plt.imshow(A.T, cmap='Reds');
plt.colorbar();
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-64a4179f5b4f> in <module>
      4 plt.figure(figsize=(20,2))
      5 
----> 6 plt.imshow(A.T, cmap='Reds');
      7 plt.colorbar();

NameError: name 'A' is not defined
<Figure size 1440x144 with 0 Axes>

**Do This:** Copy and paste the code from the above PCA section and replace the sk_data.data with the normalized vector A.

# YOUR CODE HERE
raise NotImplementedError()
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-26-15b94d1fa268> in <module>
      1 # YOUR CODE HERE
----> 2 raise NotImplementedError()

NotImplementedError: 

**Question:** Compare and contrast the graph generated in part 3 with the one generated in part 4. In your own words, explain why the normalized data is "better".

YOUR ANSWER HERE


Congratulations, we're done!

Turn in your assignment using D2L no later than 11:59pm on the day of class. See links at the end of this document for access to the class timeline for your section.

Written by Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.