Link to this document's Jupyter Notebook

In order to successfully complete this assignment, you must follow all the instructions in this notebook and upload your edited ipynb file to D2L with your answers on or before 11:59pm on Friday September 25th.

BIG HINT: Read the entire homework before starting.

Homework 1: Data as Vectors¶

In this homework, we will download and explore some widely available datasets using the Python Scikit-learn module. We want you to start thinking of data samples of "feature vectors". Each sample in a dataset is composed of $n$ measurements. Each individual measurement does not necessarily have to relate to other measurements in the vector but each measurement $v_i \in V$ corresponds to a "similar" measurement in another sample $u_i \in U$.

Simple clipart picture of some wine bottles. In this assignment we will be download a dataset related to wine so this image is intended as a visual anchor

image from : pixabay.com

Outline for Homework 1¶

</p>

Download some data
Distance Measure
Introduction to Principal Component Analysis
Normalize the data

1. Download and Explore the data¶

The "Wine" is provided with the Sklearn library and is an easy dataset use as an example.

✅ **DO THIS:** Run the following code to download the wine dataset.

%matplotlib inline
import matplotlib.pylab as plt
import numpy as np
import sklearn.datasets as sdata

sk_data = sdata.load_wine()

Let's inspect the sk_data class by using the dir command:

dir(sk_data)

['DESCR', 'data', 'feature_names', 'target', 'target_names']

The DESCR object looks interesting. Let's print it out and see what is going on...

print(sk_data.DESCR)

.. _wine_dataset:

Wine recognition dataset
------------------------

**Data Set Characteristics:**

    :Number of Instances: 178 (50 in each of three classes)
    :Number of Attributes: 13 numeric, predictive attributes and the class
    :Attribute Information:
 		- Alcohol
 		- Malic acid
 		- Ash
		- Alcalinity of ash  
 		- Magnesium
		- Total phenols
 		- Flavanoids
 		- Nonflavanoid phenols
 		- Proanthocyanins
		- Color intensity
 		- Hue
 		- OD280/OD315 of diluted wines
 		- Proline

    - class:
            - class_0
            - class_1
            - class_2
		
    :Summary Statistics:
    
    ============================= ==== ===== ======= =====
                                   Min   Max   Mean     SD
    ============================= ==== ===== ======= =====
    Alcohol:                      11.0  14.8    13.0   0.8
    Malic Acid:                   0.74  5.80    2.34  1.12
    Ash:                          1.36  3.23    2.36  0.27
    Alcalinity of Ash:            10.6  30.0    19.5   3.3
    Magnesium:                    70.0 162.0    99.7  14.3
    Total Phenols:                0.98  3.88    2.29  0.63
    Flavanoids:                   0.34  5.08    2.03  1.00
    Nonflavanoid Phenols:         0.13  0.66    0.36  0.12
    Proanthocyanins:              0.41  3.58    1.59  0.57
    Colour Intensity:              1.3  13.0     5.1   2.3
    Hue:                          0.48  1.71    0.96  0.23
    OD280/OD315 of diluted wines: 1.27  4.00    2.61  0.71
    Proline:                       278  1680     746   315
    ============================= ==== ===== ======= =====

    :Missing Attribute Values: None
    :Class Distribution: class_0 (59), class_1 (71), class_2 (48)
    :Creator: R.A. Fisher
    :Donor: Michael Marshall (MARSHALL%PLU@io.arc.nasa.gov)
    :Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.
https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

Original Owners: 

Forina, M. et al, PARVUS - 
An Extendible Package for Data Exploration, Classification and Correlation. 
Institute of Pharmaceutical and Food Analysis and Technologies,
Via Brigata Salerno, 16147 Genoa, Italy.

Citation:

Lichman, M. (2013). UCI Machine Learning Repository
[http://archive.ics.uci.edu/ml]. Irvine, CA: University of California,
School of Information and Computer Science. 

.. topic:: References

  (1) S. Aeberhard, D. Coomans and O. de Vel, 
  Comparison of Classifiers in High Dimensional Settings, 
  Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of  
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Technometrics). 

  The data was used with many others for comparing various 
  classifiers. The classes are separable, though only RDA 
  has achieved 100% correct classification. 
  (RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) 
  (All results using the leave-one-out technique) 

  (2) S. Aeberhard, D. Coomans and O. de Vel, 
  "THE CLASSIFICATION PERFORMANCE OF RDA" 
  Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of 
  Mathematics and Statistics, James Cook University of North Queensland. 
  (Also submitted to Journal of Chemometrics).

✅ **DO THIS:** How many features are there in this dataset? (You could count them manually but it is highly recommended that you write code to find the right number. We may be using similar datasets in the future and it is generally better to try to make code that is portable for when things change. Store the number of features in the fector as the python variable N so that you can check your answer below:

#put your answer to the above question here.

from answercheck import checkanswer

checkanswer(N,'c51ce410c124a10e0db5e4b97fc2af39');

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-5-9fe5336775ac> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(N,'c51ce410c124a10e0db5e4b97fc2af39');

NameError: name 'N' is not defined

✅ **DO THIS:** In this dataset, How many different wines were tested using these $n$ features? Again, write code to calculate the answer instead of just "hard coding" the number. Store the size in a variable named M

#put your answer to the above question here.

from answercheck import checkanswer

checkanswer(M,'8f85517967795eeef66c225f7883bdcb');

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-7-c7675f24a4a7> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(M,'8f85517967795eeef66c225f7883bdcb');

NameError: name 'M' is not defined

The following figure graphs each feature for the entire dataset.

plt.figure(figsize=(20,10))
plt.plot(sk_data.data);
plt.legend(sk_data.feature_names)

<matplotlib.legend.Legend at 0x7f8fc6ac0ef0>

Another way to look at this dataset is as a large 2D array (or matrix) which can be viewed as an image using the imshow function. In this case we choose the "Reds" colormap to keep with the wine theme.

plt.figure(figsize=(20,2))
plt.imshow(sk_data.data.T,cmap="Reds")
plt.colorbar()

<matplotlib.colorbar.Colorbar at 0x7f8fc5a2cb00>

The pandas library can be helpful for viewing data. Here we will just show the basics for turning the raw data stored as a numpy array into a pandas.DataaFrame object. Each row is a particular wine and the columns are the individual feature measurements.

import pandas

df = pandas.DataFrame(sk_data.data, columns = sk_data.feature_names)
df

Another useful pandas funciton is the describe function which gives some basic statistics for the measurements. Check to make sure these measurements match up with the statistics provided in the dataset DESCRIB

df.describe()

2. Distance Measure¶

Now that you have a feel for the type of data available in the wine dataset we need to build a measure to compare the wines. The following is a "stub" function. Modify it to return the euclidean distance between two different input features.

HINT a good solution is one that does not "Hard-code" properties of the wine datset (such as vector length), instead a good function is one that will calculate the distance between any two vectors in $R^n$ for any size of $n$.

def dist(u,v):
    d = 0
    return d

Lets test our functions on a couple of simple examples. The following are common examples for which we know the values:

dist([0,0],[0,1]) == 1

False

dist([0,0, 0],[1,0, 0]) == 1

False

dist([0,0],[3,4]) == 5

False

anyvec = [1,22,3,444,5.123,69,2229,42.0]
dist(anyvec,anyvec) == 0

True

from answercheck import checkanswer

checkanswer(dist(sk_data.data[0,:],sk_data.data[51,:]),'7a502b88ac326e0d79fe2cc8f33efd15');

Testing 0
Answer seems to be incorrect

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-17-251e579c82c3> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(dist(sk_data.data[0,:],sk_data.data[51,:]),'7a502b88ac326e0d79fe2cc8f33efd15');

~/_CMSE314_F20/CMSE314/answercheck.py in __init__(self, var, hashtag)
     23 
     24     def __init__(self, var, hashtag=None):
---> 25         checkanswer.basic(var, hashtag)
     26 
     27     def basic(var, hashtag=None):

~/_CMSE314_F20/CMSE314/answercheck.py in basic(var, hashtag)
     48             else:
     49                 print("Answer seems to be incorrect\n")
---> 50                 assert checktag == hashtag, f"Answer is incorrect {checktag}"
     51         else:
     52             raise TypeError(f"No answer hastag provided: {checktag}")

AssertionError: Answer is incorrect cfcd208495d565ef66e7dff9f98764da

Assuming the distance measure is working above, we can calculate the distance between all wines relative to each other. Notice that this graph is symmetric because the distance between Wine A and B is the same as the distance between Wine B and A. Also Notice that the diagonal for the matrix is aways zero. i.e. the distance between wine A and wine A is zero.

distmatrix= np.zeros((M,M))

def distance_matrix(A):
    for i in range(M):
        for j in range(M):
            distmatrix[i,j] = dist(A[i,:], A[j,:])
    plt.figure(figsize=(20,10))
    plt.imshow(distmatrix, cmap="Reds")    
    plt.colorbar();

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-18-8ba66a8843aa> in <module>
----> 1 distmatrix= np.zeros((M,M))
      2 
      3 def distance_matrix(A):
      4     for i in range(M):
      5         for j in range(M):

NameError: name 'M' is not defined

distance_matrix(sk_data.data)

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-19-3e6d5f36ff3b> in <module>
----> 1 distance_matrix(sk_data.data)

NameError: name 'distance_matrix' is not defined

3. Introduction to Principal Component Analysis¶

Large datasets such as the wine dataset are considered "high dimensional" when the number of measurements in their feature vector gets large. What is considered "large" depends a little on what you are trying to do. If you are trying to visualize the data, anything large is bigger than 2 or 3 dimensions (it is hard for the human brain to visualize data in more than three dimensions).

Later in the semester you will learn how to use something called "Eigenvectors" and "EigenValues" to do Principal Component Analysis (PCA) of high dimensional Data. PCA is probably the most common algorithm in a class of algorithms used for "dimensionality reduction". The purpose of these algorithms is to summaries complex, high dimensional data into a smaller set of dimensions that fit the problem you are trying to solve. In our case, we want to visualize feature values of $N$ measurements only using 2 axes.

To learn more about PCA try checking out the PCA wikipedia page.

Before we get into all of the details about how to do the PCA math using eigenvalues/eigenvectors we will just use a PCA function avaliable in the sklearn library.

The following code imports the PCA function and reduces the wine data (sk_data.data) down to it's two largest principal components. Think of these principal components as a weighted sum of the original data specifically designed to maintain the most information.

#Reduce the data down to two priciple compoents to make plotting easier.
from sklearn.decomposition import PCA
reduced_data = PCA(n_components=2).fit_transform(sk_data.data)

Now we plot the original dataset with different colors corresponding to the three wine classes which was included with the data (sk_data.target):

#Strip out the three classes of data and plot
class0 = reduced_data[sk_data.target==0,:]
class1 = reduced_data[sk_data.target==1,:]
class2 = reduced_data[sk_data.target==2,:]

plt.scatter(class0[:,0],class0[:,1])
plt.scatter(class1[:,0],class1[:,1])
plt.scatter(class2[:,0],class2[:,1])

plt.xlabel('First principal component')
plt.ylabel('Secoind principal component')

Text(0, 0.5, 'Secoind principal component')

We can now see each of the sample wines and their "relative" relationship to each other. Unfortunately, the first two principal components do not have any units so it is hard to interpret their meaning. In the next section we will use "normalization" to clean up the data and make it easier to visualize.

4. Normalizing the data¶

One problem with the above PCA is we treat each measurement in the feature vector as having the same units. This means some measurements seem to have more "weight" in the PCA analysis just because they are bigger. One way we can fix this problem is to "Normalize" all the measurements between zero (0) and one (1). This normalization step allows us to better compare the measurements.

Let us assume that the above data is stored in a matrix $data$ with each row ($i \in M$) representing a wine and each column representing a feature ($j \in N$}. We want to "normalize" each measurement to a value between zero (0) and one (1) using the following equation:

For each wine and each feature ($j \in N$): $$A_{i,j} = \frac{data_{i,j} - min_j}{max_j-min_j}$$

where $min_j$ is the minimum value of the $j$th feature and $max_j$ is the maximum value of the $j$th feature.

✅ **DO THIS:** Write a program to normalize all of the values in the sk_data.data dataset. Store the normalized values in a matrix $A$. HINT avoid writing lots of loops, libraries such as numpy, pandas and scikit-learn all have functions that may help turn a 20 line program into 3 lines of code.

#Put your answer to the above question here.

from answercheck import checkanswer

checkanswer(A,'85608294aee283f63b58cfdc8da99a7c');

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-23-45d13c83fa0e> in <module>
      1 from answercheck import checkanswer
      2 
----> 3 checkanswer(A,'85608294aee283f63b58cfdc8da99a7c');

NameError: name 'A' is not defined

plt.figure(figsize=(20,2))

plt.plot(A);

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-24-1c608064f02c> in <module>
      1 plt.figure(figsize=(20,2))
      2 
----> 3 plt.plot(A);

NameError: name 'A' is not defined

<Figure size 1440x144 with 0 Axes>

%matplotlib inline
import matplotlib.pylab as plt

plt.figure(figsize=(20,2))

plt.imshow(A.T, cmap='Reds');
plt.colorbar();

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-64a4179f5b4f> in <module>
      4 plt.figure(figsize=(20,2))
      5 
----> 6 plt.imshow(A.T, cmap='Reds');
      7 plt.colorbar();

NameError: name 'A' is not defined

<Figure size 1440x144 with 0 Axes>

✅ **Do This:** Copy and paste the code from the above PCA section and replace the sk_data.data with the normalized vector A.

# YOUR CODE HERE
raise NotImplementedError()

---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-26-15b94d1fa268> in <module>
      1 # YOUR CODE HERE
----> 2 raise NotImplementedError()

NotImplementedError:

✅ **Question:** Compare and contrast the graph generated in part 3 with the one generated in part 4. In your own words, explain why the normalized data is "better".

YOUR ANSWER HERE

Congratulations, we're done!¶

Turn in your assignment using D2L no later than 11:59pm on the day of class. See links at the end of this document for access to the class timeline for your section.

Written by Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
0	14.23	1.71	2.43	15.6	127.0	2.80	3.06	0.28	2.29	5.640000	1.04	3.92	1065.0
1	13.20	1.78	2.14	11.2	100.0	2.65	2.76	0.26	1.28	4.380000	1.05	3.40	1050.0
2	13.16	2.36	2.67	18.6	101.0	2.80	3.24	0.30	2.81	5.680000	1.03	3.17	1185.0
3	14.37	1.95	2.50	16.8	113.0	3.85	3.49	0.24	2.18	7.800000	0.86	3.45	1480.0
4	13.24	2.59	2.87	21.0	118.0	2.80	2.69	0.39	1.82	4.320000	1.04	2.93	735.0
5	14.20	1.76	2.45	15.2	112.0	3.27	3.39	0.34	1.97	6.750000	1.05	2.85	1450.0
6	14.39	1.87	2.45	14.6	96.0	2.50	2.52	0.30	1.98	5.250000	1.02	3.58	1290.0
7	14.06	2.15	2.61	17.6	121.0	2.60	2.51	0.31	1.25	5.050000	1.06	3.58	1295.0
8	14.83	1.64	2.17	14.0	97.0	2.80	2.98	0.29	1.98	5.200000	1.08	2.85	1045.0
9	13.86	1.35	2.27	16.0	98.0	2.98	3.15	0.22	1.85	7.220000	1.01	3.55	1045.0
10	14.10	2.16	2.30	18.0	105.0	2.95	3.32	0.22	2.38	5.750000	1.25	3.17	1510.0
11	14.12	1.48	2.32	16.8	95.0	2.20	2.43	0.26	1.57	5.000000	1.17	2.82	1280.0
12	13.75	1.73	2.41	16.0	89.0	2.60	2.76	0.29	1.81	5.600000	1.15	2.90	1320.0
13	14.75	1.73	2.39	11.4	91.0	3.10	3.69	0.43	2.81	5.400000	1.25	2.73	1150.0
14	14.38	1.87	2.38	12.0	102.0	3.30	3.64	0.29	2.96	7.500000	1.20	3.00	1547.0
15	13.63	1.81	2.70	17.2	112.0	2.85	2.91	0.30	1.46	7.300000	1.28	2.88	1310.0
16	14.30	1.92	2.72	20.0	120.0	2.80	3.14	0.33	1.97	6.200000	1.07	2.65	1280.0
17	13.83	1.57	2.62	20.0	115.0	2.95	3.40	0.40	1.72	6.600000	1.13	2.57	1130.0
18	14.19	1.59	2.48	16.5	108.0	3.30	3.93	0.32	1.86	8.700000	1.23	2.82	1680.0
19	13.64	3.10	2.56	15.2	116.0	2.70	3.03	0.17	1.66	5.100000	0.96	3.36	845.0
20	14.06	1.63	2.28	16.0	126.0	3.00	3.17	0.24	2.10	5.650000	1.09	3.71	780.0
21	12.93	3.80	2.65	18.6	102.0	2.41	2.41	0.25	1.98	4.500000	1.03	3.52	770.0
22	13.71	1.86	2.36	16.6	101.0	2.61	2.88	0.27	1.69	3.800000	1.11	4.00	1035.0
23	12.85	1.60	2.52	17.8	95.0	2.48	2.37	0.26	1.46	3.930000	1.09	3.63	1015.0
24	13.50	1.81	2.61	20.0	96.0	2.53	2.61	0.28	1.66	3.520000	1.12	3.82	845.0
25	13.05	2.05	3.22	25.0	124.0	2.63	2.68	0.47	1.92	3.580000	1.13	3.20	830.0
26	13.39	1.77	2.62	16.1	93.0	2.85	2.94	0.34	1.45	4.800000	0.92	3.22	1195.0
27	13.30	1.72	2.14	17.0	94.0	2.40	2.19	0.27	1.35	3.950000	1.02	2.77	1285.0
28	13.87	1.90	2.80	19.4	107.0	2.95	2.97	0.37	1.76	4.500000	1.25	3.40	915.0
29	14.02	1.68	2.21	16.0	96.0	2.65	2.33	0.26	1.98	4.700000	1.04	3.59	1035.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...
148	13.32	3.24	2.38	21.5	92.0	1.93	0.76	0.45	1.25	8.420000	0.55	1.62	650.0
149	13.08	3.90	2.36	21.5	113.0	1.41	1.39	0.34	1.14	9.400000	0.57	1.33	550.0
150	13.50	3.12	2.62	24.0	123.0	1.40	1.57	0.22	1.25	8.600000	0.59	1.30	500.0
151	12.79	2.67	2.48	22.0	112.0	1.48	1.36	0.24	1.26	10.800000	0.48	1.47	480.0
152	13.11	1.90	2.75	25.5	116.0	2.20	1.28	0.26	1.56	7.100000	0.61	1.33	425.0
153	13.23	3.30	2.28	18.5	98.0	1.80	0.83	0.61	1.87	10.520000	0.56	1.51	675.0
154	12.58	1.29	2.10	20.0	103.0	1.48	0.58	0.53	1.40	7.600000	0.58	1.55	640.0
155	13.17	5.19	2.32	22.0	93.0	1.74	0.63	0.61	1.55	7.900000	0.60	1.48	725.0
156	13.84	4.12	2.38	19.5	89.0	1.80	0.83	0.48	1.56	9.010000	0.57	1.64	480.0
157	12.45	3.03	2.64	27.0	97.0	1.90	0.58	0.63	1.14	7.500000	0.67	1.73	880.0
158	14.34	1.68	2.70	25.0	98.0	2.80	1.31	0.53	2.70	13.000000	0.57	1.96	660.0
159	13.48	1.67	2.64	22.5	89.0	2.60	1.10	0.52	2.29	11.750000	0.57	1.78	620.0
160	12.36	3.83	2.38	21.0	88.0	2.30	0.92	0.50	1.04	7.650000	0.56	1.58	520.0
161	13.69	3.26	2.54	20.0	107.0	1.83	0.56	0.50	0.80	5.880000	0.96	1.82	680.0
162	12.85	3.27	2.58	22.0	106.0	1.65	0.60	0.60	0.96	5.580000	0.87	2.11	570.0
163	12.96	3.45	2.35	18.5	106.0	1.39	0.70	0.40	0.94	5.280000	0.68	1.75	675.0
164	13.78	2.76	2.30	22.0	90.0	1.35	0.68	0.41	1.03	9.580000	0.70	1.68	615.0
165	13.73	4.36	2.26	22.5	88.0	1.28	0.47	0.52	1.15	6.620000	0.78	1.75	520.0
166	13.45	3.70	2.60	23.0	111.0	1.70	0.92	0.43	1.46	10.680000	0.85	1.56	695.0
167	12.82	3.37	2.30	19.5	88.0	1.48	0.66	0.40	0.97	10.260000	0.72	1.75	685.0
168	13.58	2.58	2.69	24.5	105.0	1.55	0.84	0.39	1.54	8.660000	0.74	1.80	750.0
169	13.40	4.60	2.86	25.0	112.0	1.98	0.96	0.27	1.11	8.500000	0.67	1.92	630.0
170	12.20	3.03	2.32	19.0	96.0	1.25	0.49	0.40	0.73	5.500000	0.66	1.83	510.0
171	12.77	2.39	2.28	19.5	86.0	1.39	0.51	0.48	0.64	9.899999	0.57	1.63	470.0
172	14.16	2.51	2.48	20.0	91.0	1.68	0.70	0.44	1.24	9.700000	0.62	1.71	660.0
173	13.71	5.65	2.45	20.5	95.0	1.68	0.61	0.52	1.06	7.700000	0.64	1.74	740.0
174	13.40	3.91	2.48	23.0	102.0	1.80	0.75	0.43	1.41	7.300000	0.70	1.56	750.0
175	13.27	4.28	2.26	20.0	120.0	1.59	0.69	0.43	1.35	10.200000	0.59	1.56	835.0
176	13.17	2.59	2.37	20.0	120.0	1.65	0.68	0.53	1.46	9.300000	0.60	1.62	840.0
177	14.13	4.10	2.74	24.5	96.0	2.05	0.76	0.56	1.35	9.200000	0.61	1.60	560.0

	alcohol	malic_acid	ash	alcalinity_of_ash	magnesium	total_phenols	flavanoids	nonflavanoid_phenols	proanthocyanins	color_intensity	hue	od280/od315_of_diluted_wines	proline
count	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000	178.000000
mean	13.000618	2.336348	2.366517	19.494944	99.741573	2.295112	2.029270	0.361854	1.590899	5.058090	0.957449	2.611685	746.893258
std	0.811827	1.117146	0.274344	3.339564	14.282484	0.625851	0.998859	0.124453	0.572359	2.318286	0.228572	0.709990	314.907474
min	11.030000	0.740000	1.360000	10.600000	70.000000	0.980000	0.340000	0.130000	0.410000	1.280000	0.480000	1.270000	278.000000
25%	12.362500	1.602500	2.210000	17.200000	88.000000	1.742500	1.205000	0.270000	1.250000	3.220000	0.782500	1.937500	500.500000
50%	13.050000	1.865000	2.360000	19.500000	98.000000	2.355000	2.135000	0.340000	1.555000	4.690000	0.965000	2.780000	673.500000
75%	13.677500	3.082500	2.557500	21.500000	107.000000	2.800000	2.875000	0.437500	1.950000	6.200000	1.120000	3.170000	985.000000
max	14.830000	5.800000	3.230000	30.000000	162.000000	3.880000	5.080000	0.660000	3.580000	13.000000	1.710000	4.000000	1680.000000