CMSE 491 Final Project Template#

INSTRUCTIONS: This is a template to help organize your project. All projects should include the 5 major sections below (you do not need to use this template file). If you use this file, complete your work below and remove content in parentheses. Also, remove this current cell.

✅ MY_NAME#

✅ Date#

INSTRUCTIONS FOR RUNNING THE NOTEBOOK#

This section should describe how to run this notebook and provide links/instructions to additional files. For example.

  • Example 1: The dataset used in this project is freely available at https://this_website.com. The name of the file to download is file_name.csv and can be found by clicking a button on the top right corner. The files should be downloaded in the same folder as this notebook and be renamed as dataset_no1.csv –> test_data.csv, dataset_no2.csv –> test_data.csv.

  • Example 2: The dataset used in this project is too big to be uploaded on D2L and it is not available on any website because it consist of data that has been collected during my research project. However, here is a description of the dataset ….

  • Example 3: This notebook is used for analyzing data from Molecular Dynamics simulations. Typically MD simulations produce data of the order of few GB which cannot be uploaded to D2L or stored on websites. The data produced by MD simulations are stored in files named [simulation_parameters]_[timestep_number].xyz which contain the cartesian position, velocities, and acceleration of each particle. An example is given below

    # Particle Name, Pos X, Pos Y, Pos Z, Vel X, Vel Y, Vel Z, Acc X, Acc y, Acc Z,
    H, 0.0, 0.1, 0.2, 1.0, 2.0, -0.9, 0.0, 0.0, 0.0,  
    ....
    

Each line corresponds to a single particle. There are 5000 particles in the simulation.

  • Example 4: This notebook requires the installation of additional python packages. The packages can be installed via pip install [package_name] or via conda install -c [channel] [package_name]. Please install version 0.1.0 as newer version do not work for the code below.

  • Example 5: This notebook requires the installation of additional python packages and the creation of a virtual environment to avoid conflicts. The instructions for installation are given in this link

  • Example 6: This notebook uses functions and classes that have been defined in the file helper_module.py. The file is provided in D2L and should be dowloaded and saved in the same directory (folder) as this notebook.

  • Example 7: Please update seaborn to version 0.13.0 to avoid warnings in this notebook. This can be done via pip install -U seaborn or pip install seaborn=0.13.0

  • Example 8: Instructions for running this notebook are given in the file README uploaded on D2L. If you don’t follow it, I deny any responsibility if your computer gets hacked by anonymous.

You get the idea.

IMPORTANT NOTE: You should assume that the only packages installed on the instructor’s computer are the ones we used in class (numpy, seaborn, pandas, matplotlib, scipy,sklearn, tensorflow) and python’s default packages (e.g. math, random, os, PIL etc.). If at any point when working on your project you had to install a package, then you should have a description similar to example 4. This means that if you used pytorch or any other (relatively) common package (e.g. bokeh, networkx) you should have a description similar to examples 4-7.

PROJECT TITLE HERE#

Background and Motivation#

(Provide context for the problem)

Goal#

(Clearly state the question(s) you set out to answer.)

Exploratory Data Analysis#

(Describe your data and make meaningful plots here)

Methodology#

(How did you go about answering your question(s)? Most of your code will be contained in this section. Here is where you can subdivide with Hyperparameter tuning, cross-validation, feature engineering, baseline.)

ML Model.#

Make a section for each ML model you used. example

Linear Regresssion#

Support Vector Machines#

Convolutional Neural Network#

Results#

(How do your models compare? What did you find when you carried out your methods? Some of your code related to presenting results/figures/data may be replicated from the methods section or may only be present in this section. All of the plots that you plan on using for your presentation should be present in this section)

Discussion and Conclusion#

(What did you learn from your results? What obstacles did you run into? What would you do differently next time? Clearly provide quantitative answers to your question(s)? At least one of your questions should be answered with numbers. That is, it is not sufficient to answer “yes” or “no”, but rather to say something quantitative such as variable 1 increased roughly 10% for every 1 year increase in variable 2.)

References#

(List the source(s) for any data and/or literature cited in your project. Ideally, this should be formatted using a formal citation format (MLA or APA or other, your choice!). Multiple free online citation generators are available such as http://www.easybib.com/style. Important: if you use any code that you find on the internet for your project you must cite it or you risk losing most/all of the points for you project.)