Michigan State University Data Science Capstone.
In this project your team will produce a clear set of instructions to easily install and demo (test) the required software needed for your project. By following these directions your instructor and community partner should be able to install everything that is necessary to get your experiments running.
You will be expected to share the installation instructions with your classmates. At some point you will need to make sure there is a way to install and test the instructions without violating your NDA. One way to do this is to provide example data that does not come from the community partners. If there is a problem please notify your instructors now.
The location of the installation instructions will vary based on the language and tools being used. For most project this will mean including an INSTALL.md
file in your main git directory and maybe an environments.yml
or requirements.txt
file. In any case the instructions should be short, clear and have a minimum number of steps. Make sure you include the following:
Since data is typically NOT included in a git repository (except for small example files) the instructions should clearly be written such that both the community partners and the instructors know where to locate the data and where to put the data relative to the code. Use relative paths and not absolute paths to ensure portability of the instructions.
The instructions should include a list of all software/packages needed to reproduce the results. These should include basic instructions for downloading and installing.
If using python or R it is highly recommended that the team includes instructions for creating a conda environment so that classmates, community partners and instructors are easily able to run the code. This means they need to temporarily set up their computer with the same settings as your system.
Ideally you also want include a environment.yml
with your project so that other students and instructors can use the following commands to set up the environment:
conda env create --prefix ./envs --file environment.yml
conda activate ./envs
The environment.yml
file can be created to “copy” your current environment by running the following command:
conda env export --from-history > environment.yml
Note, try to make your environment.yml
file the minimum packages you will need to run your project. Do not create an environment.yml
file on your entire base anaconda environment as this will just install a lot of stuff that is not needed. A good way to figure out your base install is to create a new environment and add packages one-by-one until your code works. For example:
conda create --prefix ./envs pip jupyter pytest pylint pdoc3
conda activate ./envs
Now just use “conda install” and “pip install” to add all of your project dependencies until it starts working. For example may of you will want to include one or all of the following:
conda install numpy
conda install matplotlib
conda install scipy
conda install jupyter
When you have it working run the conda env export --from-history
command from above. To deactivate and delete your environment just run:
conda deactivate
rm -rf ./envs
Test your install instructions on more than one computer. Try to make them as robust as possible so that the instructor does not need to figure out what is missing.
NOTE FOR INTERNATIONAL STUDENTS with English as a second language. If you are running on Windows and your computer’s default language is not English you may have trouble with UTF8 errors and the
environment.yml
file. Running the following line on a windows command prompt in your git repository directory may fix the problem.
conda env export --from-history | Set-Content -Encoding utf8 environment.yml
Finally, include example code (jupyter notebooks or similar interface preferred) in an obvious location. Have the example(s) describe and show how to use the software, preview your end-to-end solution and show some preliminary results.
The example should work “out-of-the-box” with no special commands once your core software has been installed. Make sure this software is well documented. If your team is using Jupyter notebooks make sure you use proper syntax.
This demo doesn’t need to include finished details of the data/results but should provide a preview of all of the steps you are expecting in your end project for instructors to make sure the install worked and that everything will be ready for the community partners.
Make sure it is clear where your instructions are located. The INSTALL.md
file is a good place to start but consider adding a line in your README.md
file linking to the INSTALL.md
file.
An ideal project will have a at most one or two commends needed to install and run everything. Your instructor really prefers conda environments but teams can also use a pip install command (or something similar in other languages) with a requirements.txt
file. Here might be a typical set of instructions:
git clone <<PROJECT URL>>
/path/
teams folder into the empty “data” directory inside the cloned repository.conda env create --prefix ./envs --file environment.yml
conda activate ./envs
jupyter notebook
demo.ipynb
file and continue the instructions from there.My sure your instructions are tailored to your audience. Some projects may need multiple sets of instructions i.e. ones for the instructor and ones for the community partners. More is often better as long as they are clear and easy to understand.
Add your instruction files to your team git repository on or before the deadline.
Instructor will clone (or do a pull) to get the most recent copy of your repository. They will conduct a code review starting with the README file and then attempt to follow the INSTALL instruct. Points will be taken off for unclear steps. Instructor will provide feedback to help you achieve the balance between incomplete and overly detailed.
Review the feedback from your instructors and make necessary changes to the instructions based on the feedback.
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.