CMSE 495

Logo

Michigan State University Data Science Capstone.

View the Project on GitHub msu-cmse-courses/cmse495-SS24

Project Introductions and First Contact

Drawing of an Alien as an instance of first contact

Agenda (80 Minutes)

Welcome Serena Lotreck as a guest facilitator

1. First Contact

Link to First Contact slides

2. Project Review

Today we will split everyone into temporary groups of 4-5 people. Each group will be assigned a project summary. You will only spend about 10 minutes in your groups. Your goal is to do a quick read through the project summary and get started outlining roles and responsibilities for the summary report. You will get another 30 minutes on Friday to finish up your review and give a short 5 minute group presentation.

NOTE Remember that you are expected to individually review all of the project summaries and fill out your preferences as your first milestone assignment which is due this weekend (Sunday night). The purpose of this review is to dig a little deeper into at least one of the project and discuss as a group what you liked and disliked about the project. See what other people are looking at to help you prioritize the projects.

Project Description Files

Once you get into your groups, go to the “General” channel in our courses Microsoft Team, click on “files” and there will be a folder of project descriptions. Select your teams file and read through it.

Link to Course Microsoft Team

NOTE the above project descriptions may not be posted until the first day of class.

Project Review Report

Note, as with anything in this course, part of your grade is to read and follow instructions. If instructions are confusing ask questions of your instructors. Try to avoid asking questions that can be answered in the instructions.

Find the Review_template.docx file in the teams directory and make a copy in the Project_Review_Assignment folder. Rename the file with the same name as the project but with the _review.docx at the end of the file name. Identify a scribe for the group (someone to make and share the report). As a group you will need to type up a report that includes the following:

You are encouraged to work on this document outside of class. Your team will be given 3-5 minutes on Friday to give an overview of the project to the entire class.

NOTE: You still need to individually review ALL the projects and fill out the project selection form which is due on Sunday. The teams formed today are only temporary and the project is not necessarily the one you will be assigned for the semester.

Project Review Terms

The following are some terms we will use through out the semester but you may see in some of the project descriptions. Please let the instructors know if there are other terms we are using that may need to be included in this review.

Initial Project Breakdown: Split the main projects into major components that need to be completed. Think of components as a hierarchy with multiple levels. The top level is “the entire project”. Below that are the major components (ex. the four categories outlined in the project description). What are the best way to break down the project for 4-5 teammates such that each member of the team has an equal contribution to the project? It is important to not think of a project as linear (from inputs to outputs). Instead, think about what parts will take the most time and assign members to those tasks early. With good planning you can research and build a components even if the prior steps are not complete.

Data Bibliography:
Students will be expected to build a data bibliography of other possible sources of publicly available data that may help answer these questions. This is a lot like a paper bibliography but we are referencing datasets.

Data Dictionary: Also known as a “Schema”, this is a list of features in a data set and their ranges. A Data Dictionary may vary by project but is an extremely helpful resource when reviewing a new data set.

Open Loops: A component of the project with an unclear solution. An example of an open loop includes “Review data and identify sources of error”. This loop is open because the team does not know what the sources of error are and thus can’t plan how this error will be solved. A closed loop is something like “use beautiful soup to scrape website”. Although this task may not be done there is a clear path to a solution and no obvious unknowns that will inhibit progress. Early on there may be many open loops that are easy to close. Even if they are easy we still consider them to be open loops. For example, “Pick a programming language”.

MVP Milestone: This is the second major presentation required for the project (the first is the project plan and the last is the final report). The Minimum Viable Product (MVP) milestone is pivotal as we expect all teams to demonstrate an end-to-end solution for their project (i.e. no open loops).

Success_Metrics Realistic and measurable outcomes of the project. For example, if the project uses machine learning, what is a reasonable target recognition rate. You can’t just guess and throw out a number like 90%. How would the team know if that number is even possible given the data? Instead it is better to establish a simple baseline metric (using a simple algorithm) and then measuring progress as how far the team can move from the baseline metric.

Next Steps: A task which that can be conducted right now with no known blocks or prerequisites. The best next tasks have clear criteria for completion. For example, a poor next step would be something like “write the project proposal”. A better next step may be “Write the outline for the project proposal” An even better next step would be “Spend 20 minutes writing an outline for the project proposal and then share with the team”. What is nice about the last next step is that it has a clear goal, clear finish and how to move forward.

Anticipated Challenges Often related to an Open Loop, an Anticipated Challenge is something about the project that could cause it to fail. Each anticipated challenge should include suggestions for what to do avoid or get around the problem. For example, “What if there is not enough data to train the Neural Network, if so, maybe research pre-trained models and transfer learning?” Another example would be “It could take a long time to determine the best visualization for this project, make sure preliminary results of the model are obtained early in the project (within the first four weeks) so there is enough time to iterate over different visualizations.” Another really good one is “what do we do if someone gets sick or drops the class.”

3. First Contact D2L Quiz

Sometime before Friday log onto D2L and complete the quick quiz in the content section. Don’t stress about these quizzes they are not worth much. They should take less than 5 minutes and I just ask you put in your best effort.

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.