CMSE 495

Logo

Michigan State University Data Science Capstone.

View the Project on GitHub msu-cmse-courses/cmse495-FS25

Project Introductions and First Contact

Drawing of an Alien as an instance of first contact

Agenda (70 Minutes)

1. Project Review

Today we will split everyone into temporary groups of 4-5 people. Each group will be assigned a project summary and expected to write a short report and give a summary this Friday. Please see the following link for more information about the assignment:

NOTE: You still need to individually review ALL the projects and fill out the project selection form which is due on Sunday. The teams formed today are only temporary and the project is not necessarily the one you will be assigned for the semester.

Project Review Terms

The following are some terms we will use through out the semester but you may see in some of the project descriptions. Please let the instructors know if there are other terms we are using that may need to be included in this review.

Initial Project Breakdown: Split the main projects into major components that need to be completed. Think of components as a hierarchy with multiple levels. The top level is “the entire project”. Below that are the major components (ex. the four categories outlined in the project description). What are the best way to break down the project for 4-5 teammates such that each member of the team has an equal contribution to the project? It is important to not think of a project as linear (from inputs to outputs). Instead, think about what parts will take the most time and assign members to those tasks early. With good planning you can research and build a components even if the prior steps are not complete.

Data Bibliography: For many of the projects it will be expected that the team build a data bibliography of other possible sources of publicly available data that may help answer the community partner’s questions. This is a lot like a paper bibliography but we are referencing datasets. For a few projects the Data bibliography may be a major contribution to the final results.

Data Dictionary: Also known as a “Schema”, this is a list of features in a data set and their ranges. A Data Dictionary may vary by project but is an extremely helpful resource when reviewing a new data set.

Open Loops: A component of the project with an unclear solution. An example of an open loop includes “Review data and identify sources of error”. This loop is open because the team does not know what the sources of error are and thus can’t plan how this error will be solved. A closed loop is something like “use beautiful soup to scrape website”. Although this task may not be done there is a clear path to a solution and no obvious unknowns that will inhibit progress. Early on there may be many open loops that are easy to close. Even if they are easy we still consider them to be open loops. For example, “Pick a programming language”.

MVP Milestone: This is the second major presentation required for the project (the first is the project plan and the last is the final report). The Minimum Viable Product (MVP) milestone is pivotal as we expect all teams to demonstrate an end-to-end solution for their project (i.e. no open loops).

Success_Metrics All projects are required to have realistic and measurable outcomes. For example, if the project uses machine learning, what is a reasonable target recognition rate. It is typically a bad idea to just guess and throw out a number like 90%. How would the team know if that number is even possible given the data? Instead it is better to establish a simple baseline metric (using a simple algorithm) and then measuring progress as how far the team can move from the baseline metric.

Next Steps: A task which that can be conducted immediately with no known blocks or prerequisites. The best next tasks have clear criteria for completion. For example, a poor next step would be something like “write the project proposal”. A better next step may be “Write the outline for the project proposal” An even better next step would be “On Thursday spend 20 minutes writing an outline for the project proposal and then share with the team”. What is nice about the last next step is that it has a clear goal, clear finish and how to move forward.

Anticipated Challenges Often related to an Open Loop, an Anticipated Challenge is something about the project that could cause it to be delayed or fail. Each anticipated challenge should include suggestions for what to do avoid or get around the problem. For example, “What if there is not enough data to train the Neural Network, if so, maybe research pre-trained models and transfer learning?” Another example would be “It could take a long time to determine the best visualization for this project, make sure preliminary results of the model are obtained early in the project (within the first four weeks) so there is enough time to iterate over different visualizations.” Another really good one is “what do we do if someone gets sick or drops the class.”

2. First Contact

Link to First Contact slides

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.