This is the webpage for CMSE495 Data Science Capstone Course. These materials are provided as an Open Educational Resource (OER). Instructors interested in using these classroom resources should reach out to Dirk Colbry (colbrydi@msu.edu) who can provide all the materials and instructor notes.
Today we will split everyone into temporary groups of 4-5 people. Each group will be assigned a project summary. Today you will only spend about 10 minutes in your groups. Your goal today is to do a quick read through the project summary and get started outlining roles and responsibilities for the summary report. You will get another 30 minutes on Friday to finish up your review and give a short 5 minute group presentation.
NOTE Remember that you are expected to individually review all of the project summaries and fill out your preferences as your first milestone assignment which is due this weekend (Sunday night). The purpose of this review is to dig a little deeper into at least one of the project and discuss as a group what you liked and disliked about the project.
Once you get into your groups, go to the “General” channel in our courses Microsoft Team, click on “files” and there will be a folder of project descriptions. Select your teams file and read through it.
NOTE the above project descriptions may not be posted until the first day of class.
Create a new Word document in the project descriptions folder with the same name as the project but with the “_review.docx” at the end of the file name. Identify a scribe for the group (someone to make and share the report). As a group you will need to type up a report that includes the following:
Initial Project Breakdown: Split the main projects into major components that need to be completed. Think of components as a hierarchy with multiple levels. The top level is “the entire project”. Below that are the major components (ex. the four categories outlined in the project description). What are the best way to break down the project for 4-5 teammates such that each member of the team has an equal contribution to the project? It is important to not think of a project as linear (from inputs to outputs). Instead, think about what parts will take the most time and assign members to those tasks early. With good planning you can research and build a software components even if the prior steps are not complete.
Open Loops: A component of the project with an unclear solution. An example of an open loop includes “Review data and identify sources of error”. This loop is open because the team does not know what the sources of error are and thus can’t plan how this error will be solved. A closed loop is something like “use beautiful soup to scrape website”. Although this task may not be done there is a clear path to a solution and no obvious unknowns that will inhibit progress. Early on there may be many open loops that are easy to close. They are still open loops for example, “Pick a programming language”.
Success_Metrics Realistic and measurable outcomes of the project. For example, if the project uses machine learning, what is a reasonable target recognition rate. You can’t just guess and throw out a number like 90%. How would the team know if that number is even possible given the data? Instead it is better to establish a simple baseline metric (using a simple algorithm) and then measuring progress as how far the team can move from the baseline metric.
Next Steps: A task which that can be conducted right now with no known blocks or prerequisites. The best next tasks have clear criteria for completion. For example, a poor next step would be something like “write the project proposal”. A better next step may be “Write the outline for the project proposal” An even better next step would be “Spend 20 minutes writing an outline for the project proposal and then share with the team”. What is nice about the last next step is that it has a clear goal, clear finish and how to move forward.
Anticipated Challenges Often related to an Open Loop an Anticipated Challenge is something about the project that could cause it to fail. Each anticipated challenge should include suggestions for what to do avoid or get around the problem. For example, “What if there is not enough data to train the Neural Network, if so, maybe research pre-trained models and transfer learning?” Another example would be “It could take a long time to determine the best visualization for this project, make sure preliminary results of the model are obtained early in the project (within the first four weeks) so there is enough time to iterate over different visualizations.”
You are encouraged to work on this document outside of class. Your team will be given 3-5 minutes on Friday to give an overview of the project to the entire class.
NOTE: You still need to individually review ALL the projects and fill out the project selection form which is due on Sunday. The teams formed today are only temporary and the project is not necessarily the one you will be assigned for the semester.
Sometime before Friday log onto D2L and complete the quick quiz in the content section. Don’t stress about these quizzes they are not worth much. I just ask you put in your best effort.
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.