Michigan State University Data Science Capstone.
On Wednesday of this week the instructors will split everyone into temporary groups of 4-5 people. Each group will be assigned a project summary. Teams will spend about 10 minutes in your groups on Wednesday to get organized and then have an additional 30 minutes on Friday to finish up.
Your goal is to do a quick read through the project summary and get started outlining roles and responsibilities for the summary report. Your team will be expected to give a short < 5 minute group presentation.
NOTE Remember that you are expected to individually review all of the project summaries and fill out your preferences as your first milestone assignment which is due this weekend (Sunday night). The purpose of this temp review is to dig a little deeper into at least one of the project and discuss as a group what you liked and disliked about the project. See what other people are looking at to help you prioritize the projects.
Once you are assigned to a group and a project, go to the “General” channel in our courses Microsoft Team, click on “files” and select the folder for the team. The project description will be in that folder.
NOTE the above project descriptions may not be posted until the first day of class.
As soon as you get into your groups introduce yourselves and identify people for the following roles.
Everyone on the team is responsible to to review and research the proposal. Your job is to use your favorite search engines and see what you can learn immediately about the project. Pretend that this will be your project and you are trying to find the information you need to get started. This includes information such as software resources, background information, data sources, etc. Include links and descriptions of what you find. Think about the first steps and open loop questions that need to be answered.
In your groups you have 30 minutes to complete the review and get ready for the presentation. The timekeeper is responsible for the agenda and keeping your team on task. Here is a recommendation:
The note taker will be responsible for taking notes for the meeting. These notes will be shared with the rest of the class. If you are the note taker, do the following:
initial_Review.docx
. Do not use spaces in the filename. The TEAMNAME is the name of the project your team is reviewing.Your job is to make sure everyone is participating and working. Encourage discussion and help the timekeeper and note takers stay on task. Focus on making sure others are participating and not taking over the discussion yourself.
Your job is to present the 2-3 items your team picks to the entire group. You will only have a maximum of 3 minutes to present so make sure you know what you want to say. If you have time write it down and practice.
Remember, as with everything in this course, part of your grade is to read and follow instructions. If instructions are confusing make an educated guess. If you feel an incorrect guess will impact the quality of your product then ask questions of your instructors. Avoid asking questions that can be answered in the instructions.
Find the Review_template.docx
file in the teams directory and make a copy in the
Project_Review_Assignment
folder. Rename the file with the same name as the project short name and include _review.docx
at the end of the file name. Identify a scribe for the group
(someone to make and share the report). As a group you will need to type up a report that includes the following:
You are encouraged to work on this document outside of class. Your team will be given 3-5 minutes on Friday to give an overview of the project with a focus on key insights your group discovered. Identify one person to share with entire class <5 Minutes.
The team will be assessed on the clarity, organization, and professionalism of the document and presentation. Particular emphasis will be placed on how effectively the group integrates insights from their discussions to expand upon the original project description, helping peers gain a deeper understanding. Including relevant references and useful links is encouraged. The ultimate goal is to create an informative and unbiased report that supports students in selecting their final project without being influenced by personal opinions.
NOTE: You still need to individually review ALL the projects and fill out the project selection form which is due on Sunday. The teams formed today are only temporary and the project is not necessarily the one you will be assigned for the semester.
The following are some terms we will use through out the semester but you may see in some of the project descriptions. Please let the instructors know if there are other terms we are using that may need to be included in this review.
Initial Project Breakdown: Split the main projects into major components that need to be completed. Think of components as a hierarchy with multiple levels. The top level is “the entire project”. Below that are the major components (ex. the four categories outlined in the project description). What are the best way to break down the project for 4-5 teammates such that each member of the team has an equal contribution to the project? It is important to not think of a project as linear (from inputs to outputs). Instead, think about what parts will take the most time and assign members to those tasks early. With good planning you can research and build a components even if the prior steps are not complete.
Data Bibliography: For many of the projects it will be expected that the team build a data bibliography of other possible sources of publicly available data that may help answer the community partner’s questions. This is a lot like a paper bibliography but we are referencing datasets. For a few projects the Data bibliography may be a major contribution to the final results.
Data Dictionary: Also known as a “Schema”, this is a list of features in a data set and their ranges. A Data Dictionary may vary by project but is an extremely helpful resource when reviewing a new data set.
Open Loops: A component of the project with an unclear solution. An example of an open loop includes “Review data and identify sources of error”. This loop is open because the team does not know what the sources of error are and thus can’t plan how this error will be solved. A closed loop is something like “use beautiful soup to scrape website”. Although this task may not be done there is a clear path to a solution and no obvious unknowns that will inhibit progress. Early on there may be many open loops that are easy to close. Even if they are easy we still consider them to be open loops. For example, “Pick a programming language”.
MVP Milestone: This is the second major presentation required for the project (the first is the project plan and the last is the final report). The Minimum Viable Product (MVP) milestone is pivotal as we expect all teams to demonstrate an end-to-end solution for their project (i.e. no open loops).
Success_Metrics All projects are required to have realistic and measurable outcomes. For example, if the project uses machine learning, what is a reasonable target recognition rate. It is typically a bad idea to just guess and throw out a number like 90%. How would the team know if that number is even possible given the data? Instead it is better to establish a simple baseline metric (using a simple algorithm) and then measuring progress as how far the team can move from the baseline metric.
Next Steps: A task which that can be conducted immediately with no known blocks or prerequisites. The best next tasks have clear criteria for completion. For example, a poor next step would be something like “write the project proposal”. A better next step may be “Write the outline for the project proposal” An even better next step would be “On Thursday spend 20 minutes writing an outline for the project proposal and then share with the team”. What is nice about the last next step is that it has a clear goal, clear finish and how to move forward.
Anticipated Challenges Often related to an Open Loop, an Anticipated Challenge is something about the project that could cause it to be delayed or fail. Each anticipated challenge should include suggestions for what to do avoid or get around the problem. For example, “What if there is not enough data to train the Neural Network, if so, maybe research pre-trained models and transfer learning?” Another example would be “It could take a long time to determine the best visualization for this project, make sure preliminary results of the model are obtained early in the project (within the first four weeks) so there is enough time to iterate over different visualizations.” Another really good one is “what do we do if someone gets sick or drops the class.”
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.