Michigan State University Data Science Capstone.
The capstone project was to determine if “justice is blind” by looking at legislation data from judges in Michigan. Students were given links to a website with a database and were able to successfully build a web scraper. Unfortunately, the data did not include key information (race and gender) needed to answer any of the research questions asked by the community partner. Some students felt the project “failed,” were frustrated, didn’t know how to proceed and lost motivation.
The team’s community partner was a very large corporation, and the students had the opportunity mid-semester to give a presentation to the CEO. This was a big deal, and the community partner contact asked the students to give a “practice” presentation a month before the CEO visit. The students were already stressed because this course is a lot of work, and neither of these presentations were required as part of the course milestones. The students figured that these presentations wouldn’t count toward their actual grade, and delayed working on the practice presentation until the last minute. The practice went poorly, and the community partner contact was so unhappy with the students’ work that they cancelled the CEO visit. The students felt relieved that they didn’t have to talk to the CEO, and continued on with the project. The students never told the instructor about the presentation requests, so the instructor was blindsided when the community partner contact complained about the students’ poor performance.
A student team was paired with a community partner for a semester-long consulting project. The partner was enthusiastic and met with the team every week, which initially seemed like a great sign. However, the meetings quickly turned into weekly task assignments given by the partner to the students. Instead of collaborating on a semester-long plan, the partner gave the students new instructions each week, often changing direction or adding new tasks without considering the team’s overall timeline or course deliverables.
The students tried to balance the partner’s weekly requests with the structured milestones required by the course. This led to long hours, duplicated work, and growing frustration. The team felt like they were constantly reacting instead of planning. They eventually reached out to the instructor for help.
After the instructor intervened, the situation improved slightly, but the partner began to express dissatisfaction with the students, accusing them of being unmotivated and unprofessional. The relationship became strained, and the team struggled to maintain a positive connection with the community partner while still meeting course expectations.
In the end, the team delivered a solid final product, but the experience left them feeling discouraged and unsure how to handle similar dynamics in the future.
Two weeks before the final project was due, the instructor received a long email from a student complaining about another team member who they felt was not contributing sufficiently to the project. The email was extremely detailed and documented various interactions that raised concerns within the team throughout the semester. However, this email was the first time that anyone told the instructor there were problems. The student wanted the team member to get a lower grade than everyone else, and wanted the instructor to intervene and tell the student that they were not meeting expectations.
The scope of this capstone project was to find freely available data to answer a set of research questions about “Smart Cities.” The problem is that there were too many data sources (for example, WIFI networks, types and locations of businesses, city websites etc.) and none of the data sources was organized with specific information that would answer any of the research questions directly. It would require a lot of work to clean and organize the data, by hand, representing many hours of manual labor. and even after the work was done there was no clear indication that the data would be sufficient to answer their research questions. Students became overwhelmed with the options and had no clear path forward. Unfortunately, the project community partners where not data experts so didn’t have the ability to guild the team who lost a lot of time continuing their search for the “perfect” dataset.
A capstone team is assigned a classification project (i.e., a project where they are given labeled data and need to use machine learning to train a classifier). After a discussion with the community partner there was consensus that at least 85% accuracy seemed like a reasonable accuracy to do what they needed.
The team got into trouble when they kept trying different algorithms with different hyperparameters and couldn’t get their accuracy above 72%. The team took a long time telling the community partner there was a problem and when they did the community partner was very upset at such a terrible result and told the team to keep trying.
A community partner wanted to analyze high dimensional, 3D data of material and build a regression model that connected the data to mechanical measurements. This project included a paper (from the 1990’s) that did a similar study that required quite a bit of expensive manual measurements (i.e., no 3D data) of materials and used simple linear correlation.
The community partner wanted to take advantage of the automatic method of gathering data using 3D scanning technology and then use deep learning methods to make an automated model similar to the 1990 paper. Unfortunately, we didn’t know until late in the semester that they only had seven (7) 3D models with 7 labeled points.
Members of a team came to the instructors asking for help with a fellow teammate. The teammate was given the task of building a front-end GUI for their project. The problem wasn’t that the teammate was slacking in their role. In fact, you could say that the opposite was true, the teammate was working hard on the GUI and the design of the result looked impressive. The problem the teammate was not sharing everything they needed to get the code working on their own machines and just wanted to work alone. The team was worried the GUI was getting too complex and the teammate was not looping any of the other teammates into the design. Because they didn’t understand how the GUI worked, they didn’t feel confident that their parts of the project (the back end) were going to fit in or even work with the front end. The teammate didn’t seem too concerned and just wanted to work independently from the rest of the team. The team decided to just let the teammate work alone and behind their back built a second GUI backup. In the end they ended up submitting both solutions that really didn’t work together.
A student team was excited to begin work on a data-driven project with a community partner. The project involved analyzing proprietary customer behavior data to help the partner improve their service delivery. However, early in the semester, the team learned that the university and the community partner had not yet finalized a Non-Disclosure Agreement (NDA).
The NDA was necessary before the partner could legally share any data. Weeks passed as legal teams on both sides negotiated terms. The students grew increasingly frustrated. Without the data, they felt stuck and unsure how to proceed. They spent time waiting, checking in with the partner, and worrying about falling behind. Eventually, the NDA was signed—but by then, the semester was halfway over.
Despite the delay, the team managed to deliver a final report, but it lacked depth and polish. In hindsight, they realized there were many things they could have done earlier to prepare for the data and make better use of their time.
A student team was tasked with building a classification model to support a community partner’s decision-making process. Early in the project, the team realized that there were many possible machine learning models to choose from. Uncertain about which model would perform best, they made a strategic decision: each team member would independently explore a different model. This approach encouraged exploration and individual initiative. However, as the semester progressed, the team struggled to maintain cohesion. Each student treated their model as a standalone project, using different preprocessing techniques, evaluation metrics, and reporting styles. By the time of the final presentation, the team had five separate solutions to the same problem, but no unified comparison or synthesis.
The final report reflected this fragmentation. Rather than presenting a cohesive narrative or a comparative analysis of model performance, the report consisted of five loosely connected sections. The team had not agreed on common evaluation criteria, making it impossible to determine which model was most effective. The community partner was left without clear guidance, and the team missed an opportunity to demonstrate collaborative data science practices.
A student team was assigned a predictive modeling project for a community partner. While all members had some Python experience, their familiarity with Git and machine learning varied significantly. One student had prior experience with version control and model development, while others were still learning how to use Git effectively and were new to concepts like hyperparameter tuning and model evaluation.
The experienced student took on most of the technical work, including managing the Git repository and building the core model. Other team members contributed to documentation and presentation design but felt disconnected from the technical process. The team didn’t establish shared workflows or take time to build common understanding, which led to uneven contributions and limited collaboration.
During the final presentation, only one student could explain the modeling pipeline and Git workflow. The others struggled to engage with technical questions, and the final report reflected a lack of shared ownership. The project met its goals, but the team missed an opportunity for inclusive learning and skill development. Discussion Questions
A student team was excited to use a cutting-edge large language model (LLM) for their community partner project. The model promised advanced capabilities for text classification and summarization, and two students strongly advocated for it, citing its popularity and recent breakthroughs. Since the tool required a paid license, the team submitted a short proposal to the instructor, which was approved in about a week.
Once they gained access, the team began integrating the model into their workflow. However, after a few weeks of experimentation, they discovered that the model’s API was more limited than expected. It didn’t support key features they had assumed were available, and adapting it to their domain-specific dataset proved difficult. The team had invested heavily in this tool and didn’t have a backup plan.
As the semester progressed, they struggled to pivot. The final presentation focused more on the challenges they faced than on actionable results. The community partner appreciated their effort but was left without a usable solution. The team reflected that their decision had been driven more by excitement than by careful evaluation. Discussion Questions
A student team was assigned a semester-long data science project with a community partner. The team was on top of the course milestones, completing some of them early and even taking breaks to celebrate their progress. They had strong communication, divided assigned tasks fairly, and supported each other throughout the semester.
However, the team treated the course milestones as their primary roadmap and didn’t develop their own internal timeline. When they reached the Minimum Viable Product (MVP) milestone, their submission was extremely weak and consisted more of a rough sketch than a usable prototype. They hadn’t realized how much work would be needed after the MVP to refine, validate, and prepare the final deliverables.
As the semester neared its end, they found themselves short on time. Despite their strong collaboration and steady progress, the final product felt rushed. The community partner received a working solution, but the team knew it could have been more impactful with better planning. They had done everything “right” on paper, but hadn’t taken full ownership of the project timeline.
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.