Day 12 In-Class: Data Contextualization and Wrangling#
✅ Put your name here
#✅ Put your group member names here
#Learning Goal:#
Search for, locate, and contextualize data sets from the internet and develop troubleshooting methods for loading data sets into Jupyter notebook.
Practices:#
Articulate the context for a data set
Experiment with some of the ways you can load data into a Jupyter notebook
Identify the arguments needed for different ways to load data
Practice loading a data file in need of wrangling
Part 1: Pre-class Review/Discussion (15 minutes)#
✅ Questions#
In the pre-class, you explored two main ideas: data contextualization and data wrangling.
As a group, discuss and answer the following questions about the Data Feminism chapter (and record your notes below):
Summarize the main ideas of data contextualization and data wrangling.
What visualizations stood out to you and why?
Thinking about your personal ethics as someone developing data analysis skills, what ideas would you want to carry forward from this reading into your approach to data analysis and visualization?
What did you discover (or not discover) when you went to find the context for your data sets in the pre-class?
✎ Summarize your discussion here.
✅ Task#
After you have talked through data contextualization from the pre-class, discuss as a group the tools you needed to load your data into your notebooks in the pre-class and record a list of the tools/arguments below. Include specific examples from each of your group members’ experiences.
✎ Put your tools here!
Part 2: Practice with Loading Data#
✅ Question#
Below are the links to three different data sets. Work together as a group to:
Obtain the data.
Load the data into your notebook.
Make at least one change to make the data more usable.
Links:
As you work, take notes on what tools/advice you used from Part 1 (you’ll want it to answer the discussion questions)!
# put your code here
✅ Question#
In the cell below, talk about the strategies you used and describe the challenges you encountered.
✎ Summarize your challenges here!
Part 3: Finding Data to Answer an Exploratory Question#
✅ Task#
Below are several questions that require data (and your expert Python skills) to answer them. Your task is to choose one of the following questions and use internet resources to find a dataset that could answer the question. You can use the sites from the pre-class to get started, but you should also try a broader search as well.
Q1: Which New York City neighborhoods have seen the largest changes in death due to Air Pollution?
Q2: For a single US state, what is the distribution of Powerball lottery numbers over the last ~10 years?
Q3: How has National Park Usage/Visitation changed over time?
Q4: In the 2015 Major League Baseball season, how many games did each team win/lose?
Q5: How many exoplanets are discovered each year (since the first discovery in 1992)?
✎ Write the exploratory question you chose here!
Part 4: Putting it All Together#
✅ Task#
Now that you have some practice articulating data context and finding data, it’s time to combine your skills (much like you will be asked to in the context of your semester projects!)
Obtain a dataset that will (hopefully) answer your group’s chosen exploratory question.
Then, do your best to answer the data contextualization questions (as you did in the pre-class):
Who collected/generated the data?
How was the data collected/generated?
Who/what is included in the data?
Who/what is not included in the data?
What are the limitations or biases of the data?
✎ Write your answers to the questions above here and include the link to your dataset
✅ Task#
Now, load the dataset into your notebook in the cell(s) below. If you need to clean up your data to be usable, put those steps here! Remember that you can also open the data file(s) in Jupyter if you are having trouble with loading/cleaning.
Make sure everyone in your group is able to get the data loaded and cleaned
# put your code here!
✅ Task#
Looking at your dataset, answer the following questions:
What columns do you think will be most helpful for answering your question?
What do those columns mean/stand for?
Why do you think they might answer your question?
✎ Write your answers here
✅ Task#
Now that your data is loaded and cleaned, you are going to make a plot (e.g. scatter plot, line plot, histogram, bar plot, etc.).
# put your code here
✅ Reflection#
Answer the following questions about your plot:
What is your plot showing?
What part of your exploratory question does it answer?
What information might be missing or incomplete?
How does the context of the data you articulated affect your interpretation of the plot?
✎ Write your data answers here!
Part 5: Sharing your Work#
As a group, prepare ~3 slides showing the steps you took to answer your research question. You should have:
1 slide on your research question and data context
1 slide on the steps you took to make your data usable
1 slide with your completed plot(s).
Make sure to explicitly discuss what choices you made in your data cleaning and plot making process and how this affects the representation of the data and the associated claims you can make (e.g. what is your data showing? Who/what is missing from the data? What information does the plot not capture?). You will be asked to do this on your final project, so this is great practice! When you are making slides, also consider what your classmates might find helpful if they encountered similar data.
Each group member is responsible for presenting some component of your group’s process to the class. As you listen to other students, you are responsible for taking notes on the things that each group needed to do to get their data usable!
✎ Put your notes here!
Acknowledgements: This assignment originally designed by Dr. Rachel L.S. Frisbie as part of the MSU STEM Teaching and Learning Fellowship and updated for broad implementation with CMSE Graduate Students Emily Bolger and Rachel Roca.
Assignment wrapup#
Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credit for the assignment!
from IPython.display import HTML
HTML(
"""
<iframe
src="https://cmse.msu.edu/cmse201-ic-survey"
width="800px"
height="600px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
Congratulations, you’re done!#
Submit this assignment by uploading your notebook to the course Desire2Learn web page. Go to the “In-Class Assignments” folder, find the appropriate submission link, and upload everything there. Make sure your name is on it!
Copyright © 2025, Department of Computational Mathematics, Science and Engineering at Michigan State University, All rights reserved.