Day 12 Pre-class: Introduction to Data Wrangling#
✅ Put your name here
Learning Objective:#
Search for, locate, and contextualize data sets from the internet and develop troubleshooting methods for loading data sets into Jupyter notebook.
In this assignment you will:
Discuss the impact of data context
Experiment with some of the ways you can load data into a Jupyter notebook
Identify the arguments needed for different ways to load data
Practice loading a data file in need of wrangling
This assignment is due by 11:59 p.m. the day before class, and should be uploaded into the appropriate “Pre-Class Assignments” submission folder on D2L. Submission instructions can be found at the end of the notebook.
Part 1: Contexting our Discussion#
Before we dive deeper into where to find data and how to get it in a form that is usable in your notebooks, we first want to build some context for why data sources matter.
Read Chapter 6. The Numbers Don’t Speak for Themselves of Data Feminism located HERE and answer the following questions in the cell below:
What are some ways we can identify the complexity of what data sets actually represent?
Go back and look at Figures 6.6 and 6.7. Tell us about your interpretation of that sequence of plots.
Identify some places in CMSE 201 so far that data context was or would be impactful.
Thinking about your personal ethics as someone developing data analysis skills, what ideas would you want to carry forward from this reading into your approach to data analysis and visualization?
NOTE: While your personal views may not completely align with those of the authors, you should seek to glean insights that make your data analysis and visualization efforts more impactful, regardless of the context or application. By digesting multiple perspectives and approaches to data analysis and visualization, we can strive to make our work as high quality as possible.
….
….
….
Part 2: Finding Data#
Below are a few places (among many!) to find data sets. Choose three data sets (without worrying about if they are “good” or not) and download them. In the cell below, paste links to the files.
Sites:
LINKS HERE:
Part 3: Loading and Contexting Data#
Watch the video below to see some of the ways we can load data into Jupyter notebooks! Then answer the questions below.
from IPython.display import YouTubeVideo
YouTubeVideo("KxBgGdDP95Y",width=640,height=360)
In the cell below, include links to the package documentation for the different options mentioned in the video.
What are some of the common arguments used when loading in data files?
What are some common challenges you might run into when loading in data files? What challenges have you already run into?
✎ Put your answers here.
In previous assignments, we have loaded data that is ready to use with tools like numpy
and pandas
. In the cell below, choose one of your data sets and try to read it in with no additional arguments. Then answer the questions below.
# put your code here
✅ Question#
If your data was read in, what does it look like when you view it in the notebook? Is it in a usable form?
If your data was not read in, what bugs do you see, how might you address them?
✎ Put your answers here.
✅ Let’s try again!#
If your data was not read in properly above, try adjusting your arguments and see if you can get it loaded in.
If your data was read in properly, try an additional data set here!
# put your code here
✅ Reflecting#
What arguments did you use to read in your data?
What steps did you need to take to figure out how to read in your data?
✎ Put your answers here.
✅ Contexting your Data Set#
Choose one of the three data sets you identified, and do your best to answer the following questions for that data set:
Who collected/generated the data?
How was the data collected/generated?
Who/what is included in the data?
Who/what is not included in the data?
What are the limitations or biases of the data?
Note: If the information is not available to answer the questions above, that is your answer! You may need to do some additional searching about the data beyond the source you downloaded it from as well.
✎ Put your answer here.
Assignment wrap-up#
Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credit for the assignment! Press “shift + enter” to execute the cell and bring up the assignment survey.
from IPython.display import HTML
HTML(
"""
<iframe
src="https://cmse.msu.edu/cmse201-pc-survey"
width="800px"
height="600px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
Acknowledgements: This assignment originally designed by Dr. Rachel L.S. Frisbie as part of the MSU STEM Teaching and Learning Fellowship and updated for broad implementation with CMSE Graduate Students Emily Bolger and Rachel Roca.
Congratulations, you’re done!#
Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Pre-class assignments” folder, find the appropriate submission folder link, and upload it there.
See you in class!
© Copyright 2025, Department of Computational Mathematics, Science and Engineering at Michigan State University.