Day 13 Pre-class Assignment: Finding resources online and tinkering wth code#
✅ Put your name here
#Goals for today’s pre-class assignment#
Find appropriate datasets that can be used for analysis
Working with new coding tools that you find online
Assignment instructions#
This assignment is due by 11:59 p.m. the day before class, and should be uploaded into the appropriate “Pre-class assignments” submission folder. If you run into issues with your code, make sure to use Teams to help each other out and receive some assistance from the instructors. Submission instructions can be found at the end of the notebook.
It is important that you do your best to complete the pre-class assignment! Going through this assignment and trying to complete it to the best of your ability will help to make sure you’re prepared for the content that is covered in class.
Part 1: Finding Data Online#
Thus far in the course, we’ve often provided you with datasets for our previous assignments to work with. However, there is a vast treasure trove of data on the internet, and we want to ensure that you can find and access these resources online. Aside from being an important skill to develop in general, this will also prove invaluable for most of you when working on your semester projects.
Not all datasets are created equal. There are several important qualities that you should look for as you pick a dataset:
Use data that comes from reliable sources. There should be a citation you can give for any data you end up using. Often, some of the most reliable sources come from government agencies–CDC, FDA, the CIA (strangely), etc.–as well as large NGOs/non-profit organizations (E.g., World Bank). This isn’t necessarily a hard and fast rule for our purposes here, but it works well as a general rule of thumb.
The data should be rich. The power that computers have brought to modern data analysis is that they allow us to look at many variables in massive data sets. You should look for data that contains multiple variables (4+) and has many values for each variable (greater than 1000).
…But not too rich (just for this assignment). We don’t want you spending too much time trimming down data for this assignment (not true for your semester project!). We recommend looking for datasets that are less than 5 MB.
Here is a list of potential resources we recommend for finding data. You don’t need to use any of these, but they can be helpful if you’re unsure of where to start. Also, this list is by no means an exhaustive list and you should feel free to search around for other good data sources as well.
Kaggle: Kind of a random assortment of data sets. We’ve found everything from data on roadkill in Vermont to an exhaustive compilation of memes.#
World Bank: Economic Data#
And, this article does a nice job of summarizes a large number of places were you can find datasets that make for good data analysis projects.
Watch the video below (or watch it on MediaSpace) showing an example of finding data from the FAO website.#
from IPython.display import YouTubeVideo
YouTubeVideo("_WFZj2x6I0g",width=640,height=360)
✅ Task#
Your task for this section is to find a data set. You will be sharing the dataset that you find with your group in class, so you should prepare a quick summary of your data. You should make a single slide with important information about your dataset including (but not limited to):
The source of your dataset
The variables in your dataset.
What makes your dataset interesting.
You may wish to use Google Slides or a shared PowerPoint presentation (i.e. via OneDrive) to make your slides, so that you can easily share it with your group mates.
NOTE: If you have an idea for a data set you’d like to work with but are having trouble finding data in any of the resources we’ve provided, we strongly recommend searching Google! You can search for something like “(TOPIC YOU CARE ABOUT) data csv.” Hopefully you’ll be able to find something that works!#
Part 2: Using New Coding Tools#
A critical skill for you to develop is the ability to find and use new coding tools from online sources. Thus far, we’ve often provided you with the tools that you need for class, mainly coming from Numpy, Matplotlib, and Pandas. However, there are many (many) more tools available to the python coder that we haven’t talked about. If you want to learn these tools on your own, it’s best to use a coding skill that we’ll refer to as tinkering.
Code tinkering#
Tinkering is where you experiment with a functioning piece of code that is new to you to understand it well enough to use it for your purposes.
Note: Tinkering is not an official term; it’s just what we’re calling this kind of code experimentation.
Important Ideas for Tinkering with Code#
Always have a copy of the original code, in case you break things too much and need to start over. You can use the copy/comment/paste method (not officially recognized as a thing; just what we’re calling it).
Make sure there is some output to see the effect of the changes you make. Either plot the results or print them, with a preference for plotting (easier to see changes!)
Look for specific values in the code that you can change. Change the value of any numbers/strings you see.
Test your understanding of the code. When you feel like you understand how things work, make some predictions about what will happen if you change specific values.
Use your understanding of the code to get it to work for your purposes. Ultimately, we tinker with code to figure out how to use new tools that we find online.
Watch the video below (or on MediaSpace) for a worked example of code tinkering.#
from IPython.display import YouTubeVideo
YouTubeVideo("UTVR2DlQrRw",width=640,height=360)
✅ Task#
Your task for this section is to go to the Matplotlib Gallery and find a “new-to-you” or “new-to-our-class” plotting tool or utility. Experiment with the code until you feel that you can use the tool for your purposes. You will be sharing–and potentially using–your new plotting tool with your group, so it’s important that you understand it well enough to get it to work. Write a quick summary of your plotting tool that you can share with your group, and be prepared to show an example of the tool in action.
# Use this space to tinker with the code for the plotting tool you found
✎ Write a summary of your new plotting tool here (e.g. what does it do? how does it work?)
# Put a functioning piece of code here that showcases what this tool can do
# Ideally this should be something more than the example code your started with
Assignment wrap-up#
Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credit for the assignment!
from IPython.display import HTML
HTML(
"""
<iframe
src="https://cmse.msu.edu/cmse201-pc-survey"
width="800px"
height="600px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
Congratulations, you’re done!#
Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Pre-class assignments” folder, find the appropriate submission link, and upload it there.
See you in class!
Copyright © 2023, Department of Computational Mathematics, Science and Engineering at Michigan State University, All rights reserved.