Practice Midterm#

✅ Put your name here

#

CMSE 202 Practice Midterm#

The goal of this midterm is to give you the opportunity to test out some of the skills that you’ve developed thus far this semester. In particular, you’ll practice setting up a GitHub repository, committing and pushing repository changes, downloading data with command line tools, performing some data analysis, possibly using a new Python package, and writing a python class. You should find that you have all of the skills necessary to complete this exam with even just eight weeks of CMSE 202 under your belt!

You are encouraged to look through the entire exam before you get started so that you can appropriately budget your time and understand the broad goals of the exam. Once you’ve read through it, try doing Parts 1 and 2 first so that you have your repository set up and you download all necessary data files as they will be necessary to complete the assigned tasks. Let your instructor know right away if you have problems downloading the data!

The exam is set up so that even if you get stuck on one part there are opportunities to get points on the other parts, so consider jumping ahead if you feel like you aren’t making progress and then come back later if you have time.

Important note about using online resources: This exam is “open internet”. That means that you can look up documentation, google how to accomplish certain Python tasks, etc. Being able to effectively use the internet for computational modeling and data science is a very important skill, so we want to make sure you have the opportunity to exercise that skill. However: The use of any person-to-person communication software or generative AI is absolutely not acceptable. If you are seen accessing your email, using a chat program (e.g. Slack), accessing ChatGPT or similar too, or any sort of collaborative cloud storage or document software (e.g. Google Documents), you will be at risk for receiving a zero on the exam.

Keep your eyes on your screen! Unfortunately, there isn’t enough space in the room for everyone to sit at their own table so please do your best to keep your eyes on your own screen. This exam is designed to give you the opportunity to show the instructor what you can do and you should hold yourself accountable for maintaining a high level of academic integrity. If any of the instructors observe suspicious behavior, you will, again, risk receiving a zero.


Part 0: Academic integrity statement#

Read the following statement and edit the markdown text to put your name in the statement. This is your commitment to doing your own authentic work on this exam.

I, INSERT NAME HERE, affirm that this exam represents my own authetic work, without the use of any unpermitted aids or resources or person-to-person communication. I understand that this exam an an opportunity to showcase my own progress in developing and improving my computational skills and have done my best to demonstrate those skills.


Part 1: Add to your Git repository to track your progress on your exam (2 points)#

Before you get to far along in the exam, you’re going to add it to the cmse202-xxx-turnin repository you created in class so that you can track your progress on the exam and preserve the final version that you turn in. In order to do this you need to

✅ Do the following:

  1. Navigate to your cmse202-xxx-turnin repository and create a new directory called midterm.

  2. Move this notebook into that new directory in your repository, then add it and commit it to your repository.

  3. Finally, to test that everything is working, “git push” the file so that it ends up in your GitHub repository.

Important: Double check you’ve added your Professor and your TA as collaborators to your “turnin” respository (you should have done this in the previous homework assignment).

Also important: Make sure that the version of this notebook that you are working on is the same one that you just added to your repository! If you are working on a different copy of the noteobok, none of your changes will be tracked!

If everything went as intended, the file should now show up on your GitHub account in the “cmse202-xxx-turnin” repository inside the midterm directory that you just created. Periodically, you’ll be asked to commit your changes to the repository and push them to the remote GitHub location. Of course, you can always commit your changes more often than that, if you wish. It can be good to get into a habit of committing your changes any time you make a significant modification, or when you stop working on the project for a bit.

Question 1.1 (2 point): Do this: Before you move on, put the command that your instructor should run to clone your repository in the markdown cell below. Also make sure that you created the directory and pushed your change to GitHub as explained above.

# Put the command for cloning your repository here!

Part 2: Downloading and analyzing unfamiliar data (12 points)#

In this part of the exam, you will use load an visualize a dataset you have not used before.

In particular, you will be working with a dataset on international health care spending. The data is associated with a BuzzFeed article at https://www.buzzfeednews.com/article/peteraldhous/american-health-care . Interestingly, BuzzFeed maintains GitHub repositories for the data they use to make plots in their articles. (Although it should be noted that they are not the original source of the data, but they do cite their source at https://data.oecd.org/ - this repository has a TON of other interesting datasets.) The specific data you will be looking at is on health care spending and life expectancy in OECD countries. You can find the data files we will use here:

You will need to download two files from this repository, health_spending_per_cap.csv and life_expect_birth.csv. The direct links to these files are:

Question 2.1 (1 point): Do this now: Save these files in the same directory as your notebook so you can load them directly. Then, in the cell below, put the command or commands you used to download the files. If you did not use a command line tool to download the files, write down a command that would have fetched the files.

# Put the command(s) you used for fetching the data files here!

Question 2.2 (2 points): To get started, read in the life_expect_birth.csv dataset and then display the first 15 rows of the data. You can use Pandas for this task or any other Python tool you prefer.

### Put your code here

Question 2.3 (4 points): The dataset should containe a column called LOCATION which contains three-letter country codes, such as AUS, AUT, … You will now select all rows for the United States (country code USA) and create a plot of life expectancy (Value) vs. year.

Do this: select all rows where with country code (LOCATION) is USA and create a plot of life expectancy (Value) [y-axis] vs. year of birth [x-axis]. Make sure to label your axes.

### Put your code here

Question 2.4 (2 points): Now we compared this against another country. You can choose any country you like, but if you feel uninspired, just use Canada (CAN).

Do this: Create a plot that shows the life expectancy vs. year of birth just as in Q2.2, but this time plot the curves for both the USA and another country of your choice (e.g. Canada / CAN). Make sure your axes are labeled.

### Put your code here

Question 2.5 (2 points): Finally, we want to see how health care spending per capita differs between the two countries.

Do this: Create code that loads the other dataset health_spending_per_cap.csv. It contains data on the health spending per capita in many countries and for a large range of years. The health spending per capita is stored in US dollars. Evaluate the csv file and write code that loads data, selects the necesary rows for your two chosen countries (USA and whatever you chose before in Q2.3) and create a plot that shows the year on the x-axis and the spending on the y-axis. The lines for both countries should appear in the same plot and the x- and y-axes should be labeled.

### Put your code here

Question 2.6 (1 point): The BuzzFeed article linked above uses a different way to visualize this data.

Visualization

Do this: explain in a few sentences how you would go about creating a plot like theirs using the datasets you already loaded. You are not required to re-create the plot, but you should describe what you would do in order to create it in a few sentences. If you can think of any issues that might occur when you create this version of the plot, point them out.

Do This - Erase the contents of this cell an put your answer here.


🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 2”, and push the changes to GitHub.



Part 3: Working with a new Python package (8 points)#

You might have noticed that the dataset you used in Part 2 used 3-letter codes to encode countries. For some countries, these are obvious (e.g. USA), but for others, they might be more obscure (e.g. the code for “Germany” is not GER but DEU). In this part, we will be using a Python package you probably have not used before to make sure we can interpret these country codes and to find codes for certain countries. The package is called pycountry and its source code can be found here: flyingcircusio/pycountry . It is also available from PyPi at https://pypi.org/project/pycountry/ .

Question 3.1 [2 points]: Unfortunately, pycountry is not already included with Anaconda. However, you should be able to install the package in the same way we have used previously to install Python packages.

Do this: Install the pycountry Python package.

What command did you use to install pycountry? Include this command in the Markdown cell below.

# Put the command for installing `pycountry` here!

Once pycountry is installed, running the following cell should not result in an error message. You might need to re-start your Jupyter kernel after installing the package for this to work. (Once everything works, it should give you no output at all.)

# Running this cell should just work and not yield any output if `pycountry` is installed and available
import pycountry

Question 3.2 [2 points]: Looking at the pycountry documentation, find a way to determine how many (current, non-historic) countries the package is aware of in total. Then write code to display the number of countries.

### Put your code here

Question 3.3 [2 points]: Write code that performs a “fuzzy” search for a country called “England” and store its entry (a Python object!) in a variable called UK. (You might want to look at the package documentation page to find out how to perform a “fuzzy search” for country names. This search will always return a list of objects, so you want to make sure to use the only entry in the list returned, not the list itself.)

### Put your code here

The following cell will print the proper country name (“United Kingdom”) if your above code works correctly and stored the country object in the variable UK.

print("The country name is \"{}\"".format(UK.name))

Question 3.4 [2 points]: The UK object you just created includes an attribute with the “offical name” of the country. For some countries, this can be different from their “commonly” used name. Use Python tools/commands for inspecting objects to find what the name of this attribute could be and print it. (If you were not able to complete question 3.3, you can create the UK object with this line: UK = pycountry.countries.get(alpha_2="GB"). Note that this is NOT the answer to question 3.3 but will return the same result.)

For full points, you need to both write code that prints the “official name” of the country and show the command(s) that you used to find the attribute name.

### Put your code here

🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 3”, and push the changes to GitHub.



Part 4: Writing and using Python classes (9 points)#

# The Country class. You'll need to edit/expand on this.
class Country:
    def __init__(self, name):
        self.name = name

    def print_name(self):
        print('The country name is {0}'.format(self.name))
c = Country("USA")
c.print_name()

Question 4.1 [3 points]: Do this: extend the Country class as described below and put the new version of this class in the code cell below.

  • Add a new attribute, is_historic, to the class so that it is set to a default of False when the class object is first initialized. This attribute will be used to represent historic countries that do not exist anymore. By default any country that is created will be treated as non-historic.

  • Add a new class method, make_historic, that takes no additional inputs (except to what all class methods should take)

    • The only thing this method should do is to set the is_historic attribute to True

    • This function is not expected to return a value

  • Add another new class method, get_is_historic() that also does not take any additional inputs, except the usual

    • This method should return the current value of the is_historic attribute

### Put your code here
# This cell should now initialize a `Country` (for the fictional historic country
# of "Osparia") and print its name. You do not have to change this code, but once
# your class is finalized, the output should be:
# ```
#       This country name is Osparia
#        -> The country is historic
# ```

my_country = Country("Osparia")
my_country.make_historic()
my_country.print_name()
if my_country.get_is_historic():
    print(" -> The country is historic")

Question 4.2 [6 points]: Now that you have a functioning class, your next task is to create a second class, EnhancedCountry, that inherits the Country class.

Do this: Create a second class called EnhancedCountry that inherits the Country class and then adds new functionality as described below. Put this new class in the code cell below. (This extension is mostly independent of the extension in Q4.1, so if you did not manage to get that done, you can still get credit for this part by using the initial version of the Country class as the base.)

In this new EnhancedCountry class, do the following:

  • Add another new attribute, subdivisions, to the class such that the attribute is initialized to be an empty dictionary ({}) when the class object is first created. This dictionary will serve as a place to store country subdivisions, such as “states” in the United States. Other countries might have other types of subdivisions, such as Canada, which has “provinces”.

    • When adding the new attribute, make sure all other attributes inherited from Country are also still initialized.

  • Add a new class method, add_subdivision, that takes two inputs:

    • name (the name of the subdivision, such as “Michigan”)

    • type (the type of the subdivision, such as “state”)
      Using these two inputs, this method should: Update the subdivisions dictionary attribute to include the new name as a dictionary key and the type as the value associated with that key.

  • Add one final new method, print_subdivisions, that takes no input and prints a list of all subdivisions and their types, sorted alphabetically by subdivison name. (Partial credit if they are unsorted.)

### Put your code here
# This cell should now initialize an `EnhancedCountry` (for the fictional country
# of "Osparia" which consists of provinces and territories), add subdivisions and
# finally print them. You do not have to change this code, but once your class is
# finalized, the code in here should print a list of country subdivisions and their types.

my_country = EnhancedCountry("United Provinces of Osparia")
my_country.add_subdivision("East Neana", "province")
my_country.add_subdivision("Pennxico", "territory")
my_country.add_subdivision("West Wyoshire", "province")
my_country.add_subdivision("South Geoiana", "province")
my_country.add_subdivision("Oreginia", "territory")
my_country.add_subdivision("Marybama", "province")
my_country.add_subdivision("New Flovada", "province")
my_country.add_subdivision("Illibraska", "province")

my_country.print_subdivisions()

🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 4”, and push the changes to GitHub.


You’re done! Congrats on finishing your CMSE 202 Midterm!#

Make sure all of your changes to your repository are committed and pushed to GitHub. Also upload a copy of this notebook to the dropbox on D2L in case something went wrong with your repository or if you couldn’t get the repository to work.