Homework 2: Dictionaries, If Statements, Functions, and Modules#
✅ Put your name here
#CMSE 201 – Fall 2024#
Learning Goals#
Content Goals#
Use if/elif/else statements to implement a logical flow
Write and execute functions
Use the numpy module for math and data
Display data on plots using matplotlib
Utilize dictionaries to store data
Practice Goals#
Coding conventions
More advanced debugging
Using functions to build reusable/transferable code
Use visualization best practices to make informative plots
Assignment instructions#
Work through the following assignment, making sure to follow all the directions and answer all the questions.
This assignment is due at 11:59 pm on Friday, October 11th. It should be uploaded into the “Homework Assignments” submission folder for Homework #2. Submission instructions can be found at the end of the notebook.
Academic integrity statement (2 Points)#
In the markdown cell below, paste your personal academic integrity statement. By including this statement, you are confirming that you are submitting this as your own work and not that of someone else.
✎ Put your personal academic integrity statement here.
1. Dictionaries (10 Points)#
Python dictionaries are particularly useful for keeping track of pieces of information of different types. Check out Part 4 of the Day 4 In Class Assignment for a reference on Python dictionaries! This problem will also have you use generative AI (specifically chatGPT 4o) and critically evaluate the output it provides.
Scenario: You are working on a project to manage different species of animals in a zoo. Each animal has a scientific name, average lifespan, and a list of continents where it can be found in the wild.
✅ 1.1 Task (1 Point)#
Copy and paste the following prompt into chatGPT 4o:
Create a Python dictionary that includes at least 10 animals, with each entry containing the animal’s scientific name, average lifespan, and a list of continents where it can be found. Your animals should have different lifespans, be found on different continents, and some of your animals should be found on multiple continents.
Copy and paste the output it provides to you in the cell below:
# Put your answer here
✅ 1.2 Task (3 Points)#
In the cell below, write down your plan for evaluating the output you got with the prompt from chatGPT 4o. You must include how you will make sure that you understand how to construct a dictionary and how you will verify that the information generated for the dictionary is correct (i.e. the information about the different animals). Justify your plan based on your past experiences, particularly in the Day 7 In Class assignment.
✎ Put your answer here!
✅ 1.3 Task (4 Points)#
Now you will carry out the plan you described above. In the cell below, describe (using words and/or code) how evaluating the output went. Identify any places where the generated output was exactly what you needed as well as any places where the generated output was incorrect. Cite any resources you used.
✎ Put your answer here!
✅ 1.4 Task (2 Points)#
You have made your dictionary, now it’s time to use it!
In the cell below, construct two print statements: (1) Print the scientific name of one of the animals, and (2) Print a sentence that includes the average lifespan and one of the continents where the animal can be found.
For example, if your dictionary entry is 'Elephant': {'scientific_name': 'Loxodonta africana', 'lifespan': 70, 'continents': ['Africa', 'Asia']}
, your output could be:
“The scientific name of the elephant is Loxodonta africana.”
If your dictionary entry is 'Lion': {'scientific_name': 'Panthera leo', 'lifespan': 15, 'continents': ['Africa']}
, your output could be:
“The lion lives for an average of 15 years and can be found in Africa.”
# put your answer here
2. Debugging a Function (10 Points)#
In the Tasks below, you will be asked to find, describe, and fix the bugs in each example.
But first, let’s review what a bug is! (For more in-depth discussion and scaffolding, check out the Debugging Notebook on the course website).
Getting code to run perfectly the first time is nearly impossible, even for people with years of experience. When running code results in an error, or your code is not doing what you expect, we call that a bug. Fun note: It comes from a computer at Harvard in the 40s that wasn’t working as expected because a moth was trapped in the mechanism! We call the process of finding and fixing bugs debugging.
Debugging is a very important skill all programmers need to have, regardless of what they are doing or how much experience they have. While it sounds intimidating, the process of debugging can actually be broken down to three simple steps:
Locate where the bug is occurring
Figure out what is causing the bug
Fix the bug
In the following tasks, you will do these three things! Note: Keep in mind that not all bugs result in error messages, so look carefully at your code if it runs but isn’t doing what you expect!
If you use resources (e.g. Stack Overflow, chatGPT, Co-pilot), you are expected to document your usage and your solution pathway.
✅ 2.1 Task (5 Points)#
In this task, you will debug a function that calculates the total price of items in a shopping cart. The function applies a discount if the total price exceeds a certain threshold, but there is an error in the logic.
If your total grocery bill is over $100, you receive a (specified) percentage discount off your total purchase.
Find and fix the bug in the function below. Use the test list of data and describe where the bug is occurring, how you figured it out, and then fix it.
def calculate_total(prices, discount):
total = sum(prices)
if total > 100:
total = total - discount
return total
# Test data
cart = [45, 30, 40]
print(calculate_total(cart, 20))
95
# Your code here
✎ Put your answer here!
✅ 2.2 Task (5 Points)#
In the function below, there’s another bug related to checking if the cart contains items. You should do/answer the following:
What should the output of the function call be?
What line(s) are causing the bug?
Write a plain English description of the bug.
Fix the bug!
def cart_is_empty(cart):
if len(cart) == 0:
return False
return True
# Test data
cart = []
print(cart_is_empty(cart))
False
# Put your code here
✎ Put your answer here!
3. Functions, if/else, and Basketball! (12 Points)#
When the NBA drafts new players, they prefer younger people because young players have more time to develop before getting to their prime. So, what is this prime age for a basketball player? Findings\(^{1}\) suggest that a basket ball players prime age is on average between 27 and 31 years old.
In this section, you will sort through a list of ages of NBA players in the 2023-24 regular season, and classify them as in their prime or not. The ages range from 19 to 39.
\(^{1}\)Citation: An Empirical Analysis of Prime Performing Age of NBA Players, When Do They Reach Their Prime? by Tony Salameh
3.1 Finding Players in their Prime Age (4 points)#
First, choose if you want to check for prime age or outside prime age in the data set. Then, write a function that takes in the age of a NBA player and returns the Boolean True
if it falls within the age range of that definition and False
if it is outside that definition.
Write your function in the cell below. Demonstrate that your function works using the test data given (nba_player_ages_test_data
).
Make sure to explain your function in a markdown cell below the code cell.
# test NBA player ages data
nba_player_ages_test_data = [24, 24, 24, 26, 23, 23, 25, 28, 25, 25, 30]
# put your code here
✎ Put your answer here!
3.2 Extending to the full Data Set (2 points)#
In the cell below, you will find a data set of ages of NBA players. Use your function on the list to count the number of players that fall into the category you wrote your function to find.
Citation: NBA Player Stats
# full NBA player ages data set
nba_player_ages = [24, 24, 24, 26, 23, 23, 25, 28, 25, 25, 30, 29, 31, 23, 26, 26, 26, 29, 23, 25, 24, 21, 24, 19, 21, 22, 25, 21, 25, 24, 20, 31, 22, 23, 23, 23, 23, 20, 28, 35, 35, 35, 30, 27, 23, 24, 31, 35, 24, 24, 24, 31, 20, 24, 31, 34, 24, 25, 27, 22, 31, 23, 24, 20, 22, 27, 25, 25, 31, 25, 28, 27, 26, 22, 27, 24, 24, 24, 27, 26, 20, 32, 32, 23, 34, 24, 30, 23, 29, 28, 24, 29, 23, 22, 22, 20, 19, 31, 24, 19, 26, 26, 26, 36, 31, 19, 33, 33, 33, 33, 33, 22, 33, 35, 20, 30, 21, 21, 24, 34, 22, 20, 20, 30, 27, 24, 24, 24, 30, 26, 29, 35, 20, 22, 22, 23, 24, 29, 26, 28, 25, 30, 25, 28, 25, 25, 31, 26, 20, 25, 25, 35, 24, 21, 25, 27, 20, 33, 38, 21, 25, 25, 31, 24, 25, 31, 25, 28, 35, 28, 29, 24, 36, 33, 21, 37, 23, 20, 23, 21, 25, 23, 22, 31, 34, 21, 23, 29, 32, 31, 30, 28, 25, 26, 21, 23, 22, 33, 19, 20, 24, 31, 27, 24, 25, 27, 33, 34, 30, 21, 20, 37, 23, 30, 21, 20, 25, 25, 26, 23, 36, 26, 31, 26, 21, 22, 19, 22, 24, 33, 23, 39, 22, 26, 26, 24, 27, 22, 36, 24, 21, 23, 28, 21, 28, 26, 25, 24, 27, 35, 32, 20, 22, 27, 22, 26, 24, 32, 24, 27, 26, 28, 21, 28, 28, 22, 28, 23, 24, 30, 32, 29, 22, 21, 23, 33, 23, 19, 25, 20, 21, 27, 35, 35, 35, 37, 23, 28, 22, 24, 27, 22, 35, 26, 26, 28, 28, 23, 23, 23, 27, 21, 37, 23, 26, 23, 32, 31, 23, 26, 32, 36, 21, 27, 25, 25, 27, 26, 30, 32, 21, 24, 20, 19, 35, 27, 24, 21, 25, 27, 22, 24, 25, 21, 29, 22, 24, 34, 34, 23, 27, 26, 23, 23, 32, 26, 31, 24, 24, 30, 21, 23, 24, 24, 29, 25, 30, 25, 25, 23, 23, 32, 26, 28, 28, 38, 29, 31, 24, 23, 23, 23, 20, 24, 33, 20, 28, 22, 24, 23, 25, 30, 28, 28, 25, 32, 30, 21, 29, 26, 21, 26, 24, 24, 24, 24, 23, 29, 27, 25, 24, 24, 24, 24, 26, 30, 29, 26, 25, 23, 23, 22, 21, 35, 29, 19, 27, 26, 27, 24, 25, 21, 21, 29, 24, 23, 26, 30, 23, 21, 20, 25, 26, 22, 20, 22, 29, 27, 24, 25, 25, 24, 24, 29, 26, 26, 35, 20, 23, 19, 20, 21, 26, 22, 21, 27, 22, 24, 28, 25, 24, 37, 21, 31, 31, 31, 22, 21, 21, 33, 32, 21, 26, 25, 23, 25, 30, 28, 25, 24, 38, 38, 38, 27, 23, 24, 31, 24, 29, 23, 28, 27, 33, 27, 22, 26, 29, 21, 20, 25, 20, 25, 22, 29, 26, 23, 21, 20, 20, 35, 23, 29, 19, 25, 28, 25, 24, 25, 22, 21, 24, 29, 22, 22, 26, 23, 22, 23, 23, 27, 22, 28, 31, 35, 25, 25, 31, 26]
# put your code here
3.3 Adding Complexity (4 points)#
What if you wanted to categorize players as in the prime age range or not? In the cell below, write a function that takes in the age of an NBA player, determines if the player is prime age or not, and returns the string “Prime age” if the player is in the prime age range and “Not prime” if the player is not in the prime age range. You should consider building from the code you wrote for 3.1 and 3.2, but you may also start from scratch. You may find it helpful to make use of the test data again!
Demonstrate that your code works by writing additional code that runs your function and determines the fraction of prime age and non-prime age players in the full data set (e.g., 40% of players are prime age). That is not the correct percentage, it is just an example.
Make sure to explain your function in a markdown cell below the code cell.
# put your code here
✎ Put your answer here!
3.4 Reflections (2 points)#
✅ In the cell below, answer the following reflection questions:
Document your pathway to a solution for 3.3. Include whether or not you used previous code for your solution and justify that decision.
Briefly describe at least one other way to solve this problem that doesn’t use a function. Which way would you say is a better way to solve the problem? Justify your answer.
✎ Put your answers here.
4. Numpy, matplotlib, and functions (24 Points)#
4.1 Making a plotting function (7 points)#
In CMSE 201, you will be making plots often. To help future you save time, you are going to write a function to use when you make plots!
✅ In the cells below, write a function that takes in arrays of x and y values and displays a plot containing 4 subplots. You should use plt.subplot()
and make use of labels, line shapes, and colors to make your plots clear.
You should plot the data for each month in a different subplot.
The plot should show the number of people employed in the US in thousands for each month for all years in the data set.
The labels should be clear and informative.
Functions should be reusable, so make sure that your function is designed to work for more datasets beyond the one here (i.e. datasets with data that isn’t employment data).
You must also test your function with the test data to receive full credit
Briefly explain your function in a markdown cell below the code cell.
Citation: US Bureau of Labor Statistics
import numpy as np
years = np.array([1939, 1940, 1941, 1942, 1943, 1944, 1945, 1946, 1947, 1948, 1949, 1950, 1951, 1952, 1953, 1954, 1955, 1956, 1957, 1958, 1959, 1960, 1961, 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 1971, 1972, 1973, 1974, 1975, 1976, 1977, 1978, 1979, 1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021, 2022, 2023])
# Number of people employed in the US by year in January (in thousands)
employment_jan = np.array([29923, 31603, 34481, 38347, 42172, 42654, 41895, 39829, 43535, 44679, 44668, 43526, 47288, 48296, 50144, 49469, 49496, 51975, 52887, 52076, 52478, 54274, 53683, 54891, 56115, 57487, 59582, 62529, 65407, 66804, 69439, 71176, 70865, 72441, 75617, 78100, 77293, 78503, 80690, 84594, 88808, 90800, 91033, 90565, 88990, 92673, 96372, 98732, 100678, 103753, 107161, 109189, 109051, 108367, 109795, 112598, 116504, 118318, 121363, 124813, 127704, 131009, 132699, 130853, 130580, 130766, 132779, 135425, 137475, 138397, 134066, 129795, 130839, 133243, 135263, 137551, 140562, 143196, 145636, 147667, 150062, 152045, 142916, 150014, 154773])
# Number of people employed in the US by year in April (in thousands)
employment_apr = np.array([30094, 31701, 35468, 39352, 42647, 42063, 41446, 40913, 43499, 44379, 44236, 44382, 47861, 48620, 50435, 49179, 50248, 52375, 53238, 51027, 53321, 54813, 53627, 55602, 56580, 57922, 60259, 63437, 65466, 67556, 70072, 71348, 71036, 73162, 76455, 78382, 76460, 79292, 81728, 86162, 89417, 90849, 91283, 90150, 89364, 93792, 97038, 99121, 101499, 104732, 107791, 109671, 108352, 108515, 110295, 113587, 117065, 119156, 122284, 125445, 128595, 131883, 132455, 130616, 130176, 131409, 133515, 136210, 137842, 138037, 131825, 130115, 131604, 133828, 135876, 138298, 141202, 143856, 146173, 148426, 150602, 130421, 144593, 151642, 155484])
# Number of people employed in the US by year in July (in thousands)
employment_jul = np.array([30419, 31942, 37137, 40472, 42700, 41904, 40874, 42153, 43743, 45160, 43531, 45454, 48061, 48143, 50536, 48835, 50987, 51955, 53123, 51039, 53804, 54306, 54123, 55746, 56794, 58412, 60965, 64301, 65888, 68126, 70729, 71053, 71315, 73709, 76913, 78636, 76770, 79547, 82834, 87204, 90217, 89840, 91601, 89521, 90437, 94789, 97648, 99473, 102247, 105550, 108069, 109822, 108290, 108799, 111060, 114607, 117377, 120016, 123112, 126209, 129423, 132228, 132173, 130585, 130184, 131850, 134297, 136522, 138042, 137492, 130661, 130420, 131992, 134153, 136391, 139076, 141989, 144515, 146772, 149023, 150934, 139240, 146761, 153038, 156211])
# Number of people employed in the US by year in October (in thousands)
employment_oct = np.array([31411, 33267, 37949, 41515, 42675, 41708, 38600, 43093, 44411, 45245, 42942, 46706, 48006, 49597, 50240, 48942, 51429, 52777, 52763, 51485, 53358, 54142, 54522, 56041, 57283, 58793, 61719, 64854, 66225, 68721, 71121, 70521, 71642, 74674, 77607, 78630, 77540, 79911, 83800, 87956, 90481, 90490, 91380, 88907, 91520, 95629, 98233, 100121, 103138, 106276, 108476, 109374, 108362, 109135, 111734, 115458, 118029, 120663, 123923, 126950, 130179, 132352, 131452, 130621, 130439, 132435, 134646, 136849, 138175, 136294, 130061, 130620, 132557, 134672, 137039, 139804, 142584, 145069, 147146, 149525, 151458, 142493, 148566, 153897, 156832])
# put your code here
✎ Put your answer here!
4.2 Using numpy
arrays and functions for calculations (11 points)#
The data on labor is collected monthly, but that makes it subject to seasonal variation or other fluctations. A better idea is to look at the average throughout a year. Typically, we would use a moving average to do this, but for this exercise, we will use the average of the data for each quarter of a year (i.e. the average at January, April, July, October for a given year).
For the data set we’ve been using, write a function that computes the average number of people employed in the US in thousands for each month. The function should take in the data set and return an array of the average number of people employed in the US in thousands for each month.
Then, write another function that calls your average function and uses that average to compute the percentage change in employment for a given year (on average) compared to the previous year. We can calculate the percent difference like this:
\(\% \ difference = \frac{(final-initial)}{initial} \times 100\)
Finally, use subplot to plot both the average annual employment and the percent difference in employment for each year. Make sure to label your plots and axes.
When you find the percentage difference, the number of entries in the array will be one less than the number of years in the data set.
Make sure you explain your work in a markdown cell below the code cell.
✅ In the cell below, write the necessary functions and demonstrate that they work with the test data. You should have a function to compute the averages, a function to compute the percent change, and code to make the plots (writing this as a function is not required).
# put your code here
✎ Put your answer here!
4.3 Putting it all together (6 points)#
Along with this homework assignment, you will find a .csv file with some additional employment data.
✅ In the cell(s) below, you will use the data from the file (and the functions you wrote in 4.2). Your task is to:
Choose four months of data (i.e. 4 columns of the data file) to explore. Use different months than you did in 4.2.
Using those four months, calculate the percent difference for each one across all years and generate create the four plots to visualize this. (i.e. if you choose February as one of the months, your plot should have 1940-2023 on the x-axis and percent change on the y-axis)
Remember that you can open up data files in Jupyter to see what the data looks like!
# put your work here
Congratulations, you’re done!#
Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Homework Assignments” section, find the submission folder link for Homework #2, and upload it there.