Homework Assignment 1#

Git practice, debugging practice, new Python packages, and Python classes#

And exploring earthquake data#

✅ Put your name here.

#

✅ Put your GitHub username here.

#

Image credit: https://aroundmichigan.com/2018/04/15/history-earthquakes-michigan/

Goals for this homework assignment#

By the end of this assignment, you should be able to:

  • Use Git to create a repository, track changes to the files within the repository, and push those changes to a remote repository.

  • Debug some basic Python code that involves Pandas.

  • Read documentation and example code to use a new Python package

  • Modify and use a simple Python class

Work through the following assignment, making sure to follow all of the directions and answer all of the questions.

There are 55 points possible on this assignment. Point values for each part are included in the section headers and question prompts.

This assignment is due at 11:59 pm on Sunday, Feb 15. It should be uploaded into the “Homework Assignments” submission folder for Homework #1 on D2L. Submission instructions can be found at the end of the notebook.

Table of contents#

  1. Part 0: Office Hours and Academic Integrity

  2. Part 1: Git and CLI (9 points)

  3. Part 2: Debugging (6 points)

  4. Part 3: Downloading and analyzing unfamiliar data (15 points)

  5. Part 4: Using documentation to use a new Python package (8 points)

  6. Part 5: Practice with using Python classes (14 points)

  7. Part 6: Finishing (3 points)

# Calculate total points possible and print it
print("Total number of points possible on this assignment is %i." %(9+6+15+8+14+3))

Back to ToC

Part 0: Office Hours and Academic Integrity#

CMSE 202 Office Hours/Help Rooms#

This is a reminder that CMSE 202 offers a help room that is available to students across all 3 sections. The help room is a great place to get assistance if you run into challenges while working through your homework assignment.

NOTE: The days before a homework is due the help room tends to be very busy. To ensure you get adequate assistance, it is highly recommended that you start this homework early and seek help from office hours early.

You can find the office hours calendar on the course website.

The link to the help zoom room can be accessed directly from the calendar.

Academic Integrity Statement#

In the markdown cell below, paste your personal academic integrity statement (this can be a statement you have used in the past or a statement you craft now to highlight that your work reflects your own learning, understanding, and honest effort). By including this statement, you are confirming that you are submitting this as your own work and not that of someone else. Reminder that you are encouraged to use the HiTA learning assistant to support your learning. If you use any generative AI for this assignment, you must cite the usage each time you use information from another source to give appropriate attribution to the origin of your work; See the Course AI Policy for additional details.

Put your personal academic integrity statement here.


Back to ToC

Part 1: CLI and Git (9 points)#

Setting up a new folder in your Git repository and adding your HW1#

git is a very important professional tool and we want you to get plenty of practice using it. The following set of questions checks your understanding of using Git and the command line by having you add, commit, and push your homework file in your cmse202-s26-turnin repository. You will share this repo with your course lead instructor and TA so that they can pull your completed assignments for grading. Additionally, please verify that your repository is setup as a private repository rather than a public repository.

Note: Although you will be uploading your assingment to Github to practice Git commands, you are still expected to submit the assignment to D2L.

# Put the command you used to clone the repository here

Question 1.1 (1 points):

  1. Navigate to your turnin repository on Github and add your instructor and TA as a collaborator for the repository. This step is very important since we will need access to your repository to check the status of commits for homework assignments throughout the semester. To show you’ve done this, write the Github usernames of yourself and the instructor and TA below. You should check the Slack channel for your section of the course to get this information.

### Write your the Github username: ###

### Write your TA Github username: ###

### Write your instructor Github username: ###

Question 1.2 (1 point): Using the command line interface, on your local computer (JupyterHub or your machine) move inside the repository folder

What command did you use to enter into the folder?

# Put the command to move into a new directory here.

Question 1.3 (1 point): Once inside the cmse202-s26-turnin repository, create a new folder called hw-01.

What is the command to create the new folder ?

# Put the command to create the folder/directory here

Question 1.4 (1 point): Move this notebook into that new directory in your repository then check the status of the repository

This is an important step: you’ll want to make sure you save and close the notebook before you do this step and then re-open it once you’ve added it to your local repository directory. If you don’t do this, you could end up working on the wrong version of the notebook! Once you’ve moved the notebook correctly, re-open it and continue working on it.

# Put the command you used to check the status of your repository here.

Question 1.5 (1 point): From the CLI, check the status of your git repository. Copy and paste below the output of the status command.

# Paste it here

Question 1.6 (1 point): What is the name of the current branch of the repository that you are in? (Hint: There should only be one branch at this time. We’ll learn more about branches in git later in the semester.)

# Put your answer here

Question 1.7 (1 points): If you haven’t already, add your name and GitHub username to the top of the notebook, then add and commit ONLY the notebook.

# Put the command(s) to add and commit here 

What is the commit message you used ?

# Copy your commit message here

Question 1.8 (1 point): Before moving on. Check that the notebook you are working on is the correct one. Run the following cell. Are you in the new folder you just created? If not close this notebook and open the one in the hw-01 folder. You’ll likely need to copy of over the work you did on the above questions if you were working on the wrong notebook.

!pwd

Question 1.9 (1 points): Assuming that you notebook is in the right place and committed to your repository, push your changes to GitHub.

What command did you use to push your changes to GitHub?

# Put the command you used to push your changes to GitHub here

Before moving on…#

Important: Make sure you’ve added your Professor and your TA as collaborators to your new “turnin” respository with “Read” access so that they can see your assignment. You should check the Slack channel for your section of the course to get this information.

Double-check the following: Make sure that the version of this notebook that you are working on is the same one that you just added to your repository! If you are working on a different copy of the notebook, none of your changes will be tracked.

If everything went as intended, the file should now show up on your GitHub account in the “cmse202-s26-turnin” repository inside the hw-01 directory that you just created. Periodically, you’ll be asked to commit your changes to the repository. By the end of the assignment you should have multiple commits that correspond to your completion of each section (as specified below). Of course, you can always commit your changes more often than that, if you wish. It can be good to get into a habit of committing your changes any time you make a significant modification, or when you stop working on the project for a bit.


Back to ToC

Part 2: Debugging Pandas code (6 points)#

Reading Python and Pandas code and understanding errors and error messages#

In this section, you will practice reading and debugging code, specially examples that use Pandas (since we’ll be regularly using Pandas in the course and we spent some time reviewing Pandas in class). Debugging can be one of the most frustrating and time consuming parts of a computational project, hence, it’s worth spending time parsing and debugging error messages.

Review the following code. Make sure to read the comments to understand what the code is supposed to do. Then run the code and see what it outputs and/or the error message. Finally, make a copy of the code in the provided cell and then fix the code. When you fix the code add a comment to explain what was wrong with the original code.

IMPORTANT NOTE #1: not every block of code will result in an error message, but it won’t produce the desired output. Even if there is no error, there is something you need to fix within the code.

IMPORTANT NOTE #2: In some cases, the example may use a bit of Pandas code that you’re not familiar with yet, in these cases, you’ll need to consult the internet (or the Pandas documentation) to figure out what the code is doing. This is a very common practice in computational modeling and data analysis.

Import Pandas before moving on!#

# Import Pandas and matplotlib
import pandas as pd 

Questions 2.1 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

## DO NOT CHANGE THIS CELL ##

# Group df by column 'Subject' and take the mean 

df = pandas.DataFrame({'Subject': ['Physics', 'Math',
                              'Math', 'Physics'],
                   'Scores': [88, 76, 92, 82]})

df.groupby('Subject').mean()

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

Questions 2.2 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

## DO NOT CHANGE THIS CELL ##

#use this pandas function to display all the dates between when hw1 is released until
#when the hw is due


dates = pd.date_range("01302026", periods=22)

dates

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

Questions 2.3 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

Note Assume the original dataframe ‘df’ cannot be changed. ie don’t modify the values in ‘df’ manually by erasing and re-typing, but by using functions and data cleaning methods to modify the data.

Note: The resulting dataframe should look like this:

A

B

0

3

3

1

8

4

## DO NOT CHANGE THIS CELL ##

# Take a DataFrame with two columns of numbers
# multiply the two columns by each other
# Replace the old column 'A' with the new numbers
# Display the DataFrame
df = pd.DataFrame({'A': ['1', '2'], 'B': [3, 4]})
df['A'] = df['A']*df['B']
df

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

🛑 STOP#

Pause to add and commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 2”, no need to push the changes to GitHub, but you can if you want.


Back to ToC

Part 3: Downloading and analyzing unfamiliar data (15 points)#

For this part of the homework assignment, you’re to download and analyze a dataset that you’ve likely not looked at before. You’ll perform some simple, exploratory analysis and create basic visualizations.

In particular, you’re going to be working with a dataset that contains information on earthquakes of magnitude three or greater detected in the United States from 1975 through February 2015. The dataset was used in the following Buzzfeed article:

Midwestern States Are Having Big Earthquakes Like Never Before

That headline might be a alarming, but you’re going to take a look at the data yourself to draw some of your own conclusions. Thankfully, BuzzFeed makes the data it used for the article publicly available. The original data came from the U.S. Geological Survey, but you’ll be working with the same dataset that BuzzFeed gathered and prepared, which you can get from here:

https://raw.githubusercontent.com/BuzzFeedNews/2015-03-earthquake-maps/master/data/earthquake_states.csv

Question 3.1 (1 point): Do this now: Using the command line interface, save this file in the same directory as your notebook so you can load it directly. Then, in the cell below, put the command you used to download the file.

# Put the command you used for downloading the data files here!

Question 3.2 (2 points): To get started, read in the earthquake_states.csv dataset and then display the first 15 rows of the data using Pandas.

### Put your code here

Question 3.3 (4 points): Through visual inspection, let’s try confirm that these earthquake readings do, in fact, come from the United States based on their latitude and longitude coordinates.

Make a scatter plot of latitude (on the \(y\)-axis) vs. longitude (on the \(x\)-axis).

Also, avoid using all of the default plot stylings and set the following:

  • Set the size of the scatter plot points to 1

  • Change the color of the plot to something other than the default. You choose!

  • Adjust the aspect ratio of the plot so that it is “equal”. This ensures the distances are the same in the x-direction as in the y-direction.

Give your \(x\) and \(y\) axes appropriate labels and double check that you put the right values on the right axis! (Remember: longitude should run east to west and latitude should run north to south.)

Does the resulting plot meet your expectations? Is there anything that surprises you about the plot? Discuss your observations of the plot in the markdown cell below.

### Put your code here

Do This: Record your observations of your plot here.

Question 3.4 (1 point): If you took a close look at the data, you might have noticed that some of the values in the state column show up as “NaN”. This might explain some of your observations when you visualized the data. Since we want to look at earthquakes that actually occured inside US states, what built-in Pandas function can you use to drop missing data from a Pandas frame?

Use this function to create a new dataframe that only contains the rows with a valid entry in the state column.

### Put your code here

Question 3.5 (3 points): How common are the high magntiude earthquakes?

Make a histogram of the earthquake magnitudes using your new dataframe that contains only the valid US state data. If you don’t recall how to make histograms using Pandas or matplotlib, you might need to do a bit of internet search and review some documentation. You should try modifying the number of bins your histogram and use something other than the default. Choose something that you think gives you a reasonable view of the data.

What does your histogram tell you about how common earthquakes above a magnitude of 5 are? Explain your answer in the markdown cell below!

### Put your code here

Do This: Record your answers to the question at the end of Question 3.5 here.

Question 3.6 (2 points): Now that you’ve used your histogram to get a sense for how common earthquakes above a magnitude of 5 are, write a bit of code to calculate the percentage of earthquakes that are above a magnitude of 5. (You should use your new dataframe that contains only the valid US state data.)

Make sure to print out your answer in a readable way.

### Put your code here

Question 3.7 (2 points): The original Buzzfeed article suggests that Midwestern states are seeing an uptick in earthquakes, but how bad are things in Michigan?.

Using a mask, create a new dataframe that just contains earthquakes that happened in Michigan and display that dataframe.

If you can’t figure out how to do this using a mask, but can complete it another way, feel free to do so for some fraction of the points.

Once you have your new dataframe, inspect it and answer the following questions:

  • How many unique Michigan earthquakes occurred in the timeframe covered by this dataset? (Make sure you look carefully at the resulting dataframe!)

  • Does everything in the dataframe make sense? Does it seem like there’s anything wrong with the data?

  • Where did the 1994 earthquake occur? You should be be able to figure this out using the latitude and longitude information (and map of course!).

### Put your code here

Do This: Record your answers to the free-response questions in Question 3.7 here.


🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 3


Back to ToC

Part 4: Working with a less familiar Python package (8 points)#

In this part of the assignment you will need to review a bit of documentation from either a Python package that you’ve explored a bit previously this semester.

Now that we have a sense for the history of earthquakes in Michigan over the last few decades, what about the history of earthquakes in one of our neighbors to the south, Indiana?

The goal for this part of the assignment is to see if you can make a map of Indiana that displays all of the earthquakes that have occurred in Indiana from this dataset.

You are expected to do this using Folium, which we’ve used a bit in class up to this point. If for some reason, you don’t have Folium already installed, you may need to do that!

As you work on this part of the assignment, you should take advantage of the folium documentation available here because you’ll likely find some really useful examples!

Question 4.0: If you don’t already have folium installed, what command could you use to install it? (you should run this command on the command line, if you need to).

If you do already have it installed, what command did you use to install it?

# Put the command for installing folium here!

Question 4.1 (3 points): To start, let’s make a map of Indiana centered on Indianapolis. In order to do this you’ll need need to set up a Folium with the following information:

  • Center the map on Indianapolis, which is located at roughly 39.77 degrees latitude and -86.16 degress longitude

  • Set the initial zoom_start level to 7.

Create this map and make sure it displays in your notebook.

### Put your code here

Question 4.2 (1 point): Now that you’ve got a map of Indiana, you need to isolate the earthquakes in the dataset that occurred in Indiana.

Create a new dataframe that only includes earthquake data for Indiana. You can most easily do this with a mask, but you can use whatever method that works.

### Put your code here

Question 4.3 (4 points): OK, now that you have a map of Indiana and all of the earthquakes that happened there, do the following:

  • Add every earthquake to the map as a filled red circle at the appropriate latitude and longitude

  • Set the radius of the circle to be the magnitude of the earthquake \(\times\) 2000 so that the circle represents the magnitude of the earthquake but is also large enough to be easily visible. (Note: if your circles are too big or too small, you can adjust the multiplier as needed until you get something that looks reasonable)

It is recommended that you recreate your original map from Question 4.1 in your code cell for this question so that it is freshly initialized before you add all of the circles.

### Put your code here

🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 4”, no need to push the changes to GitHub yet, but you can if you want.


Back to ToC

Part 5: Practice with using Python classes (23 points)#

For this part of the assignment, you’re going to work on fleshing out a partially constructed Python class and then experiment with using it to see if it works as intended.

The backstory#

You’re working as part of a new data science team and your team has been tasked with creating a Python class that can run some simple data analysis on one or more datasets that are provided to it. The hope is that this new class will make it easier for folks who are new to the team to do some basic exploratory data analysis when presented with new project data.

Your team leader figured that this was a good opportunity to try using one of the new generative AI tools that are out there to help with the initial development of this new class. Your team leader used Claude to generate the starting point for this class, using the following prompt:

Can you provide an example of a python class that would be useful for someone working in a computational modeling and data analysis context?

Obviously, this is a pretty vague prompt. Regardless, your team now has a basic starting point to work from and your team leader wants to move forward with this idea. He has provided you with the following code that was generated by Claude. Review the code and try running it. Make sure you understand what this code is doing.

# This code was generated using claude.ai on September 7, 2023. URL: https://claude.ai/
# The prompt was: Can you provide an example of a python class that would be useful for someone working in a computational modeling and data analysis context?
# Beyond the prompt used by the "team leader", a follow-up prompt was given which requested that Claude add comments to the code to explain what it does.

import numpy as np

class DataAnalyzer:
    # The init method initializes the class instance 
    # and saves the input data as an attribute
    def __init__(self, data):
        self.data = data
        
    # Computes the mean of the data        
    def mean(self):
        return np.mean(self.data)
    
    # Computes the standard deviation of the data
    def std(self):
        return np.std(self.data)
    
    # Generates a histogram plot of the data
    # Uses matplotlib to create the plot
    def plot_histogram(self):
        import matplotlib.pyplot as plt
        plt.hist(self.data)
        plt.title('Data Distribution')
        plt.xlabel('Value')
        plt.ylabel('Frequency')
        plt.show()

# Example usage:
data = [1, 4, 5, 8, 10, 12, 15]
analyzer = DataAnalyzer(data)
print('Mean:', analyzer.mean()) 
print('Standard deviation:', analyzer.std())
analyzer.plot_histogram()

Modifying the class to alter its behavior and add new functionality#

In the code cell below, you’re provided with a second copy of this new Python class, DataAnalyzer. For the remainder of this section of the assignment, you will be modifying this version of the class to add new functionality and alter its behavior. You will then be provided with snippets of code designed to test your modifications and confirm that you’ve implemented them as intended.

When you make edits to the class provided below, make sure to run the cell to save your changes before running the included tests!

Note: Feel free to experiment with HiTA or one of the generative AI tools out there to help you expand upon and modify the initial starting point for this new Python class. If you do this, make sure to include a link to the tool you used in the markdown cell below along with the prompt you used to generate the code and the date you accessed the tool. Additionally, it is important to make sure that you understand the code you’re working with, so make sure to review the code that is generated and make sure you understand what it is doing!

# For the assignment prompts that follow, EDIT THIS VERSION OF THE PYTHON CLASS
# This should help to ensure that you can always fall back to the original version provided above, should you need to.
import numpy as np

class DataAnalyzer:
    # The init method initializes the class instance 
    # and saves the input data as an attribute
    def __init__(self, data):
        self.data = data

    # Computes the mean of the data        
    def mean(self):
        return np.mean(self.data)
    
    # Computes the standard deviation of the data
    def std(self):
        return np.std(self.data)
    
    # Generates a histogram plot of the data
    # Uses matplotlib to create the plot
    def plot_histogram(self):
        import matplotlib.pyplot as plt
        plt.hist(self.data)
        plt.title('Data Distribution')
        plt.xlabel('Value')
        plt.ylabel('Frequency')
        plt.show()

Question 5.1 (3 points): Create a new class method called median that calculates and returns the median of the dataset. You should be able to use the mean and std functions that are already defined in the class to help you with this.

Once you’ve defined the new method, you should be able to execute the cell below to see if the new method works as intended. If it does work as intended, you should find that the median value is 8.

# DO NOT EDIT THIS CODE. If it doesn't work, you need to make changes to the class above.
data = [1, 2, 4, 5, 8, 10, 12, 15, 16]
analyzer = DataAnalyzer(data)
print('Mean:', analyzer.mean()) 
print('Standard deviation:', analyzer.std())

# Test out the new "median" method
print('Median:', analyzer.median())

Question 5.2 (2 points): Now, add a new attribute to the class called “label”. This attribute should be a string that contains a label for the dataset that is being analyzed. This label should be set when the class is initialized (similar to how the “data” attribute is set) and the value should be defined by an input argument for the class.

Run the code provide below to see if this new attribute works as intended.

# DO NOT EDIT THIS CODE. If it doesn't work, you need to make changes to the class above.
data = [1, 4, 5, 8, 10, 12, 15]
label = "Testing Data"
analyzer = DataAnalyzer(data, label)
print('The mean of %s is:' %analyzer.label, analyzer.mean()) 
print('The standard deviation of %s is:' %analyzer.label, analyzer.std())
print('The median of %s is:' %analyzer.label, analyzer.median())

Question 5.3 (4 points): Create another new method called “find_extreme” that takes a single argument, “direction”, which should be a string that is either “max” or “min”. This method should return the maximum or minimum value in the dataset, depending on the value of the “direction” argument. Your new method should also print some sort of warning message if neither “min” nor “max” is specified for the direction argument, alerting the user of their error

Run the code provide below to see if this new method works as intended. You should be able to confirm that the minimum and maximum values are correct by looking at the “data” variable defined below.

# DO NOT EDIT THIS CODE. If it doesn't work, you need to make changes to the class above.
data = [1, 4, 5, 8, 10, 12, 15]
label = "Testing Data"
analyzer = DataAnalyzer(data, label)
print('The maximum of %s is:' %analyzer.label, analyzer.find_extreme("max")) 
print('The minimum of %s is:' %analyzer.label, analyzer.find_extreme("min")) 
analyzer.find_extreme("mean")

Question 5.4 (1 point): Now that you have a label attribute as part of your class, it would be useful if the plot_histogram method use this new attribute to label the x-axis of the histogram. Modify the plot_histogram method so that it uses the label attribute as the x-axis label.

Run the code provide below to see if your modification to the method works as intended.

# DO NOT EDIT THIS CODE. If it doesn't work, you need to make changes to the class above.
data = [1, 4, 5, 8, 10, 12, 15]
label = 'Testing Data'
analyzer = DataAnalyzer(data, label)
analyzer.plot_histogram()

Testing your new DataAnalyzer class on real data#

Now that you have an enhanced version of the initial DataAnalyzer class that your team leader generated using Claude, let’s see if it works as intended on some real data! Specifically, since you already spent some time getting familiar with the earthquake data in Part 3 of this assignment, let’s see if your new class produces results that makes sense based on your previous observations.

Question 5.5 (3 points): Create a new instance of your DataAnalyzer class called earthquake_analyzer and initialize it with the magnitude values from earthquake data from Part 3 of this assignment. Make sure to set the label attribute to something that makes sense! (Note: You should use the data for all the US states, but not places outside the US. Make sure you don’t use just the Michigan data or Indiana data that you looked at previously)

Once you’ve created your new instance of the class, use the full range of functionality you’ve added to the class to explore the data and see if you can confirm your observations from Part 3 of this assignment. Make sure you produce a histogram of the magnitude values and that you calculate the mean, standard deviation, median, minimum, and maximum values of the dataset.

### Put your code here

Question 5.6 (1 point): Do the results from your new class match the results from your findings in Question 3.5? Explain why or why not.

Do This: Record your answer to Question 5.6 here.


🛑 STOP#

Pause to commit your changes to your Git repository!

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 5”, no need to push the changes to GitHub yet, but you can if you want.


Back to ToC

Part 6: Finishing (3 points)#

Question 6.1 (1 points): Have you put your name and GitHub username at the top of your notebook?

Question 6.2 (1 points): Have you added the TA and Instructor to your GitHub repository? (You should have done this in Part 1, and they should have shared this information via Slack)

Question 6.3 (1 points): Finally, push your repository to GitHub so that all of the commits that you have been making along the way show up on GitHub.

# Put the command you used to push to GitHub here

NOTE: The grader will be able to see your commit messages and whether you pushed the repo at this stage, if everything have gone as planned. Double-check that things look correct on GitHub before you submit this notebook to D2L.


Assignment wrap-up#

Congratulations, you’re done!#

Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Homework Assignments” folder, find the dropbox link for Homework #1, and upload it there.

© Copyright 2026, Department of Computational Mathematics, Science and Engineering at Michigan State University