Homework Assignment 1

Homework Assignment 1#

Git practice, debugging practice, and new Python packages, and Python classes#

✅ Put your name here.
#

✅ Put your GitHub username here.
#

Goals for this homework assignment#

By the end of this assignment, you should be able to:

Use Git to create a repository, track changes to the files within the repository, and push those changes to a remote repository.
Debug some basic Python code that involves Pandas.
Read documentation and example code to use a new Python package
Modify and use a simple Python class

Work through the following assignment, making sure to follow all of the directions and answer all of the questions.

There are 68 points possible on this assignment. Point values for each part are included in the section headers and question prompts.

This assignment is due at 11:59 pm on Friday, February 21st. It should be uploaded into the “Homework Assignments” submission folder for Homework #1 on D2L. Submission instructions can be found at the end of the notebook. You must also fill out a survey regarding this assignment. The link to this survey can also be found at the end of the notebook.

Table of contents#

Part 0: Office Hours and Help Room (6 points)
Part 1: Git and CLI (14 points)
Part 2: Debugging (7 points)
Part 3: Downloading and analyzing unfamiliar data (8 points)
Part 4: Using documentation to use a new Python package (11 points)
Part 5: Practice with using Python classes (16 points)
Part 6: Finishing (6 points)

# Calculate total points possible and print it
print("Total number of points possible on this assignment is %i." %(6+14+7+8+11+16+6))

Back to ToC

Part 0: Visiting Office Hours or Help Room (6 points)#

Going to Office Hours or Help Room#

Why are we doing this?#

We want to make sure that everyone knows how to access the resources available to you. One of the best resources you have at your disposal is office hours/help room.

What will you do?#

(At minimum) Go to one office hour or help room session (it doesn’t matter which one you go to). Come with one question that you would like to talk about. It can be big or small. It can be about the homework, but it doesn’t have to be. It can be anything about the course or about computational modeling and data analysis in general.

Once you get to office hours or help room, ask your question. All of the instructors for CMSE 202 (Professors, TAs, and LAs) will be adding to a running list of folks that we see during office hours; as long as your name appears on the list, you’ll get credit for this part of Homework 1.

NOTE: The day when the homework is due (Friday, February 21st at 11:59pm) will be the busiest time for folks to go to office hours or help room. You are STRONGLY encouraged to go to office hours or help room before Friday to get credit for this part of this assignment. (You should still feel free to go to office hours or helproom on Friday for help, though!)

You can find the office hours calendar on the course website.

FINAL NOTE: If you are unable to attend office hours or help room, please contact the instructor to make alternative arrangements and explain why you are unable to attend.

✅ Question 0.1 (6 points)

Type below the question you asked and who you asked it to (make sure you know who you’re talking to!). Make sure you double-check that the instructor made note of this.

If you did not attend office hours or help room, please explain why.

✎ Put your question here.

✎ Put the instructor name here of the person you spoke with

Back to ToC

Part 1: CLI and Git (14 points)#

Setting up a git repository to track your progress on your assignments#

git is a very important professional tool and we want you to get plenty of practice using it. The following set of questions prompt you to create a (private) Git repo for storing, updating, and turning in your homework assignments. You will share this repo with your course lead instructor and TA so that they can pull your completed assignments for grading.

✅ Question 1.1 (2 points):

On GitHub make sure you are logged into your account and then, if you haven’t already, create a new private GitHub repository called cmse202-s25-turnin. Important note: you may have already created repository in a PCA, if you have, please use that one. If you have not, please create a new one.
Once you’ve initialized the repository on GitHub, clone a copy of it onto JupyterHub or your computer.

# Put the command you used to clone the repository here

✅ Question 1.2 (1 point): Using the command line interface, move inside the repository folder

What command did you use to enter into the folder?

# Put the command to move into a new directory here.

✅ Question 1.3 (1 point): Once inside the cmse202-s25-turnin repository, create a new folder called hw-01.

What is the command to create the new folder ?

# Put the command to create the folder/directory here

✅ Question 1.4 (1 point): Move this notebook into that new directory in your repository then check the status of the repository

This is an important step: you’ll want to make sure you save and close the notebook before you do this step and then re-open it once you’ve added it to your repository. If you don’t do this, you could end up working on the wrong version of the notebook! Once you’ve moved the notebook correctly, re-open it and continue working on it.

# Put the command you used to check the status of your repository here.

✅ Question 1.5 (1 point): Copy and paste below the output of the status command.

# Paste it here

✅ Question 1.6 (1 point): What is the name of the current branch of the repository that you are in? (Hint: There should only be one branch at this time. We’ll learn more about branches in git later in the semester.)

# Put your answer here

✅ Question 1.7 (3 points): If you haven’t already, add your name and GitHub username to the top of the notebook, then add and commit ONLY the notebook.

# Put the command(s) to add and commit here

What is the commit message you used ?

# Copy your commit message here

✅ Question 1.8 (1 point): Before moving on. Check that the notebook you are working on is the correct one. Run the following cell. Are you in the new folder you just created? If not close this notebook and open the one in the hw-01 folder. You’ll likely need to copy of over the work you did on the above questions if you were working on the wrong notebook.

What command did you use to check which directory you are? What command did you use to list the files in the folder?

# Put the command you used to push your changes to GitHub here

✅ Question 1.9 (3 points): Assuming that your notebook is in the right place and committed to your repository, push your changes to GitHub.

What command did you use to push your changes to GitHub?

# Put the command you used to push your changes to GitHub here

Before moving on…#

Important: Make sure you’ve added your Professor and your TA as collaborators to your new “turnin” respository with “Read” access so that they can see your assignment. You should check this website for your section of the course to get this information.

Double-check the following: Make sure that the version of this notebook that you are working on is the same one that you just added to your repository! If you are working on a different copy of the notebook, none of your changes will be tracked.

If everything went as intended, the file should now show up on your GitHub account in the “cmse202-s25-turnin” repository inside the hw-01 directory that you just created. Periodically, you’ll be asked to commit your changes to the repository. By the end of the assignment you should have multiple commits that correspond to your completion of each section (as specified below). Of course, you can always commit your changes more often than that, if you wish. It can be good to get into a habit of committing your changes any time you make a significant modification, or when you stop working on the project for a bit.

Back to ToC

Part 2: Debugging Pandas code (7 points)#

Reading Python and Pandas code and understanding errors and error messages#

In this section, you will practice reading and debugging code, specially examples that use Pandas (since we’ll be regularly using Pandas in the course and we spent some time reviewing Pandas in class). Debugging can be one of the most frustrating and time consuming part of a computational project, hence, it’s worth spending time parsing and debugging error messages.

Review the following code. Make sure to read the comments to understand what the code is supposed to do. Then run the code and see what it outputs and/or the error message. Finally, make a copy of the code in the provided cell and then fix the code. When you fix the code add a comment to explain what was wrong with the original code.

IMPORTANT NOTE #1: not every block of code will result in an error message, but it won’t produce the desired output. Even if there is no error, there is something you need to fix within the code.

IMPORTANT NOTE #2: In some cases, the example may use a bit of Pandas code that you’re not familiar with yet, in these cases, you’ll need to consult the internet (or the Pandas documentation) to figure out what the code is doing. This is a very common practice in computational modeling and data analysis.

Import Pandas before moving on!#

# Import Pandas
import pandas as pd 

✅ Questions 2.1 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

## DO NOT CHANGE THIS CELL ##

# Group df by column 'Subject' and take the mean 

df = pandas.DataFrame({'Subject': ['Physics', 'Math',
                              'Math', 'Physics'],
                   'Scores': [88, 76, 92, 82]})

df.groupby('Subject').mean()

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

✅ Questions 2.2 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

## DO NOT CHANGE THIS CELL ##

#use this pandas function to display all the dates between when hw1 is released until
#when the hw is due

dates = pd.date_range("01312025", periods=22)

dates

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

✅ Questions 2.3 (2 points): Review the following piece of Pandas code, read the comments to understand what it is supposed to do, then run the code to see what the output is. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

Note Assume the original dataframe ‘df’ cannot be changed. ie don’t modify the values in ‘df’ manually by erasing and re-typing, but by using functions and data cleaning methods to modify the data.

Note: The resulting dataframe should look like this:

	A	B
0	3	3
1	8	4

## DO NOT CHANGE THIS CELL ##

# Take a DataFrame with two columns of numbers
# multiply the two columns by each other
# Replace the old column 'A' with the new numbers
# Display the DataFrame
df = pd.DataFrame({'A': ['1', '2'], 'B': [3, 4]})
df['A'] = df['A']*df['B']
df

If you need to write any code to explore the nature of the bug, please do so in the cell below.

# Put exploratory code here, if needed

DO THIS: Now that you understand what the bug is, fix it in the cell below and add a comment explaining what the bug was and how you fixed it.

# Put your non-buggy code here

🛑 STOP#

Pause to add and commit your changes to your Git repository! (1 point)

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 2”, no need to push the changes to GitHub, but you can if you want.

Back to ToC

Part 3: Downloading and analyzing unfamiliar data (8 points)#

For this part of the homework assignment, you’re to download and analyze a dataset potentially unknown to you. Go to this website and dowload the associated dataset (perhaps try download data as zip). [https://www.kaggle.com/datasets/jaidalmotra/pokemon-dataset/data]. You’ll perform some simple, exploratory analysis.

Make sure this file and your hw file are in the same folder location!

✅ Question 3.1 (2 points): Do this now: read in this file using pandas and display the first 16 rows. You should see the Pokemon names and numbers along with types and other stats.

#put your code here

✅ Question 3.2 (2 points): Now let’s do some cleaning. Create a new dataframe with only the water type pokemon. Do this by creating a mask to filter the original dataset with the type1 column. How many pokemon have the water typing (using this masking method)?

### Put your code here

✅ Question 3.3 (3 points): Construct a new data frame from your water type dataframe using a mask by only retaining the water pokemon with an attack stat of 100 or higher and name this dataframe “df_water_attackers”. Do the same thing with a different column of stats (hp, defense, sp_attack, sp_defense, or speed) and give this dataframe an appropriate name as well. Which of these two dataframes has more pokemon and how did you answer this question? (show your work/code)

### Put your code here

🛑 STOP#

Pause to commit your changes to your Git repository! (1 point)

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 3”, no need to push the changes to GitHub yet, but you can if you want.

Back to ToC

Part 4: Finding and Using a Python Package from Github (11 points)#

In this part of the assignment you will need to go through a Github page with a new package.

We will use a new Python package pyjokes. As you work on this part of the assignment, you should take advantage of the pyjokes Github page which contains some really useful information

✅ Question 4.1 (1 point): If you don’t already have the pyjokes package installed, what command could you use to install it? (you should run this command on the command line, if you need to).

If you do already have it installed, what command did you use to install it?

# Put the command for installing pyjokes here!

✅ Question 4.2 (2 points): Use pyjokes to output a joke in this notebook.

### Put your code here

✅ Question 4.3 (2 points): display three jokes from pyjokes with one in Spanish, one in Russian, and one in a language of your choice other than English, Spanish, or Russian.

### Put your code here

✅ Question 4.4 (1 point): What are the names of the two/three different joke categories?

Put your answer here#

✅ Question 4.5 (4 points): Where are the jokes coming from? Find the file that has the joke data in english and add this file to your hw_1 folder. How did you accomplish this task? Please describe in detail what steps you took to find the file and add it to your hw_1 folder.

🛑 STOP#

Pause to commit your changes to your Git repository! (1 point)

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 4”, no need to push the changes to GitHub yet, but you can if you want.

Back to ToC

Part 5: Practice with using Python classes (16 points)#

For this part of the assignment, you’re going to work on fleshing out a partially constructed Python class and then experiment with using it to see if it works as intended.

The background#

Curve fitting is a commonly used method to develop a math function that represents the behavior of a mdataset. It aims to find the best-fit curve that minimizes the sum of squared differences between the predicted and actual values.

In this part, we start with reviewing a sequential (procedurial) code and then appreciate the essentials of OOP code. Below is sequential code using scipy optimization libray to curve-fit a set of data points to a function:

\(f(x) = a\times\exp(-bx)+c\),

where the coefficients \(a\), \(b\), and \(c\) are to be determined.

First, let’s download the dataset using curl command from the URL: https://raw.githubusercontent.com/huichiayu/cmse202-s25-supllemental_data/refs/heads/main/HW01/xy_dataset.csv.

(1 point)

# write your code in this cell

Run the cell below to load necesary libraries.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

Use Pandas to read the data. You should have two numpy array: one for xdata and the other for ydata. Plot this dataset in scatter plot. (1 point)

# write your code in this cell

# print the values
print(xdata)
print(ydata)

Below is the sequential code. Review and run it. Make sure you understand what this code is doing.

# procedurial code
# plot data points
plt.plot(xdata, ydata, 'bo', label='data')


## define the function to be fitted. Here we use an exponential function. 
def func(x, a, b, c):
    return a * np.exp(-b * x) + c


## use curve fitting function in the Scipy library
popt, pcov = curve_fit(func, xdata, ydata)
print(popt)

## draw the obtained curve
new_x = np.linspace(0, 4, 100)
new_y =func(new_x, popt[0], popt[1], popt[2])

plt.plot(new_x, new_y, 'r-', label='data')

Now let’s create a Python class with the same functionalities for curve fitting. Below is a skeleton code as the starting point. Review the code and try running it. Make sure you understand what this code is doing.

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit



class FitData:

    def __init__(xdata, ydata):
        self.xdata = xdata
        self.ydata = ydata
        
    def summary_stats(self):
        return {
            "mean_x": np.mean(self.xdata), 
            "std_x": np.std(self.xdata),
            "mean_y": np.mean(self.ydata),
            "std_y": np.std(self.ydata)
        }

    def CurveFit_model(self):
        popt = curve_fit(self.func, self.xdata, self.ydata)
        return {
            "coefficient a": popt[0],
            "coefficient b": popt[1], 
            "coefficient c": popt[2]
        }

Modifying the class to alter its behavior and add new functionality#

For the remainder of this section of the assignment, you will be modifying the code of the class ( provided above) to add new functionality and alter its behavior.

When you make edits to the class provided, make sure to run the cell to save your changes before running tests!

Note: Feel free to experiment with using one of the generative AI tools out there to help you expand upon and modify the initial starting point for this new Python class. If you do this, make sure to include a link to the tool you used in the markdown cell below along with the prompt you used to generate the code and the date you accessed the tool. Additionally, it is important to make sure that you understand the code you’re working with, so make sure to review the code that is generated and make sure you understand what it is doing!

Let’s define the target function in the cell below. Run it.

import numpy as np

def func(x, a, b, c):
    return a * np.exp(-b * x) + c
    

Copy the OOP code to this cell and work from here.

# Copy the code to this cell.
# For the assignment prompts that follow, EDIT THIS VERSION OF THE PYTHON CLASS
# This should help to ensure that you can always fall back to the original version provided above, should you need to.

✅ Questions 5.1 (3 points): Test the class by running the following code to see what the output is. Debug the FitData class. DO NOT MODIFY THIS CODE CELL. (so that you can remember what the bug was)

## DO NOT CHANGE THIS CELL ##

# This is an example usage of the class "ModelData" If it doesn't work, you need to make changes to the class.
import matplotlib.pyplot as plt

fn = func

data = FitData(xdata, ydata)
stats = data.summary_stats()
model = data.CurveFit_model(func)


print(stats)
print(model)

DO THIS: Now that you understand what the bug(s) is(are), fix it(them) in the cell below and add a comment explaining what the bug(s) was(were) and how you fixed it.

# Put your non-buggy code here

✅ Question 5.2 (4 points): Now, create a new class method named plot_model. It takes three input arguments: magnitude, exponet, and intercept. For the output, it generates two plots on the same figure: one is the scatter data points {(xdata, ydata)} and the other is the fitted curve.

# Put your code here

✅ Question 5.3 (2 points): Create a new class method called predict that predicts and returns the model prediction for a given input value \(x\).

If your predict method in FitData class works well, you should be able to test your FitData model using the cell below.

## DO NOT CHANGE THIS CELL ##

# This is an example usage of the "predict" method. If it doesn't work, you need to make changes to your method.

CurvF_test1 = FitData(xdata, ydata)
CurvF_test1.CurveFit_model(func)
CurvF_test1.plot_model()

x_new = np.array([5.0, 6.5, -1.5]) 
y_pred = CurvF_test1.predict(x_new)

y_pred

# Put your code here

Testing your `FitData` class on real data#

Now that you have an enhanced version of the initial FitData class, let’s see if it works as intended on some real data! Let’s see if your new class produces results that makes sense on Iris data.

✅ Question 5.4 (4 points): You will need to do this in the following steps:

First, download the Iris data from https://raw.githubusercontent.com/yangy5/HWFiles/main/Iris.csv.
Extract the data of species “versicolor”.
Extract the data “sepal_length” and “sepal_width” from your versicolor data frame and save these values in two numpy arrays. (1 pt)
Next, create a new instance of your FitData class, then use these numpy arrays as data points to fit the target curve given eariler. (2 pt)
Finally, use your plot_model method (see Question 5.2) to plot the line as well as all the data points. (1 pt)

# download the data using curl

# use Pandas to load data

# Fit and plot the Iris sepal_length and sepal_width data

🛑 STOP#

Pause to commit your changes to your Git repository! (1 point)

Take a moment to save your notebook, commit the changes to your Git repository using the commit message “Committing Part 5”, no need to push the changes to GitHub yet, but you can if you want.

Back to ToC

Part 6: Finishing (6 points)#

Question 6.1 (2 points): Have you put your name and GitHub username at the top of your notebook?

Question 6.2 (2 points): Have you added the TA and Instructor to your GitHub repository? (You should have done this in Part 1, and they should have shared this information via Slack)

Question 6.3 (2 points): Finally, push your repository to GitHub so that all of the commits that you have been making along the way show up on GitHub.

# Put the command you used to push to GitHub here

NOTE: The grader will be able to see your commit messages and whether you pushed the repo at this stage, if everything have gone as planned. The version on Github will be graded for this assignment.

Assignment wrap-up#

Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credit for the assignment!

from IPython.display import HTML
HTML(
"""
<iframe 
	src="https://forms.office.com/r/XKjjVkQDca" 
	width="800px" 
	height="600px" 
	frameborder="0" 
	marginheight="0" 
	marginwidth="0">
	Loading...
</iframe>
"""
)

Congratulations, you’re done!#

If you like, you can upload this file to D2L for a record. Nevertheless, we will grade the copy on GitHub.

Homework Assignment 1

Contents

Homework Assignment 1#

Git practice, debugging practice, and new Python packages, and Python classes#

✅ Put your name here.#

✅ Put your GitHub username here.#

Goals for this homework assignment#

Table of contents#

Part 0: Visiting Office Hours or Help Room (6 points)#

Going to Office Hours or Help Room#

Why are we doing this?#

What will you do?#

Part 1: CLI and Git (14 points)#

Setting up a git repository to track your progress on your assignments#

Before moving on…#

Part 2: Debugging Pandas code (7 points)#

Reading Python and Pandas code and understanding errors and error messages#

Import Pandas before moving on!#

🛑 STOP#

Part 3: Downloading and analyzing unfamiliar data (8 points)#

🛑 STOP#

Part 4: Finding and Using a Python Package from Github (11 points)#

Put your answer here#

🛑 STOP#

Part 5: Practice with using Python classes (16 points)#

The background#

Modifying the class to alter its behavior and add new functionality#

Testing your FitData class on real data#

🛑 STOP#

Part 6: Finishing (6 points)#

Assignment wrap-up#

Congratulations, you’re done!#

✅ Put your name here.
#

✅ Put your GitHub username here.
#

Testing your `FitData` class on real data#