Homework 1#
Deadline
Homework is due Sunday, Jan 25, 2026, at 11:59 p.m. and must be submitted on https://www.msu-stt-learning.com/.
Instructions#
This homework covers two classes. Problems listed below are from the textbook.
Mon 1/12, we covered Ch 1 and reviewed Python (Section 2.4).
For loading datasets, please use the dataset URL (https://msu-cmse-courses.github.io/CMSE381-S26/_downloads/d75c3811a83a66f8c261e5b599ef9e44/Auto.csv), as demonstrated in our coding practice.
Section 2.4.9: Use the Auto dataset from class.
Weds 1/14, we covered Sec 2.2
2.4.2. Explain your reasoning. Note that parts a and b were likely discussed in class.
2.4.4 a,b. Describe one example for each of part a, and b.
Fri 1/16, we covered Ch 2.2.1 - 2.2.2
2.4.3 (Note we have only done irreducible error, not Bayes error since that has to do with classification)
2.4.8
For loading datasets, please use the dataset URL (https://msu-cmse-courses.github.io/CMSE381-S26/_downloads/cc29ec6408d657de88bc7fe6de6b1170/College.csv)
Note that there is a typo in 2.4.8f, at least with respect to the version of the data set we have. The information in the
Top10perccolumn is a percentage 0-100 rather than a value between 0 and 1. So use the following replacement code instead.
college['Elite'] = pd.cut(college['Top10perc'], [0,50,100], labels=['No', 'Yes'])
Suggested Question Order Q1: 2.4.9(a); Q2: 2.4.9(b); Q3: 2.4.9(c); Q4: 2.4.9(d); Q5: 2.4.9(e); Q6: 2.4.9(f); Q7: 2.4.2(a)–(c) (conceptual—can be answered in one cell); Q8: 2.4.3 (embed images in the notebook or upload to the Additional Files folder); Q9: 2.4.4(a),(b) (conceptual—can be answered in one cell); Q10: 2.4.8(c); Q11: 2.4.8(d); Q12: 2.4.8(e); Q13: 2.4.8(f); Q14: 2.4.8(g); Q15: 2.4.8(h).
Important
Standard instructions for submissions and deadlines can be found on the Homework Info Page.