Homework 2: Visualizing Anthropometry Data

Homework 2: Visualizing Anthropometry Data#

More plotting with Matplotlib and Modeling Human Size#

da vinci human

✅ Put your name here.#

Learning Goals#

In this homework you will get a bit more practice with some of the things that numpy and matplotlib can do. You’ll also use NumPy and Maplotlib to visualize anthropometry measurements.

Anthropometry is the scientific study of the measurements and proportions of the human body. This field has a critical role in several industries including clothing design, ergonomics, healthcare, consumer safety, and architecture.

By the end of the homework assignment you will have practiced:

Loading data using NumPy
Making a variety of plots in Matplotlib
Creating NumPy arrays
Defining functions that manipulate NumPy arrays
Plotting model values and loaded data values
Comparing models to data “by eye”

Table of Contents#

Part 1. Loading data in NumPy (7 points)
Part 2. Making plots with Matplotlib (10 points)
Part 3. Working with data in NumPy (11 points)
Part 4. Modeling Growth (12 points)

Assignment instructions#

Work through the following assignment, making sure to follow all the directions and answer all the questions.

This assignment is due at 11:59 pm on Monday, February 19. It should be uploaded into the “Homework Assignments” submission folder for Homework #2. Submission instructions can be found at the end of the notebook.

Total number of points: 40 (points)

Part 1: Loading data in NumPy (7 points)#

In this section, we’re going to review how NumPy can be used to read and write data. Later on in the course, we will learn about another, more complex module for interacting with data, but NumPy will do the trick for now.

✅ Question 1.1: Make sure you set up your notebook to produce matplotlib plots and import the right modules, as well NumPy. Do that here.

# Put your code here

✅ Question 1.2: The first thing you’re going to do is make sure you can load in the data using NumPy. There is a file on the website and (Teams just in case) called anthrokids_subset.csv. Make sure that you download the file and move it to the same location on your computer as this Jupyter Notebook. The extension of this file (.csv) stands for “comma-separated values” and is commonly used for datasets. It is basically just a text file where each row of this file is the measurement of a separate person in the data set. The columns are different measures on each person and are separated by commas.

The anthrokids_subset.csv file contains 5 columns with the following contents:

Age (years)
Weight (kg)
Height (cm)
Waist Circumference (cm)
Head Circumference (cm)

These data are a subset of the data acquired in the Synder et al, 1975: ANTHROPOMETRY OF INFANTS,CHILDREN, AND YOUTHS TO AGE 18 FOR PRODUCT SAFETY DESIGN, performed by the U.S. Consumer Product Safety Commission to design some of the original car seats for children.

Your job is to read the data into your Jupter Notebook using NumPy, specifically using the loadtxt() function, which will load the data into a multi-column NumPy array. Once you’ve read in the data, print the array to see a portion of the array on the screen.

When you load in the data, you can either read the data in and store it in a single variable, or you can use the unpack argument to read the data into a separate variable for each column. As a reference, you may wish to review the examples at the bottom of the documentation page for the loadtxt() function link:

# Put your code here

✅ Question 1.3: Before analyzing data, it is always good to summarize a portion of the data to get a sense of the population represented in the data. Answer the following in complete sentences, using print() statements:

How many separate children were measured in this data?
What is the age of the youngest and oldest child in this dataset?
Likewise, what is the height of the tallest child? (For example, “The youngest child is x years old”)

# Put your code here

✅ Question 1.4: You are consulting for a clothing company making belts for children. What would you recommend for the average and standard deviation of belt sizes (in inches, not centimeters) for the population represented in this data set? (use print() statements to report your answer)

# Put your code here

Part 2: Making plots with Matplotlib (10 points)#

For this section, we’ll practice making some plots using the matplotlib module. You may or may not have seen some of these plotting commands before, when you come across something you don’t understand, you should try consulting the matplotlib documentation first, which you can find here.

You might also find it useful to look at the examples and the tutorials.

✅ Question 2.1: Plot a histogram of the ages of the children in this data set. Label the axes appropriately.

# Put your code here

✅ Question 2.2: Play around with the option of the hist function to better present the data and to answer this question. Based on the histogram or other calculations, roughly what is the most common age in this dataset?
(exact answer is not necessary).

Put your answer here

✅ Question 2.3: Create two separate scatter plots one next to the other. The one on the left should contain weight versus age and the one on the right other height versus age (age is on the x-axis). Choose an appropriate marker size to help visualize the multitude of the points. Don’t forget to label all the axes appropriately.

# Put your code here

✅ Question 2.4: Based on visual assessment of these graphs, does weight increase more rapidly or height more rapidly as a human ages. Explain your answer.

Put your answer here

✅ Question 2.5: Make a two-dimensional histogram using hist2d() of weight and height. This represents a “”top-down” or “bird’s-eye” view of the distribution of children for each weight and height. Use 60 bins and add a colorbar to the plot. Make sure both axes and the colorbar all have appropriate labels. “Frequency” is good label choice for the colorbar because the 2D histogram will indicate how often values fell into a particular (\(x\),\(y\)) bin.”

# Put your code here

If everything went as planned, you should see in this 2D histogram representation that it is more common for children to share similar heights and weights, when height and weight are low (i.e., when a child is young). As children age, their heights and weights vary more.

Part 3: Working with data in NumPy (11 points)#

In this section, we’re going to practice using functions and learn how NumPy can be used to write data.

✅ Question 3.1: Create a function to calculate body mass index (BMI). BMI has been shown to be moderately correlated with total body fat and is a common measure for determining if a patient may have adverse health outcomes due to weight.

\[ \textrm{BMI} = \frac{\textrm{Weight (kg)}}{ (\textrm{Height (m)}) ^2} \]

where weight is in kg and height in meters (not centimeters).

Given this equation, the function should have inputs weight and height, and then return BMI.

# Put your code here

✅ Question 3.2: Using this new function, calculate the bmi of all the children in this data set and store in a new numpy array. Then, use the savetxt() command in NumPy to create a new data files called bmi_all.txt that store the bmi calculations. We’re using the extension “.txt” to enforce that this is a simple text file that could be read with any text editor. Make a plot of BMI vs age to make sure the values are what you expect them to be.

# Put your code here

✅ Question 3.3: Now, read the text file you just created back into a numpy array and call this new array bmi2. Plot bmi2 vs the original list of bmi’s you created above to make sure the values are the same as before. If everything worked as intended, you should get a straight diagonal line on your plot, which indicates a direct one-to-one mapping between your original values and the values you just read in from your new text file.

# Put your code here

✅ Question 3.4: For adults (age 18-65 years), BMI’s in the range of 18.5 to 24.9 kg/m\(^2\) are considered healthy. A BMI of 25.0 or more is considered overweight. Determine how many adults in this dataset are overweight.

# Put your code here

Part 4: Modeling Growth (12 points)#

In this section, we’re going to practice applying a model to the relationship of height and age. This will entail combining several of the programming concepts to date.

Multiple different relationships have been proposed to describe how height changes with age. To start, a simple model would be a linear relationship.

✅ Question 4.1: Define a function to compute height according to the relationship:

\[ h = m a + b \]

where \(h\) is height in cm, \(a\) is age in years, and \(m\) and \(b\) are two parameters. For now, set \(m\) and \(b\) equal to 5.33 cm/yr and 82.00 cm. As a sanity check, call this function for age=15 years. It should return a height of 162 cm.

# Put your code here

✅ Question 4.2: Create a NumPy array called “increase_age” that goes from 2 to 20 in increments of 0.5 using np.arange(). Using the linear relationship function defined above, calculate the height for all values in increase_age and store in an array named linear_height. Create a scatter plot with the original data of height vs age (same as plot made above). Add a line plot of linear_height vs increase_age on top of the first plot (use a different color than the scatter points). Label your plot with appropriate axis labels.

# Put your code here

Additional Models: The ICP-model is a more realistic model of growth with age. This model proposes that during childhood (age 2-12 years old), a quadratic relationship is appropriate:

\[ h = p_1 a^2 + p_2 a + p_3 \]

where h is height in cm, a is age in years. After puberty (>=12 years old), there is a growth spurt followed by a plateau, which can be modeled as a logistic function:

\[ h = \frac{p_4}{1 + e^{p_5 - p_6a}} + p_7\]

The parameters \([p_1, p_2 ... p_6, p_7]\) should be set to [-0.235, 9.6, 65.2, 22.5, 16.4, 1.18, 145.3] respectively, with corresponding units [cm/yrs\(^2\), cm/yrs, cm, cm, 1, 1/yrs, cm].

✅ Question 4.3: Define one function to return the childhood height for any given age and one function to return the the post-childhood height for any given age.

### Put your code here

✅ Question 4.4: Calculate the height using the ICP-model for all the ages in the array increase_age. As a suggestion, inside a loop over all ages in increase_age, you can have a conditional statement to determine if age is pre- or post- puberty, and then use the appropriate function above.

Create a scatter plot of the original height vs age (same as plot made above). Add this ICP-model height vs increase_age on top of the plot. Label your plot with appropriate axis labels.

### Put your code here

✅ Question 4.5: Does the linear model or ICP-model fit the height to age relationship better. How can you tell? Explain your reasoning.

Put your answer here

Wait! Before you submit your notebook, read the following important recommendations#

When your TA opens your notebook to grade the assignment, it will be really useful if your notebook is saved in a fully executed state so that they can see the output of all your code. Additionally, it may be necessary from time to time for the TA to actually run your notebook, but when they run the notebook, it is important that you are certain that all the code will actually run for them!

You should get into the following habit: before you save and submit your final notebook for your assignments, go to the “Kernel” tab at the top of the notebook and select “Restart and Run all”. This will restart your notebook and try to run the code cell-by-cell from the top to the bottom. Once it finished, review you notebook and make sure there weren’t any errors that popped up. Sometimes, when working with notebooks, we accidentally change code in one cell that break our code in another cell. If your TA stumbles across code cells that don’t work, they will likely have to give you a zero for those portions of the notebook that don’t work. Testing your notebook one last time is a good way to make sure this doesn’t happen.

Once you’ve tested your whole notebook again, make sure you save it one last time before you upload it to D2L.

Congratulations, you’re done!#

Submit this assignment by uploading it to the course Desire2Learn web page. Go to the “Homework Assignments” section, find the submission folder link for Homework #2, and upload it there.