CMSE 495

Logo

This is the webpage for CMSE495 Data Science Capstone Course (Spring 2022)

View the Project on GitHub msu-cmse-courses/cmse495-SS22

In-Class Assignment: Git Log Data Collection

Silhouettes of people with the words "teamwork" underneath

Free image from pixabay

In class today we are going to try an experiment to write some code as a team. We will take a problem and divide it up into parts. Each person or sub-groups of people will work on their part and then we will try to compile them as a group and see if they all work together.

Everyone will be given 2 class periods (today and next Friday) to finish this project and teams will present their work at the end of the second class.

Agenda (80 Minutes)

Link to todays slides

0. Signing Instructions

The instructions were not clear so there was a LOT of confusion regarding Signing your student agreements and turning them in. Please review the following video and double check your submission.

Direct Link to video


1. Team Charter

Reminder, the Team Charter assignment is due on Sunday


2. Group Programming Project

Project description

A git repository keeps track of individual authors and their changes. We want to write a program to evaluate the contribution of authors in a git repository for grading. This will involve, generate a list of all of the contributors, measure each authors contribution and graphing the results.

The course instructor has broken the project down the following programming components:

  1. Make a list of git contributions (log)
  2. Measure the magnitude of a contribution (diff)
  3. Summarize the contribution of each author (grade)
  4. Graph the results suitable to inform a grade decision (graph)
  5. Management Group

Assuming we get all of these steps written as functions we could imagine a program running from inside a git directory using the following syntax (or something similar):

from group1 import git_log
author_table = git_log() #Convert the output of "git log" command to a pandas table.

from group2 import git_diff
hash1 = author_table['Hash'][0]
hash2 = author_table['Hash'][1]
nlines = git_diff(hash1, hash2)  # Show the number of lines for an individual hash contribution.

from group3 import git_grade
authors = git_grade(author_table) # Generate a list of authors and their contribution for a repository.

from group4 import git_graph
git_graph(authors) # Graph the results in a meaningful way. 

Where each of the variables are of the following types:

Forming Teams

We are going to try to write each component of our software separately and then assemble them as a team. The instructor has assigned each group into the following teams:

  Team A Team B
Management AFRL Delta Dental
git_log Argonne Ford
git_diff QSIDE Hope Village
git_grade Kelloggs Neogen
git_graph Boeing Old Nation

DO THIS: There are two breakout rooms set up (one for Team A and one for team B). Please join the breakout room for your team and conduct a short meeting.

  1. Identify the members of the Management group
  2. Have someone from the management group create a Zoom room with individual breakout rooms for the four working groups (The management group will stay in the main breakout room).
  3. Share the zoom room URL with everyone on the team (do not switch yet).
  4. Return to the main room and share your team’s zoom room with the rest of the class.

We should now have three zoom rooms you can use. Make sure you have all three written down someplace!!! The main course zoom room, Team A zoom room and Team B zoom room.

DO THIS: Once you have noted all three zoom rooms. Join your teams room zoom room, get into your group’s breakout room and start working.

For the groups in charge of generating code you should focus on doing the following:

  1. Have all of your members read though this entire document to see how your part of the project will fit in with other parts of the project.
  2. Write a stub function for your part of the code. A stub function provides the inputs and outputs in a format that can be tested by the other groups. The groups need to agree and share stub functions using the git repository. NOTE: Make sure you get this done first before writing your main program.
  3. After each step graph or plot the output to make sure it is in the expected form.
  4. Write some test functions that send different data into your function and make sure it works as expected.
  5. Update your stub function with the final version in the git repository.

Key to the success of this project is careful communication between the groups. If a group gets done early and join the management group to help each other out. Good luck!


Group 1: Make a list of git contributions (log)

author_table = git_log()

This group’s job is to write a function that parses the output of the git log command and returns it as a table. Key to the success of this is figuring out how to run git log from inside python (there are multiple ways) and to make sure that your output data is formated in a way consistent with what is expected as input down stream.

HINT You may want to consider working with Group 2 to find a common syntax for accessing git from python.

DO THIS: Identify or clone a git repository you can use for testing. Pick one with lots of entries from a handful of authors. (Ex: SEE-Segment)

DO THIS: As a group, create a file called group1.py and write a function called git_log that takes in a path to a git folder (default current folder ‘.’) and uses the “git log” command to generate a table of git commits from the folder and includes the following fields: Author, Date, Hash, Comment. Output this as a pandas table. As you make changes, commit/push this file to the assignment git repository.


Group 2: Measure the magnitude of a contribution (diff)

nlines = git_diff(hash1, hash2)

This group’s job is to write a function that takes two repository “hashes” and parses the output of the git diff command to return an integer representing the number of lines changed between the two hashes. Key to the success of this is figuring out how to run git diff from inside python (there are multiple ways) and to make sure that your output data is formated in a way consistent with what is expected as input down stream.

HINT You may want to consider working with Group 1 to find a common syntax for accessing git from python.

DO THIS: Identify or clone a git repository you can use for testing. Pick one with lots of entries from a handful of authors. (Ex: SEE-Segment)

DO THIS: As a group, create a file called group2.py and write a function called git_diff that takes two hash values and parses the output of the “git diff” command to return an integer with the total number of lines made during that commit. As you make changes, commit/push this file to the assignment git repository.


Group 3: Summarize the contribution of each author (grade)

authors = git_grade(author_table) 

This group’s job is to write a function that takes a pandas table as input and uses the git_diff function to generate a dictionary of authors and the total number of lines that they have contributed. Key to the success of this is figuring out how to write the loop without a working git_log or git_diff function. this will require coordinating with group1 and group2 to make sure you get the syntax right.

DO THIS: As a group, create a file called group3.py and write a function called git_grade which takes a pandas table as input and uses the Group 2 git_diff function to loop over all of the authors and adds up the total number of lines they contribute. This function should return a dictionary with tag for author names and values of their number of total lines. As you make changes, commit/push this file to the assignment git repository.


Group 4 : Graph the results suitable to inform a grade decision (graph)

git_graph(authors)

This group’s job is to write a function that takes a dictionary as input and outputs a graph representing the mangitude of contribution of authors for an input git repository. Key to the success of this is figuring out how best summarize and visualize the data in a way that is easy to understand by an instructor.

DO THIS: As a group create a file called group4.py and write a function called git_graph which takes a dictionary of authors as inputs and generates a figure that clearly shows the contribution of each author and can be used to determine grading by an instructor. As you make changes, commit/push this file to the assignment git repository.


Group 5 : Management Group

The management group will create the team zoom room and a git repository and share it with the class. It is their job to organize the functions together and help support and coordinate the other groups.

DO THIS: Have all of your members read though this entire document to see how your part of the project will fit in with other parts of the project.

DO THIS: Create a git repository on gitlab.msu.edu and share this repository with all members of the class. The file structure for your git repository should probably be something like the following:

-- git_grader_repository
 |-- .gitignore
 |-- README.md
 |-- group1.py
 |-- group2.py
 |-- group3.py
 |-- group4.py
 |-- git_grader_demo.ipynb
 |-- git_grade.py

DO THIS: Check in with each group and make sure they can clone and contribute to the repository their initial stub functions.

DO THIS: Continue to review all groups code and make sure that everything will work together when it is all finished. Anticipate challenges, write test scripts, ask questions and provide help when needed. Bring groups together for meetings if there is confusion. Generally be there to help out and make sure the project has the resources it needs to succeed.

DO THIS: Combine all of the functions into a single python file called git_grader.py. Create a jupyter notebook that demonstrates the use of the program on a couple of different git repositories. DO NOT wait until the end to write these tests. Having them early will help you visualize what needs to be done. Something like the following:

import git_grade as gg
author_table = gg.git_log() #Convert the output of "git log" command to a pandas table.


hash1 = author_table['Hash'][0]
hash2 = author_table['Hash'][1]
nlines = gg.diff(hash1, hash2)  # Show the number of lines for an individual hash contribution.

authors = gg.grade(author_table) # Generate a list of authors and their contribution for a repository.

gg.graph(authors) # Graph the results in a meaningful way. 

DO THIS: Coordinate a 5 minute (max) presentation and be ready to share what your entire team did with the instructor. Demos of the working code are expected. Be prepared to answer questions such as “what works?” “What doesn’t work?” “is this a good way to grade contributions?”, “describe something interesting or challenging that happened during the project” etc. (We will have presentations at the end of next Friday).

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.