CMSE 495

Logo

This is the webpage for CMSE495 Data Science Capstone Course (Spring 2022)

View the Project on GitHub msu-cmse-courses/cmse495-SS22

Exploring exteral packages - Web Scraper Activity

You will be working mostly in your groups today. Please complete all of the following tasks and we will get together at around 3:30pm to discuss what you have learned.

Agenda (80 Minutes)

Submitting git Issues vs Submitting Pull requests


1. Fork the UniScrapper Project

Often web pages can have many different and unpredictable formats. The UniScraper project by MSU’s own Meng Cai is a web scraper that attempts to crape words from web pages using many different formats.

Work in your groups and fork the project, download your forked copy, install and test the UniScraper tool.

https://github.com/caimeng2/UniScraper


2. Code Review

As a team review the code provided in the Uniscraper project. See if you can get the example code working. What is it doing? What questions do you have?


3. Submit Issues you find to the github issues pages

It is important to support code developers who post their code for free. One thing you can do is help them by providing useful feedback and ideas. When you see a bug, do not be afraid to add it to the github “issues” page. Make sure you give detailed descriptions of the problems and all information the developer needs to reproduce the problem.

See if you can find an issue or feature request for the package and type it up with on github issues. I have told the author of the package you would be doing this so they are expecting your feedback. Make sure you are polite, professional and helpful.


4. Pull request

See if your group can fix any of the issues you find (or just make a suggestion). I would like each group to find something to contribute back to the project using a pull request. Do the following steps in your forked repository:

Please ask your instructor for help if needed.


5. Reflection

Finally think about the following questions and be prepared to share some of your answers with the class.

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.