Homework 01: Projects Decision#

✅ Put your name here.#

xkcd

Projects (48 points)#

You may have given some thought to your project already. In this problem you are going to walk through the steps required of your project so that you can get a sense for the size and scope of the project, and perhaps see if you are being realistic about the choice of your project.

It is extremely important that you think through and start your project in the next few weeks. It is dificult to predict whether any given dataset is going to do what you want, and you don’t want to make that discovery before it is to late to start over.

You can, and probably will, change your mind as we work through this process. Don’t worry - you are not locking anything in yet.

If you have not decided on your project yet, that’s fine. You will need to decide fairly soon, so this is a good exercise to help you narrow it down. If you truly don’t have any idea, find something interesting at the UCI Machine Learning Repo and use that for this problem as if it were the actual dataset you will use. Note that this website works in an odd way: type something into the search at the upper right; for example, type “diabetes”. Note that it takes you to a Google search of the data sets - you don’t stay within its pages. Try searching for some other topics you are interested in.

The first thing you need to do is read Appendix A in your textbook. These are the steps your project should follow, and will be used as a guide for grading your project. Read that appendix very carefully. It might be worth re-reading Chapter 1 before you get started on this problem.

Once you have a dataset in mind and you have read Appendix A, answer these questions:

  1. What is the objective of your project? State a precise goal - don’t be vague.

  2. Where do you plan on getting your data and are there any strings attached (e.g., is this proprietary research data)?

  3. What tools would you use to explore this dataset? Do you think there could be errors (typos, missing values) in the data? How will you manage the data?

  4. Do you think the data needs to be cleaned or transformed before you can use it?

  5. Can you think of \(3-5\) ML algorithms you could quickly throw at this? What are they? You may not know many yet, but the basic ideas are in Chapter 1 of your textbook.

  6. How will you present the big picture and final outcome to a wide audience in your poster?

These questions are geared a bit more toward projects on supervised and unsupervised ML

Write all your answer in a markdown cell, using good markdown formatting techniques.

Put your answers here!

© Copyright 2023, Department of Computational Mathematics, Science and Engineering at Michigan State University.