CMSE401 Quiz Instructions

This quiz is designed to take approximately 20 minutes to complete (you will be given 50 Minutes).

Please read the following instructions before starting the quiz.

This is an open Internet quiz. Feel free to use anything on the Internet with one important exception...

  • DO NOT communicate live with other people during the quiz (either verbally or on-line). The goal here is to find answers to problems as you would in the real world.

You will be given 20 minutes to complete this quiz. Use your time wisely.

HINTS:

  • Neatness and grammar is important. We will ignore all notes or code we can not read or understand.
  • Read the entire quiz from beginning to end before starting. Not all questions are equal in points vs. time so plan your time accordingly.
  • Some of the information provided my be a distraction. Do not assume you need to understand everything written to answer the questions.
  • Spaces for answers are provided. Delete the prompting text such as "Put your answer to the above question here" and replace it with your answer. Do not leave the prompting text with your answer.
  • Do not assume that the answer must be in the same format of the cell provided. Feel free to change the cell formatting (e.g., markdown to code, and vice versa) or add additional cells as needed to provide your answer.
  • When a question asks for an answer "in your own words" it is still okay to search the Internet for the answer as a reminder. However, we would like you to do more than cut and paste. Make the answer your own.
  • If you get stuck, try not to leave an answer blank. It is better to include some notes or stub functions so we have an idea about your thinking process so we can give you partial credit.
  • Always provid links to any references you find helpful.
  • Feel free to delete the provided check marks (✅) as a way to keep track of which questions you have successfully completed.

Honor Code

I, agree to neither give nor receive any help on this quiz from other people. I also understand that providing answers to questions on this quiz to other students is also an academic misconduct violation as is live communication or receiving answers to questions on this quiz from other people. It is important to me to be a person of integrity and that means that ALL ANSWERS on this quiz are my answers.

DO THIS: Include your name in the line below to acknowledge the above statement:

Put your name here.


Quiz 3 - CUDA

Consider the following code snip-it similar to the Average Filter loop from Homework 2.

int nBytes = sz.width*sz.height*channels*sizeof(char));
char * s_img = (char *) malloc(nBytes); # Source Image
char * o_img = (char *) malloc(nBytes); # Output Image

//make 2D pointer arrays from 1D image arrays
char **img = malloc(sz.height * sizeof(char*));
for (int r=0; r<sz.height; r++)
        img[r] = &s_img[r*sz.width];
char **output = malloc(sz.height * sizeof(char*));
for (int r=0; r<sz.height; r++)
        output[r] = &o_img[r*sz.width];

//average filter
for(int c=0;c<sz.width;c++) 
    for(int r=0;r<sz.height;r++)
    {
        double count = 0;
        double tot = 0;
        for(int cw=max(0,c-halfwindow); cw<min(sz.width,c+halfwindow+1); cw++)
            for(int rw=max(0,r-halfwindow); rw<min(sz.height,r+halfwindow+1); rw++)
            {
                count++;
                tot += (double) img[rw][cw];
            }
        output[r][c] = (int) (tot/count);
    }

The following is an attempt to replace the above loop with a CUDA kernel function:

__global__ void average_im( char * img_d, 
                            char * output_d, 
                            int sz_width, 
                            int sz_height, 
                            int halfwindow) {

    int c = blockIdx.x * blockDim.x + threadIdx.x;
    int r = blockIdx.y * blockDim.y + threadIdx.y;
    int i = r*sz_width+c;

    if (c < sz_width && r < sz_height) 
    {
        int count = 0;
        int tot = 0;
        int c_start = fmax(0,c-halfwindow);
        int c_stop = fmin(sz_width,c+halfwindow+1);
        int r_start = fmax(0,r-halfwindow);
        int r_stop = fmin(sz_height,r+halfwindow+1);
        for(int cw=c_start; cw<c_stop; cw++)
            for(int rw=r_start; rw<r_stop; rw++)
            {
                count++;
                tot += img_d[i];
            }
        output_d[i] = (int) (tot/count);
    }
}

Question 1: (5 points) Assuming the above CUDA function compiles properly, the function can be called using the following lines of code:

average_im<<<numBlocks, numThreads>>>(img_d,output_d,sz.width,sz.height, halfwindow)

cudaError_t err = cudaGetLastError();
if (err != cudaSuccess) {
    fprintf(stderr, "\n\nError: %s\n\n", cudaGetErrorString(err)); fflush(stderr); exit(err);   
}

Write the lines of code needed to declare numBLocks and numThreads before making the above call. Make sure that there are an equal number of threads in the x and y direction of the block and that the block uses the maximum number of threads.

Put the answer to the above question here

Question 2: (5 points) Now, go back and write the code needed to allocate the space for img_d and output_d on the GPU. You can assume the CUDA_CALL macro we used in class is already defined.

Put your answer to the above question here

Question 3: (5 points) Write the command to copy the image file from the host to the gpu device and into the img_d variable. You can still assume the CUDA_CALL macro we used in class is already defined.

Put your answer to the above question here

**Question 4**: (5 points) This kernel function could benefit from shared memory tiling. Identify the data you would need to copy to shared memory to implement tiling. Write the line of code that you would need to add to the above function to declare the local variable(s) in shared memory? (HINT: Do not implement shared memory tiling, I just want to see if you understand which data you would use and the syntax for declaring the variable to be shared by a block of threads).

Put your answer to the above question here

**Question 5**: (5 points) Assume that the above functions are part of a larger program that called mybigGPUproject and can be run on a development node using the following commands:

module load CUDA
./mybigGPUproject

Write a HPCC submission script to submit this GPU job to the scheduler. It will need to run for 3 hours and require 5gb of RAM and 1 GPU. (HINT: Make sure you use srun to allow SLURM to manage which GPU the job will get assigned).

Put the answer to the above question here

Congratulations, you're done with your EXAM

Now, you just need to submit this assignment by uploading it to the course Desire2Learn web page for today's dropbox.

DO THIS:

Congratulations

You are done with your quiz. Please save the file and upload the jupyter notebook to the D2L dropbox. Send a message to your Instructor though the zoom chat and let him know you are done and wait until you are excused.

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.