CMSE/MTH 401 - Quiz 4 - MPI

This is an open internet quiz. Feel free to use anything on the internet with one important exception...

The quiz was designed to take 20 minutes. You will be given 60 minutes (wishful thinking by the instructor, you will be given the entire quiz time if needed) to complete. Use your time wisely.

HINTS:


(25 points) Message Passing Interface

In this quiz we will be answering questions related to the following program which uses MPI to run a Linux command on each of the processors running inside an MPI job. In this case the command is called uptime but the code could be easily modified to run any command.

NOTE: if you want to test the following code, do not include the first line (%%writefile run_command.c) in your file. This is included here to allow the jupyter notebook to export the file and is NOT a valid C command.

Question 1: (5 points) Assume that the above code is stored in a file named run_command.c, in your current directory. What command(s) are needed to compile the code (with optimization) and run the mpi job on the current node with exactly 2 processors?

Put the answer to the above question here

Something is wrong with the above code. When the above program is run using two processors returns an error similar to the following:

Hello From dev-amd20-v100 (rank 0), The output from my command is:
 15:06:18 up 83 days, 23:35, 48 users,  load average: 14.40, 13.57, 12.63

[warn] Epoll ADD(4) on fd 31 failed.  Old events were 0; read change was 0 (none); write change was 1(add): Bad file descriptor
[dev-amd20-v100:155472] *** An error occurred in MPI_Recv
[dev-amd20-v100:155472] *** reported by process [4204462081,0]
[dev-amd20-v100:155472] *** on communicator MPI_COMM_WORLD
[dev-amd20-v100:155472] *** MPI_ERR_TRUNCATE: message truncated
[dev-amd20-v100:155472] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[dev-amd20-v100:155472] ***    and potentially your MPI job)

Question 2: (5 points) Fix the error in the above code (directly modify the code above). What was wrong and how did you fix it? (Put your answer in the cell below).

NOTE: there are a lot of ways to fix the code. We are looking for answers that show you understand what the code is trying to do. It may not be enough just to get it to compile and run. You need to check and make sure the output would make sense.

Put the answer to the above question here

Question 3: (5 points) Assuming we got the code working correctly, write a SLURM submissions script to run the code on 50 processors. Note, the code just runs the Linux uptime command and thus the entire program use a trivial amount of system resources. Based on this information, request "reasonable" resources for running this job on the HPCC so that the command will not sit in the queue too long yet still be enough to run (HINT many HPCC slurm defaults may be sufficient). Not required but feel free to put comments next to the resources to explain your reasoning.

Put the answer to the above question here


The above program is almost pleasantly parallel (i.e. the individual processors to not need to talk to each other and are only passing their results to the rank 0 processor). I can think of three reasons why someone would want to use this type of low communication MPI job:

  1. The Linux code will execute at approximately the same time on each processor. This synchronous execution may be useful depending on what the command is doing.
  2. By passing the output to the rank 0 processor the code can post-process all of the data and once and just get one output file instead of many.
  3. Assuming the correct resources are requested, running in an MPI job can also guarantee that the code runs on different computers.

The disadvantage of running a almost pleasantly parallel job as an MPI job is that it could take longer to schedule than a truly pleasely parallel job which can use job arrays. Job arrays are faster to schedule because not all of the resources need to be available to start the calculations.

Question 4: (5 points) Write a SLURM submission script that is truly pleasantly parallel by using a job array that will run the uptime command on 50 processors (we don't care if the processors are on the same computer).

Put the answer to the above question here


In class we have now discussed; pleasantly parallel jobs using job arrays; shared memory parallel jobs using OpenMP, GPU accelerated jobs using CUDA and now shared network jobs using MPI.

Question 5: (5 points) In your own words, describe the type of program/problem that would require an MPI job. You don't need to be specific, just describe the characteristics of the program that would necessitate an MPI job and not be able to run as a pleasantly parallel job, a OpenMP job or a CUDA program.

Put your answer to the above question here.


Congratulations, you're done with your Quiz

Now, you just need to submit this assignment by uploading it to the course Desire2Learn web page for today's dropbox.

DO THIS:

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.