Link to this document's Jupyter Notebook

In order to successfully complete this assignment you must do the required reading, watch the provided videos and complete all instructions. The embedded survey form must be entirely filled out and submitted on or before 11:59pm on Sunday January 31. Students must come to class the next day prepared to discuss the material covered in this assignment.


Pre-Class Assignment: Schedulers

SLURM LOGO

Goals for today's pre-class assignment

  1. Batch Schedulers
  2. Assignment wrap up

1. Batch Schedulers

Batch Schedulers are used on large shared systems to manage cluster resources. The HPCC uses the SLURM batch scheduler:

Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.

Note: These videos start out on a gateway node. We will be using the http://ondemand.hpcc.msu.edu system.

The following vidoes provide an itroduction to using the SLURM scheduler on the HPCC:

Part 1: Interactive Jobs

Commands used in the above video:

ssh dev-intel16-k80
who | wc -l
clear
squeue -l
squeue -l | wc -l
squeue -l | grep RUNNING | wc -l
squeue -l | grep PENDING | wc -l
salloc
env | grep SLURM
salloc --time 00:10:00
exit
salloc -N 3 --ntasks-per-node 2 --time 00:10:00
salloc -N 3 --ntasks-per-node 2 --mem 22gb --time 00:10:00
cd MPI_Example
mpirun ./hello_mpi

DO THIS: Use the salloc command and jump on to a single cpu on the cluster and answer the follwoing questions.

QUESTION: How many people are currently logged into the node?

Put your answer to the above question here.

QUESTION: What is the name of the node?

Put your answer to the above question here.

Part 2: Basics of Job Scheduling

Commands used in the above video:

ssh dev-intel14-k20
who | wc -l
clear
cd HWLOC_Example
vim hwloc_ex.sh #remember :q! your vim commands
sbatch hwloc_ex.sh
cat slurm-6883891.out # your number will be different
cat hwloc_ex.sh
sbatch -N 3 --ntasks-per-node 2 --mem 5gb --time 01:00:00 hwloc_ex.sh 
sq
ls

Example hwloc_ex.sh script used in the above video:

#!/bin/bash
#SBATCH -N 3
#SBATCH -c 2
#SBATCH --mem 10gb
#SBATCH --time 02:00:00

echo Hello from $HOSTNAME

module load hwloc

lstopo $HOSTNAME.png
#mailme $HOSTNAME.png

scontrol show job $SLURM_JOB_ID

DO THIS: See if you can write a submission script that grabs a node on the cluster, uses the env command ot print out all of the enviornment variables.

QUESTION: Copy and paste the contents of your submission script here:

Put your answer to the above question here.

QUESTION: If you were able to get your submission script to run. Copy and paste the line from the env output that shows which node your job ran on.

Put your answer to the above question here.

Part 3: Sheduling Shared Memory Jobs (Ex: OpenMP)

Commands used in the above video:

ssh dev-intel14-k20
who | wc -l
clear
ls
cd openmp_exercise

Example prime.sh script used in the above video:

#!/bin/bash
#SBATCH -N 1
#SBATCH -c 32 
#SBATCH --time 01:00:00
#SBATCH --mem 4gb

time ./prime_openmp

for THREADCOUNT in 1 2 4 8 16 32;
do
        echo $THREADCOUNT 
        export OMP_NUM_THREADS=$THREADCOUNT
        ./prime_openmp | grep  500000 | awk '{print $3}'
done

scontrol show job $SLURM_JOB_ID

Question: How would you modify your script from Part 2 to run on more than one CPU on the same node?

Put your answer to the above question here.

Part 4: Scheduling Shared Network Jobs (ex. MPI)

Commands used in the above video:

ssh dev-intel14-k20
who | wc -l
clear
cd MPI_Example
ls

Example hwloc_ex.sh script used in the above video:

#!/bin/bash
#SBATCH -N 30
#SBATCH --ntasks-per-node 1
#SBATCH --time 01:00:00
#SBATCH --mem 15gb

mpirun ./hello_mpi 

scontrol show job $SLURM_JOB_ID

DO THIS: Modify your submission script from Part 2 to run on four nodes.

QUESTION: If you were able to get your submission script to run on multiple nodes. Copy and paste the line from the env output that shows which nodes your job ran on.

Put your answer to the above question here.


2. Assignment wrap up

Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credits for the assignment!

Direct Link to Google Form

If you have trouble with the embedded form, please make sure you log on with your MSU google account at googleapps.msu.edu and then click on the direct link above.

Assignment-Specific QUESTION: How would you modify your script from Part 2 to run on more than one CPU on the same node?

Put your answer to the above question here

QUESTION: Summarize what you did in this assignment.

Put your answer to the above question here

QUESTION: What questions do you have, if any, about any of the topics discussed in this assignment after working through the jupyter notebook?

Put your answer to the above question here

QUESTION: How well do you feel this assignment helped you to achieve a better understanding of the above mentioned topic(s)?

Put your answer to the above question here

QUESTION: What was the most challenging part of this assignment for you?

Put your answer to the above question here

QUESTION: What was the least challenging part of this assignment for you?

Put your answer to the above question here

QUESTION: What kind of additional questions or support, if any, do you feel you need to have a better understanding of the content in this assignment?

Put your answer to the above question here

QUESTION: Do you have any further questions or comments about this material, or anything else that's going on in class?

Put your answer to the above question here

QUESTION: Approximately how long did this pre-class assignment take?

Put your answer to the above question here


Congratulations, we're done!

To get credit for this assignment you must fill out and submit the above survey from on or before the assignment due date.

Course Resources:

Written by Dr. Dirk Colbry, Michigan State University Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.