In order to successfully complete this assignment you must do the required reading, watch the provided videos and complete all instructions. The embedded survey form must be entirely filled out and submitted on or before 11:59pm on Sunday January 31. Students must come to class the next day prepared to discuss the material covered in this assignment.
Batch Schedulers are used on large shared systems to manage cluster resources. The HPCC uses the SLURM batch scheduler:
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Slurm requires no kernel modifications for its operation and is relatively self-contained. As a cluster workload manager, Slurm has three key functions. First, it allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work. Optional plugins can be used for accounting, advanced reservation, gang scheduling (time sharing for parallel jobs), backfill scheduling, topology optimized resource selection, resource limits by user or bank account, and sophisticated multifactor job prioritization algorithms.
Note: These videos start out on a gateway node. We will be using the http://ondemand.hpcc.msu.edu system.
The following vidoes provide an itroduction to using the SLURM scheduler on the HPCC:
from IPython.display import YouTubeVideo
YouTubeVideo("yJY2IevXaJs",width=570,height=360)
Commands used in the above video:
ssh dev-intel16-k80
who | wc -l
clear
squeue -l
squeue -l | wc -l
squeue -l | grep RUNNING | wc -l
squeue -l | grep PENDING | wc -l
salloc
env | grep SLURM
salloc --time 00:10:00
exit
salloc -N 3 --ntasks-per-node 2 --time 00:10:00
salloc -N 3 --ntasks-per-node 2 --mem 22gb --time 00:10:00
cd MPI_Example
mpirun ./hello_mpi
✅ DO THIS: Use the salloc
command and jump on to a single cpu on the cluster and answer the follwoing questions.
✅ QUESTION: How many people are currently logged into the node?
Put your answer to the above question here.
✅ QUESTION: What is the name of the node?
Put your answer to the above question here.
from IPython.display import YouTubeVideo
YouTubeVideo("cn_CA5ZHWj4",width=570,height=360)
Commands used in the above video:
ssh dev-intel14-k20
who | wc -l
clear
cd HWLOC_Example
vim hwloc_ex.sh #remember :q! your vim commands
sbatch hwloc_ex.sh
cat slurm-6883891.out # your number will be different
cat hwloc_ex.sh
sbatch -N 3 --ntasks-per-node 2 --mem 5gb --time 01:00:00 hwloc_ex.sh
sq
ls
Example hwloc_ex.sh script used in the above video:
#!/bin/bash
#SBATCH -N 3
#SBATCH -c 2
#SBATCH --mem 10gb
#SBATCH --time 02:00:00
echo Hello from $HOSTNAME
module load hwloc
lstopo $HOSTNAME.png
#mailme $HOSTNAME.png
scontrol show job $SLURM_JOB_ID
✅ DO THIS: See if you can write a submission script that grabs a node on the cluster, uses the env
command ot print out all of the enviornment variables.
✅ QUESTION: Copy and paste the contents of your submission script here:
Put your answer to the above question here.
✅ QUESTION: If you were able to get your submission script to run. Copy and paste the line from the env
output that shows which node your job ran on.
Put your answer to the above question here.
from IPython.display import YouTubeVideo
YouTubeVideo("o39erAZuDj4",width=570,height=360)
Commands used in the above video:
ssh dev-intel14-k20
who | wc -l
clear
ls
cd openmp_exercise
Example prime.sh script used in the above video:
#!/bin/bash
#SBATCH -N 1
#SBATCH -c 32
#SBATCH --time 01:00:00
#SBATCH --mem 4gb
time ./prime_openmp
for THREADCOUNT in 1 2 4 8 16 32;
do
echo $THREADCOUNT
export OMP_NUM_THREADS=$THREADCOUNT
./prime_openmp | grep 500000 | awk '{print $3}'
done
scontrol show job $SLURM_JOB_ID
✅ Question: How would you modify your script from Part 2 to run on more than one CPU on the same node?
Put your answer to the above question here.
from IPython.display import YouTubeVideo
YouTubeVideo("wZ4Q3gKUq5I",width=570,height=360)
Commands used in the above video:
ssh dev-intel14-k20
who | wc -l
clear
cd MPI_Example
ls
Example hwloc_ex.sh script used in the above video:
#!/bin/bash
#SBATCH -N 30
#SBATCH --ntasks-per-node 1
#SBATCH --time 01:00:00
#SBATCH --mem 15gb
mpirun ./hello_mpi
scontrol show job $SLURM_JOB_ID
✅ DO THIS: Modify your submission script from Part 2 to run on four nodes.
✅ QUESTION: If you were able to get your submission script to run on multiple nodes. Copy and paste the line from the env
output that shows which nodes your job ran on.
Put your answer to the above question here.
Please fill out the form that appears when you run the code below. You must completely fill this out in order to receive credits for the assignment!
If you have trouble with the embedded form, please make sure you log on with your MSU google account at googleapps.msu.edu and then click on the direct link above.
✅ Assignment-Specific QUESTION: How would you modify your script from Part 2 to run on more than one CPU on the same node?
Put your answer to the above question here
✅ QUESTION: Summarize what you did in this assignment.
Put your answer to the above question here
✅ QUESTION: What questions do you have, if any, about any of the topics discussed in this assignment after working through the jupyter notebook?
Put your answer to the above question here
✅ QUESTION: How well do you feel this assignment helped you to achieve a better understanding of the above mentioned topic(s)?
Put your answer to the above question here
✅ QUESTION: What was the most challenging part of this assignment for you?
Put your answer to the above question here
✅ QUESTION: What was the least challenging part of this assignment for you?
Put your answer to the above question here
✅ QUESTION: What kind of additional questions or support, if any, do you feel you need to have a better understanding of the content in this assignment?
Put your answer to the above question here
✅ QUESTION: Do you have any further questions or comments about this material, or anything else that's going on in class?
Put your answer to the above question here
✅ QUESTION: Approximately how long did this pre-class assignment take?
Put your answer to the above question here
from IPython.display import HTML
HTML(
"""
<iframe
src="https://cmse.msu.edu/cmse401-pc-survey"
width="100%"
height="500px"
frameborder="0"
marginheight="0"
marginwidth="0">
Loading...
</iframe>
"""
)
To get credit for this assignment you must fill out and submit the above survey from on or before the assignment due date.
Written by Dr. Dirk Colbry, Michigan State University
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.