CUHK CHPC

About SLURM Scheduler

In the Central Cluster, we use SLURM as cluster workload manager which used to schedule and manage user jobs running. SLURM will handle the job queueing and compute nodes allocating also start and executing the jobs. By running any application or script on the compute nodes, you need to submit the job to SLURM. SLURM system consider a fair-share factor when scheduling your job priority. The job's priority influences the order in which a user's queued jobs are scheduled to run based on the portion of the computing resources they have been allocated and the resources their jobs have already consumed. You can always check your fair-share details and the usage of your account by sshare command. For the details of the fair-share calculation, you may refere to SLURM Offical fair-share documentation. The usage of different compute node consumed will calculate as following table:

Node Type	Node Name	Usage Weight	Generic Resource(GRES)
Compute Node(96GB Memory)	chpc-cn[002-050]	CPU=1.0	none
Compute Node(192GB Memory)	chpc-m192a[001-010]	CPU=1.0	none
Large Memory Node(3TB Memory)	chpc-large-mem01	CPU=1.0	none
GPU Node	chpc-k80gpu[001-003]	CPU=1.0,GRES/gpu=2.0	gpu:K80:4

In the following section, we will introduce some common usage about submitting the job to SLURM. You may refer to SLURM offical Quick Start User Guide.
We recommend you download the Command Option Summary by SLURM for a quick reference. Here is a quick start video about the common SLURM tools:

Submitting Job

To submit job in SLURM, sbatch, srun and salloc are the commands use to allocate resource and run the job. All of these commands have the standard options for specifiy the resources required to run your job. Usually, we will create a script for setup the environment and run the command or program which require to run in HPC platform. In this situation, the command sbatch is the best option for running a script because we can add the job specification using #SBATCH at the top of the script. Here is an example of how to run the script with the specification in the command line and the equivalent way to add the specification inside the script:

sbatch -J JobName -N 2 -c 16 jobscript.sh

By entering the above command, SLURM will allocate 2 nodes with 16 core each with a job name "JobName" to run jobscript.sh. this specification is required for running this job script every time, you may mention them inside the shell script as below:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 2 -c 16

Then you just need to run the script with sbatch and SLURM will load the specification from your script.

sbatch jobscript.sh

Here is a list of the option for the command sbatch, srun, and salloc:

Option	Description
`-J, --job-name=<jobname>`	Name for the job
`-n, --ntasks=<number>`	Number of task
`-N, --nodes=<minnodes[-maxnodes]>`	Number of nodes to assign. You may specifiy a range and SLURM will try to assignthe maxnodes you want and the job will run when the node available meet the minnodes requirement.
`-c, --cpus-per-task=<ncpus>`	CPUs per task
`--mem=<size[units]>`	Memory required per node. units may use [K\|M\|G\|T]
`--mem-per-cpu=<size[units]>`	Memory required per allocated CPU. units may use [K\|M\|G\|T]
`-t, --time=<time>`	Set a limit on total run time of the job. time may use "mins", "HH:MM:SS", "days-HH:MM:SS"
`-o,--output=<filename pattern>`	Write the output to a file. By default both standard output and standard error are directed to the same file.
`-e, --error=<filename pattern>`	Write the error to a file.
`--mail-type=<type>`	Notify user by email when certain event types occur. type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL
`--mail-user=<email>`	Notify the user by which email address

Here is a video reference for how to use Batch Scripting with SLURM:

Submit Job to GPU Nodes

You need to use --gres option to request GPU card for your job. You may mention the option inside you batch like this:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH --gres=gpu:2

GPU is a type of Generic Resource(GRES) inside SLURM. In the above example, the SLURM will assign 2 GPU card for you job.You may also mention the GPU type like this:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH --gres=gpu:K80:2

If your batch require to assign different job to run in separate GPU, you may need to use srun with --exclusive option inside your batch file:

#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --gres=gpu:4
srun --gres=gpu:2 -n2 --exclusive task1.sh &
srun --gres=gpu:1 -n1 --exclusive task2.sh &
srun --gres=gpu:1 -n1 --exclusive task3.sh &
wait

Checking the Job Status

After you submit the job, you can check the job status by using the command squeue. You may use the option -l with squeue to show the long report of the running or pending job. When the job are cancelled or completed, these jobs are not shown by the squeue command. By default the squeue output contains the job id, partition of job, name of the job, job owner name, job state, time used by the job, nodes allocated information and the and the job pending reason. If you want more information of the job, you may use the command scontrol show job to have a full information about the job. You may specify the job id to view the target job details by the command scontrol show job <job_id>.

Controlling the Submitted Job

If the running or pending job does not run as your desire, you may use the command scancel to cancel the running or scheduled job. To cancel a job you must mention the job id with the command scancel <job_id>. After the cancel request submit, SLURM will stop the job as soon as possible. If you want to cancel all the jobs you have submitted, you may specify you username with the command scancel -u <username>. All your scheduled jobs will be terminated.

Other than cancelling the job, you may also modify the job by using scontrol command but some of the specifications cannot be modified after the job is running. For example, you may use the command scontrol update JobId=<job_id> NumNodes=<minnodes> to modify the node number requested by the job. If the job has not started, you can increase or decrease the node number requested. If the job has already started, you can only decrease the number of nodes runnning the job only. For more details about modifing the job, you may reference to SLRUM FAQ: 24.Can I change my job's size after it has started running.

PBS to SLURM

If you are using PBS batch script to submit a job in PBS, you only need to change the command for submitting job to SLURM from qsub to sbatch. SLURM will understand all PBS batch script option used in the script. Just run the script with sbatch and the SLURM will submit the job according to your PBS batch script option. The common problem you may face will be the "$PBS -q" option. SLURM uses the PBS queue name as SLURM partition name. If you face this problem, just remove the "$PBS -q" line and the job will be accepted. Even SLURM accept PBS batch script options, you should either use the PBS batch script option or SLURM bactch script option.

The following charts list the difference between PBS and SLURM commands, environment variables and job specifications:

PBS	SLURM	Meaning
`qsub <job-file>`	`sbatch <job-file>`	Submit <job script> to the queue
`qstat <-u username>`	`squeue <-u username>`	Check jobs for a particular user in the scheduling queue
`qstat -f <job_id>`	`scontrol show job <job_id>`	Show job details
`qdel <job_id>`	`scancel <job_id>`	Delete <job_id>
qdel `qselect -u user`	`scancel -u user`	Delete all jobs belonging to user
Environment variables
`$PBS_JOBID`	`$SLURM_JOBID`	Job ID
`$PBS_O_WORKDIR`	`$SLURM_SUBMIT_DIR`	Submit Directory
`$PBS_NODEFILE`	`$SLURM_JOB_NODELIST`	Allocated node list
`$PBS_ARRAY_INDEX`	`$SLURM_ARRAY_TASK_ID`	Job array index
	`$SLURM_CPUS_PER_TASK $SLURM_NTASKS`	Number of cores/processes
Job specifications
`qsub -l nodes=10 #PBS -l nodes=10`	`sbatch -N 10 #SBATCH -N 10`	Number of Nodes
`qsub -l nodes=2:ppn=16 #PBS -l nodes=2:ppn=16`	`sbatch -N 2 -c 16 #SBATCH -N 2 -c 16`	Number of Nodes and cpu per node
`qsub -l mem=8g #PBS -l mem=8g`	`sbatch --mem=8g #SBATCH --mem=8g`	Memory requirement
`qsub -l pmem=1g #PBS -l pmem=1g`	`sbatch --mem-per-cpu=1g #SBATCH --mem-per-cpu=1g`	Memory per cpu
`qsub -l walltime=HH:MM:SS #PBS -l walltime=HH:MM:SS`	`sbatch -t [min] sbatch -t [days-HH:MM:SS] #SBATCH -t [min] #SBATCH -t [days-HH:MM:SS]`	Set a wallclock limit
`qsub -o filename #PBS -o filename`	`sbatch --output filename sbatch -o filename #SBATCH --output filename #SBATCH -o filename`	Standard output file
`qsub -e filename #PBS -e filename`	`sbatch -e filename #SBATCH --error filename #SBATCH -e filename`	Standard error file
`qsub -V`	`sbatch --export=all`	Export environment to allocated node
`qsub -v np=12`	`sbatch --export=np`	Export a single variable
`qsub -N jobname #PBS -N jobname`	`sbatch -J jobname sbatch --job-name=jobname #SBATCH --job-name=jobname`	Job name
`qsub -r [y\|n]`	`sbatch --requeue sbatch --no-requeue`	Job restart
`qsub -m be #PBS -m be`	`sbatch --mail-type=[BEGIN,END,FAIL,REQUEUE,ALL] #SBATCH --mail-type=[BEGIN,END,FAIL,REQUEUE,ALL]`	Event Notification
`qsub -M notify@cuhk.edu.hk #PBS -M notify@cuhk.edu.hk`	`sbatch --mail-user=notify@cuhk.edu.hk #SBATCH --mail-user=notify@cuhk.edu.hk`	Notification Email Address
`qsub -W depend=afterany:jobid`	`sbatch --dependency=afterany:jobid`	Job dependency