About SLURM Scheduler
In the Central Cluster, we use SLURM as cluster workload manager which used to schedule and manage user jobs running.
SLURM will handle the job queueing and compute nodes allocating also start and executing the jobs.
By running any application or script on the compute nodes, you need to submit the job to SLURM.
SLURM system consider a fair-share factor when scheduling your job priority.
The job's priority influences the order in which a user's queued jobs are scheduled to run
based on the portion of the computing resources they have been allocated and the resources their jobs have already consumed.
You can always check your fair-share details and the usage of your account by sshare
command.
For the details of the fair-share calculation, you may refere to SLURM Offical fair-share documentation.
The usage of different compute node consumed will calculate as following table:
Node Type | Node Name | Usage Weight | Generic Resource(GRES) |
---|---|---|---|
Compute Node(96GB Memory) | chpc-cn[002-050] | CPU=1.0 | none |
Compute Node(192GB Memory) | chpc-m192a[001-010] | CPU=1.0 | none |
Large Memory Node(3TB Memory) | chpc-large-mem01 | CPU=1.0 | none |
GPU Node | chpc-k80gpu[001-003] | CPU=1.0,GRES/gpu=2.0 | gpu:K80:4 |
We recommend you download the Command Option Summary by SLURM for a quick reference. Here is a quick start video about the common SLURM tools:
Submitting Job
To submit job in SLURM, sbatch
, srun
and salloc
are the commands use to allocate resource and run the job.
All of these commands have the standard options for specifiy the resources required to run your job.
Usually, we will create a script for setup the environment and run the command or program which require to run in HPC platform.
In this situation, the command sbatch
is the best option for running a script because we can add the job specification using #SBATCH
at the top of the script.
Here is an example of how to run the script with the specification in the command line and the equivalent way to add the specification inside the script:
sbatch -J JobName -N 2 -c 16 jobscript.sh
By entering the above command, SLURM will allocate 2 nodes with 16 core each with a job name "JobName" to run jobscript.sh. this specification is required for running this job script every time, you may mention them inside the shell script as below:
#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 2 -c 16
Then you just need to run the script with sbatch and SLURM will load the specification from your script.
sbatch jobscript.sh
Here is a list of the option for the command sbatch
, srun
, and salloc
:
Option | Description |
---|---|
-J, --job-name=<jobname> |
Name for the job |
-n, --ntasks=<number> |
Number of task |
-N, --nodes=<minnodes[-maxnodes]> |
Number of nodes to assign. You may specifiy a range and SLURM will try to assignthe maxnodes you want and the job will run when the node available meet the minnodes requirement. |
-c, --cpus-per-task=<ncpus> |
CPUs per task |
--mem=<size[units]> |
Memory required per node. units may use [K|M|G|T] |
--mem-per-cpu=<size[units]> |
Memory required per allocated CPU. units may use [K|M|G|T] |
-t, --time=<time> |
Set a limit on total run time of the job. time may use "mins", "HH:MM:SS", "days-HH:MM:SS" |
-o,--output=<filename pattern> |
Write the output to a file. By default both standard output and standard error are directed to the same file. |
-e, --error=<filename pattern> |
Write the error to a file. |
--mail-type=<type> |
Notify user by email when certain event types occur. type values are NONE, BEGIN, END, FAIL, REQUEUE, ALL |
--mail-user=<email> |
Notify the user by which email address |
Here is a video reference for how to use Batch Scripting with SLURM:
Submit Job to GPU Nodes
You need to use --gres
option to request GPU card for your job. You may mention the option inside you batch like this:
#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH --gres=gpu:2
GPU is a type of Generic Resource(GRES) inside SLURM. In the above example, the SLURM will assign 2 GPU card for you job.You may also mention the GPU type like this:
#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH --gres=gpu:K80:2
If your batch require to assign different job to run in separate GPU, you may need to use srun
with --exclusive
option inside your batch file:
#!/bin/bash
#SBATCH -J JobName
#SBATCH -N 1
#SBATCH -n 4
#SBATCH --gres=gpu:4
srun --gres=gpu:2 -n2 --exclusive task1.sh &
srun --gres=gpu:1 -n1 --exclusive task2.sh &
srun --gres=gpu:1 -n1 --exclusive task3.sh &
wait
Checking the Job Status
After you submit the job, you can check the job status by using the command squeue
.
You may use the option -l
with squeue
to show the long report of the running or pending job.
When the job are cancelled or completed, these jobs are not shown by the squeue
command.
By default the squeue
output contains the job id, partition of job, name of the job, job owner name, job state,
time used by the job, nodes allocated information and the and the job pending reason.
If you want more information of the job, you may use the command scontrol show job
to have a full information about the job.
You may specify the job id to view the target job details by the command scontrol show job <job_id>
.
Controlling the Submitted Job
If the running or pending job does not run as your desire, you may use the command scancel
to cancel the running or scheduled job.
To cancel a job you must mention the job id with the command scancel <job_id>
.
After the cancel request submit, SLURM will stop the job as soon as possible.
If you want to cancel all the jobs you have submitted, you may specify you username with the command scancel -u <username>
.
All your scheduled jobs will be terminated.
Other than cancelling the job, you may also modify the job by using scontrol
command but some of the specifications cannot be modified after the job is running.
For example, you may use the command scontrol update JobId=<job_id> NumNodes=<minnodes>
to modify the node number requested by the job.
If the job has not started, you can increase or decrease the node number requested.
If the job has already started, you can only decrease the number of nodes runnning the job only.
For more details about modifing the job, you may reference to SLRUM FAQ: 24.Can I change my job's size after it has started running.
PBS to SLURM
If you are using PBS batch script to submit a job in PBS, you only need to change the command for submitting job to SLURM from qsub
to sbatch
.
SLURM will understand all PBS batch script option used in the script. Just run the script with sbatch
and the SLURM will submit the job according to your PBS batch script option.
The common problem you may face will be the "$PBS -q"
option. SLURM uses the PBS queue name as SLURM partition name.
If you face this problem, just remove the "$PBS -q"
line and the job will be accepted.
Even SLURM accept PBS batch script options, you should either use the PBS batch script option or SLURM bactch script option.
The following charts list the difference between PBS and SLURM commands, environment variables and job specifications:
PBS | SLURM | Meaning |
---|---|---|
qsub <job-file> |
sbatch <job-file> |
Submit <job script> to the queue |
qstat <-u username> |
squeue <-u username> |
Check jobs for a particular user in the scheduling queue |
qstat -f <job_id> |
scontrol show job <job_id> |
Show job details |
qdel <job_id> |
scancel <job_id> |
Delete <job_id> |
qdel `qselect -u user` |
scancel -u user |
Delete all jobs belonging to user |
Environment variables |
||
$PBS_JOBID |
$SLURM_JOBID |
Job ID |
$PBS_O_WORKDIR |
$SLURM_SUBMIT_DIR |
Submit Directory |
$PBS_NODEFILE |
$SLURM_JOB_NODELIST |
Allocated node list |
$PBS_ARRAY_INDEX |
$SLURM_ARRAY_TASK_ID |
Job array index |
|
$SLURM_CPUS_PER_TASK |
Number of cores/processes |
Job specifications |
||
qsub -l nodes=10 |
sbatch -N 10 |
Number of Nodes |
qsub -l nodes=2:ppn=16 |
sbatch -N 2 -c 16 |
Number of Nodes and cpu per node |
qsub -l mem=8g |
sbatch --mem=8g |
Memory requirement |
qsub -l pmem=1g |
sbatch --mem-per-cpu=1g |
Memory per cpu |
qsub -l walltime=HH:MM:SS |
sbatch -t [min] |
Set a wallclock limit |
qsub -o filename |
sbatch --output filename |
Standard output file |
qsub -e filename |
sbatch -e filename |
Standard error file |
qsub -V |
sbatch --export=all |
Export environment to allocated node |
qsub -v np=12 |
sbatch --export=np |
Export a single variable |
qsub -N jobname |
sbatch -J jobname |
Job name |
qsub -r [y|n] |
sbatch --requeue |
Job restart |
qsub -m be |
sbatch --mail-type=[BEGIN,END,FAIL,REQUEUE,ALL] |
Event Notification |
qsub -M notify@cuhk.edu.hk |
sbatch --mail-user=notify@cuhk.edu.hk |
Notification Email Address |
qsub -W depend=afterany:jobid |
sbatch --dependency=afterany:jobid |
Job dependency |