Table of Contents

Introduction to Slurm

What is Slurm?

With an HPC facility, unlike running code on your own desktop or laptop, you do not directly access the various hardware which makes up the system.

Instead, in most cases, you instruct a special software service, known as the scheduler, to run applications on your behalf. The scheduler analyses the type of job you want to run, taking note of the type and quantity of resources you indicate it needs (number of processors, amount of memory, etc) and finds the best place to fit your job from the available servers which make up the HPC facility.

Since the scheduler does this on behalf of all users, it is able to efficiently coordinate the running of thousands of jobs at the same time, managing the distribution of jobs across the facility to make the best use of the available resources, and find space for your jobs to fit.

Without having an overview of everything that is currently running, or is waiting to run, if you were to try running jobs yourself you would likely find situations where you started a job on a server that didn't have enough free RAM, or where all of the processors were already allocated to existing jobs - leading to a conflict over available resources and jobs not running efficiently.

On a small system where there are only a small number of physical servers this would seem more complicated than needed, but once you start using systems that have dozens, hundreds or even thousands of physical servers, it rapidly becomes impractical for one person to keep track of what jobs are running where, and what the capabilities of each server are. The scheduler does this for you.

We use Slurm; an open-source scheduler which is very popular, but almost every HPC facility will use a scheduler of some sort, and most of them are comparable in terms of functionality. For more technical implementation details regarding Slurm, you can access the website operated by the maintainers of the software:

You can find out more about the various types of scheduling software at the independent High Performance Computing information wiki:

Read on to cover the most common Slurm commands you will encounter in your use of HPC facilities at Newcastle University.


Slurm Commands

For the vast majority of those using HPC facilities at Newcastle University you will probably only ever need to use a very small handful of the possible Slurm commands. These are:

sbatch

Full information on the sbatch command can be obtained while logged in to our HPC facilities with the man command:

$ man sbatch


srun

Full information on the srun command can be obtained while logged in to our HPC facilities with the man command:

$ man srun


squeue

The squeue command will, by default, show a list of all currently running jobs that the scheduler knows about, for all users, as well as those jobs which are currently waiting to run.

$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          17864081    bigmem  bd6_job abc12345  R   23:23:02      1 xln02
          17864079    bigmem  bd5_job abc12345  R 1-09:31:39      1 xln01
          17857465    bigmem  bd3_job abc12345  R 3-11:36:38      1 ln04
          17864082    bigmem  bd7_job abc12345  R 2-17:26:22      1 ln02
          17864072    bigmem  bd4_job abc12345  R 2-18:41:22      1 ln03
          17867113    bigmem   USTAR1 n12345    R 1-05:31:14      1 mb01
          17824926 bigmem,de vep4-Sen n45678    PD      0:00      1 (DependencyNeverSatisfied)
          17878168      defq f3dyn.sh bcd23456  PD      0:00      4 (Resources)
          17878169      defq q5R8e5r1 n67832    PD      0:00      4 (Resources)
          17880197      defq highrevo defg78901 PD      0:00      4 (Resources)
          17877696      defq T1_9_VNS n87654    PD      0:00      2 (Dependency)
          17877695      defq T1_9_VNS n87654    PD      0:00      2 (Dependency)
          17877637      defq T1_9_VNS n87654    PD      0:00      2 (Dependency)
          17877469      defq T1_7_VNS n87654    PD      0:00      2 (Dependency)
          17878216      defq nemo_72. n91267    R    3:03:47      1 sb059
          17864625      defq run6-Por nabc123   R   22:51:51     12 sb[049-051,053,055-057,060-062,075,080]
$

You can easily restrict the output to a single user:

$ squeue --user=abc12345
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          17864081    bigmem  bd6_job abc12345  R   23:23:02      1 xln02
          17864079    bigmem  bd5_job abc12345  R 1-09:31:39      1 xln01
          17857465    bigmem  bd3_job abc12345  R 3-11:36:38      1 ln04
          17864082    bigmem  bd7_job abc12345  R 2-17:26:22      1 ln02
          17864072    bigmem  bd4_job abc12345  R 2-18:41:22      1 ln03
$

Or look for just the waiting jobs:

$ squeue --states=PD
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          17824926 bigmem,de vep4-Sen  n123213 PD       0:00      1 (DependencyNeverSatisfied)
          17882234      defq    kk.sh    n1225 PD       0:00      6 (Resources)
          17878168      defq f3dyn.sh b7891234 PD       0:00      4 (Resources)
          17878169      defq q5R8e5r1    n9927 PD       0:00      4 (Resources)
          17880197      defq highrevo b7891234 PD       0:00      4 (Resources)
          17877696      defq T1_9_VNS    n5627 PD       0:00      2 (Dependency)
          17877695      defq T1_9_VNS    n5627 PD       0:00      2 (Dependency)
          17877637      defq T1_9_VNS    n5627 PD       0:00      2 (Dependency)
          17877469      defq T1_7_VNS    n5627 PD       0:00      2 (Dependency)
          17772908 defq,long Birmingh a1234567 PD       0:00      1 (DependencyNeverSatisfied)
          17772519 defq,long Finland_ a1234567 PD       0:00      1 (DependencyNeverSatisfied)
          17763133 defq,long UK1_OSst a1234567 PD       0:00      1 (DependencyNeverSatisfied)
          17777214 defq,long OSstatus a1234567 PD       0:00      1 (DependencyNeverSatisfied)
          17777213 defq,long OSstatus a1234567 PD       0:00      1 (DependencyNeverSatisfied)
$

Common state codes are:

Full information on the squeue command, and possible status codes, can be obtained while logged in to our HPC facilities with the man command:

$ man squeue


sinfo

The sinfo command shows you the general status of the HPC cluster partitions, which nodes are available in each partition as well as the default runtime limits for each.

Each line of output from sinfo shows the status of one or more hosts against a partition:

$ sinfo
PARTITION   AVAIL  TIMELIMIT  NODES  STATE NODELIST
defq*          up 2-00:00:00      1  down* sb044
defq*          up 2-00:00:00     31    mix sb[008,012-015,024-025,027,031,040-041]
defq*          up 2-00:00:00     78  alloc sb[001-007,009-011,016-023,026,028-030,032-039]
short          up      10:00      1  down* sb044
short          up      10:00     37    mix ln[02-04],mb01,sb[008,012-015,024-025,027,031,040-041],xln[01-02]
short          up      10:00     78  alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047]
short          up      10:00      6   idle ln01,mb02,mn[01-04]
long           up 30-00:00:0      1  down* sb044
long           up 30-00:00:0     31    mix sb[008,012-015,024-025,027,031,040-041,048,063-064,070-071,076,081,083-087,090-096,106]
long           up 30-00:00:0     78  alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047,049-062,065-069]
interactive    up 1-00:00:00     37    mix ln[02-04],mb01,sb[008,012-015,024-025,027,031,040-041,048,063-064,070-071,076],xln[01-02]
interactive    up 1-00:00:00     78  alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047,049-062,065-069,072-075,]
interactive    up 1-00:00:00      6   idle ln01,mb02,mn[01-04]
bigmem         up 14-00:00:0      6    mix ln[02-04],mb01,xln[01-02]
bigmem         up 14-00:00:0      6   idle ln01,mb02,mn[01-04]
$

Looking at the output above, it shows that for the short partition we have the following:

This can be useful when scheduling jobs, or deciding on a particular partition/resource type to use.

Full information on the sinfo command can be obtained while logged in to our HPC facilities with the man command:

$ man sinfo


sacct

The sacct command is superficially similar to squeue, but is primarily used to retrieve data on historic jobs. You can extract information from previous jobs to understand how they ran and what resources they used, possibly as a means to improve the effectiveness of future jobs.

By default, sacct will show all jobs for the current user (including finished, waiting and running), in the current time window (which is approximately equivalent to the current day). With no other parameters the output shows basic information about the job (job ID, group name, number of nodes, job status):

$ sacct
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
17784183       T2_Decay       long    mygroup         88    TIMEOUT      0:0 
17784183.ba+      batch               mygroup         44  CANCELLED     0:15 
17784183.0   Foucault.+               mygroup         88  CANCELLED     0:15 
17794687_2   Artifi_Mo+       long    mygroup          8    RUNNING      0:0 
17794687_2.+      batch               mygroup          8    RUNNING      0:0 
17794688_1   Artifi_Mo+       long    mygroup          8    RUNNING      0:0 
17794688_1.+      batch               mygroup          8    RUNNING      0:0 
17794688_2   Artifi_Mo+       long    mygroup          8    RUNNING      0:0 
17794688_2.+      batch               mygroup          8    RUNNING      0:0 
17851999     RESPONSE_+       long    mygroup          1    RUNNING      0:0 
17851999.ba+      batch               mygroup          1    RUNNING      0:0 
17854203       job_3_FC       long    mygroup         32    RUNNING      0:0 
17854203.ba+      batch               mygroup         32    RUNNING      0:0 
17854203.0   interIsoF+               mygroup         32    RUNNING      0:0 
17854204       job_2_FC       long    mygroup         32    RUNNING      0:0
$

A common option is to limit the results by job status. Here's an example which reports on jobs which failed due to insufficient memory allocation in the last 7 days:

$ sacct --state=OOM --starttime now-7days --endtime now
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
------------ ---------- ---------- ---------- ---------- ---------- -------- 
17789551_142   QTABmelt       defq     mygroup         4 OUT_OF_ME+    0:125 
17789551_14+      batch                mygroup         4 OUT_OF_ME+    0:125 
17789551_144   QTABmelt       defq     mygroup         4 OUT_OF_ME+    0:125 
17789551_14+      batch                mygroup         4 OUT_OF_ME+    0:125 
17789551_153   QTABmelt       defq     mygroup         4 OUT_OF_ME+    0:125
$

You may then choose to get more detailed information on one of those jobs which failed due to OUT_OF_MEMORY by using the -j (job ID) and -l (long output) parameters:

$ sacct -j 17789551 -l
JobID     JobIDRaw    JobName  Partition  MaxVMSize  MaxVMSizeNode  MaxVMSizeTask  AveVMSize     MaxRSS MaxRSSNode MaxRSSTask     AveRSS MaxPages MaxPagesNode   MaxPagesTask   AvePages     MinCPU MinCPUNode MinCPUTask     AveCPU  
 NTasks  AllocCPUS    Elapsed      State ExitCode AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov     ReqMem ConsumedEnergy  MaxDiskRead MaxDiskReadNode MaxDiskReadTask    AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask  
 AveDiskWrite    ReqTRES  AllocTRES TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsage
OutAve TRESUsageOutTot 
------------ ------------ ---------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -
------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- ---------- -------------- ------------ --------------- --------------- -------------- ------------ ---------------- ---------------- -
------------- ---------- ---------- -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- ------------------- ------------------- ---------
------ ---------------
17789551_43+ 17789992.ba+      batch               142804K          sb099              0    142804K   3494420K      sb099          0   3494420K        0        sb099              0          0   02:44:08      sb099          0   02:44:08  
      1          4   00:45:30  COMPLETED      0:0        74K             0             0             0     2500Mc              0        1.16M           sb099               0          1.16M        0.00M            sb099                0  
        0.00M            cpu=4,mem+ cpu=02:44:08,+ cpu=02:44:08,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=02:44:08,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=02:44:08,+ energy=0,fs/di+ energy=sb099,fs/di+           fs/disk=0 energy=0,
fs/di+ energy=0,fs/di+ 
17789551_435 17789993       QTABmelt       defq                                                                                                                                                                                              
                 4   02:42:35 OUT_OF_ME+    0:125                  Unknown       Unknown       Unknown     2500Mc                                                                                                                            
              billing=4+ billing=4+                                                                                                                                                                                                          
                       
17789551_43+ 17789993.ba+      batch               142804K          sb099              0    142804K  10220208K      sb099          0  10220208K        0        sb099              0          0   09:56:26      sb099          0   09:56:26  
      1          4   02:42:35 OUT_OF_ME+    0:125        12K             0             0             0     2500Mc              0        1.16M           sb099               0          1.16M        0.00M            sb099                0  
        0.00M            cpu=4,mem+ cpu=09:56:26,+ cpu=09:56:26,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=09:56:26,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=09:56:26,+ energy=0,fs/di+ energy=sb099,fs/di+           fs/disk=0 energy=0,
fs/di+ energy=0,fs/di+ 
17789551_436 17789997       QTABmelt       defq                                                                                                                                                                                              
                 4   00:01:21  COMPLETED      0:0                  Unknown       Unknown       Unknown     2500Mc                                                                                                                            
              billing=4+ billing=4+
$

Parsing sacct output

If you want to parse the above long-format output in a script, or paste into a spreadsheet, then adding the –parsable option will embed a '|' character between each field so that the output is more easily parsed by column.

Full information on the sacct command can be obtained while logged in to our HPC facilities with the man command:

$ man sacct


scancel

The scancel command allows you to remove pending jobs from the scheduler queue, as well as request any running job to be stopped.

First, find the Job ID of the process you want to remove from the queue, or stop (replace n1234 with your normal University IT account):

$ squeue -u n1234
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
          18020902      long 13_alcoh    n1234  R       4:34      1 sb027
          18015794      long 13_SN2_1    n1234  R 1-02:18:49      1 sb094
          18015793      long 13_E2_2_    n1234  R 1-02:35:24      1 sb110
          18018643      long dynamics    n1234  R   22:21:16      1 sb100
$

In this case, we choose to cancel the running (state R) job 18018643:

$ scancel 18018643
$

Note In most use cases you are only able to cancel jobs which you have started.

Full information on the scancel command can be obtained while logged in to our HPC facilities with the man command:

$ man scancel


Back to Getting Started