With an HPC facility, unlike running code on your own desktop or laptop, you do not directly access the various hardware which makes up the system.
Instead, in most cases, you instruct a special software service, known as the scheduler, to run applications on your behalf. The scheduler analyses the type of job you want to run, taking note of the type and quantity of resources you indicate it needs (number of processors, amount of memory, etc) and finds the best place to fit your job from the available servers which make up the HPC facility.
Since the scheduler does this on behalf of all users, it is able to efficiently coordinate the running of thousands of jobs at the same time, managing the distribution of jobs across the facility to make the best use of the available resources, and find space for your jobs to fit.
Without having an overview of everything that is currently running, or is waiting to run, if you were to try running jobs yourself you would likely find situations where you started a job on a server that didn't have enough free RAM, or where all of the processors were already allocated to existing jobs - leading to a conflict over available resources and jobs not running efficiently.
On a small system where there are only a small number of physical servers this would seem more complicated than needed, but once you start using systems that have dozens, hundreds or even thousands of physical servers, it rapidly becomes impractical for one person to keep track of what jobs are running where, and what the capabilities of each server are. The scheduler does this for you.
We use Slurm; an open-source scheduler which is very popular, but almost every HPC facility will use a scheduler of some sort, and most of them are comparable in terms of functionality. For more technical implementation details regarding Slurm, you can access the website operated by the maintainers of the software:
You can find out more about the various types of scheduling software at the independent High Performance Computing information wiki:
Read on to cover the most common Slurm commands you will encounter in your use of HPC facilities at Newcastle University.
For the vast majority of those using HPC facilities at Newcastle University you will probably only ever need to use a very small handful of the possible Slurm commands. These are:
Full information on the sbatch command can be obtained while logged in to our HPC facilities with the man command:
$ man sbatch
Full information on the srun command can be obtained while logged in to our HPC facilities with the man command:
$ man srun
The squeue command will, by default, show a list of all currently running jobs that the scheduler knows about, for all users, as well as those jobs which are currently waiting to run.
$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
17864081 bigmem bd6_job abc12345 R 23:23:02 1 xln02
17864079 bigmem bd5_job abc12345 R 1-09:31:39 1 xln01
17857465 bigmem bd3_job abc12345 R 3-11:36:38 1 ln04
17864082 bigmem bd7_job abc12345 R 2-17:26:22 1 ln02
17864072 bigmem bd4_job abc12345 R 2-18:41:22 1 ln03
17867113 bigmem USTAR1 n12345 R 1-05:31:14 1 mb01
17824926 bigmem,de vep4-Sen n45678 PD 0:00 1 (DependencyNeverSatisfied)
17878168 defq f3dyn.sh bcd23456 PD 0:00 4 (Resources)
17878169 defq q5R8e5r1 n67832 PD 0:00 4 (Resources)
17880197 defq highrevo defg78901 PD 0:00 4 (Resources)
17877696 defq T1_9_VNS n87654 PD 0:00 2 (Dependency)
17877695 defq T1_9_VNS n87654 PD 0:00 2 (Dependency)
17877637 defq T1_9_VNS n87654 PD 0:00 2 (Dependency)
17877469 defq T1_7_VNS n87654 PD 0:00 2 (Dependency)
17878216 defq nemo_72. n91267 R 3:03:47 1 sb059
17864625 defq run6-Por nabc123 R 22:51:51 12 sb[049-051,053,055-057,060-062,075,080]
$
You can easily restrict the output to a single user:
$ squeue --user=abc12345
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
17864081 bigmem bd6_job abc12345 R 23:23:02 1 xln02
17864079 bigmem bd5_job abc12345 R 1-09:31:39 1 xln01
17857465 bigmem bd3_job abc12345 R 3-11:36:38 1 ln04
17864082 bigmem bd7_job abc12345 R 2-17:26:22 1 ln02
17864072 bigmem bd4_job abc12345 R 2-18:41:22 1 ln03
$
Or look for just the waiting jobs:
$ squeue --states=PD
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
17824926 bigmem,de vep4-Sen n123213 PD 0:00 1 (DependencyNeverSatisfied)
17882234 defq kk.sh n1225 PD 0:00 6 (Resources)
17878168 defq f3dyn.sh b7891234 PD 0:00 4 (Resources)
17878169 defq q5R8e5r1 n9927 PD 0:00 4 (Resources)
17880197 defq highrevo b7891234 PD 0:00 4 (Resources)
17877696 defq T1_9_VNS n5627 PD 0:00 2 (Dependency)
17877695 defq T1_9_VNS n5627 PD 0:00 2 (Dependency)
17877637 defq T1_9_VNS n5627 PD 0:00 2 (Dependency)
17877469 defq T1_7_VNS n5627 PD 0:00 2 (Dependency)
17772908 defq,long Birmingh a1234567 PD 0:00 1 (DependencyNeverSatisfied)
17772519 defq,long Finland_ a1234567 PD 0:00 1 (DependencyNeverSatisfied)
17763133 defq,long UK1_OSst a1234567 PD 0:00 1 (DependencyNeverSatisfied)
17777214 defq,long OSstatus a1234567 PD 0:00 1 (DependencyNeverSatisfied)
17777213 defq,long OSstatus a1234567 PD 0:00 1 (DependencyNeverSatisfied)
$
Common state codes are:
Full information on the squeue command, and possible status codes, can be obtained while logged in to our HPC facilities with the man command:
$ man squeue
The sinfo command shows you the general status of the HPC cluster partitions, which nodes are available in each partition as well as the default runtime limits for each.
Each line of output from sinfo shows the status of one or more hosts against a partition:
$ sinfo
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
defq* up 2-00:00:00 1 down* sb044
defq* up 2-00:00:00 31 mix sb[008,012-015,024-025,027,031,040-041]
defq* up 2-00:00:00 78 alloc sb[001-007,009-011,016-023,026,028-030,032-039]
short up 10:00 1 down* sb044
short up 10:00 37 mix ln[02-04],mb01,sb[008,012-015,024-025,027,031,040-041],xln[01-02]
short up 10:00 78 alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047]
short up 10:00 6 idle ln01,mb02,mn[01-04]
long up 30-00:00:0 1 down* sb044
long up 30-00:00:0 31 mix sb[008,012-015,024-025,027,031,040-041,048,063-064,070-071,076,081,083-087,090-096,106]
long up 30-00:00:0 78 alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047,049-062,065-069]
interactive up 1-00:00:00 37 mix ln[02-04],mb01,sb[008,012-015,024-025,027,031,040-041,048,063-064,070-071,076],xln[01-02]
interactive up 1-00:00:00 78 alloc sb[001-007,009-011,016-023,026,028-030,032-039,042-043,045-047,049-062,065-069,072-075,]
interactive up 1-00:00:00 6 idle ln01,mb02,mn[01-04]
bigmem up 14-00:00:0 6 mix ln[02-04],mb01,xln[01-02]
bigmem up 14-00:00:0 6 idle ln01,mb02,mn[01-04]
$
Looking at the output above, it shows that for the short partition we have the following:
This can be useful when scheduling jobs, or deciding on a particular partition/resource type to use.
Full information on the sinfo command can be obtained while logged in to our HPC facilities with the man command:
$ man sinfo
The sacct command is superficially similar to squeue, but is primarily used to retrieve data on historic jobs. You can extract information from previous jobs to understand how they ran and what resources they used, possibly as a means to improve the effectiveness of future jobs.
By default, sacct will show all jobs for the current user (including finished, waiting and running), in the current time window (which is approximately equivalent to the current day). With no other parameters the output shows basic information about the job (job ID, group name, number of nodes, job status):
$ sacct
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
17784183 T2_Decay long mygroup 88 TIMEOUT 0:0
17784183.ba+ batch mygroup 44 CANCELLED 0:15
17784183.0 Foucault.+ mygroup 88 CANCELLED 0:15
17794687_2 Artifi_Mo+ long mygroup 8 RUNNING 0:0
17794687_2.+ batch mygroup 8 RUNNING 0:0
17794688_1 Artifi_Mo+ long mygroup 8 RUNNING 0:0
17794688_1.+ batch mygroup 8 RUNNING 0:0
17794688_2 Artifi_Mo+ long mygroup 8 RUNNING 0:0
17794688_2.+ batch mygroup 8 RUNNING 0:0
17851999 RESPONSE_+ long mygroup 1 RUNNING 0:0
17851999.ba+ batch mygroup 1 RUNNING 0:0
17854203 job_3_FC long mygroup 32 RUNNING 0:0
17854203.ba+ batch mygroup 32 RUNNING 0:0
17854203.0 interIsoF+ mygroup 32 RUNNING 0:0
17854204 job_2_FC long mygroup 32 RUNNING 0:0
$
A common option is to limit the results by job status. Here's an example which reports on jobs which failed due to insufficient memory allocation in the last 7 days:
$ sacct --state=OOM --starttime now-7days --endtime now
JobID JobName Partition Account AllocCPUS State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
17789551_142 QTABmelt defq mygroup 4 OUT_OF_ME+ 0:125
17789551_14+ batch mygroup 4 OUT_OF_ME+ 0:125
17789551_144 QTABmelt defq mygroup 4 OUT_OF_ME+ 0:125
17789551_14+ batch mygroup 4 OUT_OF_ME+ 0:125
17789551_153 QTABmelt defq mygroup 4 OUT_OF_ME+ 0:125
$
You may then choose to get more detailed information on one of those jobs which failed due to OUT_OF_MEMORY by using the -j
(job ID) and -l
(long output) parameters:
$ sacct -j 17789551 -l
JobID JobIDRaw JobName Partition MaxVMSize MaxVMSizeNode MaxVMSizeTask AveVMSize MaxRSS MaxRSSNode MaxRSSTask AveRSS MaxPages MaxPagesNode MaxPagesTask AvePages MinCPU MinCPUNode MinCPUTask AveCPU
NTasks AllocCPUS Elapsed State ExitCode AveCPUFreq ReqCPUFreqMin ReqCPUFreqMax ReqCPUFreqGov ReqMem ConsumedEnergy MaxDiskRead MaxDiskReadNode MaxDiskReadTask AveDiskRead MaxDiskWrite MaxDiskWriteNode MaxDiskWriteTask
AveDiskWrite ReqTRES AllocTRES TRESUsageInAve TRESUsageInMax TRESUsageInMaxNode TRESUsageInMaxTask TRESUsageInMin TRESUsageInMinNode TRESUsageInMinTask TRESUsageInTot TRESUsageOutMax TRESUsageOutMaxNode TRESUsageOutMaxTask TRESUsage
OutAve TRESUsageOutTot
------------ ------------ ---------- ---------- ---------- -------------- -------------- ---------- ---------- ---------- ---------- ---------- -------- ------------ -------------- ---------- ---------- ---------- ---------- ---------- -
------- ---------- ---------- ---------- -------- ---------- ------------- ------------- ------------- ---------- -------------- ------------ --------------- --------------- -------------- ------------ ---------------- ---------------- -
------------- ---------- ---------- -------------- -------------- ------------------ ------------------ -------------- ------------------ ------------------ -------------- --------------- ------------------- ------------------- ---------
------ ---------------
17789551_43+ 17789992.ba+ batch 142804K sb099 0 142804K 3494420K sb099 0 3494420K 0 sb099 0 0 02:44:08 sb099 0 02:44:08
1 4 00:45:30 COMPLETED 0:0 74K 0 0 0 2500Mc 0 1.16M sb099 0 1.16M 0.00M sb099 0
0.00M cpu=4,mem+ cpu=02:44:08,+ cpu=02:44:08,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=02:44:08,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=02:44:08,+ energy=0,fs/di+ energy=sb099,fs/di+ fs/disk=0 energy=0,
fs/di+ energy=0,fs/di+
17789551_435 17789993 QTABmelt defq
4 02:42:35 OUT_OF_ME+ 0:125 Unknown Unknown Unknown 2500Mc
billing=4+ billing=4+
17789551_43+ 17789993.ba+ batch 142804K sb099 0 142804K 10220208K sb099 0 10220208K 0 sb099 0 0 09:56:26 sb099 0 09:56:26
1 4 02:42:35 OUT_OF_ME+ 0:125 12K 0 0 0 2500Mc 0 1.16M sb099 0 1.16M 0.00M sb099 0
0.00M cpu=4,mem+ cpu=09:56:26,+ cpu=09:56:26,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=09:56:26,+ cpu=sb099,energy=+ cpu=0,fs/disk=0,m+ cpu=09:56:26,+ energy=0,fs/di+ energy=sb099,fs/di+ fs/disk=0 energy=0,
fs/di+ energy=0,fs/di+
17789551_436 17789997 QTABmelt defq
4 00:01:21 COMPLETED 0:0 Unknown Unknown Unknown 2500Mc
billing=4+ billing=4+
$
Parsing sacct output
If you want to parse the above long-format output in a script, or paste into a spreadsheet, then adding the –parsable
option will embed a '|' character between each field so that the output is more easily parsed by column.
Full information on the sacct command can be obtained while logged in to our HPC facilities with the man command:
$ man sacct
The scancel command allows you to remove pending jobs from the scheduler queue, as well as request any running job to be stopped.
First, find the Job ID of the process you want to remove from the queue, or stop (replace n1234 with your normal University IT account):
$ squeue -u n1234
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
18020902 long 13_alcoh n1234 R 4:34 1 sb027
18015794 long 13_SN2_1 n1234 R 1-02:18:49 1 sb094
18015793 long 13_E2_2_ n1234 R 1-02:35:24 1 sb110
18018643 long dynamics n1234 R 22:21:16 1 sb100
$
In this case, we choose to cancel the running (state R
) job 18018643:
$ scancel 18018643
$
Note In most use cases you are only able to cancel jobs which you have started.
Full information on the scancel command can be obtained while logged in to our HPC facilities with the man command:
$ man scancel