Also: “Why is my job not running yet?”
You can get a list of jobs that you have submitted and which are waiting to run by using the squeue -t PD –me command:
squeue -t PD –me
$ squeue -t PD --me JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 18020727 bigmem generati n1234 PD 0:00 1 (Resources) 18020208 bigmem AOH_a5 n1234 PD 0:00 1 (Resources) 17824926 bigmem,de vep4-Sen n1234 PD 0:00 1 (DependencyNeverSatisfied) $
The PD option is a shorthand for jobs of status type PENDING. See man squeue for a full list of all squeue options and job status type codes.
PD
man squeue
squeue
The NODELIST(REASON) column shows the reason that your job is not currently running.
You can use the Simple Slurm Tools utilities to obtain more information about why a job may not be running. In particular the use of sproject may prove useful to explain current restrictions on your project account.
e.g.: sproject PROJECT_NAME:
sproject PROJECT_NAME
The output above shows that for project comet_imlibrw two jobs are pending, both requesting GPU resources, and they are not running because one job is receiving a Node down or drained message, and the other because gpu004 is unavailable.
It also highlights that in this case, the total number of GPU cards requested is beyond the project limits, so not all jobs will run even when the pending reasons are resolved.
For reference, the possible REASON status codes are listed below:
Back to FAQ
Table of Contents
HPC Service
Main Content Sections
Documentation Tools