====== Simple Slurm Tools ======
A set of additional tools for working with, reporting on, or interacting with HPC systems using standard tools such as Slurm and lmod.
Most of these tools are not going to be used everyday, but can make it easier to pull data out of Slurm, understand why a job is not running, or analyse system utilisation.
* For more information: https://github.com/megatron-uk/Simple-Slurm-Tools
No //modules// are necessary to run these tools; you only need a version of ''python3'' and the ''module'' commands available.
----
===== shistory =====
The ''shistory'' tools prints some simple time-series metrics against the Slurm job database to visualise job submissions and scheduler utilisation.
Available options:
$ shistory --h
usage: sjobs [-h] [-csv] [-csv_user] [-keyfield KEYFIELD] [-keytype KEYTYPE] [-day] [-week] [-month] [-year] [-periods PERIODS] [-pc PC]
optional arguments:
-h, --help show this help message and exit
-csv Enable CSV summary output only [default disabled].
-csv_user Enable CSV user stats output only [default disabled].
-keyfield KEYFIELD One of [runtime, waittime, cores, nodes, ramcore, cpuhours], [default is runtime].
-keytype KEYTYPE Data type to use for the keyfield, one of [min, max, mean, total], [default is total].
-day Reports are in periods of one day.
-week Reports are in periods of one week [default].
-month Reports are in periods of one month.
-year Reports are in periods of one year.
-periods PERIODS Total number of reporting periods to produce history for [default is 1].
-pc PC Percentile figure for reports [defaults is 75].
The default is to display metrics for a //single week//:
$ shistory
Report period : week
Report count : 1
Percentile : 75%
Please wait, starting retrieval of job data...
Please wait, analysing job data...
Period (type) Jobs - CPUHours - RunTime - WaitTime - Cores - Nodes - RAM/Core -
Total Total Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75%
============= ======= ========== ======================= ======================= =================== ================ ===============================
2026-05-11 week 34754 161367.9 0/ 9045/ 29/ 11 0/ 1440/ 945/ 1211 1/ 256/ 3/ 1 1/ 7/ 1/ 1 128/ 409600/ 2505/ 1024
You can generate reports over multiple periods by adding the ''-periods'' parameter. Example below for //6 days//:
$ shistory -periods 6 -day
Report period : day
Report count : 6
Percentile : 75%
Please wait, starting retrieval of job data...
Please wait, analysing job data...
Period (type) Jobs - CPUHours - RunTime - WaitTime - Cores - Nodes - RAM/Core -
Total Total Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75% Min/Max/Mean/75%
============= ======= ========== ======================= ======================= =================== ================ ===============================
2026-05-12 day 3250 36030.3 0/ 5819/ 66/ 39 9/ 1433/ 976/ 1286 1/ 256/ 6/ 10 1/ 1/ 1/ 1 128/ 409600/ 4653/ 4096
2026-05-13 day 1987 35124.6 0/ 9045/ 120/ 272 0/ 1436/ 834/ 1307 1/ 256/ 9/ 10 1/ 1/ 1/ 1 128/ 409600/ 4119/ 5120
2026-05-14 day 1446 20298.9 0/ 4134/ 95/ 26 8/ 1439/ 760/ 1258 1/ 256/ 5/ 1 1/ 4/ 1/ 1 128/ 44800/ 4024/ 4096
2026-05-15 day 10036 24015.1 0/ 2887/ 16/ 11 0/ 1440/ 1007/ 1247 1/ 256/ 2/ 1 1/ 1/ 1/ 1 128/ 92160/ 1807/ 1024
2026-05-16 day 3124 17523.4 0/ 4236/ 28/ 17 19/ 1333/ 601/ 682 1/ 256/ 2/ 1 1/ 7/ 1/ 1 128/ 44800/ 1497/ 1024
2026-05-17 day 290 11706.9 0/ 2618/ 63/ 0 65/ 1264/ 780/ 788 1/ 200/ 7/ 8 1/ 1/ 1/ 1 1024/ 44800/ 6394/ 8192
Data can be output in CSV format by appending the ''-csv'' option. Example:
$ shistory -periods 6 -day -csv
date,period,jobs (total),cpu hours (total),job runtime (total),job runtime (min),job runtime (max),job runtime (mean),job runtime (75%),job waittime (total),job waittime (min),job waittime (max),job waittime (mean),job waittime (75%),job cores (min),job cores (max),job cores (mean),job cores (75%),job nodes (min),job nodes (max),job nodes (mean),job nodes (75%),ram per core (min),ram per core (max),ram per core (mean),ram per core (75%)
2026-05-12,day,3250,36030.30361111109,214961.3333333321,0.0,5818.75,66.14194871794834,39.13333333333333,3175350.7000000007,10.133333333333333,1434.4833333333333,977.0309846153848,1287.1833333333334,1,256,6.3218461538461534,10,1,1,1.0,1,128.0,409600.0,4652.953025641026,4096.0
2026-05-13,day,1987,35124.617500000124,238749.00000000128,0.0,9044.783333333333,120.15551082033281,271.93333333333334,1659104.0499999863,1.5666666666666667,1436.9833333333333,834.9793910417646,1308.0166666666667,1,256,8.668847508807247,10,1,1,1.0,1,128.0,409600.0,4119.00035686508,5120.0
2026-05-14,day,1446,20298.91138888913,137364.19999999896,0.0,4133.516666666666,94.99598893499237,25.9,1101173.2000000088,8.9,1439.9333333333334,761.5305670816105,1259.3333333333333,1,256,5.166666666666667,1,1,4,1.0020746887966805,1,128.0,44800.0,4023.921991701245,4096.0
2026-05-15,day,10036,24015.085277777747,158294.81666666773,0.0,2887.1833333333334,15.772699946858083,11.1,10103467.599999985,0.25,1439.9333333333334,1006.7225587883604,1247.8666666666666,1,256,1.9364288561179752,1,1,1,1.0,1,128.0,92160.0,1807.4715025906735,1024.0
2026-05-16,day,3124,17523.400000000085,88880.23333333309,0.0,4236.466666666666,28.450778915919685,16.9,1880613.6000000057,19.966666666666665,1334.6833333333334,601.9889884763143,683.15,1,256,1.6840588988476313,1,1,7,1.0057618437900129,1,128.0,44800.0,1497.2560819462228,1024.0
2026-05-17,day,290,11706.935833333335,18410.566666666527,0.016666666666666666,2618.4166666666665,63.48471264367768,0.13333333333333333,226532.0333333323,66.38333333333334,1265.1666666666667,781.1449425287321,788.9166666666666,1,200,7.172413793103448,8,1,1,1.0,1,1024.0,44800.0,6393.820689655173,8192.0
----
===== sjobs =====
The ''sjobs'' tool gives a high level overview of the utilisation of a given Slurm //partition// right __now__. It prints a summary for both //running// and //pending// jobs in the partition including:
* Total running users, jobs, cores in use, memory allocated and current total runtime
* Largest number of cores per job, ram per job, ram per core and longest job runtime
* Average number of cores per job, ram per job, ram per core and average job runtime
To run the command, use the form: ''sjobs PARTITION_NAME''. For example, for ''default_free'':
$ sjobs default_free
-= Running =- -= Pending =-
============= =============
Total users : 12 Total users waiting : 9
Total running jobs : 111 Total waiting jobs : 109
Total allocated cores : 1966 Total requested cores : 1110
Total allocated memory : 16735 GB Total requested memory : 24975 GB
Total runtime : 12372 min Total waiting time : 2714 min
-
Largest job (cores) : 128 Largest waiting job (cores) : 256
Largest job (memory/job) : 1400 GB Largest waiting job (memory/job) : 1400 GB
Largest job (memory/core): 43 GB Largest waiting job (memory/core): 292 GB
Longest job runtime : 1728 min Longest waiting time : 170 min
-
Average job (cores) : 17 Average waiting job (cores) : 10
Average job (memory/job) : 150 GB Average waiting job (memory/job) : 229 GB
Average job (memory/core): 7 GB Average waiting job (memory/core): 10 GB
Average runtime : 111 min Average waiting time : 24 min
$
----
===== sproject =====
The ''sproject'' tool shows a summary of Slurm data for a single Slurm account code (remember, this is the //same// as your HPC project name; either **comet_abc123** or **rocket_abc123**) project. It is used to show:
* The resource limits applied to jobs submitted by this account code
* Which partitions the account code is allowed to submit to
* A summary of current resource demands by all jobs submitted by this account code, and...
* ... all relevant reasons //why// jobs submitted from this account code have not started
To run, the format is: ''sproject PROJECT_NAME''. For example, the project **comet_mopm**:
{{:started:sproject_example.png?1400|}}
In the example above it shows:
* The account code can access all //free// Slurm partitions, but none of the //paid// partitions
* The current Slurm resource limits in place on the account (both per-user and across the group)
* That more GPU cards have been requested than are allowed to be used simultaneously (limit is currently = **1**, but total of **6** have been requested)
* That other jobs are already running, or ready to run, ahead of those from this project (reason = **[Priority]**)
This tool is intended to give members of a project a simple interface to understand the restrictions on their account, as well as a quick way of viewing the reasons why their jobs may not have started yet. The tool will attempt to identify //all// relevant reasons why a job may not be running yet - in some cases there may be //more than one reason//.
----
===== Non-Slurm (but Slurm related) tools =====
Additional tools included in the Simple Slurm Tools software, which, although not directly related to Slurm, still have some use on HPC systems.
----
==== modulespy ====
The ''modulespy'' interrogates a Linux software //module// and recurses through //all// listed dependencies which are necessary in order to use it.
For example, to find **all** of the modules needed in order to load ''Python/3.12.3'':
$ modulespy Python/3.12.3
Searching for all dependencies of: Python/3.12.3
Python/3.12.3 -> GCCcore/13.3.0
Python/3.12.3 -> binutils/2.42-GCCcore-13.3.0
Python/3.12.3 -> bzip2/1.0.8-GCCcore-13.3.0
Python/3.12.3 -> zlib/1.3.1-GCCcore-13.3.0
Python/3.12.3 -> libreadline/8.2-GCCcore-13.3.0
Python/3.12.3 -> ncurses/6.5-GCCcore-13.3.0
Python/3.12.3 -> SQLite/3.45.3-GCCcore-13.3.0
Python/3.12.3 -> XZ/5.4.5-GCCcore-13.3.0
Python/3.12.3 -> libffi/3.4.5-GCCcore-13.3.0
Python/3.12.3 -> OpenSSL/3
binutils/2.42-GCCcore-13.3.0 -> GCCcore/13.3.0
binutils/2.42-GCCcore-13.3.0 -> zlib/1.3.1-GCCcore-13.3.0
zlib/1.3.1-GCCcore-13.3.0 -> GCCcore/13.3.0
bzip2/1.0.8-GCCcore-13.3.0 -> GCCcore/13.3.0
libreadline/8.2-GCCcore-13.3.0 -> GCCcore/13.3.0
libreadline/8.2-GCCcore-13.3.0 -> ncurses/6.5-GCCcore-13.3.0
ncurses/6.5-GCCcore-13.3.0 -> GCCcore/13.3.0
SQLite/3.45.3-GCCcore-13.3.0 -> GCCcore/13.3.0
SQLite/3.45.3-GCCcore-13.3.0 -> libreadline/8.2-GCCcore-13.3.0
SQLite/3.45.3-GCCcore-13.3.0 -> Tcl/8.6.14-GCCcore-13.3.0
Tcl/8.6.14-GCCcore-13.3.0 -> GCCcore/13.3.0
Tcl/8.6.14-GCCcore-13.3.0 -> zlib/1.3.1-GCCcore-13.3.0
XZ/5.4.5-GCCcore-13.3.0 -> GCCcore/13.3.0
libffi/3.4.5-GCCcore-13.3.0 -> GCCcore/13.3.0
$
----
==== modulesearch ====
The ''modulesearch'' tool finds any modules which have the stated module as a dependency.
For example, to find all modules which have ''Python'' as a dependency:
$ modulesearch -m Python
Searching for modules which have the dependency [Python]
Searching using PARTIAL matches
Search results will be held in [modulesearch.out]
Searching through [2214] modules, please wait:
Searched all packages
Found [82] results
Please 'cat modulesearch.out' to see found packages
$
Unlike the other tools, the results from ''modulesearch'' are not output to the terminal. Instead, view the contents of ''modulesearch.out'' (created in the current directory) to find the matching modules:
$ cat modulesearch.out
BeautifulSoup/
BeautifulSoup/4.12.3-GCCcore-13.3.0
Biopython/
Biopython/1.84-foss-2024a
Biopython/1.85-foss-2024a
cffi/
cffi/1.16.0-GCCcore-13.3.0
cryptography/
cryptography/42.0.8-GCCcore-13.3.0
Cython/
Cython/3.0.10-GCCcore-13.3.0
flit/
flit/3.9.0-GCCcore-13.3.0
Flye/
Flye/2.9.5-GCC-13.3.0
fonttools/
fonttools/4.53.1-GCCcore-13.3.0
GitPython/
GitPython/3.1.43-GCCcore-13.3.0
hatch-jupyter-builder/
hatch-jupyter-builder/0.9.1-GCCcore-13.3.0
hatchling/
hatchling/1.24.2-GCCcore-13.3.0
hypothesis/
hypothesis/6.103.1-GCCcore-13.3.0
IPython/
IPython/8.28.0-GCCcore-13.3.0
jedi/
jedi/0.19.1-GCCcore-13.3.0
JupyterLab/
JupyterLab/4.2.5-GCCcore-13.3.0
JupyterLab/4.4.3-GCCcore-13.3.0
jupyter-server/
jupyter-server/2.14.2-GCCcore-13.3.0
lit/
lit/18.1.8-GCCcore-13.3.0
lxml/
lxml/5.3.0-GCCcore-13.3.0
Mako/
Mako/1.3.5-GCCcore-13.3.0
maturin/
maturin/1.6.0-GCCcore-13.3.0
Meson/1.4.0-GCCcore-13.3.0
meson-python/
meson-python/0.16.0-GCCcore-13.3.0
PGAP/
PGAP/2025-05-06
poetry/
poetry/1.8.3-GCCcore-13.3.0
PuLP/
PuLP/2.8.0-foss-2024a
pybind11/
pybind11/2.12.0-GCC-13.3.0
Pysam/
Pysam/0.22.1-GCC-13.3.0
Python-bundle-PyPI/
Python-bundle-PyPI/2024.06-GCCcore-13.3.0
PyYAML/
PyYAML/6.0.2-GCCcore-13.3.0
PyZMQ/
PyZMQ/26.2.0-GCCcore-13.3.0
scikit-build/
scikit-build/0.17.6-GCCcore-13.3.0
scikit-build-core/
scikit-build-core/0.10.6-GCCcore-13.3.0
SciPy-bundle/
SciPy-bundle/2024.05-gfbf-2024a
setuptools-rust/
setuptools-rust/1.9.0-GCCcore-13.3.0
snakemake/
snakemake/8.27.0-foss-2024a
SPAdes/4.1.0-GCC-13.3.0
tornado/
tornado/6.4.1-GCCcore-13.3.0
Unicycler/
Unicycler/0.5.1-gompi-2024a
virtualenv/
virtualenv/20.26.2-GCCcore-13.3.0
wrapt/
wrapt/1.16.0-gfbf-2024a
Z3/
Z3/4.13.0-GCCcore-13.3.0
To find //exact// matches, i.e. including a version suffix, such as ''Python/3.12.3'', use the ''-e'' //exact// option:
$ modulesearch -e -m Python/3.12.3
Searching for modules which have the dependency [Python/3.12.3]
Searching using EXACT matches
Search results will be held in [modulesearch.out]
Searching through [2214] modules, please wait:
Searched all packages
Found [2] results
Please 'cat modulesearch.out' to see found packages
... and the results:
$ cat modulesearch.out
PGAP/
PGAP/2025-05-06
----
[[:started:index|Back to Getting Started]]