All Slurm jobs cost resources. Some of the hardware resources we make available are free, whereas some are reserved for paying projects. It is important to understand that the resources you request in your Slurm job directly impact the costs incurred by your project.
We use the term Compute Hours to track the resources used in your Slurm jobs.
Our Methodology
When we calculate how many resources a Slurm job has used we use the following mechanism:
Number of resources * Job time in hours = Hours of Compute Resources
CPU Based Slurm Jobs
In the case of a Slurm job which only uses CPU resources, this becomes:
Total number of CPU cores * Hours = Total Hours of CPU Compute Resource
GPU Based Slurm Jobs
In the case of a Slurm job using GPU resources, the calculation is:
Total number of GPU cards * Hours = Total Hours of GPU Compute Resource
Note, as above, memory, disk storage or application type is not a factor in the cost of your Slurm jobs.
We track hours of compute use for all jobs, not just those using paid compute resources. Those users who are a member of one or more HPC Projects can see a breakdown of their resource use by project under the My HPC Projects section of this website.
The amount of CPU cores allocated to each task of a job is configured with the -c or –cpus-per-task parameter in your Slurm job file:
-c
–cpus-per-task
#SBATCH --cpus-per-task
Unless you request multiple nodes, this parameter indicates how many free CPU cores on a single server need to be free for your job to be started. Our largest node type for the Comet HPC facility can offer up to 256 cores on a single node.
If you request more cpu cores than are available on a single server, without indicating your job can run on multiple nodes, then your job may never start.
Note: Jobs with multiple tasks are discussed in Advanced Slurm Job Optimisation.
The Comet Resources and Rocket Resources pages list the maximum number of CPU cores which can be requested by partition type.
#SBATCH --gres=
GPU resources are requested by a named type and quantity, and not just a quantity. For example, to request a single Nvidia V100 which would normally be available on the power partition on Rocket:
#SBATCH --gres=gpu:Tesla-V100-SXM2-16GB:1
To request 4x Nvidia H100 cards on the Comet facility, as available in the gpu-l_paid partition:
#SBATCH --gres=gpu:H100-96GB:4
Consult the FAQ to understand GPU naming conventions, and which of our HPC node/partition types they are available on.
The number of physical nodes that your job needs in order to run is specified with the -n or –nodes parameter:
-n
–nodes
#SBATCH --nodes=
The Comet Resources and Rocket Resources pages list the maximum number of nodes which can be requested by partition type.
Until now we have looked at the most simplistic method of requesting memory - the total amount required by node that we run on. This uses the -m or –mem parameter:
-m
–mem
#SBATCH --mem=
The value expressed is an integer, defaulting to Megabytes. You can add a suffix of K, M, G or T to explicitly request Kilobytes, Megabytes, Gigabytes or Terabytes.
To calculate how much memory you need in total for the above method:
number_of_nodes * memory_per_node = total memory
Depending on your application/code, you may need more control over the amount of RAM allocated. It is possible to allocate RAM based on the number of CPU cores allocated - this may be useful if each job that you run is allocated one or more CPU cores, and each of those needs a specific amount of RAM:
#SBATCH --mem-per-cpu=
Calculating total RAM allocation is as follows:
total_number_of_cpu_cores * memory_per_cpu_core = total memory
If you have requested more than one node, then the calculation becomes:
(total_number_of_nodes * total_number_of_cpu_cores) * memory_per_cpu_core = total memory
As with CPU core based RAM allocation, it is also possible to allocate RAM based on the number of GPU cards your job requests. Again, this may be desirable if your application needs a specific amount of RAM for each part which is working on a particular GPU card:
#SBATCH --mem-per-gpu=
Calculating total memory allocation is therefore:
total_number_of_gres_cards * memory_per_cpu_core = total memory
The Comet Resources and Rocket Resources pages list the maximum amount of RAM per node and RAM per job which can be requested by partition type.
Set the expected runtime of your job with the –t or –time parameter. The time syntax is any of:
–t
–time
time
days-hours
days-hh:mm:ss
hh:mm:ss
#SBATCH --time=
If you do not specify a runtime for your job, it will be assigned the default value for the partition you submit to.
Choose a partition which matches your required runtime - jobs which are started and are likely to need more runtime than allocated cannot be altered and will fail.
If your required runtime is greater than any available in our standard partitions, see our FAQ for how to proceed.
The Comet Resources and Rocket Resources pages list the maximum runtime which is permitted by partition type.
Our First and Second Job scripts did not specify a partition - instead the scheduler allocated it to the default.
However, now that we have started to explore requesting specific resources, we need to consider carefully which is the most appropriate partition to send our jobs to.
The Comet Resources and Partitions and Rocket Resources and Partitions pages detail which partitions are available in each facility. You must study the tables for each and decide which is the most appropriate for your job, based on the resources (CPU, GPU, RAM, runtime, nodes) that your job requires.
You define the partition your job is submitted to using the -p or –partition parameter:
-p
–partition
#SBATCH --partition=
For example, to submit your job to the Large.b memory nodes on Comet, using the smaller number of free nodes:
#SBATCH --partition=highmem_free
Another example, submitting the job to the GPU-S gpu nodes on Comet using the larger paid group of nodes:
#SBATCH --partition=gpu-s_paid
It is important to note that access to paid partitions is only available to HPC project teams and their members who have a positive credit balance remaining.
Back to Getting Started
Table of Contents
Main Content Sections
Documentation Tools