====== Resources and Partitions ====== All Slurm jobs //cost// resources. Some of the hardware resources we make available are //free//, whereas some are reserved for //paying// projects. It is important to understand that the resources you request in your Slurm job directly impact the costs incurred by your project. We use the term **Compute Hours** to track the resources used in your Slurm jobs. **Our Methodology** When we calculate how many resources a Slurm job has used we use the following mechanism: > ''Number of resources * Job time in hours = Hours of Compute Resources'' **CPU Based Slurm Jobs** In the case of a Slurm job which only uses CPU resources, this becomes: > ''Total number of CPU cores * Hours = Total Hours of CPU Compute Resource'' **GPU Based Slurm Jobs** In the case of a Slurm job using GPU resources, the calculation is: > ''Total number of GPU cards * Hours = Total Hours of GPU Compute Resource'' Note, as above, memory, disk storage or application type is __not__ a factor in the //cost// of your Slurm jobs. We track hours of compute use for //all// jobs, not just those using //paid// compute resources. Those users who are a member of one or more HPC Projects can see a breakdown of their resource use by project under the [[https://hpc.researchcomputing.ncl.ac.uk/projects/|My HPC Projects]] section of this website. * [[#Resources]] - The resources which you can configure in your Slurm job files * [[#Partitions]] - Depending on which resources your job needs, you will need to choose a particular //partition// to send your job to ---- ===== Resource Types ===== * [[#CPU Cores]] * [[#GPU Cards]] * [[#Nodes]] * [[#RAM]] * [[#Runtime]] ==== CPU Cores ==== The amount of CPU cores allocated to each //task// of a job is configured with the ''-c'' or ''--cpus-per-task'' parameter in your Slurm job file: #SBATCH --cpus-per-task Unless you request multiple nodes, this parameter indicates how many free CPU cores on a single server need to be free for your job to be started. Our largest node type for the **Comet** HPC facility can offer up to 256 cores on a single node. If you request more cpu cores than are available on a single server, without indicating your job can run on multiple nodes, then your job may never start. Note: Jobs with //multiple tasks// are discussed in [[advanced:slurm|Advanced Slurm Job Optimisation]]. The [[started:comet_resources|Comet Resources]] and [[started:rocket_resources|Rocket Resources]] pages list the maximum number of CPU cores which can be requested by partition type. ---- ==== GPU Cards ==== #SBATCH --gres= GPU resources are requested by a named type and quantity, and //not// just a quantity. For example, to request a //single// Nvidia V100 which would normally be available on the **power** partition on **Rocket**: #SBATCH --gres=gpu:Tesla-V100-SXM2-16GB:1 To request 4x Nvidia H100 cards on the **Comet** facility, as available in the **gpu-l_paid** partition: #SBATCH --gres=gpu:H100-96GB:4 Consult the [[faq:011|FAQ]] to understand GPU naming conventions, and which of our HPC node/partition types they are available on. ---- ==== Nodes ==== The number of physical nodes that your job needs in order to run is specified with the ''-n'' or ''--nodes'' parameter: #SBATCH --nodes= The [[started:comet_resources|Comet Resources]] and [[started:rocket_resources|Rocket Resources]] pages list the maximum number of nodes which can be requested by partition type. ---- ==== RAM ==== [[:started:first_job|Until now]] we have looked at the most simplistic method of requesting memory - the total amount required by node that we run on. This uses the ''-m'' or ''--mem'' parameter: #SBATCH --mem= The value expressed is an integer, defaulting to Megabytes. You can add a suffix of K, M, G or T to explicitly request Kilobytes, Megabytes, Gigabytes or Terabytes. To calculate how much memory you need in total for the above method: ''number_of_nodes * memory_per_node = total memory'' == RAM per core == Depending on your application/code, you may need more control over the amount of RAM allocated. It is possible to allocate RAM based on the number of CPU cores allocated - this may be useful if each job that you run is allocated one or more CPU cores, and //each// of those needs a specific amount of RAM: #SBATCH --mem-per-cpu= Calculating total RAM allocation is as follows: ''total_number_of_cpu_cores * memory_per_cpu_core = total memory'' If you have requested more than one node, then the calculation becomes: ''(total_number_of_nodes * total_number_of_cpu_cores) * memory_per_cpu_core = total memory'' == RAM per gpu == As with CPU core based RAM allocation, it is also possible to allocate RAM based on the number of GPU cards your job requests. Again, this may be desirable if your application needs a specific amount of RAM for each part which is working on a particular GPU card: #SBATCH --mem-per-gpu= Calculating total memory allocation is therefore: ''total_number_of_gres_cards * memory_per_cpu_core = total memory'' The [[started:comet_resources|Comet Resources]] and [[started:rocket_resources|Rocket Resources]] pages list the maximum amount of RAM per node and RAM per job which can be requested by partition type. ---- ==== Runtime ==== Set the expected runtime of your job with the ''--t'' or ''--time'' parameter. The ''time'' syntax is any of: * ''days-hours'' * e.g. 04-16 (4 days, 16 hours) * ''days-hh:mm:ss'' * e.g. 5-01:15:00 (5 days, 1 hour, 15 minutes) * ''hh:mm:ss'' * e.g. 19:30:59 (19 hours, 30 minutes, 59 seconds) #SBATCH --time= If you do not specify a runtime for your job, it will be assigned the default value for the partition you submit to. Choose a partition which matches your required runtime - jobs which are started and are likely to need __more__ runtime than allocated cannot be altered and //will// fail. If your required runtime is //greater// than any available in our standard partitions, see our [[faq:001|FAQ]] for how to proceed. The [[started:comet_resources|Comet Resources]] and [[started:rocket_resources|Rocket Resources]] pages list the maximum runtime which is permitted by partition type. ---- ===== Partitions ===== Our [[started:first_job|First]] and [[started:first_job|Second]] Job scripts did not specify a partition - instead the scheduler allocated it to the default. However, now that we have started to explore requesting specific resources, we need to consider carefully //which is the most appropriate partition// to send our jobs to. The [[started:comet_resources|Comet Resources and Partitions]] and [[started:rocket_resources|Rocket Resources and Partitions]] pages detail which partitions are available in each facility. You must study the tables for each and decide which is the most appropriate for your job, based on the resources (CPU, GPU, RAM, runtime, nodes) that //your job// requires. You define the partition your job is submitted to using the ''-p'' or ''--partition'' parameter: #SBATCH --partition= For example, to submit your job to the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/7|Large.b]] memory nodes on **Comet**, using the smaller number of //free// nodes: #SBATCH --partition=highmem_free Another example, submitting the job to the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] gpu nodes on **Comet** using the larger //paid// group of nodes: #SBATCH --partition=gpu-s_paid It is important to note that access to **paid** partitions is only available to [[started:paying|HPC project teams and their members who have a positive credit balance remaining]]. ---- [[:started:index|Back to Getting Started]]