====== HPC Resources & Partitions - Comet ====== * [[https://hpc.researchcomputing.ncl.ac.uk/calc/resources/|Our HPC Resources List]] - Full technical details of all resource types on the Comet HPC * [[#Partitions for Unfunded Projects|Free Partitions]] * [[#Partitions for Funded Projects|Paid Partitions]] * [[#Partition Descriptions|Description of Partitions on Comet]] * [[#short]] * [[#default]] * [[#long]] * [[#highmem]] * [[#gpu-s]] * [[#gpu-l]] * [[#interactive]] * [[#interactive-gpu]] * [[#low-latency]] ===== Partitions for Unfunded Projects ===== These partitions and resources are available to __all__ Comet users. Beyond our [[:policies:acceptable_use|Acceptable Use Policy]] there are no restrictions on the use of these resources. ^ Partition ^ Node Types ^ GPU ^ Max Resources ^ Default Runtime ^ Maximum Runtime ^ Default Memory ^ | short_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 10 minutes | 30 minutes | 1GB per core | | default_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 24 hours | 48 hours | 1GB per core | | long_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 4 days | 14 days | 1GB per core | | highmem_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/7|Large.b]] | No | | 24 hours | 5 days | 4GB per core | | gpu-s_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] | Yes | | 24 hours | 14 days | 2GB per core | | interactive_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 2 hours | 8 hours | 1GB per core | | interactive-gpu_free | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] | Yes | | 2 hours | 8 hours | 2GB per core | ---- ===== Partitions for Funded Projects ===== These partitions are available to all projects who have __allocated funds__ to their Comet HPC Project accounts. If you have not allocated funds to your HPC Project, or your balance is negative then you will not be able to submit jobs to these partitions. For further details on paid resource types, see our [[:policies:billing|Billing & Project Funds]] policy page. ^ Partition ^ Node Types ^ GPU ^ Max Resources ^ Default Runtime ^ Maximum Runtime ^ Default Memory ^ | short_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 10 minutes | 30 minutes | 1GB per core | | default_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 24 hours | 48 hours | 1GB per core | | long_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 4 days | 14 days | 1GB per core | | highmem_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/7|Large.b]] | No | | 24 hours | 5 days | 4GB per core | | gpu-s_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] | Yes | | 24 hours | 14 days | 2GB per core | | gpu-l_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/10|GPU-L]] | Yes | | 24 hours | 14 days | 2GB per core | | interactive_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] | No | | 2 hours | 8 hours | 1GB per core | | interactive-gpu_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] | Yes | | 2 hours | 8 hours | 2GB per core | | low-latency_paid | [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/8|Standard.Lowlatency]] | No | 1024 cores | 24 hours | 4 days | 1GB per core | ---- ===== Partition Descriptions ===== ==== short ==== The **short** partition is intended for quick tests, proof of concept runs, debugging and other tasks which can be completed quickly. It is not intended to run entire compute jobs. == Examples == * Test code runs on a particular node type ==== default ==== The **default** partition has the largest number of general CPU resources in the Comet HPC facility and is intended to run the bulk of our compute workloads outside of the multi-node MPI / low latency and GPU requirements. Default runtime is set to 24 hours and default memory allocation is set to 1GB per allocated CPU core. There is no defined maximum memory allocation - this is limited by the size of the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] nodes it is built on. It is your responsibility to determine the most appropriate runtime (in the range of 0 - 48 hours), required number of CPU cores and memory allocation for your specific application. == Examples == * Most batch compute jobs * Non-low-latency compute * Non-interactive compute * Non-GPU compute ==== long ==== The **long** partition has the same hardware resources as the **default** partition, since it is based on the same number and type of nodes ([[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]]), however the runtime is extended over default; from a default of 4 to a maximum of 14 days. == Examples == * Jobs which require the same type of resources as available in **default**, but which need to run longer than 48 hours. ==== highmem ==== The **highmem** partition allows jobs which need a larger amount of memory to be run. Note that unlike the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/1/2|Standard.a]] compute nodes of Rocket (128GB), the Comet [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/6|Standard.b]] compute nodes are //substantially// larger (1.1TB), and you may not need to use the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/7|Large.b]] compute nodes (1.5TB) for many large jobs. Note that the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/7|Large.b]] compute nodes are //also// connected by a faster network; if you need to run large processes across multiple nodes simultaneously via MPI (and they do not fit the **low-latency** node types), then the use of **highmem** may be an option for you. By default, jobs submitted to the **highmem** partition are able to run longer (up to 5 days) than the **standard** partition (2 days). Though this is not as long as the long partition (14 days). Consider the use of this partition type if your workload needs more than 1TB of memory on a single node, otherwise the **standard** or **long** partitions may be more suitable for you. == Examples == * Jobs needing more than 1TB of RAM on a single node * Large jobs needing to communicate via low-latency networking ==== gpu-s ==== The **gpu-s** partition uses the [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/9|GPU-S]] node type on Comet. These nodes are suitable for most types of GPU-accelerate compute, though please do check whether any of your CUDA/OpenCL code paths require double precision, FP64 capability; the {{ :started:datasheets:nvidia_l40s_datasheet.pdf|Nvidia L40S datasheet}} should be checked, as these cards are restricted in that mode. The nodes hosting the L40S cards are //also// connected via faster networking, just like **highmem** and **low-latency**, so you can also take advantage of faster inter-node communication, if you job spans more than one node, as well as faster IO speeds to/from the main NOBACKUP storage. Any jobs run on the **gpu-s** partition are costed by the number of GPU cards you request. If your job requests //two// cards, then it will cost //twice// as much as a single card in the same amount of time. Users of our paid partitions should be careful when requesting resources via Slurm that you are requesting //and using// what you actually need. Users of our unpaid **gpu-s** partition do not have any costs associated with the use of GPU cards, but the number available is __strictly limited__. Default runtime is up to 24 hours, but you may request up to a maximum of 14 days. The use of GPU resources is closely monitored. == Examples == * Most CUDA/OpenCL compute workloads * FP8, FP16 and FP32 code paths ==== gpu-l ==== The **gpu-l** partition uses a single node; [[https://hpc.researchcomputing.ncl.ac.uk/calc/show/2/10|GPU-L]], which contains a very small number of Nvidia H100 cards. These cards represent some of the most powerful GPU compute options currently available. See {{ :started:datasheets:nvidia_h100_tensor_core_gpu_datasheet.pdf|Nvidia H100 datasheet}} for further information. This node type is also connected to NOBACKUP via faster networking, just as with **gpu-s**, **highmem** and **low-latency** to take advantage of faster IO read/write facilities. As with **gpu-s**, if your job requests //two// cards, then it will cost //twice// as much as a single card in the same amount of time. Users should be careful when requesting resources via Slurm that you are requesting //and using// what you actually need; this partition represents the most costly use of your HPC Project balance. Be certain of your job parameters __before__ you launch a multi-day compute run. There is __no__ unpaid access to the **gpu-l** partition. __All__ users must be members of at least one HPC Project with a positive balance. Default runtime, like **gpu-s** is up to 24 hours, but a maximum of 14 days may be requested. == Examples == * GPU compute code which is reliant on FP64 / double precision models * GPU workloads which do not fit within the on-board memory of the cards in the **gpu-s** partition ==== low-latency ==== ==== interactive ==== ==== interactive-gpu ==== ---- [[:started:index|Back to Getting Started]]