Our HPC vendor is in the process of installing the new Matlab 2026 release for us. This will become the default option for new Open OnDemand sessions as well as at the command line. The previous versions of Matlab will remain for the present time.
We are in the process of implementing ANSYS 2025 R2, which will replace the older releases currently available on Comet.
This version is configured specifically to work at the command line and from the Open OnDemand desktop environment. You can read the new ANSYS guide here:
Note that this new version of ANSYS is not installed as a module - since the required software list from ANSYS is truly enormous. There will be a small difference in how you call the various ANSYS modules - all of which are in the process of being documented in the new ANSYS guide for Comet.
Cometlogin01 is now available once more so we again have 2 working login nodes.
Due to ongoing infiniband hardware issues, after the recent reboot of cometlogin01, /nobackup (the lustre filesystem) has been mounted via a slower connection. Both login nodes are now functioning normally and can be used for:
Because cometlogin01 now has a much slower connection to lustre (/nobackup) please do NOT use it for large file transfers. If you are on cometlogin01 and need to transfer large data with /nobackup, please either `ssh cometlogin02` log off and on again or until you get a session on cometlogin02.
Many thanks for your patience and consideration for other users
A change request has been submitted in order to take Comet down for maintenance. 3009741
This will cover the following changes:
Because of the number of changes this will need to go through the NUIT change management process and so is subject to their approval first. The maintenance, because it is substantial, is expected to take Comet out for a full working day (e.g. 9-5).
Until the change is approved we will not have a definite date for the outage - once approved we will look at the current running jobs and schedule the maintenance for as soon as we can, bearing in mind any jobs that are still running in the long partitions can run for up to 14 days. We therefore do not expect the maintenance to be any sooner than 18th of May.
Once we have a decision from the change management board and a date, we will post here and to the HPC-Users email distribution list.
One of our two login nodes, cometlogin01 is down after a crash this morning about 11am.
cometlogin01
Please continue to log in as normal at comet.hpc.ncl.ac.uk but note that you will only be connected to cometlogin02. If possible, please avoid resource heavy activities on the login node while capacity is reduced.
comet.hpc.ncl.ac.uk
Once we have more details and an estimate for the time to resolution we will post a further update.
Freesurfer has now been added to Comet in the form of a new application container.
If you are waiting to access GPU resources on Comet (e.g. the gpu-s_paid, gpu-s_free or gpu-l_paid partitions) then please be aware that our HPC support vendor is currently investigating a network issue affecting all GPU nodes.
It is likely that this network issue (an Infiniband network fabric controller problem) has been the cause of the stuck jobs, dropped Lustre connections and drain states on the GPU nodes.
Since all GPU nodes share the same network fabric backplane, it's not possible for us to drop out the two (current) affected nodes (gpu004 and hgpu001) and replace with the alternatives (since they are all connected to the same impacted network controller).
We have installed vLLM on Comet within a container environment. This is an easy to use LLM inference engine.
Following analysis of the most recent performance report data we are collecting from Comet, we have taken the decision to redistribute some of the compute resources.
A request has been logged with our HPC vendor to move six compute nodes from _paid partitions to the equivalent _free partitions. This will change the resource distribution as follows:
short_paid
default_paid
long_paid
interactive-std_paid
short_free
default_free
long_free
interactive-std_free
You will not need to do anything to make use of these extra resources - your jobs submitted to the various _free partitions will automatically be distributed over the new resources as they become available. The intention is to reduce the waiting time for all free jobs - per-project resource limits on the number of cores and simultaneous jobs are not intended to be increased.
This work is expected to take place over the next few days. This work has been completed and the new node allocations are now in place.
No change is being made to GPU resources (gpu-s_paid, gpu-l_paid, interactive-GPU_paid, gpu-s_free or interactive-GPU_free) at this time.
gpu-s_paid
gpu-l_paid
interactive-GPU_paid
gpu-s_free
interactive-GPU_free
Several users have reported that normal srun sessions are not starting across various partitions.
srun
The typical error will show as follows:
$ srun --partition=gpu-l_paid --account=my_account_name --pty bash srun: job 1337731 queued and waiting for resources srun: job 1337731 has been allocated resources srun: StepId=1337731.0 aborted before step completely launched. srun: Job step aborted: Waiting up to 32 seconds for job step to finish. srun: error: task 0 launch failed: Unspecified error
So far we have identified this happening on the following partitions:
It is not present on:
Due to current resource allocation, we do not have any evidence yet for:
Currently sbatch jobs are unaffected. A software incident has been raised with our HPC support vendor to begin looking at this issue.
sbatch
The software list has now been updated to provide an information page for every software module which was requested to be installed on Comet. There are a small number of software packages which were requested which we have found to be missing - these will be followed up with our HPC vendor for installation.
A new version of the CASTEP container has been released. This is now updated to the latest 26.1.1 release of the application.
Full details are included in the CASTEP user guide wiki page.
Back to HPC News & Changes
Table of Contents
HPC Service
Main Content Sections
Documentation Tools