Table of Contents

HPC Newsletter - 2026/03

Welcome to the March 2026 HPC newsletter. This month we've got a lot of new software packages to announce, and some changes to the HPC website and some new wiki articles for you.

Unfortunately this month we also had a few system problems - ongoing connectivity issues to RDW, file transfer speeds to Lustre and some resilience issues with login nodes and filestore.

HPC Summary for March 2026

Software Changes

New software

There are lots of new software titles on Comet this month. Some have been provisioned using the standard 'module' approach, while many of the more complex packages are now using container environments which are cutting down on the amount of dependency conflicts we see when multiple modules are installed.

Changed software

One bug was fixed in an existing module, and a few new features have been added to one of our previously-published container environments.

If you have any suggestions for additions to the container environments we've made available, please get in touch - we're happy to consider any reasonable request!

Website & Documentation

System Changes

This month we had a period of scheduled maintenance (23rd March) which did not go entirely to plan. This was supposed to include three discrete actions:

Whilst the security bug was fixed, there were issues encountered adding cometlogin02 back in, and we ended up still running on a single login server for a period of time. Although this is now fixed, several days after the maintenance, it resulted in a significantly longer outage than planned. The HPC vendor has offered their apologies for the duration of this outage.

Unfortunately the configuration change to improve Lustre copy speeds resulted in no change. The vendor is now looking at other options to address the bottleneck of getting data on to the /nobackup filesystem.

Another short outage was experienced with the Lustre filesystem recently. The vendor restored the service relatively quickly and have identified the root cause (split-brain condition detected between the Lustre controller pair). A solution is currently being developed to reduce the chance of future outages.

We have also started to move user data, datasets and databases to the /nobackup/shared area for files which are relatively static and can be used across multiple groups. The first example is with the databases used by PGAP, but we're happy to add more examples. This lets other groups share the same data and means that those files are also not subject to the data retention policies which are applied across most project areas. Drop us a message if you are interested in adding shared datasets to this area.

Community Events

The wider RSE team are busy organising a HPC community session which we hope will take place monthly or thereabouts. It will be an opportunity to drop by, talk about what you are using the HPC for, suggesting improvements and engaging with a wider variety of HPC users in the Newcastle University research community. There will be further announcements on this shortly.

Also, a reminder that we have an active “HPC and Code Community” channel in Teams that you can drop in to and chat to others within the Newcastle HPC community

In The Pipeline

Docker is still being trialled - whilst we have been able to test it working alongside Slurm, there are some drawbacks compared to Apptainer and Podman which means it is sub-optimal in a HPC/scheduler environment. We intend to finish testing Docker, document the restrictions/drawbacks in the wiki and then make the new module available for use soon.

In the meantime, if you want to use containers to build your software environments on Comet, Apptainer and Podman are working very well and are being used daily:

We are also investigating Nextflow (https://www.nextflow.io/) and Globus (https://www.globus.org/globus-connect-personal) as two new software packages to bring to Comet. Expect to see and hear more about these next month.

An update to CASTEP is also planned for the coming weeks to bump it up to the recently released 26.1 version from our current 25.12 installation:

Additional system work is planned over the next month, including a possible replacement of the optical uplinks/transceivers that connect Comet to the campus network. Our NUIT network team have detected errors on at least one of the cables and this is thought to be a possible underlying cause of the intermittent dropping of the /rdw/ network filesystem on the login nodes. The NUIT network team are liaising with the HPC vendor to arrange replacement of the cables.


Back to HPC Newsletters