Table of Contents

HPC Service Updates - 2026 - January


(29th) January 2026 - Ongoing issues with job rescheduling and pre-emption

A number of users of Comet have noticed that certain jobs (mainly default_free and long_free, but also seen sporadically elsewhere) have been unexpectedly stopped, paused and rescheduled, even after running for many hours or several days.

This is not expected behaviour and we do not envisage job pre-emption based on any priority levels for the vast majority of Comet. The only area where this is part of the design specification are the nodes which make up the low-latency partition. If these are idle, then they may take up extra load from the default_paid job queue, to prevent them from being under-utilised. This should not be in place anywhere else, so this is not expected behaviour.

We are working with the HPC support vendor to understand why jobs outside of the low-latency partition are being stopped and rescheduled, as this is clearly a waste of compute time for those affected jobs. Once the cause is identified and a solution designed we will update you on the timeline to get this resolved.


(29th) January 2026 - New software added

The following new software has been added by our HPC support vendor:

The list of all software requests can be found on the software page for Comet.


(22nd) January 2026 - Issues with Comet this week

Now that Comet is coming into heavy usage, some new issues have emerged from the end of last week and into this week.

Apologies to those of you who have experienced these and thanks to you all for your patience and for continuing to let us know about problems as they occur (email https://hpc.researchcomputing.ncl.ac.uk/ or log a ticket at NUService).

/tmp space on nodes

Currently, all nodes have both /tmp and /scratch directories (on the node internal fast NVMe drive). /scratch is a very large partition intended for working temporary files. Unfortunately, many applications have been attempting to use the much smaller /tmp directory.

We have seen the /tmp directory on compute nodes filling up, sometimes leading to job fails with error messages relating to failing to create temporary files, as well as more obscure error messages.

What's being done?

Working with our supplier, OCF, we have asked for:

These changes should not affect any jobs, running jobs are allowed to complete but new jobs are not sent to nodes in 'drain'. However it will take some days to complete this change on all nodes.

Issues with Open OnDemand sessions

Various issues have been reported with Open OnDemand VNC desktop, RStudio and Matlab sessions. Most commonly, sessions have failed to start but immediately jumped to 'completed'

The issues have been tracked down to node compute030, which is the first node 'in line' for free sessions in Open OnDemand.

What's being done?

compute030 has been put into 'drain' so that once running jobs have completed it can be rebuilt.

In the meantime, other nodes are now picking up new Open OnDemand sessions and we've had no further reports of issues.

Please do email us at hpc.researchcomputing@newcastle.ac.uk if you notice problems on Comet, even things like missing libraries, which you might have dealt with by local installs on Rocket. We can't promise to fix everything centrally but we do aim to have Comet's core software operating properly.


Return to HPC Service Updates & Project News