====== HPC Service Updates - 2026 - January ====== ---- ===== (29th) January 2026 - Ongoing issues with job rescheduling and pre-emption ===== A number of users of Comet have noticed that certain jobs (mainly **default_free** and **long_free**, but also seen sporadically elsewhere) have been unexpectedly stopped, paused and rescheduled, even after running for many hours or several days. This is **not** expected behaviour and we do not envisage job pre-emption based on any priority levels for the vast majority of Comet. The only area where this is part of the design specification are the nodes which make up the **low-latency** partition. If these are idle, then they **may** take up extra load from the **default_paid** job queue, to prevent them from being under-utilised. This should not be in place //anywhere else//, so this is not expected behaviour. We are working with the HPC support vendor to understand why jobs outside of the low-latency partition are being stopped and rescheduled, as this is clearly a waste of compute time for those affected jobs. Once the cause is identified and a solution designed we will update you on the timeline to get this resolved. ---- ===== (29th) January 2026 - New software added ===== The following new software has been added by our HPC support vendor: * System packages (do __not__ need to be loaded via ''module''): screen, tmux, emacs, image magick, bc * New modules * **Gromacs** (Nvidia CUDA / OpenCL), molecular dynamics package: load with ''module load GROMACS/2026-cuda'' or ''module load GROMACS/2026-opencl'' * **Miniforge**, conda tool configured to use the conda-forge software channels: load with ''module load Miniforge'' * **libudunits**, software library: load with ''module load UDUNITS'' * **gdal**, software library: load with ''module load GDAL'' * **lapack**, BLAS library: load with ''module load LAPACK'' * Open requests: * Gaussian / Gauss View * CASTEP * Hybre * Stata * VS Code The list of all software requests can be found on the [[advanced:software_list|software page for Comet]]. ---- ===== (22nd) January 2026 - Issues with Comet this week ===== Now that Comet is coming into heavy usage, some new issues have emerged from the end of last week and into this week. Apologies to those of you who have experienced these and thanks to you all for your patience and for continuing to let us know about problems as they occur (email https://hpc.researchcomputing.ncl.ac.uk/ or log a ticket at NUService). ==== /tmp space on nodes ==== Currently, all nodes have both ''/tmp'' and ''/scratch'' directories (on the node internal fast NVMe drive). ''/scratch'' is a very large partition intended for working temporary files. Unfortunately, many applications have been attempting to use the much smaller /tmp directory. We have seen the ''/tmp'' directory on compute nodes filling up, sometimes leading to job fails with error messages relating to failing to create temporary files, as well as more obscure error messages. === What's being done? === Working with our supplier, OCF, we have asked for: * Set ''TMPDIR'' to point to the ''/scratch'' partition, so that any well-behaved application/library writes to that location instead. This has been completed. * Requested to replace the ''/tmp'' directory with a symlink to ''/scratch''. This work must be done on each node individually, taking them gracefully out of service (drain), making the change and re-instating the node. These changes should not affect any jobs, running jobs are allowed to complete but new jobs are not sent to nodes in 'drain'. However it will take some days to complete this change on all nodes. ==== Issues with Open OnDemand sessions ==== Various issues have been reported with Open OnDemand VNC desktop, RStudio and Matlab sessions. Most commonly, sessions have failed to start but immediately jumped to 'completed' The issues have been tracked down to node ''compute030'', which is the first node 'in line' for free sessions in Open OnDemand. === What's being done? === ''compute030'' has been put into 'drain' so that once running jobs have completed it can be rebuilt. In the meantime, other nodes are now picking up new Open OnDemand sessions and we've had no further reports of issues. Please do [[mailto:hpc.researchcomputing@newcastle.ac.uk|email us]] at hpc.researchcomputing@newcastle.ac.uk if you notice problems on Comet, even things like missing libraries, which you might have dealt with by local installs on Rocket. We can't promise to fix everything centrally but we do aim to have Comet's core software operating properly. ---- [[:status:index|Return to HPC Service Updates & Project News]]