We have published a new Bioapps container image to Comet.
This container has AMD Epyc optimised binaries of most of the bioinformatics software modules which either (a) are installed on Rocket or Comet, or (b) have been requested to be installed. This includes:
If this method of providing this extensive set of software works well, then it is our intention to continue to keep the container updated and move away from the complicated set of dependencies and modules and runtimes needed to install these as independent packages on Comet.
Please consult the Bioapps documentation page and let us know how things work - the issues of conflicting module dependencies when trying to use several of these packages in the same script should be substantially reduced with this method.
In addition, we've provided a simple container helper script for CASTEP and DNAscent - allowing you to source a single shell script and then use a new command we call container.run to run files inside the Apptainer container, complete with all necessary filesystem mounts etc. We hope you agree that running:
$ source /nobackup/shared/containers/container.sh
$ container.run app_name
… is much easier than …
$ module load apptainer
$ apptainer exec --bind /scratch:/scratch --bind /nobackup:/nobackup /nobackup/shared/containers/container.sif app_name
The second login node (cometlogin02) has had a number of load/ssh issues over the past 24 hours. Our HPC vendor is currently rebuilding/reconfiguring the node.
If you are allocated cometlogin02 whilst this work is under way you can connect back to the working (albeit more slowly than usual) cometlogin01 node by simply exiting the SSH session and then connecting again; every second SSH connection should (on average) see you connect to cometlogin01 then cometlogin02 and then back again.
As per our Connecting - Onsite page, you may also choose to connect directly to the first login node, if you wish.
Please bear with us while the second node is reinstated and normal performance is returned.
The software package CASTEP has now been installed on Comet. This has been built as a container image with AMD Epyc architecture optimised builds of LAPACK, OpenBLAS, FFTW3 and more. Both serial and parallel/MPI CASTEP binaries are included.
Please consult our CASTEP guide to get started.
The Avogadro 2 software module (module load Avogadro) now works on non-GPU Open OnDemand desktop sessions as well as GPU. There was a missing dependency on non-GPU desktops which was previously stopping it from starting.
You may still prefer a GPU desktop depending on the level of complexity of your data and your performance needs (it uses OpenGL to render/visualise output).
The Apptainer guide has been update to include an example of how to use overlay images to turn read-only container images into write/read images.
Possibly useful to users of Comet who choose to use one of the (growing) number of Apptainer images provided at /nobackup/shared/containers, but still would like to add their own scripts/enhancements without needing to rebuild the entire base image.
This is based on the guidance available at: https://apptainer.org/docs/user/main/persistent_overlays.html
Please note that in line with University policy, rates for paid resource use have now been updated for the 2026 financial year.
Updated figures can be viewed using the interactive cost calculator tool. The new pricing will be applied against invoices raised from end of February onwards.
Our HPC vendor has now added these new modules:
module load LAPACK/3.12.0module load ChimeraX, and start with the command ChimeraX. Whilst you can run this on any desktop, rendering performance is likely to be too low on non-GPU nodes.module load Avogadro, and start with the command avogadro2. Currently this only runs on GPU nodes (it does not currently work via MESA / VirtualGL; this is being investigated).module load VSCode and then calling code.module load Hypre.module load BOLT-LMM, start with bolt.Having demonstrated that we are able to reach the limits of the network bandwidth between the Comet login node(s) and RDW, we have moved on to measuring performance between the login node and Lustre - thereby removing the external network connectivity from the equation.
Here are the results for the tests transferring the same data from the /scratch area on local NVMe drives to Lustre:
It's easy to see that we are getting the same performance characteristics as the RDW to Lustre results published yesterday. Here they are, side-by-side for comparison:
cp, cat and default dd options are the slowest - ranging from 48 - 94 Megabytes/second.mbuffer in the cat or dd pipelines also shows improvement in throughput.dd block size of 128 Kilobytes, reaching a transfer rate of over 700 Megabytes/second.And one final set of data - Lustre to Lustre operations. This excludes the performance characteristics of all local filesystems and is purely focused on the speed of the Lustre client reading/writing to the Lustre service. No surprises, as this mirrors all other observations we have made so far:
A number of users of Comet have mentioned slower-than-expected speeds downloading data from the main University Research Data Warehouse (RDW). On initial investigation, using rsync to transfer data from RDW to a project folder on Lustre (i.e. in /nobackup) the transfer rate was low, but broadly equivalent (within 10's of megabytes) of what was observed doing the same with Rocket.
However, on closer examination we have found some very surprising results.
First, here are a set of results for transferring a large file from RDW to the local NVMe scratch space (/scratch) on a login node. This will always be the fastest method of getting data on to Comet, as we are only exercising a single network transfer and using the local NVMe drives of the login servers. This demonstrates the capacity of the link between RDW and Comet and/or the rest of the Campus network. RDW is mounted with rsize=1048576,wsize=1048576 params to optimise for large transfers:
Some observations:
cp command the speeds attained are around the 1.9 Gigabytes/second range; exactly what we would expect. rsync, then your speeds typically drop to about 370 Megabytes/second. We would expect lower speeds from rsync than less complex methods, but this does seems lower than expected.dd tool we see speeds steadily increasing as we use successively larger block sizes, transfer rates level off after increasing to 64 Kilobyte blocks.mbuffer) to smooth out reads and writes has no detectable benefit and remains no faster than rsync.
Second, if transferring data from RDW to your home directory on Comet (/mnt/nfs/home), then we observe a very different level of performance. This involves two network filesystems - first to read from RDW and second to write to the NFSv4 mounted home directories. Both RDW and NFS Homes are mounted with the rsize=1048576,wsize=1048576 params to optimise for large transfers:
Observations:
/scratch drives on the login nodes, as we are making an NFS read from RDW to get the data, and then making an NFS write to the Comet home directory server.rsync (around 250 Megabytes/second) and dd with default block sizes (usually 512 bytes as standard, giving ~210 Megabytes/second).dd with a block size of 128 Kilobytes (netting a speed of 530 Megabytes/second, though gains are negligible after 32-64 Kilobytes.dd in conjunction with mbuffer offers no benefits.
Lastly, the data for copying from RDW to Lustre, aka /nobackup. Again RDW is mounted with rsize=1048576,wsize=1048576 for optimal large transfers.
Key points:
cp and cat results in the worst possible speeds - at the lowest point this can be as bad as 40 Megabytes/second.dd with the default 512 Byte block size; this achieves no better than 100 Megabytes/second.rsync instead shows a consistent increase to around 290 Megabytes/second, making this a better option than cp, cat or dd.cat through mbuffer. For example cat /rdw/01/group/file.dat | mbuffer > /nobackup/proj/group/file.dat. This increases speeds again, to around 450-500 Megabytes/second.dd we see substantial, incremental improvements in transfer speed, peaking at more than 850 Megabytes/second, which is the highest result we have observed when copying direct from RDW to Lustre. An example would be: dd if=/rdw/01/group/file.dat of=/nobackup/proj/group/file.dat bs=1M.dd block size beyond 1M results in performance dropping sharply.mbuffer with dd shows inconsistent performance and is not advised.
The test file is ~6000 Megabytes of random, non-compressible data, and was generated with: dd if=/dev/random of=/rdw/1/group/test_data bs=1M count=5000
$SOURCE, a path on RDW, e.g.: /rdw/1/group/test_file$DEST, a path on either local NVMe drives, NFS home, or Lustre: /scratch, /mnt/nfs/home/$USER or /nobackup/proj/PROJ_NAME$BS, either 8k, 16k, 32k, 64k, 128k, 256k, 512k, 1m or 2mtime cp $SOURCE $DESTtime rsync –progress $SOURCE $DESTtime cat $SOURCE > $DEST/test_filetime cat $SOURCE | mbuffer -m 1g -p 10 > $DEST/test_filetime dd if=$SOURCE of=$DEST/test_filetime dd if=$SOURCE of=$DEST/test_file bs=$BStime dd if=$SOURCE bs=$BS | mbuffer -m 1g -p 10 > $DEST/test_filereal time was used to record the duration of the copy.real time = bytes_per_second
If using strace to monitor the read/write calls of the cp command, then it shows when copying from RDW to /scratch the requested block size is 1MB. This matches the RDW NFS mount params:
$ strace -e read,write cp /rdw/03/rse-hpc/test_file /scratch/ 2>&1 | head -15
...
...
read(3, "", 4096) = 0
read(3, ";\362C\233%\364\233Z\233\301}>\22+\t\357_l\302EL\360>\36\310\335.F+\204\244\235"..., 1048576) = 1048576
write(4, ";\362C\233%\364\233Z\233\301}>\22+\t\357_l\302EL\360>\36\310\335.F+\204\244\235"..., 1048576) = 1048576
read(3, "y\241\372;\357\246}\36\235l\0207\23\334\204\217T\366%\343\326\211\n\361J\314{\177\300\306J\235"..., 1048576) = 1048576
write(4, "y\241\372;\357\246}\36\235l\0207\23\334\204\217T\366%\343\326\211\n\361J\314{\177\300\306J\235"..., 1048576) = 1048576
read(3, "*\17\n\330n=i\235\355\214\337\37\263h\25;\333\337\334Yq&y\30\216\3}v\220\371\6\36"..., 1048576) = 1048576
write(4, "*\17\n\330n=i\235\355\214\337\37\263h\25;\333\337\334Yq&y\30\216\3}v\220\371\6\36"..., 1048576) = 1048576
If you do the same to Lustre it's trying to read and write in 4MB blocks:
$ strace -e read,write cp /rdw/03/rse-hpc/test_file /nobackup/proj/comet_training/n1234/1 2>&1 | head -15
...
...
read(3, "", 4096) = 0
read(3, ";\362C\233%\364\233Z\233\301}>\22+\t\357_l\302EL\360>\36\310\335.F+\204\244\235"..., 4194304) = 4194304
write(4, ";\362C\233%\364\233Z\233\301}>\22+\t\357_l\302EL\360>\36\310\335.F+\204\244\235"..., 4194304) = 4194304
read(3, "S\210\25i\1Y\234*dy/\377\324&\332\277\7o/\31\251z\315e\\SEy\232\373d\317"..., 4194304) = 4194304
write(4, "S\210\25i\1Y\234*dy/\377\324&\332\277\7o/\31\251z\315e\\SEy\232\373d\317"..., 4194304) = 4194304
read(3, "\311\350\351\320S\226\372\321\27\266\360z.\30\201\257\317\374\356\271\365;\32\240\367(.\302z;\211\330"..., 4194304) = 4194304
write(4, "\311\350\351\320S\226\372\321\27\266\360z.\30\201\257\317\374\356\271\365;\32\240\367(.\302z;\211\330"..., 4194304) = 4194304
Points to note:
rsync if you have many files to transfer.dd if=sourcefile of=/path/to/destination/outfile bs=128k cp performance to Lustre will be raised with our HPC vendor for further investigation.As per our news article from yesterday, we have now implemented resource caps across the two types of project:
A breakdown of the resource limits will be added to the HPC Resources & Partitions - Comet page so that you understand how these apply to your jobs.
Again, the intention is to enable fairer access to the resources of the Comet HPC facility to the widest range of staff and students as possible. If you need to make a case for a different resource limit for your project, please get in touch.
Now that the changes to the Slurm priorities and cpu over-subscription have been implemented our next phase of development for Comet will focus on the implementation of resource limits for both free and paid partitions.
These changes are intended to allow a fairer distribution of resources across all users and projects. Currently, as per Rocket, there are very few limits in place for the number of simultaneous resources or jobs. Going forwards we will introduce two distinct levels of resource caps:
_free partitions_paid partitionsAs it stands today, the resource limits for all users of Comet are:
| Resource | Unfunded Projects | Funded Projects |
|---|---|---|
| MaxSubmitJobs | 512 | 512 |
| MaxJobs | 256 | 256 |
| CPU | Unlimited | Unlimited |
| Nodes | Unlimited | Unlimited |
| GPU | Unlimited | Unlimited |
| GPU RAM | Unlimited | Unlimited |
| RAM | Unlimited | Unlimited |
Local Disk (e.g. /scratch) | Unlimited | Unlimited |
In the first iteration of the new resource caps, we will be implementing the following limits:
| Resource | Unfunded Projects | Funded Projects |
|---|---|---|
| MaxSubmitJobs | 128 | 512 |
| MaxJobs | 256 | 1024 |
| CPU | 512 | 2048 |
| Nodes | Unlimited | Unlimited |
| GPU | 1 | 8 |
| GPU RAM | Unlimited | Unlimited |
| RAM | Unlimited | Unlimited |
| Local Disk | Unlimited | Unlimited |
These limits will be per project, and are intended to stop the situation where a small number of users are able to monopolise almost the entire resources of a given partition (either paid or free). We will notify all registered users via the HPC-Users distribution list once these resource limits are in place. In most cases the only impact this will have on your jobs is that they may need to queue for a little longer if you are submitting a large number at the same time, again, this is to allow more users and projects to have a better chance of accessing the resources at the same time.
Those projects contributing towards the operation of Comet are given the higher resource cap for the duration that their funding balance remains positive.
As posted earlier, the planned maintenance to the Slurm configuration is now underway (starting at 9:15am). We expect this to be completed shortly and afterwards any jobs in the PENDING state will automatically be released.
This work should resolve the CPU over-subscription issues which have been encountered, as well as the rare case which has resulted in jobs being stopped and rescheduled due to priority levels.
10:13am - Update
The change has now been implemented and the maintenance reservation window will shortly be removed. Any pending jobs should automatically restart.
We will be monitoring resource allocation closely over the next few hours/days - if you spot any cases which you believe may stem from CPU over-subscription again, please do let us know. The same goes for any jobs in any partition other than low-latency and default_paid which get suspended or rescheduled; please let us know as a priority.
A number of users have reported strange write issues on Lustre (/nobackup). These appear to manifest in text editors not being able to write content on the Lustre filesystem, but some tools like echo, cat and the standard shell redirection (> and ») are seemingly unaffected.
Kernel messages (.e.g. dmesg) on affected nodes are showing a number of Lustre warnings and errors that have occurred this afternoon.
An incident has been raised with our vendor to assess the situation and provide a fix.
15:18pm - Update
Our support vendor indicates that extreme load levels on one of the Lustre storage appliances may have been the cause of this incident. The affected services have been restarted and nodes appear to be reading/writing the /nobackup filesystem normally again. We will be following up with our vendor to get a more detailed explanation of the high load scenario and to determine how to prevent this from happening again.
Orca (https://www.faccts.de/orca/) is now installed on Comet and can be loaded as follows:
module load Orca
Please note the specific upper-case first character of the name - ORCA or orca will not load it.
The vendor has restored the Comet login service (e.g. ssh comet.hpc.ncl.ac.uk). Unfortunately it appears that the SSH host key fingerprints for one of the login servers have been lost.
If you attempt to log in to Comet now you will see a warning from your SSH client looking like this:
$ ssh comet.hpc.ncl.ac.uk
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the ED25519 key sent by the remote host is
SHA256:ABCDEFGHIJKLMNOPQRTU12345678.
Please contact your system administrator.
$
This is expected, since the original fingerprints of that server have now changed. To resolve this one-time issue, run the following command on your Linux or Mac OS device:
$ ssh-keygen -f $HOME/.ssh/known_hosts -R comet.hpc.ncl.ac.uk
If you are using an alternative SSH client on a Windows platform, the error message from your software (e.g. PuTTY, Mobaxterm or similar) should indicate the equivalent command to run.
An issue has been identified with the Comet login nodes. The usual access method of ssh comet.hpc.ncl.ac.uk or ssh comet.ncl.ac.uk is not working.
Please use the direct connection alternatives: ssh cometlogin02.comet.hpc.ncl.ac.uk or ssh cometlogin01.comet.hpc.ncl.ac.uk to temporarily bypass the faulty configuration - the incident has been raised with our HPC vendor who are attempting to fix as a priority this afternoon.
Following the unintended configuration of Slurm priorities and pre-emption rules we have requested that our HPC vendor make the following changes to the operation of Comet:
In addition, we are taking the opportunity to make a better distribution of the available compute node resources. The following changes will be made to partitions:
hmem009 and hmem010) to be added to short_free, long_free and interactive-std_free partitions. Giving a total of 9 compute nodes / 2304 cores across all free partitions.hmem001-004 and hmem006-008) to be added to short_paid, long_paid and interactive-std_paid partitions. Giving a total of 39 compute nodes / 9984 cores across all paid nodes, plus a further 4 nodes accessed from the low-latency_paid partition if they are idle for a total combined core count of 11008.The design intention of Comet was to put most of our compute resource into standard compute nodes (i.e. not low-latency), as the historical data from operating Rocket indicated most of our workloads fit into that classification. However we do have some users who need large scale parallel jobs, and that is what the low-latency_paid partition is for. Since we don't run low latency jobs most of the time we wanted the ability to use that resource when it is not being used for its original purpose.
The job pre-emption rules allow for this, and the specification as set for Comet stated:
Spare capacity in the low-latency_paid partition can be used by default_paid jobs to prevent it sitting idle and allow for 'burst' expansion of the default_paid partition… but this capacity must be evacuated and priority given over if a job is submitted to the low-latency_paid partition which would require them.
Jobs submitted to short_paid and long_paid are not subject to this configuration, neither are jobs submitted to any of the free partitions.
This does mean that if the default_paid partition is at capacity, your job may run on extra capacity provided by low-latency_paid, but it is in danger of being stopped/rescheduled if a real low-latency job is submitted. You should always consider adding checkpointing to your jobs to allow resuming from a service interruption.
The work to make the changes outlined above will be carried out on the morning of Wednesday 11th of February, at 9:00am. A resource reservation window has been put in place to prevent any jobs from running/starting during this piece of work. We expect the change to be implemented within a few minutes, but it involves a restart/reload of the Slurm service on each node, so we do not want to risk causing faults with running jobs at that time.