This page is intended to act as a timeline of events for the Comet HPC project, as well as major changes in functionality or policies relating to the system.
This week we will start the journey to full production release of Comet.
This will be a staggered approach; initially we will be limiting new project signups to 20 or so individuals / teams who have approached us over the previous months. We will invite those users to register new HPC Projects for Comet and start the process of managing their teams, getting access to Comet and using the resources. We anticipate these 20 projects to be spread over a week or two to minimise the chance of disruption or bottlenecks caused by a mass enrolment of new projects and users. Anyone included in this set of users/projects will receive an email invite and instructions within the next 48 hours.
We expect these new project registrations to begin no later than Friday 21st November.
Once this phase of new projects is registered and working we will then move to the wider release of Comet to the rest of the University research community. This will also involve the registration of all the legacy Rocket HPC Projects and their activation on Comet. This will enable users in existing Rocket projects to move their data and workloads from the old to new system.
We expect the migration of Rocket projects and full release of Comet to the entire University research community to start no later than the first week of December.
During this beta test / initial project registration phase it is possible that an owner of a HPC Project you are a member of may use the 'notify' function within the HPC Portal to start sending reminders to members of their projects - but please remember you'll only be able to log in once a project you are a member of is live and you have passed the HPC Driving Test; Rocket projects will not be live until after this phase.
To be clear - you do not have to re-register all of your Rocket projects - this will be done for you. A small amount of maintenance may be required to re-active your Rocket projects at the time we announce the migration from Rocket to Comet, but this should be limited to, in most cases, updating your project descriptions and/or ensuring you have project owners listed.
The date for Rocket project/data migration will be announced within the next two weeks.
We are now moving into the next phase of the Comet HPC project.
Starting next week (the week ending Friday 21st November) we will be contacting a number of people who have asked to create HPC projects, but which we have been unable to register (due to the freeze of new accounts on Rocket since this summer). We will prioritise approximately 20 such requests which have been on hold since the summer.
Over the next two weeks we will invite a number of those outstanding requests to re-register using Comet, following the new process on the HPC Portal website. Once registered they will be able to manage additional project members and start working on Comet using the new resources and software. The RSE team will be available throughout this period to provide guidance and support during the initial registration process and when it comes to using the new project management tools directly within the HPC Portal website itself - unlike Rocket, use of the ITService ticketing system or the Grouper group management system is not necessary to sign up for, or to manage team members of your projects on Comet.
Assuming that all goes well with the registration of those initial projects, we will look to move as quickly as possible to full release of Comet to the entire research community. All users will then have the opportunity to register new HPC project accounts and to start the process of migrating any existing Rocket projects and data to the new facility.
In anticipation of the release of the HPC Project registration workflow we urge everyone to sign up for and complete the HPC Driving Test as soon as possible; once Comet is fully launched you must have a HPC Driving Test pass in order to request a new project or to log in and use the system. Doing this now will save you time when it comes to migrate from Rocket in the very near future.
In summary - the project phases now look like this:
As soon as the system launches in full production mode we would encourage all existing Rocket users to begin the process of migrating existing data and workflows as an urgent priority due to the lack of support, parts and general low levels of reliability with that system.
Beginning early next week those users shortlisted to be part of the initial round of new HPC Project registrations will be contacted directly with instructions on how to start the process.
We will be making a change to the cometloginaccess user group today - this will change the current, manually-defined membership list to one which is generated by being a member of an active, HPC Project under My HPC Projects. This is, of course, in preparation for the move to the Beta testing phase and the approach of the production launch of the Comet service.
cometloginaccess
All early-access users should retain login permissions as they are already members of at least one test project (comettestgroup1 - 3).
Please get in touch if you encounter any issues logging in from 6th November onwards.
The changes to Slurm accounts on Comet (mandatory submission of -account code, and membership of _free or _free and _paid partitions) will be implemented this week after an initial test on Friday 31st October.
-account
_free
_paid
You may temporarily lose access while your new Slurm permissions are installed and your test permissions (membership of hpcusers and no restrictions on partitions) removed. Once the changes are implemented our early-access users must include -account in their job submissions, but should otherwise not notice any difference (i.e. your comettestgroup1-3 account codes will still grant access to everything).
hpcusers
comettestgroup1-3
This is in prep for the first full-day HPC training workshop on November 11th.
We will shortly be starting work on implementing the changes needed to take the Slurm configuration from the current test configuration to one which is necessary to operate Comet in production.
Current test environment configuration:
Intended production configuration:
This effectively means the enforcement of account codes (since your 'default' account will have no permissions), and the restriction of jobs submitted to paid queues to those projects which are (a) active, (b) funded, and (c) have a positive remaining balance.
Early access users are in groups which are, for all reasonable purposes, considered funded (your membership of comettestgroup1, comettestgroup2 or comettestgroup3). So early access users will still be able to submit against paid partitions with those account codes until we move to beta testing and real project registrations.
comettestgroup1
comettestgroup2
comettestgroup3
However, you may temporarily lose the ability to submit Slurm jobs over the coming days whilst we work on this implementation.
We are rapidly approaching the launch of Comet - the replacement HPC facility at Newcastle University.
Firstly our (longer than anticipated!) acceptance testing of the new infrastructure has been completed, and we are pleased to announce that the system has passed all test criteria. We are now working with an expanded group of early-access users from across the research community to test more 'real world' workloads. This will continue for another few weeks.
Performance Testing
The system has exceeded all performance criteria which was set during procurement, and the aggregate math/floating point performance for the facility was recorded as exceeding 1120 TFLOPS (1127.2, to be precise).
As a like-for-like comparison the CPU models used in most of the Rocket nodes are: Xeon E5-2699 v4 = 0.75-1.0 TFLOPS per CPU, x2 CPU per server for 1.5-2.0 TFLOPS per node
The actual benchmarks run on Comet show the following: AMD Epyc 9745 CPU = 9 TFLOPS per CPU, x2 CPU per server for 18 TFLOPS per node
Additionally, Rocket has ~5400 CPU cores (averaging 44 cores per node), whilst Comet has more than 14000 (with 256 cores per node). CPU based performance therefore should be substantially improved in almost all workloads.
RAM has also increased from 128GB in most nodes of Rocket to 1.5TB in everything but the GPU nodes (they are 768GB instead). As a result, almost every node type of Comet can be considered equal or superior to the old 'bigmem' nodes of Rocket.
High speed file storage (/nobackup) has been increased; now ~2.2PB, compared to the ~500TB of Rocket.
Local scratch drives (/scratch) on all nodes have also been swapped to solid-state NVMe units, so also much, much faster than Rocket.
GPU Capabilities
Whilst the performance of the replacement GPU cards (32x Nvidia L40S 48GB, and 4x H100 96GB) were not part of our performance testing criteria, a number of the members of our early-access and testing community have benchmarked these against figures available from the GPU cards (4x Nvidia Tesla V100 16GB) available on Rocket and on the BEDE Tier 2 facility, showing significant performance gains in all areas.
New Features
New features not available on Rocket have also been implemented and extensively tested, as a quick summary:
We have prepared documentation for using Open OnDemand inside our documentation wiki: https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/doku.php?id=advanced:interactive
Both container toolsets which have been installed also support the creation of new containers directly on the HPC itself; something many of our contemporaries do not have available - but we understand the availability/support of Linux environments across the University can make creating/managing these containers somewhat difficult locally, right now. Hence we've gone a step further, to allow you to create containers on Comet itself; and not just run them - all without the need for sudo or additional security permissions.
You can find our documentation for containers on Comet here: https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/doku.php?id=advanced:containers
NOTE: Our HPC vendor is still working on Docker implementation, and we expect this to continue as we move into the migration phase and official Comet service launch.
HPC Portal / Documentation Website
The https://hpc.researchcomputing.ncl.ac.uk website replaced the old Rocket pages on the NUIT website and has continued to be updated throughout the summer with documentation written to specifically support Comet, including the new features described above. This should be your one-stop shop for all things HPC at Newcastle University.
We have also written extensive 'Getting Started', 'Software', 'FAQ' and 'Policies' sections which should cover most of the questions we've been asked over the years in relation to HPC.
Our documentation is available here: https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/
For those of you planning to include HPC costs in your projects and grant proposals, the website features an interactive 'cost calculator' to help you plan the financial envelope for your predicted HPC resource use, somewhat modelled on the ARCHER2 application form (those of you who have applied for ARCHER2 access will be familiar with this process of predicting the type and quantity of resources needed) - many of you have already made use of this, and we're happy to provide guidance to anyone who needs it.
You can access the cost calculator at any time here: https://hpc.researchcomputing.ncl.ac.uk/calc/
Also released this summer was the 'HPC Driving Test' - this is a mandatory requirement that all HPC users must take (and pass!) before access to Comet will be approved.
The driving test is online: https://hpc.researchcomputing.ncl.ac.uk/quiz/
I'd urge everyone to try the practice quiz and attempt the HPC Driving Test as soon as possible - given how close we are to launching the Comet service. The intention is not to 'gatekeep' the facility, but to ensure that everyone who has access to the resources also has the core knowledge to make use of them effectively and with minimal disruption to other users.
To be clear once more: You will not be allowed to access Comet until you have passed the HPC Driving Test.
Also, once the Comet service changes to production, you will manage all aspects of your HPC projects through the 'My HPC Projects' area of the website (not yet accessible), including:
Expected Timeline / Next Steps
The RSE team run the first of the new 'Introduction HPC' workshops on November 11th, and the signup for this event has been substantial! This will be the first of our new events to take place on Comet (compared to Rocket), and we hope that it will offer new users a glimpse of what is possible using HPC facilities at Newcastle.
The first of a series of induction events for existing Rocket/HPC users is planned for the beginning of December; we are running a number of these over the coming months as an opportunity for existing Rocket users to engage with us, see others in the Newcastle HPC community and to discuss what changes and new facilities are available on Comet.
We expect to start allowing registration of new HPC projects towards the end of November - as this has been paused over the summer while we completed the integration of Comet. This will be the first opportunity for the research community to move data and compute workflows over to Comet. The initial number of projects/users in this phase will be limited so that we are not overwhelmed with queries related to possibly teething troubles within the first few weeks.
All being well, at that point we will then open up the 'My HPC Projects' section of the new website, and this will allow you to manage your existing Rocket projects and request new ones. This will be the trigger to switch to Comet as the supported HPC facility, and Rocket will then move on to a wind-down process.
Ideally, we would like to complete the migration of all Rocket projects by end of calendar year due to the ongoing fragility of Rocket; though there are factors in this process which are outside of our control (i.e. hardware failures).
Expect a further update as we move into November and have more concrete initial date for registration of new projects and then the secondary date for the start of your Rocket project migrations.
The RSE team are currently preparing for the first full-day HPC training workshop to be delivered using Comet. The workshop is scheduled for November 11th and is for new users of HPC in Newcastle.
This will be the first 'live' use of Comet, so staff are working hard to ensure that the teaching material works correctly and reliably. As previously mentioned to early-access users, there may be some configuration changes / interruptions to the early-access service needed to move towards this more production-like setting.
If the workshop event works as expected, this will give us a good indication that the service is ready to move towards beta-testing and the subsequent registration of new HPC Projects in November.
Our HPC vendor is now in the process of implementing High-Availability across the two login servers. This means that instead of having to manually type the address of one server or the other, you will soon be able to just use the system alias comet.hpc.ncl.ac.uk and you will get an SSH connection, regardless of which login server is available.
Our Connection Details wiki page has been updated with the new hostnames to use - these should be available to use within the next few working days these are available to use now.
Email invites to Comet alpha testers and early access users will be going out shortly. The email will describe how to apply and what features will be available; additional, early-access information is available on our Comet early-access users wiki page.
Early-access will be for approximately 4 weeks; running until end of October. If you are part of the early access users then you can access and use all of the features of Comet free of charge.
Once early-access is finished we will move to beta testing and invite new HPC projects and groups to sign up and use the facility in a production manner. Based on current timelines this is likely to be early November.
[As of Monday 6th October the email invitations to alpha-testers & early-access users have now been sent.]
Our proposed plan for October is to open the Comet facility up for alpha testing for a period of 4-6 weeks.
This will allow all of our test users as well as an extended list of experienced HPC users to use the system in a free-form manner, try out code and push the limits of what works. Here is a summary of what we expect alpha testing to entail:
$HOME
/nobackup
At the end of the alpha testing window we will remove all content created under the shared test directories and all provisioned user accounts will be removed, in preparation for running the service in production mode - this will revert to requiring passing the HPC driving test and registration of new HPC projects as we move forward.
Those identified as alpha testers will be contacted shortly and then be given the information needed in order to access the service. If you are not contacted then you will have a further opportunity to get early access during the beta testing phase, possibly as a member of the selected group of HPC projects who will be invited to use the system in a production manner. Finally, after beta testing all users will be invited to move from Rocket to Comet and transfer their data and workflows as we fully transition to a production service.
Podman container support is now fully tested and integrated - you can create and manage containers on Comet login nodes and run those container images across any of our compute nodes; including for GPU jobs; many of our users have been requesting this, so it's a big step forwards.
Job checkpointing is also implemented, so you can stop and resume your Slurm jobs at any point if you follow the instructions.
Only a few remaining test criteria remain at this point - we have over 400 individual test criteria records logged and now less than 20 remaining. We expect to finish test sign-off very shortly and move into Alpha testing for most of October, where our testers will stress the system, try out all of the new tools and have an opportunity to do things ahead of our formal opening to our Beta tester projects.
Beta testing (likely starting late October / early November) will then be invited to use the system in a more production-like manner ahead of the full release of the facility to the University and migration requests to be sent to Rocket users.
Hardware OpenGL / Nvidia 3D visualisation is now implemented on the Open OnDemand VNC Desktop Session (GPU) session types. This means applications can take advantage of the Nvidia GPU cards for hardware accelerated rendering and display output.
Apptainer/Singularity container support is now 100% implemented and working across all compute nodes. We remain working on Podman container support which is also nearing completion. Jupyter Lab and RStudio are also now added to our interactive session options via Open OnDemand.
We are in the process of testing the Slurm job checkpointing implementation.
The RSE team and academic colleagues have moved to testing higher level application functionality; container technology, interactive applications (Matlab), software 3D visualisation and commercial software packages are installed.
Dear HPC and Research community colleagues,
The installation and commissioning of the replacement High Performance Computer facility, Comet, is continuing to progress.
We are now at a position where we are testing functionality and performance of the new hardware, and this continues to make progress against the extensive list of criteria set in our agreement with the supplier. Whilst we are not quite at the point where we can give you a definite date for the availability of the new system, we are confident it will be ready in autumn 2025. with the information below gives you the broad outline of the schedule over the coming months.
In order to progress with the remainder of the project schedule, the academic steering group for AHPC within Newcastle have decided that we must now suspend the creation, on Rocket, of new HPC projects and project registration requests. This is so we can reduce the efforts required to continue to support Rocket (an aging system with the capacity to pull in a huge level of support time) and focus our efforts on readying Comet for use. New project creation will resume later this summer (on Comet, instead of Rocket) towards the end of the Beta Testing phase as we move into the migration of Rocket projects to Comet.
Comet hardware was installed, basic operating system functionality and network connectivity implemented.
This is our current stage of work. We are methodically testing both hardware and basic system functions, as well as higher level capabilities such as the scheduler, GPU and CPU compute functions and storage performance. The testing involves co-operation between the system vendor, the Research Software Engineering team, and academic colleagues from across the University.
Once we complete user acceptance testing, we will invite a number of expert HPC users to explore the performance and functionality of the new system. This will be an opportunity to test in a sandpit environment the new software, tools and performance available. During this period system configuration is likely to be in flux; changing job queues, software and configuration as we shape the system to more accurately cater for the needs of our users and the rest of the Newcastle University infrastructure. Any work during this period will be considered 'at risk' and the system will not be suitable for 'production' work due to possibility of missing software.
Alpha testers will be expected to work largely without support during this time. We expect alpha testing to last up to a month.
After alpha testing has been completed, and any configuration changes that it highlights have been applied, we move into the beta testing phase. During this time we will invite a wider range of existing Rocket projects or new projects who have been waiting to start over the summer to apply for accounts on Comet.
Beta testers will experience the Comet service in a 'production' setting; use of the new HPC portal website (https://hpc.researchcomputing.ncl.ac.uk) to manage their projects and team members, usage reports and metrics, the introduction of job queues which will match production (including 'free' and 'paid' jobs and billing). Most software and functionality (including the new interactive jobs support and container technology) will be in place at this point.
The amount of production, or live projects, created on Comet will be strictly limited at this point.
Beta testers will be able to call upon RSE support resource at regular drop-in surgeries, though will be expected to be able to access and use HPC systems independently; this phase will not be suitable for novice or new users. We expect beta testing to also last approximately one month.
After the beta testing phase has completed, and we are confident that Comet is working in a 'production' manner, we will start the process of migration of projects and users from Rocket.
The lifespan of Rocket will be very limited at this point, and the project timeline states that Rocket will be shut down THREE MONTHS after the completion of user acceptance testing, listed previously. Once we move to the migration phase you will be given a slot during which you will be required to move over any code from Rocket to Comet, and an expectation that any further jobs will run on Comet. Your account on Rocket will then be shut down.
Again, drop-in surgeries will be available to provide advice to projects and users who are asked to migrate, but we cannot move files or data on your behalf.
If you take no action during this phase, then your code, data and jobs on Rocket will be lost at the point it is shut down. There will be no exceptions for extending Rocket. It is already beyond end of life.
Please note that all users will be required to undertake (and pass) an online HPC knowledge test before an account will be provisioned for you on Comet. Further information will be published on this test in due course, and you are advised to take the test as soon as it becomes available to coincide with the start of the Beta Testing stage. To prepare your project for migration, you may wish to move historical data to RDW and scripts to GitHub.
A range of induction sessions for existing HPC users, and our Introduction to HPC training workshops for new users will be available from the start of the migration phase.
More details on Comet, our revamped support documentation and our workshop offerings are available on the new HPC support portal:
- https://hpc.researchcomputing.ncl.ac.uk
Sincerely,
HPC Working Group and Support Team
hpc.researchcomputing@newcastle.ac.uk Research Software Engineering https://rse.ncldata.dev/contact
Back to HPC Documentation Home
Table of Contents
Main Content Sections
Documentation Tools