====== HPC Service Status ======

This page is intended to act as a timeline of events for the Comet HPC project, as well as major changes in functionality or policies relating to the system.
===== (18th) November 2025 - Beta testing & Initial Project Registrations =====

This week we will start the journey to full production release of Comet.

This will be a staggered approach; initially we will be limiting new project signups to 20 or so individuals / teams who have approached us over the previous months. We will invite those users to register new HPC Projects for Comet and start the process of managing their teams, getting access to Comet and using the resources. We anticipate these 20 projects to be spread over a week or two to minimise the chance of disruption or bottlenecks caused by a mass enrolment of new projects and users. Anyone included in this set of users/projects will receive an email invite and instructions within the next 48 hours.

We expect these new project registrations to begin no later than **Friday 21st November**.

Once this phase of new projects is registered and working we will then move to the wider release of Comet to the rest of the University research community. This will also involve the registration of __all__ the legacy Rocket HPC Projects and their activation on Comet. This will enable users in existing Rocket projects to move their data and workloads from the old to new system.

We expect the migration of Rocket projects and full release of Comet to the entire University research community to start no later than the first week of December.

During this beta test / initial project registration phase it is possible that an owner of a HPC Project you are a member of may use the //'notify'// function within the HPC Portal to start sending reminders to members of their projects - but please remember you'll __only__ be able to log in once a project you are a member of is live __and__ you have passed the HPC Driving Test; Rocket projects will __not__ be live until //after// this phase.

<WRAP important round>
To be clear - you do __not__ have to re-register all of your Rocket projects - this will be done for you. A small amount of maintenance may be required to re-active your Rocket projects __at the time we announce the migration from Rocket to Comet__, but this should be limited to, in most cases, updating your project descriptions and/or ensuring you have project owners listed.

The date for Rocket project/data migration will be announced within the next two weeks.
</WRAP>

===== (14th) November 2025 - Moving towards production status =====

We are now moving into the next phase of the Comet HPC project. 

Starting next week (the week ending **Friday 21st November**) we will be contacting a number of people who have asked to create HPC projects, but which we have been unable to register (due to the freeze of new accounts on Rocket since this summer). We will prioritise __approximately 20 such requests__ which have been on hold since the summer.

Over the next two weeks we will invite a number of those outstanding requests to re-register using Comet, following the new process on the HPC Portal website. Once registered they will be able to manage additional project members and start working on Comet using the new resources and software. The RSE team will be available throughout this period to provide guidance and support during the initial registration process and when it comes to using the new project management tools directly within the HPC Portal website itself - unlike Rocket, use of the ITService ticketing system or the Grouper group management system is not necessary to sign up for, or to manage team members of your projects on Comet.

Assuming that all goes well with the registration of those initial projects, we will look to move as quickly as possible to full release of Comet to the entire research community. All users will then have the opportunity to register new HPC project accounts and to start the process of migrating any existing Rocket projects and data to the new facility.

<WRAP important round>
In anticipation of the release of the HPC Project registration workflow we urge everyone to sign up for and complete the HPC Driving Test as soon as possible; once Comet is fully launched __you must have a HPC Driving Test pass in order to request a new project or to log in and use the system__. Doing this now will save you time when it comes to migrate from Rocket in the very near future.
</WRAP>

In summary - the project phases now look like this:

  * Installation of hardware - completed
  * Base functionality sign-off - completed
  * Alpha users / Early-access users - completed
  * First workshop session - completed
  * Beta users / project registration backlog - to start before Friday 21st November
  * Full production launch - contingent on successful registration of Beta users, but no later than first week of December

As soon as the system launches in full production mode we would encourage all existing Rocket users to begin the process of migrating existing data and workflows as an urgent priority due to the lack of support, parts and general low levels of reliability with that system.

Beginning early next week those users shortlisted to be part of the initial round of new HPC Project registrations will be contacted directly with instructions on how to start the  process.

===== (6th) November 2025 - Changes to Comet login groups and user accounts =====

We will be making a change to the ''cometloginaccess'' user group today - this will change the current, manually-defined membership list to one which is generated by being a member of an **active**, HPC Project under [[https://hpc.researchcomputing.ncl.ac.uk/projects/|My HPC Projects]]. This is, of course, in preparation for the move to the Beta testing phase and the approach of the production launch of the Comet service.

All early-access users should retain login permissions as they are already members of at least one test project (comettestgroup1 - 3).

Please [[:contact:index|get in touch]] if you encounter any issues logging in from 6th November onwards.

===== (3rd) November 2025 - Slurm accounts to change this week =====

The changes to Slurm accounts on **Comet** (mandatory submission of ''-account'' code, and membership of ''_free'' **or** ''_free'' //and// ''_paid'' partitions) will be implemented **this week** after an initial test on Friday 31st October.

You may temporarily lose access while your new Slurm permissions are installed and your test permissions (membership of ''hpcusers'' and no restrictions on partitions) removed. Once the changes are implemented our early-access users **must include** ''-account'' in their job submissions, but should otherwise not notice any difference (i.e. your ''comettestgroup1-3'' account codes will still grant access to everything).

This is in prep for the first full-day HPC training workshop on November 11th.

   * A new FAQ entry for possible account code related errors: [[:faq:032|Why do I get an invalid account error when submitting my job?]]
   * ... and a FAQ entry for checking your account code / partition access rights:[[:faq:033|How can I tell which account codes I can use and which partitions I can submit to?]]
===== (31st) October 2025 - Upcoming changes to Comet Slurm accounts and partitions =====

We will shortly be starting work on implementing the changes needed to take the Slurm configuration from the current test configuration to one which is necessary to operate **Comet** in production.
 
Current **test environment** configuration:
  * All users are members of the 'hpcusers' Slurm account code as their **default** account
  * The 'hpcusers' account code can submit **any** number of jobs to **all** partitions; both __paid__ and __free__
  * Inclusion of an account code is **not** enforced upon job submission
  * Members of the 'hpcusers' account are **never** disabled or removed

Intended **production** configuration:

  * All users will be made members of the 'allusers' Slurm account code as their **default** account
  * The 'allusers' account code will **not** have any job submission permissions
  * Every live HPC Project a user is a member of will be added as an account code (if you are in 3 projects, you'll have access to 3 account codes)
  * Every live (funded or unfunded) HPC Project/account code will be given permission to submit to every **free** partition
  * Every live funded HPC Project/account code with a positive remaining balance will be given permission to submit to every **paid** partition
  * Every live funded HPC Project/account code with a zero or negative balance will have their permission to submit to paid partitions **revoked** at the start of every week
  * Every inactive or archived HPC Project/account code will be removed from Slurm to remove the ability to submit jobs against it

This effectively means the enforcement of account codes (since your 'default' account will have no permissions), and the restriction of jobs submitted to paid queues to those projects which are (**a**) active, (**b**) funded, and (**c**) have a positive remaining balance.
 
Early access users are in groups which are, for all reasonable purposes, considered funded (your membership of ''comettestgroup1'', ''comettestgroup2'' or ''comettestgroup3''). So early access users will still be able to submit against paid partitions with those account codes **until** we move to beta testing and real project registrations.
 
However, you may __temporarily__ lose the ability to submit Slurm jobs over the coming days whilst we work on this implementation. 
 

===== (29th) October 2025 - Message to HPC-Users Community =====

We are rapidly approaching the launch of Comet - the replacement HPC facility at Newcastle University.

Firstly our (longer than anticipated!) acceptance testing of the new infrastructure has been completed, and we are pleased to announce that the system has passed all test criteria. We are now working with an expanded group of early-access users from across the research community to test more 'real world' workloads. This will continue for another few weeks.

**Performance Testing**

The system has exceeded all performance criteria which was set during procurement, and the aggregate math/floating point performance for the facility was recorded as exceeding 1120 TFLOPS (1127.2, to be precise).

As a like-for-like comparison the CPU models used in most of the Rocket nodes are:
Xeon E5-2699 v4 = 0.75-1.0 TFLOPS per CPU, x2 CPU per server for 1.5-2.0 TFLOPS per node

The actual benchmarks run on Comet show the following:
AMD Epyc 9745 CPU = 9 TFLOPS per CPU, x2 CPU per server for 18 TFLOPS per node

Additionally, Rocket has ~5400 CPU cores (averaging 44 cores per node), whilst Comet has more than 14000 (with 256 cores per node). CPU based performance therefore should be substantially improved in almost all workloads. 

RAM has also increased from 128GB in most nodes of Rocket to 1.5TB in everything but the GPU nodes (they are 768GB instead). As a result, almost every node type of Comet can be considered equal or superior to the old 'bigmem' nodes of Rocket.

High speed file storage (/nobackup) has been increased; now ~2.2PB, compared to the ~500TB of Rocket.

Local scratch drives (/scratch) on all nodes have also been swapped to solid-state NVMe units, so also much, much faster than Rocket.

**GPU Capabilities**

Whilst the performance of the replacement GPU cards (32x Nvidia L40S 48GB, and 4x H100 96GB) were not part of our performance testing criteria, a number of the members of our early-access and testing community have benchmarked these against figures available from the GPU cards (4x Nvidia Tesla V100 16GB) available on Rocket and on the BEDE Tier 2 facility, showing significant performance gains in all areas.

**New Features**

New features not available on Rocket have also been implemented and extensively tested, as a quick summary:

   * Apptainer containerisation - running Apptainer/Singularity/simple Docker containers under Slurm. This is a replacement for the old, unsupported, Singularity containers and also runs most Docker build scripts. Now fully supported on all Comet nodes.
   * Podman containerisation - running Apptainer/Singularity/more complex Docker containers under Slurm. A more advanced container toolset, with better compatibility with some of the more advanced Docker build scripts than Apptainer. Fully supported on all Comet nodes.
   * Open OnDemand - a new graphical, interactive environment for running Linux/X11 desktop applications. Initially launching with RStudio, MATLAB and Jupyter (with more to come). There is no longer any need to tunnel X11 through SSH and no need to work around the HPC terms of use to launch long-running Jupyter/R processes on the login nodes - this is a fully supported solution to those issues. This also includes both software and hardware accelerated OpenGL capabilities.

We have prepared documentation for using Open OnDemand inside our documentation wiki:
https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/doku.php?id=advanced:interactive

Both container toolsets which have been installed also support the creation of new containers directly on the HPC itself; something many of our contemporaries do not have available - but we understand the availability/support of Linux environments across the University can make creating/managing these containers somewhat difficult locally, right now. Hence we've gone a step further, to allow you to create containers on Comet itself; and not just run them - all without the need for sudo or additional security permissions.

You can find our documentation for containers on Comet here:
https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/doku.php?id=advanced:containers

NOTE: Our HPC vendor is still working on Docker implementation, and we expect this to continue as we move into the migration phase and official Comet service launch.

**HPC Portal / Documentation Website**

The https://hpc.researchcomputing.ncl.ac.uk website replaced the old Rocket pages on the NUIT website and has continued to be updated throughout the summer with documentation written to specifically support Comet, including the new features described above. This should be your one-stop shop for all things HPC at Newcastle University.

We have also written extensive 'Getting Started', 'Software', 'FAQ' and 'Policies' sections which should cover most of the questions we've been asked over the years in relation to HPC.

Our documentation is available here:
https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/

For those of you planning to include HPC costs in your projects and grant proposals, the website features an interactive 'cost calculator' to help you plan the financial envelope for your predicted HPC resource use, somewhat modelled on the ARCHER2 application form (those of you who have applied for ARCHER2 access will be familiar with this process of predicting the type and quantity of resources needed) - many of you have already made use of this, and we're happy to provide guidance to anyone who needs it. 

You can access the cost calculator at any time here:
https://hpc.researchcomputing.ncl.ac.uk/calc/

Also released this summer was the 'HPC Driving Test' - this is a mandatory requirement that all HPC users must take (and pass!) before access to Comet will be approved. 

The driving test is online:
https://hpc.researchcomputing.ncl.ac.uk/quiz/

I'd urge everyone to try the practice quiz and attempt the HPC Driving Test as soon as possible - given how close we are to launching the Comet service. The intention is not to 'gatekeep' the facility, but to ensure that everyone who has access to the resources also has the core knowledge to make use of them effectively and with minimal disruption to other users.

To be clear once more: You will not be allowed to access Comet until you have passed the HPC Driving Test.

Also, once the Comet service changes to production, you will manage all aspects of your HPC projects through the 'My HPC Projects' area of the website (not yet accessible), including:

  * Registering new HPC projects
  * HPC project member management
  * Compute resource reports
  * Actual/indicative costs of resource use
  * Electrical consumption and CO2 production figures of your resource use
  * Detailed job analytics

**Expected Timeline / Next Steps**

The RSE team run the first of the new 'Introduction HPC' workshops on November 11th, and the signup for this event has been substantial! This will be the first of our new events to take place on Comet (compared to Rocket), and we hope that it will offer new users a glimpse of what is possible using HPC facilities at Newcastle. 

The first of a series of induction events for existing Rocket/HPC users is planned for the beginning of December; we are running a number of these over the coming months as an opportunity for existing Rocket users to engage with us, see others in the Newcastle HPC community and to discuss what changes and new facilities are available on Comet.

We expect to start allowing registration of new HPC projects towards the end of November - as this has been paused over the summer while we completed the integration of Comet. This will be the first opportunity for the research community to move data and compute workflows over to Comet. The initial number of projects/users in this phase will be limited so that we are not overwhelmed with queries related to possibly teething troubles within the first few weeks.

All being well, at that point we will then open up the 'My HPC Projects' section of the new website, and this will allow you to manage your existing Rocket projects and request new ones. 
This will be the trigger to switch to Comet as the supported HPC facility, and Rocket will then move on to a wind-down process. 

Ideally, we would like to complete the migration of all Rocket projects by end of calendar year due to the ongoing fragility of Rocket; though there are factors in this process which are outside of our control (i.e. hardware failures).

Expect a further update as we move into November and have more concrete initial date for registration of new projects and then the secondary date for the start of your Rocket project migrations.

===== (27th) October 2025 - Preparing for First Comet Workshop! ======

The RSE team are currently preparing for the first full-day HPC training workshop to be delivered using Comet. The workshop is scheduled for **November 11th** and is for new users of HPC in Newcastle.

This will be the first 'live' use of Comet, so staff are working hard to ensure that the teaching material works correctly and reliably. As previously mentioned to early-access users, //there may be some configuration changes / interruptions to the early-access// service needed to move towards this more production-like setting. 

If the workshop event works as expected, this will give us a good indication that the service is ready to move towards beta-testing and the subsequent registration of //new// HPC Projects in November.

===== (15th) October 2025 - Comet High-Availability implementation =====

Our HPC vendor is now in the process of implementing [[https://en.wikipedia.org/wiki/High_availability|High-Availability]] across the two login servers. This means that instead of having to manually type the address of one server or the other, you will soon be able to just use the system alias **comet.hpc.ncl.ac.uk** and you will get an SSH connection, regardless of which login server is available.

Our [[:started:connecting_onsite|Connection Details wiki page]] has been updated with the new hostnames to use - <del>these should be available to use within the next few working days</del> these are available to use //now//.

===== (6th) October 2025 - Comet early access =====

Email invites to Comet alpha testers and early access users will be going out shortly. The email will describe how to apply and what features will be available; additional, early-access information is available on our [[:advanced:comet_alpha_testers|Comet early-access users wiki page]].

Early-access will be for approximately 4 weeks; running until end of October. If you are part of the early access users then you can access and use all of the features of Comet free of charge.

Once early-access is finished we will move to beta testing and invite new HPC projects and groups to sign up and use the facility in a production manner. Based on current timelines this is likely to be early November.

//[As of Monday 6th October the email invitations to alpha-testers & early-access users have now been sent.]//
===== October 2025 - Comet HPC project update =====

Our proposed plan for October is to open the Comet facility up for //alpha// testing for a period of 4-6 weeks.

This will allow all of our test users as well as an extended list of experienced HPC users to use the system in a free-form manner, try out code and push the limits of what works. Here is a summary of what we expect alpha testing to entail:

   * There will be **no** registration of projects, teams or groups (alpha testers will //not// be able to register more people to join) - the list of alpha testers we contact will be the upper bounds of who can access Comet in this period
   * You will have access to a personal ''$HOME'' directory on NFS, and a //shared test group// directory on Lustre / ''/nobackup''
   * You can use any pre-installed software
   * Access to command line via the login nodes, all compute nodes, run [[:started:slurm_basics|Slurm jobs]], build/run [[:advanced:containers|containers]], run interactive/graphical applications via [[:advanced:interactive|Open OnDemand]]
   * No restrictions to accessing [[:started:paying|paid / free resource partitions]]
   * Not all software that was requested during the initial requirements gathering for Comet will be installed - this will still take place over the coming months
   * No monthly [[:policies:billing|billing or resource/financial reports]]
   * No enforcement of HPC driving tests
   * The level of support from the RSE team will be **limited** as we concentrate on further integration work between Comet and University systems (e.g. RDW filestore) and work on our upcoming Comet induction training workshops in anticipation of beta testing in early November.
   * Comet should **not** be used for essential production work, as it will still be in some state of flux (e.g. we may make changes to Slurm partitions - e.g. enforcing account codes, during this period)

At the end of the alpha testing window we will remove **all** content created under the shared test directories and **all** provisioned user accounts will be removed, in preparation for running the service in production mode - this will revert to requiring passing the [[:started:register|HPC driving test]] and [[:policies:access|registration of new HPC projects]] as we move forward.

Those identified as alpha testers will be contacted shortly and then be given the information needed in order to access the service. If you are not contacted then you will have a further opportunity to get early access during the beta testing phase, possibly as a member of the selected group of HPC projects who will be invited to use the system in a production manner. Finally, after beta testing all users will be invited to move from Rocket to Comet and transfer their data and workflows as we fully transition to a production service.

===== (Late) September 2025 - Comet HPC project update =====

[[:advanced:podman|Podman]] container support is now fully tested and integrated - you can create and manage containers on Comet login nodes and run those container images across any of our compute nodes; including for GPU jobs; many of our users have been requesting this, so it's a big step forwards.

[[:advanced:slurm_checkpoints|Job checkpointing]] is also implemented, so you can stop and resume your Slurm jobs at any point if you follow the instructions.

Only a few remaining test criteria remain at this point - we have over 400 individual test criteria records logged and now //less than 20 remaining//. We expect to finish test sign-off very shortly and move into **Alpha** testing for most of October, where our testers will stress the system, try out all of the new tools and have an opportunity to do things ahead of our formal opening to our **Beta** tester projects.

**Beta** testing (likely starting late October / early November) will then be invited to use the system in a more production-like manner ahead of the full release of the facility to the University and migration requests to be sent to Rocket users.

===== September 2025 - Comet HPC project update =====

[[:advanced:software:virtualgl|Hardware OpenGL / Nvidia 3D]] visualisation is now implemented on the [[:advanced:interactive|Open OnDemand]] VNC Desktop Session (GPU) session types. This means applications can take advantage of the Nvidia GPU cards for hardware accelerated rendering and display output.

[[:advanced:software:apptainer|Apptainer/Singularity]] container support is now 100% implemented and working across all compute nodes. We remain working on [[:advanced:software:podman|Podman]] container support which is also nearing completion.
[[:advanced:software:jupyter|Jupyter Lab]] and [[:advanced:software:rstudio|RStudio]] are also now added to our interactive session options via Open OnDemand.

We are in the process of testing the [[:advanced:slurm_checkpoints|Slurm job checkpointing]] implementation.

===== August 2025 - Comet HPC project update =====

The RSE team and academic colleagues have moved to testing higher level application functionality; container technology, interactive applications ([[:advanced:software:matlab|Matlab]]), [[:advanced:software:mesa|software 3D visualisation]] and commercial software packages are installed.

===== July 2025 - update from the Comet HPC Project Team =====

Dear HPC and Research community colleagues, 

The installation and commissioning of the replacement High Performance Computer facility, Comet, is continuing to progress. 

We are now at a position where we are testing functionality and performance of the new hardware, and this continues to make progress against the extensive list of criteria set in our agreement with the supplier. Whilst we are not quite at the point where we can give you a definite date for the availability of the new system, we are confident it will be ready in autumn 2025. with the information below gives you the broad outline of the schedule over the coming months. 

In order to progress with the remainder of the project schedule, the academic steering group for AHPC within Newcastle have decided that **we must now suspend the creation, on Rocket, of new HPC projects and project registration requests**. This is so we can reduce the efforts required to continue to support Rocket (an aging system with the capacity to pull in a huge level of support time) and focus our efforts on readying Comet for use. New project creation will resume later this summer (on Comet, instead of Rocket) towards the end of the Beta Testing phase as we move into the migration of Rocket projects to Comet. 

===== June 2025 - Status of the Comet HPC project =====

Comet hardware was installed, basic operating system functionality and network connectivity implemented.

----
===== Our project stages =====

  * User Acceptance Testing 
  * Alpha Testing 
  * Beta Testing 
  * Rocket Project Migration  

=== User Acceptance Testing ===

This is our current stage of work. We are methodically testing both hardware and basic system functions, as well as higher level capabilities such as the scheduler, GPU and CPU compute functions and storage performance. The testing involves co-operation between the system vendor, the Research Software Engineering team, and academic colleagues from across the University. 

=== Alpha Testing ===

Once we complete user acceptance testing, we will invite a number of expert HPC users to explore the performance and functionality of the new system. This will be an opportunity to test in a sandpit environment the new software, tools and performance available. During this period system configuration is likely to be in flux; changing job queues, software and configuration as we shape the system to more accurately cater for the needs of our users and the rest of the Newcastle University infrastructure. Any work during this period will be considered 'at risk' and the system will not be suitable for 'production' work due to possibility of missing software. 

//Alpha testers will be expected to work largely without support during this time. We expect alpha testing to last up to a month.// 
 
=== Beta Testing ===

After alpha testing has been completed, and any configuration changes that it highlights have been applied, we move into the beta testing phase. During this time we will invite a wider range of existing Rocket projects or new projects who have been waiting to start over the summer to apply for accounts on Comet. 

Beta testers will experience the Comet service in a 'production' setting; use of the new HPC portal website (https://hpc.researchcomputing.ncl.ac.uk) to manage their projects and team members, usage reports and metrics, the introduction of job queues which will match production (including 'free' and 'paid' jobs and billing). Most software and functionality (including the new interactive jobs support and container technology) will be in place at this point. 

The amount of production, or live projects, created on Comet will be strictly limited at this point. 

Beta testers will be able to call upon RSE support resource at regular drop-in surgeries, though will be expected to be able to access and use HPC systems independently; this phase will not be suitable for novice or new users. We expect beta testing to also last approximately one month.  

=== Rocket Project Migration ===

After the beta testing phase has completed, and we are confident that Comet is working in a 'production' manner, we will start the process of migration of projects and users from Rocket. 

The lifespan of Rocket will be very limited at this point, and the project timeline states that **Rocket will be shut down THREE MONTHS after the completion of user acceptance testing**, listed previously. Once we move to the migration phase you will be given a slot during which you will be required to move over any code from Rocket to Comet, and an expectation that any further jobs will run on Comet.  Your account on Rocket will then be shut down. 

Again, drop-in surgeries will be available to provide advice to projects and users who are asked to migrate, but **we cannot move files or data on your behalf**. 

**//If you take no action during this phase, then your code, data and jobs on Rocket will be lost at the point it is shut down. There will be no exceptions for extending Rocket. It is already beyond end of life. //** 

----
====== Further Information ======

Please note that **all users will be required to undertake (and pass) an online HPC knowledge test before an account will be provisioned for you on Comet**. Further information will be published on this test in due course, and you are advised to take the test as soon as it becomes available to coincide with the start of the Beta Testing stage.   To prepare your project for migration, you may wish to move historical data to [[https://services.ncl.ac.uk/itservice/core-services/filestore/servicedefinition/|RDW]] and scripts to [[https://github.com/orgs/newcastleuniversity/sso|GitHub]].

A range of induction sessions for existing HPC users, and our Introduction to HPC training workshops for new users will be available from the start of the migration phase. 

More details on Comet, our revamped support documentation and our workshop offerings are available on the new HPC support portal: 

  - https://hpc.researchcomputing.ncl.ac.uk  

Sincerely, 

HPC Working Group and Support Team

hpc.researchcomputing@newcastle.ac.uk\\
Research Software Engineering\\
https://rse.ncldata.dev/contact

----

[[:wiki:index|Back to HPC Documentation Home]]