====== HPC Service Status ====== ===== (6th) October 2025 - Comet early access ===== Email invites to Comet alpha testers and early access users will be going out shortly. The email will describe how to apply and what features will be available; additional, early-access information is available on our [[:advanced:comet_alpha_testers|Comet early-access users wiki page]]. Early-access will be for approximately 4 weeks; running until end of October. If you are part of the early access users then you can access and use all of the features of Comet free of charge. Once early-access is finished we will move to beta testing and invite new HPC projects and groups to sign up and use the facility in a production manner. Based on current timelines this is likely to be early November. //[As of Monday 6th October the email invitations to alpha-testers & early-access users have now been sent.]// ===== October 2025 - Comet HPC project update ===== Our proposed plan for October is to open the Comet facility up for //alpha// testing for a period of 4-6 weeks. This will allow all of our test users as well as an extended list of experienced HPC users to use the system in a free-form manner, try out code and push the limits of what works. Here is a summary of what we expect alpha testing to entail: * There will be **no** registration of projects, teams or groups (alpha testers will //not// be able to register more people to join) - the list of alpha testers we contact will be the upper bounds of who can access Comet in this period * You will have access to a personal ''$HOME'' directory on NFS, and a //shared test group// directory on Lustre / ''/nobackup'' * You can use any pre-installed software * Access to command line via the login nodes, all compute nodes, run [[:started:slurm_basics|Slurm jobs]], build/run [[:advanced:containers|containers]], run interactive/graphical applications via [[:advanced:interactive|Open OnDemand]] * No restrictions to accessing [[:started:paying|paid / free resource partitions]] * Not all software that was requested during the initial requirements gathering for Comet will be installed - this will still take place over the coming months * No monthly [[:policies:billing|billing or resource/financial reports]] * No enforcement of HPC driving tests * The level of support from the RSE team will be **limited** as we concentrate on further integration work between Comet and University systems (e.g. RDW filestore) and work on our upcoming Comet induction training workshops in anticipation of beta testing in early November. * Comet should **not** be used for essential production work, as it will still be in some state of flux (e.g. we may make changes to Slurm partitions - e.g. enforcing account codes, during this period) At the end of the alpha testing window we will remove **all** content created under the shared test directories and **all** provisioned user accounts will be removed, in preparation for running the service in production mode - this will revert to requiring passing the [[:started:register|HPC driving test]] and [[:policies:access|registration of new HPC projects]] as we move forward. Those identified as alpha testers will be contacted shortly and then be given the information needed in order to access the service. If you are not contacted then you will have a further opportunity to get early access during the beta testing phase, possibly as a member of the selected group of HPC projects who will be invited to use the system in a production manner. Finally, after beta testing all users will be invited to move from Rocket to Comet and transfer their data and workflows as we fully transition to a production service. ===== (Late) September 2025 - Comet HPC project update ===== [[:advanced:podman|Podman]] container support is now fully tested and integrated - you can create and manage containers on Comet login nodes and run those container images across any of our compute nodes; including for GPU jobs; many of our users have been requesting this, so it's a big step forwards. [[:advanced:slurm_checkpoints|Job checkpointing]] is also implemented, so you can stop and resume your Slurm jobs at any point if you follow the instructions. Only a few remaining test criteria remain at this point - we have over 400 individual test criteria records logged and now //less than 20 remaining//. We expect to finish test sign-off very shortly and move into **Alpha** testing for most of October, where our testers will stress the system, try out all of the new tools and have an opportunity to do things ahead of our formal opening to our **Beta** tester projects. **Beta** testing (likely starting late October / early November) will then be invited to use the system in a more production-like manner ahead of the full release of the facility to the University and migration requests to be sent to Rocket users. ===== September 2025 - Comet HPC project update ===== [[:advanced:software:virtualgl|Hardware OpenGL / Nvidia 3D]] visualisation is now implemented on the [[:advanced:interactive|Open OnDemand]] VNC Desktop Session (GPU) session types. This means applications can take advantage of the Nvidia GPU cards for hardware accelerated rendering and display output. [[:advanced:software:apptainer|Apptainer/Singularity]] container support is now 100% implemented and working across all compute nodes. We remain working on [[:advanced:software:podman|Podman]] container support which is also nearing completion. [[:advanced:software:jupyter|Jupyter Lab]] and [[:advanced:software:rstudio|RStudio]] are also now added to our interactive session options via Open OnDemand. We are in the process of testing the [[:advanced:slurm_checkpoints|Slurm job checkpointing]] implementation. ===== August 2025 - Comet HPC project update ===== The RSE team and academic colleagues have moved to testing higher level application functionality; container technology, interactive applications ([[:advanced:software:matlab|Matlab]]), [[:advanced:software:mesa|software 3D visualisation]] and commercial software packages are installed. ===== July 2025 - update from the Comet HPC Project Team ===== Dear HPC and Research community colleagues, The installation and commissioning of the replacement High Performance Computer facility, Comet, is continuing to progress. We are now at a position where we are testing functionality and performance of the new hardware, and this continues to make progress against the extensive list of criteria set in our agreement with the supplier. Whilst we are not quite at the point where we can give you a definite date for the availability of the new system, we are confident it will be ready in autumn 2025. with the information below gives you the broad outline of the schedule over the coming months. In order to progress with the remainder of the project schedule, the academic steering group for AHPC within Newcastle have decided that **we must now suspend the creation, on Rocket, of new HPC projects and project registration requests**. This is so we can reduce the efforts required to continue to support Rocket (an aging system with the capacity to pull in a huge level of support time) and focus our efforts on readying Comet for use. New project creation will resume later this summer (on Comet, instead of Rocket) towards the end of the Beta Testing phase as we move into the migration of Rocket projects to Comet. ===== June 2025 - Status of the Comet HPC project ===== Comet hardware was installed, basic operating system functionality and network connectivity implemented. ---- ===== Our project stages ===== * User Acceptance Testing * Alpha Testing * Beta Testing * Rocket Project Migration === User Acceptance Testing === This is our current stage of work. We are methodically testing both hardware and basic system functions, as well as higher level capabilities such as the scheduler, GPU and CPU compute functions and storage performance. The testing involves co-operation between the system vendor, the Research Software Engineering team, and academic colleagues from across the University. === Alpha Testing === Once we complete user acceptance testing, we will invite a number of expert HPC users to explore the performance and functionality of the new system. This will be an opportunity to test in a sandpit environment the new software, tools and performance available. During this period system configuration is likely to be in flux; changing job queues, software and configuration as we shape the system to more accurately cater for the needs of our users and the rest of the Newcastle University infrastructure. Any work during this period will be considered 'at risk' and the system will not be suitable for 'production' work due to possibility of missing software. //Alpha testers will be expected to work largely without support during this time. We expect alpha testing to last up to a month.// === Beta Testing === After alpha testing has been completed, and any configuration changes that it highlights have been applied, we move into the beta testing phase. During this time we will invite a wider range of existing Rocket projects or new projects who have been waiting to start over the summer to apply for accounts on Comet. Beta testers will experience the Comet service in a 'production' setting; use of the new HPC portal website (https://hpc.researchcomputing.ncl.ac.uk) to manage their projects and team members, usage reports and metrics, the introduction of job queues which will match production (including 'free' and 'paid' jobs and billing). Most software and functionality (including the new interactive jobs support and container technology) will be in place at this point. The amount of production, or live projects, created on Comet will be strictly limited at this point. Beta testers will be able to call upon RSE support resource at regular drop-in surgeries, though will be expected to be able to access and use HPC systems independently; this phase will not be suitable for novice or new users. We expect beta testing to also last approximately one month. === Rocket Project Migration === After the beta testing phase has completed, and we are confident that Comet is working in a 'production' manner, we will start the process of migration of projects and users from Rocket. The lifespan of Rocket will be very limited at this point, and the project timeline states that **Rocket will be shut down THREE MONTHS after the completion of user acceptance testing**, listed previously. Once we move to the migration phase you will be given a slot during which you will be required to move over any code from Rocket to Comet, and an expectation that any further jobs will run on Comet. Your account on Rocket will then be shut down. Again, drop-in surgeries will be available to provide advice to projects and users who are asked to migrate, but **we cannot move files or data on your behalf**. **//If you take no action during this phase, then your code, data and jobs on Rocket will be lost at the point it is shut down. There will be no exceptions for extending Rocket. It is already beyond end of life. //** ---- ====== Further Information ====== Please note that **all users will be required to undertake (and pass) an online HPC knowledge test before an account will be provisioned for you on Comet**. Further information will be published on this test in due course, and you are advised to take the test as soon as it becomes available to coincide with the start of the Beta Testing stage.  To prepare your project for migration, you may wish to move historical data to [[https://services.ncl.ac.uk/itservice/core-services/filestore/servicedefinition/|RDW]] and scripts to [[https://github.com/orgs/newcastleuniversity/sso|GitHub]]. A range of induction sessions for existing HPC users, and our Introduction to HPC training workshops for new users will be available from the start of the migration phase. More details on Comet, our revamped support documentation and our workshop offerings are available on the new HPC support portal:   - https://hpc.researchcomputing.ncl.ac.uk  Sincerely, HPC Working Group and Support Team hpc.researchcomputing@newcastle.ac.uk\\ Research Software Engineering\\ https://rse.ncldata.dev/contact ---- [[:wiki:index|Back to HPC Documentation Home]]