This page is under development. We invite interested parties to make contributions to this wiki, especially if anything is mistaken or difficult to understand. A Nextflow working group is available on Teams. Please [[https://teams.microsoft.com/l/channel/19%3A2f1e4b0f03b14c2289145f57c2886324%40thread.tacv2/HPC%20NextFlow?groupId=7059214c-2200-4ad6-a739-9d350c74c7a9&tenantId=9c5012c9-b616-44c2-a917-66814fbe3e87|join with this link]] ===== Nextflow ===== Nextflow is a system which can manage complex workflows including many small processes as well as intensive processes requiring high performance computing facilities. This discussion is not designed as a tutorial for developing a nextflow workflow, but tries to outline best practices for running nextflow in an HPC environment on comet. ==== Installation ==== Nextflow version Nextflow/24.10.2 is available as a module Depending on demand we may put nextflow future versions of nextflow in additional modules. If you require Nextflow 24.10.2 this can be loaded as a module in the standard way: module load Nextflow/24.10.2 If a different version is required a user can install [[https://docs.seqera.io/nextflow/install|nextflow]] into a folder in your home directory. Ensure that the binary is executable using curl -s https://get.nextflow.io | bash chmod +x nextflow Copy the binary to: ~/.local/bin and use: echo $PATH to check that the .local directory is in your path. To confirm that nextflow has been correctly installed use: nextflow info which should return some basic information about nextflow. Nextflow requires Java, and Java versions below 17 are deprecated. Java 17 is the default on comet, but keep an eye on this in future releases. Java 21 can be loaded as a module if required. The version of Java you have loaded can be seen using: java -version ===== Running Nextflow ===== Nextflow can be used to run its workflows `locally' and/or via slurm. Nextflow offers a range of options for running different parts of the workflow very flexibly. However, on comet, nextflow should make use of only 'local' and 'slurm' executors((an executor is the part of nextflow which submits and manages jobs)). 'Local' jobs run on the node they are invoked from, and 'slurm' jobs are submitted as a proper HPC job. 'Local' should be use for short, small jobs, 'slurm' should be used for long resource intensive jobs. For this discussion we refer to the **login node** as the node a user is assigned at login and the load on this node should be minimised as much as possible. A **worker node** is a node on comet to which a job is assigned by slurm. The nextflow **local node** is the node on which **head process** (main nextflow job) is running, i.e. the worker node that slurm assigned to the main nextflow job. ==== Nextflow on Comet ==== **For production runs, the entire nextflow workflow must be submitted to slurm**. In this case, nextflow can still make use of both 'local' and 'slurm' executors internally. The only difference is that 'local' jobs will run on the worker node to which the main nextflow process as been assigned, rather than the login node. This is to avoid running the head process on the login nodes. This is especially important where the nextflow workflow includes processes running on the local executor. The login node should not be used for long running processes. The slurm sbatch file used for nextflow should ensure that sufficient resources are available for the main process and any `local' parts of the workflow. Nextflow slurm submission can be carried out using a standard sbatch file loading Java >=17 and apptainer modules. For example, #!/bin/bash #SBATCH --account= #SBATCH --mem=257GB #SBATCH --cpus-per-task=1 #SBATCH --time=00:09:00 #SBATCH --partition= module load Java/21.0.5 module load apptainer nextflow run job.nf -profile short_free -c me.config In the above example, replace and with your account name and the name of the appropriate partition for the head process of nextflow. When run on the login node or another computer nextflow supplies a live record of the workflow status, allowing users to monitor progress and if any part of the workflow as failed. This is not available directly in the terminal when the nextflow job is submitted to slurm. To recover the live updates e.g. Nextflow 26.04.1 is available - Please consider updating your version to it N E X T F L O W ~ version 99.99.99 Launching `job.nf` [aaa_bbb] revision: abcde1234 executor > slurm (8) [11/12ab21] Hello (3) | 3 of 3 ✔ [71/34c56d] Iamrunning (3) | 3 of 3 ✔ [54/7e8d9e] GoodBye | 1 of 1 ✔ we suggest monitoring the sbatch ''.out'' with ''tail'': tail -f -n 100 slurm-.out will allow a user to see the normal live update Nextflow screen to monitor progress. Increase -n if more lines are needed. This file is not obvious using other tools or text editors, but works well with ''tail -f'' == Using containers in your Workflow == Instead of docker or singularity [[Apptainer]] should be used on comet. Apptainer can be loaded using: module load apptainer Apptainer is a drop-in replacement for Singularity, so any instructions found online for Singularity will work with Apptainer. Simply replace `singularity' with `apptainer' in any instructions. Containers should be built on the login node or your local machine. Worker nodes are not set up for container build so attempting to build containers during a workflow is unsupported. Containers, once built, should be transferred to the nextflow apptainer cacheDir defined in your config file e.g. apptainer { enabled = true autoMounts = true cacheDir = '/nobackup/project//containers/' } ===== Nextflow Workflows ===== Nextflow workflows are controlled via ''*.nf'' and ''*.config'' files. The ''*.nf'' files are used to run individual steps in the workflow, while ''*.config'' files set parameters controlling the workflow itself, although these parameters can operate on individual workflow steps as well as globally on the whole workflow. On comet, profiles can be defined for each partition in nextflow.config file. This ensures that there is always a default parameter set for each partition. The command line flag ''-profile '' can then be used to select a given set of defaults e.g. profiles { short_free { params { partition = 'short_free' } process { resourceLimits = [ memory: 1000.GB, cpus: 128, time: 10.m ] } executor { name = 'slurm' queueSize=50 } } default_free { params { partition = 'default_free' } process { resourceLimits = [memory: 1000.GB, cpus: 128, time: 24.h] } executor { name = 'slurm' queueSize=50 } } long_free { params { partition = 'long_free' } process { resourceLimits = [memory: 1000.GB, cpus: 128, time: 96.h] } executor { name = 'slurm' queueSize=50 } } local { executor { name = 'local' } } } These parameters can be overridden in a workflow specific config file. === Slurm submission inside the workflow === Inside the nextflow workflow, large jobs should be submitted to slurm via profiles with executor = 'slurm' in the config file. Sbatch files are generated by nextflow for each part of the workflow using the slurm executor, and submitted to the queue automatically. Therefore, the nextflow config files must contain all the information required to submit a job. This includes the account and partition flags. If all the sbatch parameters required by slurm are not present, the jobs will not run. A nextflow.config file in one of the locations nextflow looks for config files((https://training.nextflow.io/2.2/it/basic_training/config/)) can set sensible default parameters, and users should create a local, workflow specific config file which overwrides relevant parts of nextflow.config. For example, the aforementioned account and partition flags: params { account = 'comet_rsehpc' partition = 'short_free' } would override any account and partition values stored in nextflow.config. The local, workflow specific ''*.config'' can be used in a nextflow run using: nextflow run job.nf -profile short_free -c local.config This would run the job.nf script, using the short_free profile (containing, for example defaults for the short_free queue on comet). However, where parameters are referenced in the local.config file, the values in nextflow.config and the short_free profile will be overridden. A user can see a summary of the parameters used by Nextflow, taking into account profiles, and local configs using: nextflow -c local.config config -profile short_free The priority order for config parameters can be found [[https://training.nextflow.io/2.1/basic_training/config/|here]] and example config files can be found at [[https://nf-co.re/configs/|nf-core]] Because Nexflow can make large file systems with randomly named directory trees, and many files, we suggest running ''nextflow clean''((https://bioinfo-guidelines.readthedocs.io/en/latest/nextflow/running.html#cleanup))((https://nf-co.re/docs/running/advanced-topics/managing_work_directory_growth)) on a regular bases. Below we present a few suggestions on the best practice for config files: * Config files should also contain a queueSize tag set to 50-200 to limit the number of jobs submitted at once. * We recommend using your directory on nobackup to store containers to not take up too much of your home quota. * Avoid keeping 100,000s of files, copy useful output to ''nobackup/home'' and use ''nextflow clean'' regularly * Some hpc workflow tips can be found [[https://seqera.io/blog/5_tips_for_hpc_users/]] and [[https://seqera.io/blog/5-more-tips-for-nextflow-user-on-hpc/]] * include ''module = 'apptainer:Java/21.0.5' ''in process in config file. We can add this to the nextflow.config, but any additional modules needed for the workflow can be added here separated by a '':''. Modules needed for the head node can be written in the sbatch file as normal. Modules can also be loaded from the ''*.nf'' files, inside the '' """ """ '' section , which also allows for ''module purge'' to unload modules in case of version conflict. == Controlling hybrid workflows == Submitting many short jobs to slurm can be inefficient as it takes Nextflow a while to identify when a slurm job completes. It is faster to define a withLabel tag with executor = 'local' which will run jobs on the head node. ''withLabel'' tags can be written into local.config files which allows specific workflow steps to have specific profiles e.g. ncpu, memory, time, executor((this can either be 'local' to run on the head node, or slurm, which submits an sbatch job)), modules etc. Sometimes the Java VM can take considerable resources[[https://nf-co.re/bacass/1.1.1/docs/usage]], [[https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers/]]. You can limit this by setting flags((https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers/)): export NFX_OPTS=" -Xmx=4g -Xms=1000m" ==== Files and Directories for your workflow ==== * slum submission script to submit the main nextflow process to a work node * nextflow.config file should contain reasonable defaults * local config files should contain parameters specific to a given workflow * cacheDir for your apptainer containers