This is an old revision of the document!
This page is under development. We invite interested parties to make contributions to this wiki, especially if anything is mistaken or difficult to understand.
A Nextflow working group is available on Teams. Please join with this link
Nextflow is a system which can manage complex workflows including many small processes as well as intensive processes requiring high performance computing facilities. This discussion is not designed as a tutorial for developing a nextflow workflow, but tries to outline best practices for running nextflow in an HPC environment on comet.
Nextflow version Nextflow/24.10.2 is available as a module Depending on demand we may put nextflow future versions of nextflow in additional modules.
If you require Nextflow 24.10.2 this can be loaded as a module in the standard way:
module load Nextflow/24.10.2
If a different version is required a user can install nextflow into a folder in your home directory. Ensure that the binary is executable using
curl -s https://get.nextflow.io | bash chmod +x nextflow
~/.local/bin
and use:
echo $PATH
nextflow info
Nextflow requires Java, and Java versions below 17 are deprecated. Java 17 is the default on comet, but keep an eye on this in future releases. Java 21 can be loaded as a module if required. The version of Java you have loaded can be seen using:
java -version
Nextflow can be used to run its workflows `locally' and/or via slurm. Nextflow offers a range of options for running different parts of the workflow very flexibly. However, on comet, nextflow should make use of only 'local' and 'slurm' executors1). 'Local' jobs run on the node they are invoked from, and 'slurm' jobs are submitted as a proper HPC job. 'Local' should be use for short, small jobs, 'slurm' should be used for long resource intensive jobs.
For this discussion we refer to the login node as the node a user is assigned at login and the load on this node should be minimised as much as possible. A worker node is a node on comet to which a job is assigned by slurm. The nextflow local node is the node on which the main nextflow job is running, i.e. the worker node that slurm assigned to the main nextflow job.
For production runs, the entire nextflow workflow must be submitted to slurm. In this case, nextflow can still make use of both 'local' and 'slurm' executors internally. The only difference is that 'local' jobs will run on the worker node to which the main nextflow process as been assigned, rather than the login node. This is to avoid running the head process on the login nodes. This is especially important where the nextflow workflow includes processes running on the local executor. The login node should not be used for long running processes.
The slurm sbatch file used for nextflow should ensure that sufficient resources are available for the main process and any `local' parts of the workflow. Nextflow slurm submission can be carried out using a standard sbatch file loading Java >=17 and apptainer modules.
For example,
#!/bin/bash #SBATCH --account=<your_account_name> #SBATCH --mem=257GB #SBATCH --cpus-per-task=1 #SBATCH --time=00:09:00 #SBATCH --partition=<your_partition> module load Java/21.0.5 module load apptainer nextflow run job.nf -profile short_free -c me.config
In the above example, replace <your_account_name> and <your_partition> with your account name and the name of the appropriate partition for the head node of nextflow.
When run on the login node or another computer nextflow supplies a live record of the workflow status, allowing users to monitor progress and if any part of the workflow as failed. This is not available directly in the terminal when the nextflow job is submitted to slurm. To recover the live updates e.g.
Nextflow 26.04.1 is available - Please consider updating your version to it N E X T F L O W ~ version 99.99.99 Launching `job.nf` [aaa_bbb] revision: abcde1234 executor > slurm (8) [11/12ab21] Hello (3) | 3 of 3 ✔ [71/34c56d] Iamrunning (3) | 3 of 3 ✔ [54/7e8d9e] GoodBye | 1 of 1 ✔
.out
tail
tail -f -n 100 slurm-<process_id>.out
tail -f
Instead of docker or singularity Apptainer should be used on comet. Apptainer can be loaded using:
module load apptainer
Apptainer is a drop-in replacement for Singularity, so any instructions found online for Singularity will work with Apptainer. Simply replace `singularity' with `apptainer' in any instructions.
Containers should be built on the login node or your local machine. Worker nodes are not set up for container build so attempting container build during a workflow is unsupported.
Containers, once built, should be transferred to the nextflow apptainer cacheDir defined in your config file e.g.
apptainer { enabled = true autoMounts = true cacheDir = '/nobackup/project/<project_name>/containers/' }
Nextflow workflows are controlled via *.nf and *.config files. The *.nf files are used to run individual steps in the workflow, while *.config files set parameters controlling the workflow itself, although these parameters can operate on individual workflow steps as well as globally on the whole workflow.
*.nf
*.config
On comet, profiles can be defined for each partition in nextflow.config file. This ensures that there is always a default parameter set for each partition. The command line flag -profile can then be used to select a given set of defaults e.g.
-profile
profiles { short_free { params { partition = 'short_free' } process { resourceLimits = [ memory: 1000.GB, cpus: 128, time: 10.m ] } executor { name = 'slurm' queueSize=50 } } default_free { params { partition = 'default_free' } process { resourceLimits = [memory: 1000.GB, cpus: 128, time: 24.h] } executor { name = 'slurm' queueSize=50 } } long_free { params { partition = 'long_free' } process { resourceLimits = [memory: 1000.GB, cpus: 128, time: 96.h] } executor { name = 'slurm' queueSize=50 } } local { executor { name = 'local' } } }
Inside the nextflow workflow, large jobs should be submitted to slurm via profiles with
executor = 'slurm'
Sbatch files are generated by nextflow for each part of the workflow using the slurm executor, and submitted to the queue automatically. Therefore, the nextflow config files must contain all the information required to submit a job. This includes the account and partition flags. If all the sbatch parameters required by slurm are not present, the jobs will not run. A nextflow.config file in one of the locations nextflow looks for config files2) can set sensible default parameters, and users should create a local, workflow specific config file which overwrides relevant parts of nextflow.config. For example, the aforementioned account and partition flags:
params { account = 'comet_rsehpc' partition = 'short_free' }
nextflow run job.nf -profile short_free -c local.config
A user can see a summary of the parameters used by Nextflow, taking into account profiles, and local configs using:
nextflow -c local.config config -profile short_free
The priority order for config parameters can be found here and example config files can be found at nf-core
Because Nexflow can make large file systems with randomly named directory trees, and many files, we suggest running nextflow clean3)4) on a regular bases.
nextflow clean
Below we present a few suggestions on the best practice for config files:
nobackup/home
module = 'apptainer:Java/21.0.5'
:
“”“ ”“”
module purge
Submitting many short jobs to slurm can be inefficient as it takes Nextflow a while to identify when a slurm job completes. It is faster to define a withLabel tag with executor = 'local' which will run jobs on the head node.
withLabel tags can be written into local.config files which allows specific workflow steps to have specific profiles e.g. ncpu, memory, time, executor5), modules etc.
withLabel
Sometimes the Java VM can take considerable resourceshttps://nf-co.re/bacass/1.1.1/docs/usage, https://seqera.io/blog/best-practices-deploying-pipelines-with-hpc-workload-managers/. You can limit this by setting flags6):
export NFX_OPTS=" -Xmx=4g -Xms=1000m"
Table of Contents
HPC Service
Main Content Sections
Documentation Tools