• Home
  • Accessing Our Facilities
    • Apply for Access
    • HPC Resource List
    • Our Staff
    • Our Research Projects
    • Our Research Software

    • Contributions & Costings
    • HPC Driving Test
  • Documentation
    • Documentation Home
    • Getting Started
    • Advanced Topics
    • Training & Workshops
    • FAQ
    • Policies & Procedures
    • Using the Wiki

    • Data & Report Terminology
    • About this website

    • Reports
  • My Account
    • My HPC Projects
HPC Support
Trace: • resource_overview • job_parallel

Task Array Slurm Job

This is a very simple example of how to run an application/script/code/analysis in parallel on multiple input files.

For a more complete worked example, consider reading through our Advanced Slurm Job Optimisation guide.

Where you have multiple sets of data to process, and they can all be processed using the same commands, then it is beneficial to ask Slurm to automate this for us.

Instead of submitting 4 jobs, each taking 1 hour each to process a data file, we can submit one job to run all 4 at the same time.

  • Uses the HPC Project group myhpcproject; change to use your real HPC Project name
  • Submitted to the free default_free partition (–partition=default_free)
  • Requests 4 parallel tasks (–ntasks-per-node=4)
  • Requests 1 CPU per task (–cpus-per-task=1), for a total allocation of 4 CPU cores (ntasks_per_node * cpus_per_task)
  • Requests 1GB of RAM (–mem=1G)
  • Requests up to 1 hour of runtime (–time=01:00:00), for a maximum possible total of 4 Compute Hours ((ntasks_per_node * cpus_per_task) * time_in_hours)
  • Prints the time the job started to the log file
  • Prints the name of the compute node(s) it will run on
  • Prints the time the job finished to the log file

Of the single-node Slurm job types, the Task Array job type is the more complex, and the job script needs a little more explanation than the previous types, but has the advantage that you do not need to change anything in your existing application/code/script - if it can already process a named data/input file, then it will work in parallel via the Task Array without any changes.

#!/bin/bash

#SBATCH --account=myhpcproject
#SBATCH --partition=default_free
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --time=01:00:00

# Log when we started
echo "Job started at: `date`"

# Add your custom commands you want to run here
my_command input_file.data.data.${SLURM_ARRAY_TASK_ID} > output.log.${SLURM_ARRAY_TASK_ID}

# Log when we finished
echo "Job finished at: `date`"

Notice that we have used a new Slurm variable; SLURM_ARRAY_TASK_ID. This is a unique variable which recieves the TASK_ID of each task launched by Slurm. Remember that we have asked for 4 tasks (–ntasks-per-node=4), so each of those tasks which Slurm launches for us will turn the variable $SLURM_ARRAY_TASK_ID into its own Task ID. e.g.

Task #1 will see SLURM_ARRAY_TASK_ID = 1
Task #2 will see SLURM_ARRAY_TASK_ID = 2
Task #3 will see SLURM_ARRAY_TASK_ID = 3
Task #4 will see SLURM_ARRAY_TASK_ID = 4

This allows each task to run the same commands on a different set of data, as along as we name the input data files with the numbers of the Task ID's we expect (in this case; 1 through 4), we can set up a directory of data files to be processed all at the same time.

In the case above, assuming we had a command named my_command which processes some data, we will create a directory of input data files named:

  • input_file.data.1
  • input_file.data.2
  • input_file.data.3
  • input_file.data.4

Slurm will launch the job script and then each one of those files would be processed, in parallel, by my_command. In effect, Slurm does this for us:

my_command input_file.data.1 > output.log.1
my_command input_file.data.2 > output.log.2
my_command input_file.data.3 > output.log.3
my_command input_file.data.4 > output.log.4

But, all at the same time.

This is a very simple way of processing huge numbers of data files with exactly the same code/application/script at the same time, but it is limited to the maximum number of running jobs that your project is allowed (see the MaxJobs value in the Comet Resource Limits - what they mean guide) and requires some planning before hand to structure your data.

A more complete, worked example can be found in the Advanced Slurm Job Optimisation article.


Back to Getting Started

Previous Next

HPC Support

Table of Contents

HPC Service

  • News & Changes

Main Content Sections

  • Documentation Home
  • Getting Started
  • Advanced Topics
  • Training & Workshops
  • FAQ
  • Policies & Procedures
  • Using the Wiki
  • Contact us & Get Help

Documentation Tools

  • Wiki Login
  • RSE-HPC Team Area
Developed and operated by
Research Software Engineering
Copyright © Newcastle University
Contact us @rseteam