Task Array Slurm Job

Where you have multiple sets of data to process, and they can all be processed using the same commands, then it is beneficial to ask Slurm to automate this for us.

Instead of submitting 4 jobs, each taking 1 hour each to process a data file, we can submit one job to run all 4 at the same time.

Uses the HPC Project group myhpcproject; change to use your real HPC Project name
Submitted to the free default_free partition (–partition=default_free)
Requests 4 parallel tasks (–ntasks-per-node=4)
Requests 1 CPU per task (–cpus-per-task=1), for a total allocation of 4 CPU cores (ntasks_per_node * cpus_per_task)
Requests 1GB of RAM (–mem=1G)
Requests up to 1 hour of runtime (–time=01:00:00), for a maximum possible total of 4 Compute Hours ((ntasks_per_node * cpus_per_task) * time_in_hours)
Prints the time the job started to the log file
Prints the name of the compute node(s) it will run on
Prints the time the job finished to the log file

Of the single-node Slurm job types, the Task Array job type is the more complex, and the job script needs a little more explanation than the previous types:

#!/bin/bash

#SBATCH --account=myhpcproject
#SBATCH --partition=default_free
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=4
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --time=01:00:00

# Log when we started
echo "Job started at: `date`"

# Show which node(s) we are running on
HOSTNAMES=`scontrol show hostnames $SLURM_JOB_NODELIST`
echo "Job is running on: $HOSTNAMES"

# Add any 'module load' commands here

# Add your custom commands you want to run here
my_command input_file.data.data.${SLURM_ARRAY_TASK_ID} > output.log.${SLURM_ARRAY_TASK_ID}

# Log when we finished
echo "Job finished at: `date`"

Notice that we have used a new Slurm variable; SLURM_ARRAY_TASK_ID. This is a unique variable which recieves the TASK_ID of each task launched by Slurm. Remember that we have asked for 4 tasks (–ntasks-per-node=4), so each of those tasks which Slurm launches for us will turn the variable $SLURM_ARRAY_TASK_ID into its own Task ID. e.g.

Task #1 will see SLURM_ARRAY_TASK_ID = 1
Task #2 will see SLURM_ARRAY_TASK_ID = 2
Task #3 will see SLURM_ARRAY_TASK_ID = 3
Task #4 will see SLURM_ARRAY_TASK_ID = 4

This allows each task to run the same commands on a different set of data, as along as we name the input data files with the numbers of the Task ID's we expect (in this case; 1 through 4), we can set up a directory of data files to be processed all at the same time.

In the case above, assuming we had a command named my_command which processes some data, we will create a directory of input data files named:

input_file.data.1
input_file.data.2
input_file.data.3
input_file.data.4

Slurm will launch the job script and then each one of those files would be processed, in parallel, by my_command. In effect, Slurm does this for us:

my_command input_file.data.data.1 > output.log.1
my_command input_file.data.data.2 > output.log.2
my_command input_file.data.data.3 > output.log.3
my_command input_file.data.data.4 > output.log.4

But, all at the same time.

Back to Getting Started