Where you have multiple sets of data to process, and they can all be processed using the same commands, then it is beneficial to ask Slurm to automate this for us.
Instead of submitting 4 jobs, each taking 1 hour each to process a data file, we can submit one job to run all 4 at the same time.
–partition=default_free
–ntasks-per-node=4
–cpus-per-task=1
ntasks_per_node * cpus_per_task
–mem=1G
–time=01:00:00
(ntasks_per_node * cpus_per_task) * time_in_hours
Of the single-node Slurm job types, the Task Array job type is the more complex, and the job script needs a little more explanation than the previous types:
#!/bin/bash #SBATCH --account=myhpcproject #SBATCH --partition=default_free #SBATCH --nodes=1 #SBATCH --ntasks-per-node=4 #SBATCH --cpus-per-task=1 #SBATCH --mem=1G #SBATCH --time=01:00:00 # Log when we started echo "Job started at: `date`" # Show which node(s) we are running on HOSTNAMES=`scontrol show hostnames $SLURM_JOB_NODELIST` echo "Job is running on: $HOSTNAMES" # Add any 'module load' commands here # Add your custom commands you want to run here my_command input_file.data.data.${SLURM_ARRAY_TASK_ID} > output.log.${SLURM_ARRAY_TASK_ID} # Log when we finished echo "Job finished at: `date`"
Notice that we have used a new Slurm variable; SLURM_ARRAY_TASK_ID. This is a unique variable which recieves the TASK_ID of each task launched by Slurm. Remember that we have asked for 4 tasks (–ntasks-per-node=4), so each of those tasks which Slurm launches for us will turn the variable $SLURM_ARRAY_TASK_ID into its own Task ID. e.g.
SLURM_ARRAY_TASK_ID
Task #1 will see SLURM_ARRAY_TASK_ID = 1 Task #2 will see SLURM_ARRAY_TASK_ID = 2 Task #3 will see SLURM_ARRAY_TASK_ID = 3 Task #4 will see SLURM_ARRAY_TASK_ID = 4
This allows each task to run the same commands on a different set of data, as along as we name the input data files with the numbers of the Task ID's we expect (in this case; 1 through 4), we can set up a directory of data files to be processed all at the same time.
In the case above, assuming we had a command named my_command which processes some data, we will create a directory of input data files named:
my_command
input_file.data.1
input_file.data.2
input_file.data.3
input_file.data.4
Slurm will launch the job script and then each one of those files would be processed, in parallel, by my_command. In effect, Slurm does this for us:
my_command input_file.data.data.1 > output.log.1 my_command input_file.data.data.2 > output.log.2 my_command input_file.data.data.3 > output.log.3 my_command input_file.data.data.4 > output.log.4
But, all at the same time.
Back to Getting Started
Table of Contents
Main Content Sections
Documentation Tools