This is an old revision of the document!

Building a Parallel Task Array Solution

This is an example scenario, but the principles of how you would approach it (e.g. first building a pipeline to process a single input file, understanding your input data structure, and finally converting your pipeline to a task array) are applicable to all problems which are based on the analysis of many independent data files.

Steps to a solution

It's not often feasible to jump straight to a parallel task array solution, and we encourage HPC users to approach the problem in several stages:

Identify the problem
Build a single job solution
Test the single job process
Refactoring your input data
Building a multi job solution
Testing a multi job solution
Scaling up & resource limits

Identify the problem

In this example scenario we are faced with the following problem of analysing a large amount of image data:

We need to analyse files generated by an instrument.
We need to transform each image to remove uncessary data or to highlight certain features.
We have been provided with tens of thousands of images, all of which need to be processed before we can generate some metrics from the image.

In our example we will use the EuroSAT RGB dataset as our dataset. This is large enough to demonstrate the file quantity issue (over 27000 images), but each image is small enough to not require substantial processing time. In reality you will probably have datasets that are both large in number, as well as in complexity/size.

The processing of image data could mean anything, it may involve heuristics to identify features, the application of some image transformation algorithm or similar. But, in our case, let us assume that the workflow for analysing each input image is a relatively simple sequence of:

Converting the input file from RGB to greyscale
Extracting the min, max and median brightness of all pixels in the image
Recording the brightness values of the image filename for later analysis

We know that processing thousands of these files could be performed on a local desktop, but it may take days or weeks depending on the number (and size) of files we need to analyse. We want to offload the processing to the HPC so that we don't need to keep our local machine running 24×7.

Following this example

If you want to work through this example, you can download the EuroSAT RGB image dataset and set it up with the following script:

$ wget -nc -q https://zenodo.org/records/7711810/files/EuroSAT_RGB.zip?download=1 -O EUROSAT.zip
$ unzip -q -n EUROSAT.zip

The resulting directory tree should look like this:

EuroSAT_RGB
EuroSAT_RGB/Forest
EuroSAT_RGB/Highway
EuroSAT_RGB/Residential
…

Each of the directories should contain several thousand images, such as:

A single job solution

The first step is to build an image processing pipeline for a single file which performs each step of the workflow. Remember we need to:

Grab the name of the input image file
Convert the file from a colour RGB image to greyscale
Calculate the minimum, maximum and mean brightness of the image
Save the brightness value and filename for later processing

So each step of the pipeline should produce, in memory, something similar to this:

Step 1. Load source image Residential_78.jpg

Step 2. Convert RGB to greyscale

Step 3. Detect and extract the minimum, maximum and mean brightness levels

Here we have written a very basic Python application which takes the name of an input filename (-file), and the name of an output directory (-out). It then uses the PIL module to do the image transformation we mandated, and then extracts the min, max and mean brightness of the image, finally it saves an output file containing the filename and the pixel values we extracted, to be used in later processing. Download the file below and save it as image_processor.py:

#!/usr/bin/env python3
import argparse
import sys
import os
from PIL import Image, ImageStat

def process_an_image(filename = None, outdir = None):
	""" Process an image """
	print(f"Opening source image: {filename}")
	i = Image.open(filename)

	# Convert to greyscale
	print(f"Converting to greyscale")
	i_greyscale = i.convert('L')	

	# Extract pixel data
	print(f"Extracting pixel data")
	i_stat = ImageStat.Stat(i_greyscale)

	# Save metadata for later processing
	data = f"filename:{filename} min:{i_stat.extrema[0][0]} max:{i_stat.extrema[0][1]} mean:{i_stat.mean[0]}"
	output_filename = os.path.split(filename)[1]
	out_name = outdir + "/" + output_filename + ".txt"
	
	print(f"Saving pixel data as: {out_name}")
	f = open(out_name, "w")
	f.write(data)
	f.close

	# Exit
	sys.exit(0)


if __name__ == "__main__":
	parser = argparse.ArgumentParser("image_processor")
	parser.add_argument("-file", help="Name of file to process", type=str)
	parser.add_argument("-out", help="Name of output directory", type=str)
	args = parser.parse_args()
	filename = None
	outdir = None

	if args.file:
		filename = args.file

	if args.out:
		outdir = args.out		

	if filename and outdir:
		process_an_image(filename, outdir)
	else:
		print("You need to provide -file and -out arguments.")
		print("See: ./image_processor -h")
		sys.exit(1)

As this is a relatively lightweight process, on a single image file, we can test this out on a single image file from the EuroSAT dataset by running directly on one of the login nodes as follows:

# Load the Python module to ensure we are not relying on a system version of Python
$ module load Python/3.12.3
# Create a directory to hold our output data
$ mkdir metadata
# Now run our image processor on a single file
$ python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_1.jpg -out metadata
Opening source image: EuroSAT_RGB/Residential/Residential_1.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1.jpg.txt
$

Checking the output file we can see it has recorded the min/max/mean pixel brightness values as intended:

$ cat metadata/Residential_1.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1.jpg min:57 max:253 mean:82.02685546875
$

In terms of sequence of operations, this looks like:

stateDiagram-v2 Open_Residential_1.jpg --> Process_Residential_1.jpg Process_Residential_1.jpg --> Save_Metadata

We have tested this directly at the command line, next we should check it functions correctly under Slurm.

Testing a single job

You can now run this as a Slurm job. Here's a simple job script which would process a single image file, just as we did at the command line above. Save the contents of the file below as image_job.sh, and submit as sbatch image_job.sh:

#!/bin/bash

#SBATCH --partition=short_free
#SBATCH --account=comet_abc123 # Remember to use your own account code
#SBATCH --ntasks=1
#SBATCH --mem=1g
#SBATCH -t 00:02:00

module load Python/3.12.3
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_1.jpg -out metadata

This should work identically to running it at the command line, though we have to wait for the job to be scheduled. In theory if we were processing very large files then we may have had to submit the Slurm job anyway, as the login node may not have had enough RAM to handle even a single large image. Let's submit the simple job file and check that it does work as intended:

$ sbatch image_job.sh 
Submitted batch job 1607386
$ cat slurm-1607386.out 
Opening source image: EuroSAT_RGB/Residential/Residential_1.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1.jpg.txt
$ cat metadata/Residential_1.jpg.txt
filename:EuroSAT_RGB/Residential/Residential_1.jpg min:60 max:155 mean:92.23583984375
$

So yes, it works as well under Slurm as it did directly at the command line, and the sequence of operations is similar to before:

stateDiagram-v2 Slurm_start_task --> Open_Residential_1.jpg Open_Residential_1.jpg --> Process_Residential_1.jpg Process_Residential_1.jpg --> Save_Metadata

If we didn't have many images to process, we could now just add the additional images in to the script and have them processed in turn, for example:

#!/bin/bash

#SBATCH --partition=short_free
#SBATCH --account=comet_abc123
#SBATCH --ntasks=1
#SBATCH --mem=1g
#SBATCH -t 00:02:00

module load Python/3.12.3
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_1.jpg -out metadata
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_2.jpg -out metadata
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_3.jpg -out metadata
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_4.jpg -out metadata
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_5.jpg -out metadata

However, there are more than 27000 images included in the EuroSAT data set, so it quickly becomes impractical to both list them all individually, and the time to finish increases linearly as we add each image processing command. If we were to run like that, the sequence would start to look like this:

stateDiagram-v2 Slurm_start_task --> Open_Residential_1.jpg Open_Residential_1.jpg --> Process_Residential_1.jpg Process_Residential_1.jpg --> Save_Metadata1 Save_Metadata1 --> Open_Residential_2.jpg Open_Residential_2.jpg --> Process_Residential_2.jpg Process_Residential_2.jpg --> Save_Metadata2 Save_Metadata2 --> Open_Residential_3.jpg Open_Residential_3.jpg --> Process_Residential_3.jpg Process_Residential_3.jpg --> Save_Metadata3

So it would get very long, and many times slower. It would certainly work, but is not the best use of resources or time.

However, we do know our analysis pipeline works. This is the first step complete.

Refactoring your input data

We know the image processing pipeline works for a single input file, but you are reading this guide because you presumably have hundreds or thousands of files to analyse.

So how do we do that? Well, at this point there is a decision to be made, and the ease of implementation will vary depending on the situation you find yourself in with your input data, as described in the two options below:

Scenario 1

Your files are already sequentially named, or have complete control over your input data files naming convention and can set them to be named or numbered sequentially.

Scenario 2

You do not have control over your input data files, they are not sequentially named, or they already exist with some naming convention which needs to be maintained.

Scenario 1 - Control of file naming

By far the easiest mechanism of moving to a task array based solution is if you have control over your input data naming convention. If you are able to arrange your input data files with sequential filenames, then it becomes almost trivial to convert to a task array.

By sequential naming, we refer to a filename which has some part of the name numbered in sequence - it does not necessarily need to start from 1.

Example sequential numbering schemes:

file1.jpg, file2.jpg, file3.jpg
data.1.dat, data.2.dat, data.3.dat
sequence_data.5000.seq, sequence_data.5001.seq, sequence_data.5002.seq
1001.txt, 1002.txt, 1003.txt
etc.

If you already have files named like this, or can rename them to be like this, then you will find that you can easily convert your image_job.sh job file to a task array with almost zero effort. We'll show this below.

Scenario 2 - Existing file naming conventions

If you have directory trees of input files, naming schemes that are random or otherwise unable to be ordered sequentially by number, then you will need to generate an additional input file where those files can be read sequentially.

Although this sounds complicated, it is actually quite simple. If you have a single directory of files, you can use the ls command and redirection to send all of your filenames to an input file:

Assuming we wanted all of the files in the EuroSAT_RGB/Residential directory we could do:

$ ls EuroSAT_RGB/Residential/*.jpg | sort > residential_files.txt

This command:

Lists all of the files ending in .jpg in the EuroSAT_RGB/Residential directory
Sorts them by name
Send the sorted list to a file named residential_files.txt

Looking at the top of the residential_files.txt file we can see the files are now listed:

$ head residential_files.txt 
EuroSAT_RGB/Residential/Residential_1000.jpg
EuroSAT_RGB/Residential/Residential_1001.jpg
EuroSAT_RGB/Residential/Residential_1002.jpg
EuroSAT_RGB/Residential/Residential_1003.jpg
EuroSAT_RGB/Residential/Residential_1004.jpg
EuroSAT_RGB/Residential/Residential_1005.jpg
EuroSAT_RGB/Residential/Residential_1006.jpg
EuroSAT_RGB/Residential/Residential_1007.jpg
EuroSAT_RGB/Residential/Residential_1008.jpg
EuroSAT_RGB/Residential/Residential_1009.jpg
$

… and it has all 3000 files:

$ wc -l residential_files.txt 
3000 residential_files.txt
$

We will use this residential_files.txt input file later, in our task array job.

If we cannot rename our files to be sequentially numbered, then this input file is the best way of moving to a task array based setup. If you have complicated subdirectory trees with your input files spread across multiple directories, then you may need to use something like find to list your files, rather than the simple ls command.

Building a task array solution

How you write a task array job file is determined by whether you have sequentially named input data, or are using the input data file method listed above.

Task array solution - sequential filenames

To convert our existing image_job.sh job script to a task array, we can make the following modifications:

Add the –array= parameter, which tells Slurm the quantity and numbers of the tasks we want to run - this range should correspond to the sequential numbering range of your input files

The numbers of the tasks Slurm will start for us get turned in to the variable ${SLURM_ARRAY_TASK_ID}. In the small example below, we are asking Slurm to launch four tasks and each task will get the number from 1 to 4 (–array=1-4).

Save the file below as image_job_seq.sh and run as sbatch image_job_seq.sh:

#!/bin/bash

#SBATCH --partition=default_free
#SBATCH --account=comet_abc123 # Remember to use your own account code
#SBATCH --mem=1g
#SBATCH --nodes=1
#SBATCH --tasks=4
#SBATCH --array=1-4
#SBATCH --cpus-per-task=1

module load Python/3.12.3
python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_${SLURM_ARRAY_TASK_ID}.jpg -out metadata

We use the fact each task gets a unique number to make each task also use a differently named input image file. In this case Slurm will start the four tasks and the commands run by each one will be one of:

python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_1.jpg -out metadata

python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_2.jpg -out metadata

python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_3.jpg -out metadata

python3 image_processor.py -file EuroSAT_RGB/Residential/Residential_4.jpg -out metadata

You don't need to do anything else - Slurm will automatically take care of launching each image_processor.py task with a differently named input file, and since we wrote image_processor.py to work completely independently of any other file, there will be no conflicts and each task will run simultaneously with the other three. In terms of sequence, it looks like this:

stateDiagram-v2 Submit_Job --> Start_Tasks Start_Tasks --> Open_Residential_1.jpg Start_Tasks --> Open_Residential_2.jpg Start_Tasks --> Open_Residential_3.jpg Start_Tasks --> Open_Residential_4.jpg Open_Residential_1.jpg --> Process_Residential_1.jpg Process_Residential_1.jpg --> Save_Metadata_1 Open_Residential_2.jpg --> Process_Residential_2.jpg Process_Residential_2.jpg --> Save_Metadata_2 Open_Residential_3.jpg --> Process_Residential_3.jpg Process_Residential_3.jpg --> Save_Metadata_3 Open_Residential_4.jpg --> Process_Residential_4.jpg Process_Residential_4.jpg --> Save_Metadata_4

Because each of the tasks executes entirely independently and in parallel, this gives us a x4 speed up versus processing the four images in sequence.

Task array solution - using an input data file

To convert our existing image_job.sh job script to a task array and instead use an input file, we can make the following modifications:

Add the –array= parameter, which tells Slurm the quantity and numbers of the tasks we want to run - this range should correspond to the sequential numbering range of your input files

The numbers of the tasks Slurm will start for us get turned in to the variable ${SLURM_ARRAY_TASK_ID}. In the small example below, we are asking Slurm to launch four and each task will get the number from 1 to 4 (–array=1-4).

Save the file below as image_job_inputfile.sh and run with sbatch image_job_inputfile.sh:

#!/bin/bash

#SBATCH --partition=default_free
#SBATCH --account=comet_abc123 # Remember to use your own account code
#SBATCH --mem=1g
#SBATCH --nodes=1
#SBATCH --array=1-4
#SBATCH --cpus-per-task=1

module load Python/3.12.3

# This next line generates a unique input data file for each task which is launched, as long as we generated the 'residential_files.txt' input as
# per our earlier example.
INPUT_FILE=$(awk NR==${SLURM_ARRAY_TASK_ID} residential_files.txt)

# Then each task launches image_processor with a unique INPUT_FILE name
python3 image_processor.py -file $INPUT_FILE -out metadata

Because we don't have sequentially numbered input files we use a little trick with the awk command to read a specific line from our input file. In this case each task reads its own line from the input file; task one reading the filename on line one, task two reading the filename on line two, and so on.

The sequence of operations is exactly the same as the sequentially named file example, as we have not changed anything other than the method that the filenames are retrieved:

stateDiagram-v2 Submit_Job --> Start_Tasks Start_Tasks --> Open_Residential_1.jpg Start_Tasks --> Open_Residential_2.jpg Start_Tasks --> Open_Residential_3.jpg Start_Tasks --> Open_Residential_4.jpg Open_Residential_1.jpg --> Process_Residential_1.jpg Process_Residential_1.jpg --> Save_Metadata_1 Open_Residential_2.jpg --> Process_Residential_2.jpg Process_Residential_2.jpg --> Save_Metadata_2 Open_Residential_3.jpg --> Process_Residential_3.jpg Process_Residential_3.jpg --> Save_Metadata_3 Open_Residential_4.jpg --> Process_Residential_4.jpg Process_Residential_4.jpg --> Save_Metadata_4

This therefore gets exactly the same x4 speed up as the previous example. The only difference is the potential order the files are processed and that they don't need to be sequentially named.

This achieves the same goal as the sequentially numbered task array example (four tasks launched, each reading a different input file), and while the awk line is a little opaque for those who have not written a great deal of shell script, you can treat it as an automatic filename supplier, as long as you give it your input file (in this case, residential_files.txt we created earlier).

Testing a multi job solution

Testing the sequential filename solution

First, let's test the sequentially numbered task array solution:

$ sbatch image_job_seq.sh 
Submitted batch job 1607653

# Check for output logs
$ ls slurm-1607653*
slurm-1607653_1.out  slurm-1607653_2.out  slurm-1607653_3.out  slurm-1607653_4.out

# See contents of output logs
$ cat slurm-1607653*
Opening source image: EuroSAT_RGB/Residential/Residential_1.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_2.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_2.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_3.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_3.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_4.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_4.jpg.txt
$

Has the data been generated correctly?

$ ls metadata/
Residential_1.jpg.txt  Residential_2.jpg.txt  Residential_3.jpg.txt  Residential_4.jpg.txt
$ cat metadata/Residential_1.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1.jpg min:60 max:155 mean:92.23583984375
$ cat metadata/Residential_2.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_2.jpg min:54 max:254 mean:115.12548828125
$ cat metadata/Residential_3.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_3.jpg min:62 max:255 mean:102.17138671875
$ cat metadata/Residential_4.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_4.jpg min:56 max:158 mean:90.14990234375
$

Yes, the four files which were processed in parallel were analysed successfully, using our existing, sequential image_process.py application.

Testing the input data file solution

Now, let's test the solution which uses the input data file, and does not rely on sequentially named image files.

# Submit job
$ sbatch image_job_inputfile.sh 
Submitted batch job 1607718

# Did Slurm logs get created for each task?
$ ls slurm-1607718_*
slurm-1607718_1.out  slurm-1607718_2.out  slurm-1607718_3.out  slurm-1607718_4.out

# Check Slurm output logs
$ cat slurm-1607718_*
Opening source image: EuroSAT_RGB/Residential/Residential_1000.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1000.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_1001.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1001.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_1002.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1002.jpg.txt
Opening source image: EuroSAT_RGB/Residential/Residential_1003.jpg
Converting to greyscale
Extracting pixel data
Saving pixel data as: metadata/Residential_1003.jpg.txt
$

So it seems to have ran okay (note that it processed the files in a different order, based on the sort method with used to generate residential_files.txt). What about the analysed data files?

# Did the output data files get created?
$ ls metadata/
Residential_1000.jpg.txt  Residential_1001.jpg.txt  Residential_1002.jpg.txt  Residential_1003.jpg.txt

# Check output data
$ cat metadata/Residential_1000.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1000.jpg min:47 max:218 mean:94.780517578125
$ cat metadata/Residential_1001.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1001.jpg min:61 max:254 mean:85.656982421875
$ cat metadata/Residential_1002.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1002.jpg min:66 max:254 mean:114.40478515625
$ cat metadata/Residential_1003.jpg.txt 
filename:EuroSAT_RGB/Residential/Residential_1003.jpg min:59 max:153 mean:79.7548828125

Again, yes; the four parallel tasks ran successfully, and they each generated (presumably) correct data from their independent image files.

We can be confident that both the sequential filename method and the input data file method do produce the same output - although the ordering of the image files could differ between the two, depending on how we order/sort the input file.

But hold on, we have thousands of input files, and we have only processed four of them! Well yes; as always we should start off small and test that things work before we scale up. Read the next section to understand how to scale up your task arrays jobs.

Scaling up & resource limits

If we look at the SBATCH headers we set in both input_job_seq. and input_job_inputfile.sh they are the same:

#SBATCH --partition=default_free
#SBATCH --account=comet_abc123 # Remember to use your own account code
#SBATCH --mem=1g
#SBATCH --nodes=1
#SBATCH --array=1-4
#SBATCH --cpus-per-task=1

How we increase the number of parallel copies of image_process.py is to configure the –array values to better suit our needs. Remember that:

–array= controls the quantity of tasks (4, in the case above) and the numbering of the ${SLURM_ARRAY_TASK_ID} variable which each task inherits (1 to 4 in the previous example)

It would seem that we could increase tasks to some huge number to run all of our input data files in parallel… but we are constrained by several factors:

In the case of Comet we do not have 27000 CPU cores! (each task requires a minimum of 1 CPU core)
Even if we did have enough cores to launch an obscene number of tasks, this risks a detrimental impact of the HPC service to all other users!
Each HPC project will have a resource limit on the number of cores and jobs (or tasks) it can submit or have running at any time - a paid project may therefore be able to launch a higher number of tasks from an array job than an unfunded project

When increasing the array figure the form below is used:

array=start-end%max_running

The %max_running argument is optional, and if not set, the number of running tasks is the sequence from start to end, e.g:

# Four running tasks, numbered 1, 2, 3, 4
#SBATCH --array=1-4

# Twenty eight running tasks, numbered 100 to 127
#SBATCH --array=100-127

If you include the %max_running argument, then this applies a limit to the simultaneous running tasks, e.g.:

# Four tasks, limited to two running at a time, numbered 1, 2, 3, 4
#SBATCH --array=1-4%2

# Twenty eight tasks, limited to five running at a time, numbered 100 to 127
#SBATCH --array=100-127%5

A key point to remember is that Slurm will create a job entry for each element of your array. If you have an array which defines 28 entries, then you will get 28 new jobs created… if you have an array with 200 entries then you will create 200 new Slurm job entries.

This means that large arrays can create huge numbers of Slurm jobs very, very quickly. For this reason we do currently have some limits set on the number of jobs both users and their entire group can create (see Resource Limits - Understanding what they mean for more details on the current limits - MaxSubmitJobs and GrpSubmitJobs limits are applicable to the maximum size of the –array setting, but also take in to account everything else currently running or pending against your account code).

If you have huge numbers of data files to process, then you may need to approach this in several batches to avoid overload Slurm and the job submission system, and to stay inside the limits enforced on your project (you can use our sproject tool from Simple Slurm Tools to quickly check your project limits).

Possible workarounds to array task limits

Whilst you may have limits in the number of individual tasks that you can launch as part of a Slurm job (and most HPC systems do have limits of some sort), these can easily be worked around with a little bit of extra shell script logic.

In the case of the EuroSAT image database, with 27000 files, there is little chance of using being able to launch 27K tasks at the same time, so we have a number of approaches.

Option 1 - Launch multiple Slurm jobs, manually

The simplest method is to split the file list of 27000 files down to more manageable sizes, in batches of 250, 500, 1000 entries or similar - based on what your actual MaxSubmitJobs and GrpSubmitJobs limits are set to. Launch all 250/500/1000 tasks, wait for them to finish, and then submit another batch.

This is the easiest, requires no additional scripting, but does mean that you need to watch for your jobs finishing and then submitting the next batch. This may take some time with tens of thousands, or even millions of input files.

Option 2 - Have each array job handle multiple input files

Presently our solution has each array task handle just one input file… but there's no reason why we have to keep it like that. We could add a little more logic to the Slurm script and instead have each task process a list of files… so although we only launch 250 tasks (for example), each one could loop over 1/250 of the input files.

Here's a possible solution for the 27000 file data set we are using in this example.

Generate a list of all input files:

$ find EuroSAT_RGB/ -type f -print > all_files.txt

The file all_files.txt now contains all 27000 files in the dataset:

$ wc -l all_files.txt 
27000 all_files.txt
$

Now use the command line tool split to create chunks of the all_files.txt by the number of tasks we will have in our Slurm array, in this case we'll limit it to 250 tasks:

$ split --numeric-suffixes=001 -n l/250 -d all_files.txt
$ ls
download.sh             x004  x016  x028  x040  x052  x064  x076  x088  x100  x112  x124  x136  x148  x160  x172  x184  x196  x208  x220  x232  x244
EuroSAT_RGB             x005  x017  x029  x041  x053  x065  x077  x089  x101  x113  x125  x137  x149  x161  x173  x185  x197  x209  x221  x233  x245
EUROSAT.zip             x006  x018  x030  x042  x054  x066  x078  x090  x102  x114  x126  x138  x150  x162  x174  x186  x198  x210  x222  x234  x246
all_files.txt           x007  x019  x031  x043  x055  x067  x079  x091  x103  x115  x127  x139  x151  x163  x175  x187  x199  x211  x223  x235  x247
image_job_inputfile.sh  x008  x020  x032  x044  x056  x068  x080  x092  x104  x116  x128  x140  x152  x164  x176  x188  x200  x212  x224  x236  x248
image_job_seq.sh        x009  x021  x033  x045  x057  x069  x081  x093  x105  x117  x129  x141  x153  x165  x177  x189  x201  x213  x225  x237  x249
image_job.sh            x010  x022  x034  x046  x058  x070  x082  x094  x106  x118  x130  x142  x154  x166  x178  x190  x202  x214  x226  x238  x250
image_processor.py      x011  x023  x035  x047  x059  x071  x083  x095  x107  x119  x131  x143  x155  x167  x179  x191  x203  x215  x227  x239
README.txt              x012  x024  x036  x048  x060  x072  x084  x096  x108  x120  x132  x144  x156  x168  x180  x192  x204  x216  x228  x240
x001                    x013  x025  x037  x049  x061  x073  x085  x097  x109  x121  x133  x145  x157  x169  x181  x193  x205  x217  x229  x241
x002                    x014  x026  x038  x050  x062  x074  x086  x098  x110  x122  x134  x146  x158  x170  x182  x194  x206  x218  x230  x242
x003                    x015  x027  x039  x051  x063  x075  x087  x099  x111  x123  x135  x147  x159  x171  x183  x195  x207  x219  x231  x243

This gives us 250 files, each with a portion of the files listed in all_files.txt. The files are also sequentially numbered (in this case x001 through to x250 - perfectly matching the number format used by the Slurm array tasks).

With a little bit of modification to our earlier sequential Slurm job file, we can make each of the 250 tasks we run read from their own input files:

#!/bin/bash

#SBATCH --partition=default_free
#SBATCH --account=comet_training # Remember to use your own account code
#SBATCH --mem=1g
#SBATCH --nodes=1
#SBATCH --array=1-250
#SBATCH --cpus-per-task=1

module load Python/3.12.3

# Turn the non-zero padded SLURM_ARRAY_TASK_ID (e.g. 7)
# into a zero-padded string which matches the filenames (e.g. x007)
INPUT_DATA_FILE="x`printf %03d ${SLURM_ARRAY_TASK_ID}`"

# Read from the x001 - x250 input file, which contains a portion of ALL the input files we need to process
cat ${INPUT_DATA_FILE} | while read INPUT_FILENAME
do
    python3 image_processor.py -file ${INPUT_FILENAME} -out metadata
done

So this allows us to start tasks up to the limit allowed by our Slurm account code (in this case we assume 250 is our limit - see the output from sproject for your Slurm account for your real limit), and each task will then individually process multiple files in it's own short loop, until each one is completed.

In this specific case, it means each running task will process 108 input files (27000 / 250) in the short loop section.

Downloads

You can download all of the scripts used in this example using the link below:

slurm_task_array_example.zip

Back to Advanced Slurm Job Optimisation