Advanced Slurm Job Types

Advanced Slurm Job Types

This page is an introduction for users who are starting to move beyond single application, single input data processing. The field of parallel programming and parallel computing is large and this is just a pointer to two techniques which you may want to consider once you need that extra performance.

Sections:

Types of Parallelism - Considering your input data
Parallel jobs - Building a parallel task array solution
Parallel jobs - Building a simple MPI solution

Types of Parallelism - Consider your input data

The types of parallelism we employ in HPC generally depends on the category of data we are processing. The two main categories that we encounter most often are explained below.

Independent Data Input

In certain workloads you have lots of data files which are completely independent of each other - what you do to one data file has no bearing on what you do to another. Each file is loaded, processed and the output saved independently:

stateDiagram-v2 InputData_1 --> Processing_1 Processing_1 --> OutputData_1

Examples of data sets where processing one file may have no impact on any of the other files you need to analyse:

Sequencing data - where each file produced by the sampling process is analysed independently of any other.

Individual image files - where the analysis or transformation of each JPEG file does not depend on any other file before or after.

This type of data processing is ideal for implementation as a Parallel Task Array Job offered by Slurm and this allows you to simultaneously process all (or a large number) of the data files at the same time, greatly decreasing the overall time it takes to get results for your entire dataset.

Single Complex Data Input

In other workloads you may only have a single data file to analyse, but that data is either so large or so complex that it can only be analysed in a reasonable timescale by processing smaller subsets of the overall data, but the output from analysing all of those sub-components is necessary for the final calculation or output.

stateDiagram-v2 state fork_state <> InputData --> fork_state fork_state --> subData_Processing_1 fork_state --> subData_Processing_2 fork_state --> subData_Processing_N state join_state <> subData_Processing_1 --> join_state subData_Processing_2 --> join_state subData_Processing_N --> join_state join_state --> finalData finalData --> [*]

Examples where you may need to analyse one data file in this way:

Computational Fluid Dynamics - Calculating each area of a mesh independently, but needing the output of those calculations to produce the final combined analysis.

Creative Commons licensed image

Large Scale Satellite Imagery - Analysing individual tiles of large format images or sensor data, where the processing of each tile must also rely on attributes of its neighbours.

Original image from the Max Planck Institute

This type of compute is more suited to implementation as a Parallel Job Using MPI, but can be much more difficult to implement if you are starting from scratch.

By following the MPI approach you can break down one massive problem in to a smaller number of sub-problems, distributing those sub-problems to a large number of CPU cores, and decrease the overall time it takes to arrive at a result.

Parallel Task Array Jobs

Continuing on from our earlier Task Array Slurm Job this guide explores a more complete example of how you can develop a task array based solution to processing huge quantities of input data with only small changes to your existing sbatch job files and no changes to your underlying code or application.

This guide explores:

How to build a single job solution
Testing a single job
Refactoring your data to work in task arrays
Building an example task array solution
Considering resource limits and other limitations

See the Building a parallel task array solution

Parallel Jobs Using MPI

This guide explores the more complicated approach to achieving parallelism on a single compute problem by dividing the task up in to discrete units of work and distributing the problem using MPI. This requires more software engineering skills and a careful approach when designing/writing your code, but can achieve substantial improvements in performance if you problem is suitable for this approach: