Job files are just text files. They can be created with any text editor (but not a word processor like Word). The job file instructs the HPC scheduler what hardware your job requires and contains the commands that your job runs to run your computation workload.
The scheduler takes your job file and looks for the most suitable place to run your commands, from across the available hardware in the HPC facility. The choices you make and the options you set in your job file are used by the scheduler to better inform its decision on when and where to run your job.
As noted, with an HPC system you do not normally interact directly with the server or servers which are used to run your code or application; the job file describes what you need, and it is up to the scheduler to run it on your behalf.
The actual code you want to run might be some custom Python or R, or it may involve starting up Matlab or Ansys to run a pre-defined solution. Whatever you intend to run, the job script is the way you do this.
In 99.9% of cases you will need a job file to tell the scheduler where to find your job, what resources you need, and how to run it. As the job file is also code, we would recommend that you use software version control tools, such as Git / Github.com, to store your job files; especially if your job files are being shared amongst others in your HPC project group.
First, log on to the HPC facility.
All of your job files start the same way, in your text editor (for example nano) you create a new file (let us call it firstjob.sh) and add this line at the very top:
nano
firstjob.sh
#!/bin/bash
This tells the system that when the scheduler actually runs your job, it will run by the bash command - a built in utility on Linux systems. The bash tool allows for simple scripting and flow control and is an easy way to get started writing scripts and chaining together sequences of actions on Linux.
bash
After you have added the #!/bin/bash header line, you then add the commands which you want your job to run. In this case we are going to:
Let us run nano firstjob.sh and enter the following simple job:
nano firstjob.sh
#!/bin/bash #SBATCH --account=myhpcgroup date echo "We are running on $HOSTNAME" date
For the –account line, make sure that you change myhpcgroup to the name of the HPC project group you are a member of (e.g. rockhpc_abc23 or comethpc_xyz23).
–account
rockhpc_abc23
comethpc_xyz23
You must always use a valid account name in your job files, as the system needs to know which project to allocate your resource use to. Unlike earlier HPC facilities, our current systems require that you specify your account. The –account field is no longer an optional parameter.
Save the file as firstjob.sh and exit nano.
We send the job to the scheduler using the sbatch command and the name of the job file. In the example above we use sbatch firstjob.sh:
sbatch
sbatch firstjob.sh
$ sbatch firstjob.sh Submitted batch job 134567899 $
Okay, so the job is now submitted, but where did it go, and how do you see what your code did?
Well, we need to understand that running jobs on an HPC are often somewhat different to those same scripts and code running on a local compute:
Instead, all output that your job would have sent to the screen is captured and instead sent to a text log file slurm-<job_ID>.out, where job_ID is replaced by the number which sbatch indicated when you submitted the job. In this example we would find that a new file named slurm-134567899.out has been created.
slurm-<job_ID>.out
slurm-134567899.out
The .out file may not contain any contain anything at this time, as the scheduler still needs to find a free space across the HPC facility to run your job. There are likely other jobs running on the HPC, and we have to wait our turn.
.out
You can monitor the status of any jobs you submit by running the squeue –me command.
squeue –me
At some point your job will run, and once completed the slurm-134567899.out file will contain anything printed to the screen during that run. In the case of the simple job script we type above, something along the lines of this should be shown:
$ cat slurm-134567899.out Mon 31 Mar 14:03:45 BST 2025 We are running on compute37.hpc Mon 31 Mar 14:03:46 BST 2025 $
When we submitted our test job script, how did the HPC scheduler know how much RAM or CPU cores (both referred to as job resources) to allocate to it, or how long it was going to run?
It didn't.
Whilst the scheduler software is quite sophisticated and can manage thousands of jobs running across many dozens or hundreds of servers, it has no means of magically identifying how much memory or processor power that your application needs to run. In our example above your job ran with the default resource allocation - we are fortunate that such a basic job can run with the default resource values.
The basic resource requirements of a job are:
The resources of our HPC facilities are grouped into logical containers called partitions. These groupings are used to gather together servers of similar capabilities and performance. Each partition type has (or may have) a different default resource allocation
All Slurm jobs cost resources. Some of the hardware resources we make available are free, whereas some are reserved for paying projects. It is important to understand that the resources you request in your Slurm job directly impact the costs incurred by your project.
Our Methodology
When we calculate how many resources a Slurm job has used we use the following mechanism:
Number of resources * Job time in hours = Hours of Compute Resources
CPU Based Slurm Jobs
In the case of a Slurm job which only uses CPU resources, this becomes:
Total number of CPU cores * Hours = Total Hours of CPU Compute Resource
GPU Based Slurm Jobs
In the case of a Slurm job using GPU resources, the calculation is:
Total number of GPU cards * Hours = Total Hours of GPU Compute Resource
Note, as above, RAM / Memory is not a factor in the cost of your Slurm jobs.
Read on to understand how to change the basic resources allocated to your Slurm job; the most common resources you will need to request are CPU cores, Memory and job time:
We can adjust the amount of RAM the scheduler will allocate us using the –mem parameter in our job file:
–mem
#SBATCH --mem=
The default values are interpreted as Megabytes, so –mem=1000M and –mem=1000 are identical. You may use the optional suffixes K, M, G and T for Kilobytes, Megabytes, Gigabytes and Terabytes, respectively.
–mem=1000M
–mem=1000
While we don't directly include the amount of memory your job requires as part of our costing methodology, it is still important for Slurm to know how much your job needs, so that it can find the right space on the right server for your job to start.
We can adjust the number of CPU cores the scheduler will allocate us using the following parameter in our job file with –cpus-per-task:
–cpus-per-task
#SBATCH --cpus-per-task=
Or, use the shorthand -c:
-c
#SBATCH -c
We can adjust the amount of time we want the scheduler to allocate to our job using the –time parameter in our job file:
–time
#SBATCH --time=
The shorthand version is just -t:
-t
#SBATCH -t
The format is hh:mm:ss, or days-hh:mm:ss.
hh:mm:ss
days-hh:mm:ss
The amount of time your job uses only starts at the point the scheduler begins to run it - it does not include the time your job may spend waiting to start.
When the resource cost of your job is calculated it is only the elapsed time the job was running which is used, not the time you requested.
If you request 4 hours, but the job only runs for 2 hours, then your resource cost is based on 2 hours, not 4.
Now that we know what the basic resource 'building blocks' of a job script are, let us write a second job, this time adding a number of explicit resource requests:
#!/bin/bash #SBATCH --account=myhpcgroup #SBATCH --mem=1000M #SBATCH --cpus-per-task=2 #SBATCH --time=00:05:00 date echo "We are running on $HOSTNAME" date
Submit the job as before:
$ sbatch secondjob.sh Submitted batch job 154543891 $
Wait for the job to run (check via squeue –me) and look for the output in slurm-154543891.out:
slurm-154543891.out
$ cat slurm-154543891.out Mon 31 Mar 14:40:44 BST 2025 We are running on compute09.hpc Mon 31 Mar 14:40:46 BST 2025 $
Although in this limited case there appears to be no visible difference, by explicitly requesting a different amount of CPU cores, RAM and runtime the scheduler had more information available to make an informed choice about where to run our job. It also allowed the scheduler to allocate the specific resources our job needed - before we were just guessing that the scheduler gave us enough!
In a trivial case such as this, this likely makes no significant difference, but what if we had a big job requiring hundreds of Gigabytes of RAM, or a hundred CPU cores, or there were already hundreds of other jobs running on the same server?
If you do not request the right amount of resources for your job, then you may end up in one of the following scenarios:
It may seem harsh to have your job terminated, but the scheduler is working on behalf of all users of the facility - and a job which is trying to use more resources than the scheduler expected may cause many other users to be negatively impacted.
To make a more informed decision about how to write your job scripts to take advantage of our different partition types, please now consult our Resources and Partitions section.
Back to Getting Started
Table of Contents
Main Content Sections
Documentation Tools