It is important to understand that the HPC facilities, like most non-desktop based computing environment, does not have a single filesystem or storage area.
This is often one of the biggest changes for users coming from a non-HPC environment; even those who may have traditionally used Linux or unix-based systems such as Mac OS.
Before working on the HPC facilities, you should know what the different storage locations and types are, how to access them, and most importantly; what to use each for. By using the wrong storage area, you may cause slowdown in your jobs, those of others, or end up storing data in a location which is subject to shorter retention policies than others.
The areas you can find on our HPC facilities include the following:
The HOME filesystem represents your personal home directory. This is provided to allow you to login to the facility, install and build your own software and store configuration data.
HOME is implemented on a general purpose Linux server which presents the HOME directory to all nodes in the facility. Because this is a general purpose server, it is not optimised for high-throughput, parallel data operations or other intensive purposes.
In order to minimise disruption to other users, your space on HOME is restricted by the enforcement of a disk quota. You can view your current quota use at any time by running the quota command:
$ quota
Disk quotas for user n1234 (uid 123456):
Filesystem blocks quota limit grace files quota limit grace
nfs:/home 31586244 41943040 41943040 36480 0 0
$
By default the output is in bytes, you can see more useful figures (MB, GB, TB) with the following option:
$ quota -s
Disk quotas for user n1234 (uid 123456):
Filesystem space quota limit grace files quota limit grace
nfs:/home 30846M 40960M 40960M 36480 0 0
The above sample output shows a quota of 40960MB (40GB), of which user n1234 is using 30846MB (~30GB).
Running Slurm jobs will always attempt to write their output logs and errors to your HOME, for this reason if you exceed your quota then it is highly likely that some or all of your subsequent Slurm jobs may fail!. If your Slurm jobs fail, first check that you have not exceeded your HOME quota.
Your home directory can always be referenced by the environment variable $HOME, both at the command line, as well as in your SBATCH job files.
Your home directory is accessible on every node in the HPC facility, and is available to your running jobs. We recommend that for jobs which have large data demands or constantly write to output files during the process of their runtime that they are not written to HOME.
Because your HOME contains configuration files vital to allow your login to the facility, it is not a suitable location to share code or data with other users. If you are working on a shared project or with a wider group then we recommend not keeping your code and data in your personal home directory; use the NOBACKUP area instead, and, ideally use a revision control system, such as Github, for working on code.
In most cases, in the event of an employee or student leaving the University, information governance restrictions mean that we are unable to allow the transfer of code or data from their personal HOME directory to another member of their team. It is the responsibility of a HPC Project owner to ensure that all project members are working in the most appropriate location.
The HOME directory is the only location on the HPC facilities which is backed up.
The use of the HOME directory should typically be limited to the following categories:
The RDW (Research Data Warehouse) filesystem is not technically part of the HPC facility, but allows users to access data and shared groups stored on the central University filestore.
The RDW filesystem is not intended to be used for interactive compute activity, and you should copy any data down from RDW to your personal HOME or shared NOBACKUP project directory first via an interactive login (on a login node), run your compute activity, then copy any results back (again interactively from a login node).
RDW is normally mapped to the /rdw directory on the login nodes, from there you will find a directory tree which corresponds to the same network location path you would use to access the RDW filesystem from a typical Windows client.
It is important to note that due to technical restrictions, it is not possible to access the RDW filesystem from any nodes other than the login nodes. If your jobs attempt to access RDW whilst running on a compute node then they will fail. You are recommended to not interactively run computation on data which remains on RDW, even whilst on a login node. Please always copy data locally from RDW before running computation which accesses it.
Completed data sets should be moved to RDW, as it is the most appropriate location for the storage of data long term. Data left on the HPC facilities long-term may be subject to automated cleanup by our data retention policies.
You should factor in the use of RDW for long term storage of your data as part of your research project costs. Using our HPC facilities for data storage is not supported.
Note: The RSE team do not have any control over the RDW filesystem, quotas, shares or permissions, nor do we have access to any of your RDW groups or directories. Requests for support, new areas or permission changes on RDW locations should be raised through the NUIT ITService self-service system, using the “Research Data Warehouse” category, which will be actioned by a member of the NUIT Infrastructure team.
You should use RDW for these categories of activity:
The NOBACKUP filesystem is a high-performance data filesystem implemented using a Lustre storage architecture.
The filesystem is optimised for heavy data reads and writes, supporting many simultaneous users and IO requests. All HPC Projects are given an area on the NOBACKUP filesystem which project members can use for sharing input and output data files and shared code.
NOBACKUP is available on all nodes under the directory name /nobackup. On nodes which are connected by low-latency (Infiniband) networking, the NOBACKUP filesystem is also connected over the same networking for increased performance.
To determine what space you have used on the NOBACKUP filesystem, you can use the lfs quota command:
$ lfs quota /nobackup
Disk quotas for usr n1234 (uid 123456):
Filesystem kbytes quota limit grace files quota limit grace
/nobackup 1956 0 0 - 115 0 0 -
Disk quotas for grp rocketloginaccess (gid 987654):
Filesystem kbytes quota limit grace files quota limit grace
/nobackup 0 0 0 - 0 0 0 -
$
The lfs quota command will show you the disk space used by your account, as well as each HPC Project group that you are a member of. To see the output in more human-readable units (MB, GB, TB), use the following option:
$ lfs quota /nobackup -h
Disk quotas for usr n1234 (uid 123456):
Filesystem used quota limit grace files quota limit grace
/nobackup 1.91M 0k 0k - 115 0 0 -
Disk quotas for grp rocketloginaccess (gid 987654):
Filesystem used quota limit grace files quota limit grace
/nobackup 0k 0k 0k - 0 0 0 -
$
At present, no quota enforcement is enabled on the NOBACKUP filesystem in order to allow HPC Projects to periodically spike in their utilisation of shared disk space. If projects abuse this facility then quotas may be enforced.
On previous HPC facilities we have also provided personal areas on NOBACKUP, in addition to the HPC Project shared areas. For Comet this facility has been removed - all data and code which is placed on NOBACKUP must therefore be in a shared area which has a minimum of 2 project members.
If you have a suggestion for a data set which can be shared by multiple projects (for example, several of the common bio bank databases) then please get in touch with RSE, and we can arrange to have these created in an area on NOBACKUP outside of your project shared area, and made accessible to all users.
Note: On departure of a project member you may request that the code/data created by that user within the project shared area to be re-assigned to another project member.
The SCRATCH directory is a location available on each node within the HPC facility. It is implented using the local, high-speed, solid-state drives of that particular node. As such it is not shared, and any data created in that directory is only available on that particular node, but it is also very fast.
The SCRATCH directory can be accessed by the value of the $SCRATCH or $TMP variable while at a command line, or within your SBATCH files.
Quotas are not enabled on the SCRATCH directory; you are responsible to clean up all files created during the run of your Slurm job.
The SCRATCH directory is suggested to be used for:
A summary of the available filesystem locations on our HPC facilities:
Location | Variable | Quota | Speed | Size | Run jobs from here? | Shared with others? | Re-assignable? | Availability | Long-term storage | Backed Up |
---|---|---|---|---|---|---|---|---|---|---|
HOME | $HOME | Yes | Medium | 75 TB | Yes - if low bandwidth | No | No | All Nodes | No | Yes |
RDW | Yes | Slow | >5 PB | No | Yes | Yes | Login Nodes | Yes | Yes | |
NOBACKUP | $NOBACKUP | No | Fast | ~2 PB | Yes | Yes | Yes | All Nodes | No | No |
SCRATCH | $SCRATCH or $TMP | No | Fastest | 2-8 TB | Yes | No | No | Unique per Node | No | No |
We recommend all HPC users use a revision control system, such as https://github.com/ to store your code and Slurm job scripts. The RSE team run workshops to introduce version control using Git. None of the HPC filesystems should be the primary storage location of your code.
Newcastle University maintains an Enterprise account on Github, through which you may authenticate using your University IT account.