====== Data And HPC Filesystems ======
===== HPC Filesystem Areas =====
It is important to understand that the HPC facilities, like most non-desktop based computing environment, //does not// have a single filesystem or storage area.
This is often one of the biggest changes for users coming from a non-HPC environment; even those who may have traditionally used Linux or unix-based systems such as Mac OS.
Before working on the HPC facilities, you should know what the different storage locations and types are, how to access them, and most importantly; //what to use each for//. By using the wrong storage area, you may cause slowdown in your jobs, those of others, or end up storing data in a location which is subject to shorter retention policies than others.
The areas you can find on our HPC facilities include the following:
* [[#HOME_Filesystem|HOME]] - Your own personal home directory
* [[#RDW_Filesystem|RDW]] - Long term, large capacity, central University storage
* [[#NOBACKUP_Filesystem|NOBACKUP]] - Fast, shared, group working area
* [[#SCRATCH_Directory|SCRATCH]] - Fast, local, temporary working area
==== HOME Filesystem ====
The HOME filesystem represents your //personal// home directory. This is provided to allow you to login to the facility, install and build your own software and store configuration data.
HOME is implemented on a general purpose Linux server which presents the HOME directory to all nodes in the facility. Because this is a general purpose server, it is //not// optimised for high-throughput, parallel data operations or other intensive purposes.
In order to minimise disruption to other users, your space on HOME is restricted by the enforcement of a disk quota. You can view your current quota use at any time by running the //quota// command:
$ quota
Disk quotas for user n1234 (uid 123456):
Filesystem blocks quota limit grace files quota limit grace
nfs:/home 31586244 41943040 41943040 36480 0 0
$
By default the output is in bytes, you can see more useful figures (MB, GB, TB) with the following option:
$ quota -s
Disk quotas for user n1234 (uid 123456):
Filesystem space quota limit grace files quota limit grace
nfs:/home 30846M 40960M 40960M 36480 0 0
The above sample output shows a quota of 40960MB (40GB), of which user n1234 is using 30846MB (~30GB).
Running Slurm jobs will always attempt to write their output logs and errors to your HOME, for this reason if you exceed your quota then //it is highly likely that some or all of your subsequent Slurm jobs may fail!//. If your Slurm jobs fail, __first__ check that you have not exceeded your HOME quota.
Your home directory can always be referenced by the environment variable **$HOME**, both at the command line, as well as in your SBATCH job files.
Your home directory is accessible on every node in the HPC facility, and is available to your running jobs. We recommend that for jobs which have large data demands or constantly write to output files during the process of their runtime that they are //not// written to HOME.
Because your HOME contains configuration files vital to allow your login to the facility, it is not a suitable location to share code or data with other users. If you are working on a shared project or with a wider group then we recommend not keeping your code and data in your personal home directory; use the NOBACKUP area instead, and, ideally use a revision control system, such as Github, for working on code.
In most cases, in the event of an employee or student leaving the University, information governance restrictions mean that we are __unable__ to allow the transfer of code or data from their personal HOME directory to another member of their team. //It is the responsibility of a HPC Project owner to ensure that all project members are working in the most appropriate location.//
The HOME directory is the //only// location on the HPC facilities which is backed up.
=== Suggested Use ===
The use of the HOME directory should typically be limited to the following categories:
* General Linux tasks (moving files around, editing configuration data)
* Compiling software
* Installing custom software for your own use
* Short test cases where data is limited or not heavily IO bound
-----
==== RDW Filesystem ====
The [[https://services.ncl.ac.uk/itservice/core-services/filestore/researchdatawarehouse|RDW]] (Research Data Warehouse) filesystem is not technically part of the HPC facility, but allows users to access data and shared groups stored on the central University filestore.
The RDW filesystem is not intended to be used for interactive compute activity, and you should copy any data down from RDW to your personal HOME or shared NOBACKUP project directory first via an interactive login (on a login node), run your compute activity, then copy any results back (again interactively from a login node).
RDW is normally mapped to the **/rdw** directory on the login nodes, from there you will find a directory tree which corresponds to the same network location path you would use to access the RDW filesystem from a typical Windows client.
It is important to note that due to technical restrictions, //it is not possible to access the RDW filesystem from any nodes other than the login nodes//. If your jobs attempt to access RDW whilst running on a compute node __then they will fail__. You are recommended to //not// interactively run computation on data which remains on RDW, even whilst on a login node. Please //always// copy data locally from RDW before running computation which accesses it.
Completed data sets should be moved to RDW, as it is the most appropriate location for the storage of data long term. Data left on the HPC facilities long-term may be subject to automated cleanup by our [[:policies:data|data retention policies]].
You should factor in the use of RDW for long term storage of your data as part of your research project costs. Using our HPC facilities for data storage is //not// supported.
**Note:** The RSE team do **not** have any control over the RDW filesystem, quotas, shares or permissions, nor do we have access to any of your RDW groups or directories. Requests for support, new areas or permission changes on RDW locations should be raised through the [[https://nuservice.ncl.ac.uk|NUIT ITService]] self-service system, using the "__Research Data Warehouse__" category, which will be actioned by a member of the NUIT Infrastructure team.
=== Suggested Use ===
You should use RDW for these categories of activity:
* Long term store of completed project data sets
* Sharing data (and code) with users outside of the HPC facility
* Import of data from non-Linux environments
* Interactive sessions on login nodes where you manually copy data down and then back up again
----
==== NOBACKUP Filesystem ====
The NOBACKUP filesystem is a high-performance data filesystem implemented using a [[https://en.wikipedia.org/wiki/Lustre_(file_system)|Lustre]] storage architecture.
The filesystem is optimised for heavy data reads and writes, supporting many simultaneous users and IO requests. All HPC Projects are given an area on the NOBACKUP filesystem which project members can use __for sharing input and output data files and shared code__.
NOBACKUP is available on all nodes under the directory name **/nobackup**. On nodes which are connected by low-latency (Infiniband) networking, the NOBACKUP filesystem is also connected over the same networking for increased performance.
To determine what space you have used on the NOBACKUP filesystem, you can use the //lfs quota// command:
$ lfs quota /nobackup
Disk quotas for usr n1234 (uid 123456):
Filesystem kbytes quota limit grace files quota limit grace
/nobackup 1956 0 0 - 115 0 0 -
Disk quotas for grp rocketloginaccess (gid 987654):
Filesystem kbytes quota limit grace files quota limit grace
/nobackup 0 0 0 - 0 0 0 -
$
The //lfs quota// command will show you the disk space used by your account, as well as each HPC Project group that you are a member of. To see the output in more human-readable units (MB, GB, TB), use the following option:
$ lfs quota /nobackup -h
Disk quotas for usr n1234 (uid 123456):
Filesystem used quota limit grace files quota limit grace
/nobackup 1.91M 0k 0k - 115 0 0 -
Disk quotas for grp rocketloginaccess (gid 987654):
Filesystem used quota limit grace files quota limit grace
/nobackup 0k 0k 0k - 0 0 0 -
$
At present, no quota enforcement is enabled on the NOBACKUP filesystem in order to allow HPC Projects to periodically //spike// in their utilisation of shared disk space. If projects abuse this facility then quotas may be enforced.
On previous HPC facilities we have also provided //personal// areas on NOBACKUP, in addition to the HPC Project shared areas. For **Comet** this facility has been removed - all data and code which is placed on NOBACKUP //must// therefore be in a shared area which has a minimum of 2 project members.
If you have a suggestion for a data set which can be shared by multiple projects (for example, several of the common bio bank databases) then please get in touch with RSE, and we can arrange to have these created in an area on NOBACKUP outside of your project shared area, and made accessible to all users.
**Note:** On departure of a project member you may request that the code/data created by that user within the project shared area to be re-assigned to another project member.
=== Suggested Use ===
* Shared data working area with other HPC Project members
* Shared code, where the code used by all project members needs to be identical
----
==== SCRATCH Directory ====
The SCRATCH directory is a location available on each node within the HPC facility. It is implented using the local, high-speed, solid-state drives of that particular node. As such it is //not// shared, and any data created in that directory is //only// available on that particular node, but it is also //very// fast.
The SCRATCH directory can be accessed by the value of the **$SCRATCH** //or// **$TMP** variable while at a command line, or within your SBATCH files.
Quotas are not enabled on the SCRATCH directory; //you// are responsible to clean up all files created during the run of your Slurm job.
=== Suggested Use ===
The SCRATCH directory is suggested to be used for:
* High-speed, temporary directory for files written-to whilst a Slurm job is running
* Temporary, ephemeral data which does not need to be kept, but which is essential while running
* Log files created during running a Slurm job
----
===== Summary =====
A summary of the available filesystem locations on our HPC facilities:
^ Location ^ Variable ^ Quota ^ Speed ^ Size ^ Run jobs from here? ^ Shared with others? ^ Re-assignable? ^ Availability ^ Long-term storage ^ Backed Up ^
| HOME | $HOME | Yes | Medium | 75 TB | Yes - //if low bandwidth// | No | No | All Nodes | No | Yes |
| RDW | | Yes | Slow | >5 PB | No | Yes | Yes | Login Nodes | Yes | Yes |
| NOBACKUP | $NOBACKUP | No | Fast | ~2 PB | Yes | Yes | Yes | All Nodes | No | No |
| SCRATCH | $SCRATCH or $TMP | No | Fastest | 2-8 TB | Yes | No | No | Unique per Node | No | No |
* //Re-assignable// refers to the ability to transfer ownership of files and directories in this area upon the departure from the University of the original creator through normal IT / Information Governance policies.
We recommend //all// HPC users use a revision control system, such as https://github.com/ to store your code and Slurm job scripts. The RSE team run [[https://rse.ncldata.dev/events|workshops]] to introduce version control using Git. //None// of the HPC filesystems should be the primary storage location of your code.
Newcastle University maintains an [[https://services.ncl.ac.uk/itservice/technical-services/softwaredevtoolkit/|Enterprise]] account on Github, through which you may authenticate using your University IT account.
----
[[:started:index|Back to Getting Started]]