• Home
  • Accessing Our Facilities
    • Apply for Access
    • HPC Resource List
    • Our Staff
    • Our Research Projects
    • Our Research Software

    • Contributions & Costings
    • HPC Driving Test
  • Documentation
    • Documentation Home
    • Getting Started
    • Advanced Topics
    • Training & Workshops
    • FAQ
    • Policies & Procedures
    • Using the Wiki

    • Data & Report Terminology
    • About this website
  • My Account
    • My HPC Projects
HPC Support
Trace: • rdw_transfer_how-to

File Transfers and RDW

Monitoring RDW shares

RDW is file storage for research, hosted on our campus datacentre. It is possible to view the files from both Windows and Linux, permissions on the files are set using your campus login so it’s essential that you are logged in as the correct campus user in order to see your files.

Checking the size of data

Check the size of your data before transferring; this will give you an idea of how much space you’ll need on the destination and how long the transfer may take. Windows File Explorer will give you an estimate of directory size with right-click and ‘properties’ but you can get much better information at the linux command line using du. Try this tutorial: https://www.geeksforgeeks.org/du-command-linux-examples/

Principles for setting up large file transfers

  • Prioritise the most important data and plan the directory structure to help you find data later.
  • Keep what you need - consider whether it's necessary to keep large data that can be easily re-downloaded or re-created
  • Include a ‘README’ text file at the top level directory for a project or section, with information about the data, how it was generated and its expected uses.
  • Take a look at source and destination files before you start.
  • Use a file synchronisation tool, like rsync (linux) or robocopy (Windows), which can keep track of the copy and only move data which isn’t already present at the destination.
  • Start small
    • Try a ‘dry run’ to check your command
    • Try a small amount of data / number of files
    • Try a small number of directories
  • Try a ‘dry run’ for the whole transfer and output to a log file (this will be quick because no data is moved)
  • Run the transfer and output to a log file
  • Run the transfer again to confirm success

Large data transfer tips

Consider the route the data takes If you run a copy command on a laptop at home to copy data from Rocket to RDW, the data will have to move off campus to your laptop and back again via the (much slower) internet connection of your laptop. If you log in to Rocket from your laptop and run a copy command to RDW, the data will move straight from Rocket to RDW without leaving the (much faster) data centre network.

Avoid Graphical copy/paste. It's slow and error-prone; instead of a graphical file manager, use a file synchronisation tool, which can keep track of the copy and only move data which isn’t already present at the destination. This means that a transfer which fails part-way through is not wasted. Next time the command is run, the transfer will pick up from where it stopped.

Viewing and copying to RDW on Linux Command Line

Viewing your RDW share on Rocket

When an RDW share is set up, you will be provided with its Windows share name. Navigate to your RDW share on Rocket from the login node. RDW is split up ‘behind the scenes’ into numbered blocks, for admin purposes. This means there will be a number between 01 and 08 after /rdw in the path, so the path will be like: /rdw/05/share_name

To find your a new share called “share-name”, use:

$ cd /rdw
$ find -maxdepth 2 -type d -name "share-name"

NB: Groups with more than 1 project may have a super-directory on rdw like /rdw/02/group/share_name

Using rsync to copy to RDW

For large data, you may find the scp command limiting. The rsync utility provides advanced features for file transfer and is typically faster compared to both scp and sftp. It is especially useful for transferring large and/or many files. The syntax is similar to cp and scp. Rsync can be used on a locally mounted filesystem or a remote filesystem.

rsync is a powerful command

  • it's possible to over-write, duplicate or delete data accidentally with rsync
  • Ensure you understand the options you use
  • Check source and destination are the right way round
  • Check whether a trailing slash / is needed in the destination path
  • Always do a 'dry run'

Try out a dry run:

[userid@login01 ~]$ cd /nobackup/proj/training/userid/
[userid@login01 userid]$ rsync -trlv --inplace TestDir /rdw/03/rse-hpc/training/userid --dry-run

sending incremental file list
TestDir/
TestDir/testfile1
TestDir/testfile2

sent 121 bytes  received 26 bytes  294.00 bytes/sec
total size is 0  speedup is 0.00 (DRY RUN)

Run ‘for real’:

[userid@login01 userid]$ rsync -trlv --inplace TestDir /rdw/03/rse-hpc/training/userid

sending incremental file list
created directory /rdw/03/rse-hpc/training/userid
TestDir/
TestDir/testfile1
TestDir/testfile2

sent 197 bytes  received 415 bytes  408.00 bytes/sec
total size is 0  speedup is 0.00

Re-run the rsync transfer command: This gives you confidence that nothing was missed. The second run should be very fast, as rsync will not need to copy any data and will simply list all the files.

Output to a log file

Try out a dry run:

rsync --dry-run -rltv --inplace --itemize-changes --progress --stats --whole-file --size-only /nobackup/myusername/source /rdw/path/to/my/share/destination/ 2>&1 | tee /home/myusername/meaningful-log-name.log1

Run ‘for real’

rsync -rltv --inplace --itemize-changes --progress --stats --whole-file --size-only /nobackup/myusername/source /rdw/path/to/my/share/destination/ 2>&1 | tee /home/myusername/meaningful-log-name.log2

rsync options

Those familiar with rsync will often use the -av option, which preserves permissions, but leads to group modification errors on RDW. For Rocket and RDW, replace -av with -rltv

  • `-r` = recurse through subdirectories
  • `-l` = copy symlinks
  • `-t` = preserve timestamps
  • `-v` = verbose
  • `–inplace –whole-file –size-only` speed up transfer and prevent rsync filling up space with a large temporary directory
  • `–itemize-changes –progress –stats` for more informative output
  • `| tee` sends output both to the screen and to a log file
  • Use `man rsync` for more information on options.

Note: RDW has a super-fast connection to Rocket, which means that it takes more resource to compress and un-compress the data than it does to do the transfer.

Troubleshooting

Read the error messages - Not all errors are a cause for concern:

files/attrs were not transferred

files/attrs were not transferred: This error may be returned because RDW doesn't 'know' about Rocket's groups.

  • Applying group permissions from Rocket will fail because RDW has 'trumped' our local permissions and imposed its own permissions.
  • This would not prevent the transfer if only the group attribute of the file couldn't be transferred rsync: chgrp … failed: Invalid argument (22)
  • RDW shares are set up to allow access for users permitted by the PI.

Long transfers being halted by permissions errors

This usually means the ‘kerberos ticket’ for your user has expired. Kerberos tickets allow the system to know what your user is allowed to do, for security, they automatically expire at a set time after you log in. Most of the time these tickets are automatically renewed while you’re working, but they can expire during long copy commands. You need to take two steps to avoid timeouts. * Firstly, you will periodically need to renew your Kerberos authentication ticket, which controls your access to '/rdw' and expires after 10 hours. The 'krenew' command will do the renewal automatically for up to a week. * Secondly, to stop your process being killed if you are logged out, run it within a tmux session, then detach from your login. You can reattach later if necessary. An example session might look like this: Start a new tmux session: tmux

Run the command to copy your data to /rdw:

$ tmux
$ krenew -v -- bash -c 'rsync -trlv --inplace /nobackup/myuser/ /rdw/myshare/ >> mylogfile'

  • Detach the tmux session with <Ctrl-b d>
  • If necessary, start tmux again and attach to your previous session: tmux attach

Data Transfer over Fast Connections

Transfers from Rocket to RDW don’t leave our fast data centre network. The options needed are the same as for using rsync for disk to disk transfers in the same machine:

  • DON'T use compression -z
  • DO use –inplace

Why not -z? Compression uses lots of CPU and this becomes a bottleneck once network speed is fast enough. Why –inplace? Rsync usually creates a temp file on disk before copying, which places load on the CPU and hard drive. –inplace tells rsync not to create the temp file but send the data straight away. It doesn’t matter if the connection is interrupted, because rsync keeps track and tries again.

Rsync over Slow Connections

For rsync a slow connection like the internet:

  • DO use compression -z
  • DON’T use –inplace.

Back to RDW FAQ

Previous Next

HPC Support

Table of Contents

Table of Contents

  • File Transfers and RDW
    • Monitoring RDW shares
      • Checking the size of data
    • Principles for setting up large file transfers
    • Viewing and copying to RDW on Linux Command Line
      • Viewing your RDW share on Rocket
      • Using rsync to copy to RDW
      • Output to a log file
    • Troubleshooting
      • files/attrs were not transferred
      • Long transfers being halted by permissions errors

Main Content Sections

  • Documentation Home
  • Getting Started
  • Advanced Topics
  • Training & Workshops
  • FAQ
  • Policies & Procedures
  • Using the Wiki
  • Contact us & Get Help

Documentation Tools

  • Wiki Login
  • RSE-HPC Team Area
Developed and operated by
Research Software Engineering
Copyright © Newcastle University
Contact us @rseteam