RDW is file storage for research, hosted on our campus datacentre. It is possible to view the files from both Windows and Linux, permissions on the files are set using your campus login so it’s essential that you are logged in as the correct campus user in order to see your files.
Check the size of your data before transferring; this will give you an idea of how much space you’ll need on the destination and how long the transfer may take. Windows File Explorer will give you an estimate of directory size with right-click and ‘properties’ but you can get much better information at the linux command line using du. Try this tutorial: https://www.geeksforgeeks.org/du-command-linux-examples/
du
Consider the route the data takes If you run a copy command on a laptop at home to copy data from Rocket to RDW, the data will have to move off campus to your laptop and back again via the (much slower) internet connection of your laptop. If you log in to Rocket from your laptop and run a copy command to RDW, the data will move straight from Rocket to RDW without leaving the (much faster) data centre network.
Avoid Graphical copy/paste. It's slow and error-prone; instead of a graphical file manager, use a file synchronisation tool, which can keep track of the copy and only move data which isn’t already present at the destination. This means that a transfer which fails part-way through is not wasted. Next time the command is run, the transfer will pick up from where it stopped.
When an RDW share is set up, you will be provided with its Windows share name. Navigate to your RDW share on Rocket from the login node. RDW is split up ‘behind the scenes’ into numbered blocks, for admin purposes. This means there will be a number between 01 and 08 after /rdw in the path, so the path will be like: /rdw/05/share_name
/rdw
/rdw/05/share_name
To find your a new share called “share-name”, use:
$ cd /rdw $ find -maxdepth 2 -type d -name "share-name"
NB: Groups with more than 1 project may have a super-directory on rdw like /rdw/02/group/share_name
/rdw/02/group/share_name
For large data, you may find the scp command limiting. The rsync utility provides advanced features for file transfer and is typically faster compared to both scp and sftp. It is especially useful for transferring large and/or many files. The syntax is similar to cp and scp. Rsync can be used on a locally mounted filesystem or a remote filesystem.
scp
sftp
cp
rsync
/
[userid@login01 ~]$ cd /nobackup/proj/training/userid/ [userid@login01 userid]$ rsync -trlv --inplace TestDir /rdw/03/rse-hpc/training/userid --dry-run sending incremental file list TestDir/ TestDir/testfile1 TestDir/testfile2 sent 121 bytes received 26 bytes 294.00 bytes/sec total size is 0 speedup is 0.00 (DRY RUN)
[userid@login01 userid]$ rsync -trlv --inplace TestDir /rdw/03/rse-hpc/training/userid sending incremental file list created directory /rdw/03/rse-hpc/training/userid TestDir/ TestDir/testfile1 TestDir/testfile2 sent 197 bytes received 415 bytes 408.00 bytes/sec total size is 0 speedup is 0.00
Re-run the rsync transfer command: This gives you confidence that nothing was missed. The second run should be very fast, as rsync will not need to copy any data and will simply list all the files.
rsync --dry-run -rltv --inplace --itemize-changes --progress --stats --whole-file --size-only /nobackup/myusername/source /rdw/path/to/my/share/destination/ 2>&1 | tee /home/myusername/meaningful-log-name.log1
rsync -rltv --inplace --itemize-changes --progress --stats --whole-file --size-only /nobackup/myusername/source /rdw/path/to/my/share/destination/ 2>&1 | tee /home/myusername/meaningful-log-name.log2
Those familiar with rsync will often use the -av option, which preserves permissions, but leads to group modification errors on RDW. For Rocket and RDW, replace -av with -rltv
-av
-rltv
Note: RDW has a super-fast connection to Rocket, which means that it takes more resource to compress and un-compress the data than it does to do the transfer.
Read the error messages - Not all errors are a cause for concern:
files/attrs were not transferred: This error may be returned because RDW doesn't 'know' about Rocket's groups.
files/attrs were not transferred
group
rsync: chgrp … failed: Invalid argument (22)
This usually means the ‘kerberos ticket’ for your user has expired. Kerberos tickets allow the system to know what your user is allowed to do, for security, they automatically expire at a set time after you log in. Most of the time these tickets are automatically renewed while you’re working, but they can expire during long copy commands. You need to take two steps to avoid timeouts. * Firstly, you will periodically need to renew your Kerberos authentication ticket, which controls your access to '/rdw' and expires after 10 hours. The 'krenew' command will do the renewal automatically for up to a week. * Secondly, to stop your process being killed if you are logged out, run it within a tmux session, then detach from your login. You can reattach later if necessary. An example session might look like this: Start a new tmux session: tmux
tmux
$ tmux $ krenew -v -- bash -c 'rsync -trlv --inplace /nobackup/myuser/ /rdw/myshare/ >> mylogfile'
<Ctrl-b d>
tmux attach
Transfers from Rocket to RDW don’t leave our fast data centre network. The options needed are the same as for using rsync for disk to disk transfers in the same machine:
-z
–inplace
Why not -z? Compression uses lots of CPU and this becomes a bottleneck once network speed is fast enough. Why –inplace? Rsync usually creates a temp file on disk before copying, which places load on the CPU and hard drive. –inplace tells rsync not to create the temp file but send the data straight away. It doesn’t matter if the connection is interrupted, because rsync keeps track and tries again.
For rsync a slow connection like the internet:
Back to RDW FAQ
Table of Contents
Main Content Sections
Documentation Tools