====== Gathering Troubleshooting Information ======
For troubleshooting either on your own or with assistance from others, you will want to know "What, Where, When, and How" an error occurred.
It's a good idea to copy and paste from the terminal, to capture the exact time, what you were doing, the hostname and current directory.
==== Where and When? ====
Edit the shell environment configuration file (for example , to configure bash, add to the ''~/.bashrc'' file):
export PS1="[\d \t \u@\h:\w ] $ " # shows date, time, host and current directory in your prompt
HISTTIMEFORMAT="%d/%m/%y %T " # adds timestamps to your history
Alternatively, gather this information by running commands:
* ''$ hostname'' will output something like: cometlogin01.comet.hpc.ncl.ac.uk
* ''$ pwd'' will output something like: /nobackup/proj/MyProject
* ''$ date'' will output something like: Mon 7 Jul 12:25:26 BST 2025
==== Context: What and How? ====
Always provide any scripts you were using. Where there is a directory containing many relevant scripts and data files, you can use:
* ''$ cat '' will output the content of a text file to the console
* ''$ ls -al '' will list the directory contents
* ''$ tree'' will show the tree structure from the current directory
* ''$ module list'' will show currently loaded modules
===== Records of previous slurm jobs =====
If some jobs work and others don't, it can be handy to look at differences in the resources they used, maybe they didn't use the resources you expected!
* ''sacct'' provides information about your finished jobs
for a single job numbered 1000667:
sacct --jobs 1000667 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
to ouput information on all your jobs, leave out the ''--jobs'' option:
sacct --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist