====== Gathering Troubleshooting Information ======

For troubleshooting either on your own or with assistance from others, you will want to know "What, Where, When, and How" an error occurred.
It's a good idea to copy and paste from the terminal, to capture the exact time, what you were doing, the hostname and current directory.

==== Where and When? ====
Edit the shell environment configuration file (for example , to configure bash, add to the ''~/.bashrc'' file):
<code>
export PS1="[\d \t \u@\h:\w ] $ "                                                                                # shows date, time, host and current directory in your prompt
HISTTIMEFORMAT="%d/%m/%y %T "                                                                                    # adds timestamps to your history
</code>

Alternatively, gather this information by running commands:
  * ''$ hostname'' will output something like: cometlogin01.comet.hpc.ncl.ac.uk
  * ''$ pwd'' will output something like: /nobackup/proj/MyProject
  * ''$ date'' will output something like: Mon  7 Jul 12:25:26 BST 2025

==== Context:  What and How? ====
Always provide any scripts you were using.  Where there is a directory containing many relevant scripts and data files, you can use:
  * ''$ cat <filename>'' will output the content of a text file to the console
  * ''$ ls -al <directory>'' will list the directory contents
  * ''$ tree'' will show the tree structure from the current directory


  * ''$ module list'' will show currently loaded modules

===== Records of previous slurm jobs =====
If some jobs work and others don't, it can be handy to look at differences in the resources they used, maybe they didn't use the resources you expected!
  * ''sacct'' provides information about your finished jobs

for a single job numbered 1000667:
<code>
sacct --jobs 1000667 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
</code>
to ouput information on all your jobs, leave out the ''--jobs'' option:
<code>
sacct --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
</code>