====== Gathering Troubleshooting Information ====== For troubleshooting either on your own or with assistance from others, you will want to know "What, Where, When, and How" an error occurred. It's a good idea to copy and paste from the terminal, to capture the exact time, what you were doing, the hostname and current directory. ==== Where and When? ==== Edit the shell environment configuration file (for example , to configure bash, add to the ''~/.bashrc'' file): export PS1="[\d \t \u@\h:\w ] $ " # shows date, time, host and current directory in your prompt HISTTIMEFORMAT="%d/%m/%y %T " # adds timestamps to your history Alternatively, gather this information by running commands: * ''$ hostname'' will output something like: cometlogin01.comet.hpc.ncl.ac.uk * ''$ pwd'' will output something like: /nobackup/proj/MyProject * ''$ date'' will output something like: Mon 7 Jul 12:25:26 BST 2025 ==== Context: What and How? ==== Always provide any scripts you were using. Where there is a directory containing many relevant scripts and data files, you can use: * ''$ cat '' will output the content of a text file to the console * ''$ ls -al '' will list the directory contents * ''$ tree'' will show the tree structure from the current directory * ''$ module list'' will show currently loaded modules ===== Records of previous slurm jobs ===== If some jobs work and others don't, it can be handy to look at differences in the resources they used, maybe they didn't use the resources you expected! * ''sacct'' provides information about your finished jobs for a single job numbered 1000667: sacct --jobs 1000667 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist to ouput information on all your jobs, leave out the ''--jobs'' option: sacct --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist