For troubleshooting either on your own or with assistance from others, you will want to know “What, Where, When, and How” an error occurred. It's a good idea to copy and paste from the terminal, to capture the exact time, what you were doing, the hostname and current directory.
Edit the shell environment configuration file (for example , to configure bash, add to the ~/.bashrc
file):
export PS1="[\d \t \u@\h:\w ] $ " # shows date, time, host and current directory in your prompt
HISTTIMEFORMAT="%d/%m/%y %T " # adds timestamps to your history
Alternatively, gather this information by running commands:
$ hostname
will output something like: cometlogin01.comet.hpc.ncl.ac.uk$ pwd
will output something like: /nobackup/proj/MyProject$ date
will output something like: Mon 7 Jul 12:25:26 BST 2025Always provide any scripts you were using. Where there is a directory containing many relevant scripts and data files, you can use:
$ cat <filename>
will output the content of a text file to the console$ ls -al <directory>
will list the directory contents$ tree
will show the tree structure from the current directory$ module list
will show currently loaded modulesIf some jobs work and others don't, it can be handy to look at differences in the resources they used, maybe they didn't use the resources you expected!
sacct
provides information about your finished jobsfor a single job numbered 1000667:
sacct --jobs 1000667 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
to ouput information on all your jobs, leave out the –jobs
option:
sacct --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist