For troubleshooting either on your own or with assistance from others, you will want to know “What, Where, When, and How” an error occurred. It's a good idea to copy and paste from the terminal, to capture the exact time, what you were doing, the hostname and current directory.
Edit the shell environment configuration file (for example , to configure bash, add to the ~/.bashrc file):
~/.bashrc
export PS1="[\d \t \u@\h:\w ] $ " # shows date, time, host and current directory in your prompt HISTTIMEFORMAT="%d/%m/%y %T " # adds timestamps to your history
Alternatively, gather this information by running commands:
$ hostname
$ pwd
$ date
Always provide any scripts you were using. Where there is a directory containing many relevant scripts and data files, you can use:
$ cat <filename>
$ ls -al <directory>
$ tree
$ module list
If some jobs work and others don't, it can be handy to look at differences in the resources they used, maybe they didn't use the resources you expected!
sacct
for a single job numbered 1000667:
sacct --jobs 1000667 --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
–jobs
sacct --format=User,JobID,Jobname%50,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
Table of Contents
Main Content Sections
Documentation Tools