This is a historic project which has previously made use of HPC facilities at Newcastle University.
For further information about this project, please contact:
Various Many-task computing (MTC) and HPC workloads depending on application:
many single core jobs 4-72hr (memory requirements approx 8-24GB)
some high memory, multi core, long running (+120hr) up to ~128GB+ jobs.
some short-medium running up to ~24hr high core count jobs needing 5-10 execution threads and approx 32-64GB of RAM.
All conceivable permutations of above - different applications have different runtime and memory requirements some bottle neck stages can't be sub divided.
Highly parallel map reduce style scatter gather operations.
Frequent multiple simultaneous large >0.5TB scale amounts of IO too big for local scratch
Most code will be compiled C code or Java and sometimes Python, R, or Perl. - These applications are not developed by us so workload and run time requirements are not controllable by re factoring code base, nor would we wish too. This means that job queue restrictions need to be flexible - You may not have these set-up in a life science friendly way. Additionally all JVM GATK jobs are refectory to checkpointing and can't easily be suspended to disk owing to JVM issues.
Some of the code in these applications makes use of AVX instructions and necessitates newer versions of GCC to compile and run.
Initially:
emacs (vim not acceptable as a text editor)
emacs speaks statistics
byobu (on login node)
pigz (parallel implementation of gzip)
Sun Java JDK 1.8 u151 (open JKD not usable for these applications)
GATK 3.8
GATK 4 Beta 5
Cromwell Workflow Management System release 29.
Picard tools 2.14
GCC 5.x, 6.x and 7.x (with access to libs and include for development) preferably installed as interchangeable modules.
GUN binutils 2.29.1
Borrows wheeler aligner bwa-0.7.16a (compiled with latest GCC 7.x or 6.x)
Perl 5.24.3, with modules: Perl DBI, JSON, Set::IntervalTree, PerlIO::gzip, ensembl-xs (all of which must be current release versions), all complied with at least GCC 6.x or higher.
R 3.4.2 (compiled with new GCC) (and available with and with out OpenBLAS math library - preferably as modules)
samtools, hstlib and bcftools 1.6 (all complied with new GCC)
Ensembl variant effect predictor (EVP) v90 or v91 (soon to be released) (note perl module Bio::DB::HTS
needs to be installed correctly - part of VEP installation procedure, this can be tricky)
STAR aligner 2.5
Manta Structural variant and indel caller
LUMPY structural variant caller (0.2.13) (requires cmake, python 2.7 + various other libs)
smbclient (from samba)
bedtools 2.26.0
gunplot
Likely more binaries will need compiling as project develops, these in turn may have other nested dependencies whilst I can compile them, various libraries may need to be added to the system at a later date so they are in the relevant include and lib paths, as these may also be needed at run time on worker nodes as not all binaries will be statically linked.