Our Research Projects

Genome Analysis of Streptomyces

This is a project which is currently making use of HPC facilities at Newcastle University. It is active.

Project Contacts

For further information about this project, please contact:


Project Description

Genome sequence analysis and bioinformatics for comparative genomics of Illumina-based bacterial genomes involves applying computational tools and analytical methods to explore the genetic diversity, evolutionary relationships and functional differences between bacterial isolates. Starting from Illumina short-read sequencing data, workflows typically include quality control and trimming, genome assembly or reference-based mapping, annotation of coding and non-coding features, and identification of genomic variation such as SNPs, indels and gene-presence/absence patterns. Comparative genomics then integrates these outputs to investigate core and accessory genome structure, detect mobile genetic elements, quantify population structure, and reconstruct phylogenetic relationships. These analyses support a wide range of biological and applied questions, including outbreak investigation, antimicrobial-resistance surveillance, evolutionary studies, and the discovery of traits associated with virulence, host adaptation or environmental persistence.


Software or Compute Methods

Prokka is a rapid prokaryotic genome-annotation pipeline that identifies coding sequences, rRNAs, tRNAs and functional annotations using curated databases. It is widely used to annotate bacterial and archaeal genomes and generate GFF/GBK files for downstream analysis.



Roary is a high-speed pangenome analysis tool that takes annotated genomes—typically Prokka outputs—and clusters their genes into core, accessory and unique groups, producing core-genome alignments for comparative genomics and phylogenetics.



RAxML (Randomized Axelerated Maximum Likelihood) is a computational tool for constructing phylogenetic trees using maximum likelihood. It provides a range of nucleotide and amino-acid substitution models and efficient bootstrapping, and is well suited for large alignments requiring high accuracy.



ARIBA detects antimicrobial-resistance genes and other genetic targets directly from sequencing reads. It maps reads to curated reference databases and performs local assemblies where needed, enabling rapid AMR profiling, virulence detection and sequence typing.



FastTree constructs approximately maximum-likelihood phylogenetic trees for very large alignments. It trades a small amount of accuracy for substantial speed improvements, making it useful when analysing many genomes or large pangenome alignments.



SPAdes is a genome assembler developed primarily for microbial genomes. It uses multiple k-mer sizes and advanced heuristics to generate high-quality assemblies from short-read data or hybrid read sets that include long reads.



Snippy is a bacterial variant-calling pipeline that maps sequencing reads to a reference genome, calls high-quality SNPs and small indels, and produces a core-SNP alignment suitable for phylogenetic and outbreak analysis.