Newcastle University HPC Portal

Our Research Projects

Risks, ESG and Firm Resilience

This is a project which is currently making use of HPC facilities at Newcastle University. It is active.

Project Contacts

For further information about this project, please contact:

Dr Erwei Xiang (david.xiang@newcastle.ac.uk)
Dr Periklis Boumparis (periklis.boumparis@newcastle.ac.uk)

Project Description

This project uses HPC resources for large-scale text corpora processing and analysis for research in accounting and finance. The corpus contains tens of thousands of transcripts spanning multiple years, and the study requires sentence-level processing, resulting in several million sentence records. The work involves building and repeatedly re-running an NLP pipeline (parsing/cleaning, sentence segmentation, tokenisation, and feature extraction) as preprocessing settings and model specifications are iteratively refined. HPC is needed because end-to-end reprocessing at this scale is computationally and memory intensive, running the full workflow on a personal computer is prohibitively slow and constrains other research tasks. Access to HPC will enable faster, more reliable processing and iteration, supporting reproducible text-based measures for downstream empirical analyses and research outputs.

Software or Compute Methods

This Python-based workflow utilizes Stanford CoreNLP and spaCy/NLTK for large-scale text processing and word embedding extraction. GPUs and high-memory CPU nodes are required for Transformer-based inference and memory-intensive parsing of millions of records. HPC enables the parallelization of these compute-heavy tasks, ensuring efficient and reproducible model iteration at scale.