====== ldsc ======
This software and user guide is still under development. It is not yet available on Comet.
> ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores.
The original version of **ldsc** was available from https://github.com/bulik/ldsc, this __no longer works__ on any modern versions of Python. **Do not** attempt to use the older package.
The updated version of **ldsc** for Python 3.9+ is based on https://github.com/CBIIT/ldsc, but again, has not been updated very often.
----
===== Running ldsc on Comet =====
The **ldsc** tool is closely tied to older versions of Python (at least 3.9, but less than 3.12) and also requires a number of Python modules (numpy, scipy, matplotlib, etc). Rather than ask all users to install these custom versions, we have installed ''ldsc.py'' on Comet as a tiny, custom **Apptainer** //container image//.
The **ldsc** container is stored in the ''/nobackup/shared/containers'' directory and is accessible to __all__ users of Comet. You do //not// need to take a copy of the container file; it should be left in its original location.
You can find the container files here:
* ''/nobackup/shared/containers/ldsc.2026.03.sif''
We //normally// recommend using the latest version of the container, in the case of Amber, the version numbers represent the date the container image was created (normally with the //current// version of the ldsc tool from Github at that time).
**Container Image Versions**
We //may// reference a specific container file, such as **ldsc.2026.03**, but you should always check whether this is the most recent version of the container available. Simply ''ls'' the ''/nobackup/shared/containers'' directory and you will be able to see if there are any newer versions listed.
We have provided a convenience script that will automate **all** of steps needed to run applications inside the container, as well as access your ''$HOME'', ''/scratch'' and ''/nobackup'' directories to just two simple commands.
* ''/nobackup/shared/containers/ldsc.2026.03.sh''
There is a corresponding ''.sh'' script for //each version// of the container image we make available.
Just ''source'' this file and it will take care of loading ''apptainer'', setting up your ''bind'' directories and calling the ''exec'' command for you - and give you a single command called ''container.run'' (instead of the really long //apptainer exec// command) to then //run// anything you want inside the container.
----
===== Simple use of ldsc =====
All of the **ldsc** commands are installed in the ''$PATH'' and can be called from inside the container without giving their full path or prefixing them with ''python''.
The following **ldsc** commands are available:
* ''ldsc''
* ''munge_sumstats''
* ''make_annot''
As an example, to run ''ldsc'', simply call it with the ''container.run'' helper as follows:
$ source /nobackup/shared/containers/ldsc.2026.03.sh
$ container.run ldsc -h
usage: ldsc.py [-h] [--out OUT] [--bfile BFILE] [--l2] [--extract EXTRACT] [--keep KEEP] [--ld-wind-snps LD_WIND_SNPS] [--ld-wind-kb LD_WIND_KB]
[--ld-wind-cm LD_WIND_CM] [--print-snps PRINT_SNPS] [--annot ANNOT] [--thin-annot] [--cts-bin CTS_BIN] [--cts-breaks CTS_BREAKS]
[--cts-names CTS_NAMES] [--per-allele] [--pq-exp PQ_EXP] [--no-print-annot] [--maf MAF] [--h2 H2] [--h2-cts H2_CTS] [--rg RG]
[--ref-ld REF_LD] [--ref-ld-chr REF_LD_CHR] [--w-ld W_LD] [--w-ld-chr W_LD_CHR] [--overlap-annot] [--print-coefficients]
[--frqfile FRQFILE] [--frqfile-chr FRQFILE_CHR] [--no-intercept] [--intercept-h2 INTERCEPT_H2] [--intercept-gencov INTERCEPT_GENCOV]
[--M M] [--two-step TWO_STEP] [--chisq-max CHISQ_MAX] [--ref-ld-chr-cts REF_LD_CHR_CTS] [--print-all-cts] [--print-cov]
[--print-delete-vals] [--chunk-size CHUNK_SIZE] [--pickle] [--yes-really] [--invert-anyway] [--n-blocks N_BLOCKS] [--not-M-5-50]
[--return-silly-things] [--no-check-alleles] [--samp-prev SAMP_PREV] [--pop-prev POP_PREV]
options:
-h, --help show this help message and exit
...
$
----
===== Data used by ldsc =====
The sample data files used in the [[https://github.com/CBIIT/ldsc|Basic Useage Example]] from the developers Github page have already been downloaded and are available in the shared data area of the ''/nobackup'' filesystem on Comet:
* ''/nobackup/shared/data/ldsc''
* ''/nobackup/shared/data/ldsc/eas_ldscores/''
* ''/nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22_sumstats.gz''
If you have a data set used in **ldsc** that would be useful to others, please [[:contact:index|Contact us]] and we will arrange to have it moved to the shared data area, where you can continue to access it, share it with others, and have it excluded from the [[:policies:data|Data retention]] policies.
----
===== Running the ldsc example =====
You can follow the [[https://github.com/CBIIT/ldsc|Basic Usage Example]] from the **ldsc** Github page, as follows:
$ source /nobackup/shared/containers/ldsc.2026.03.sh
$ container.run ldsc \
--h2 /nobackup/shared/data/ldsc/sumstats/BBK_HDLC22.sumstats.gz \
--ref-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ \
--w-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/
You should see the following output:
*********************************************************************
* LD Score Regression (LDSC)
* Version 3.0.1
* (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane
* Broad Institute of MIT and Harvard / MIT Department of Mathematics
* GNU General Public License v3
*********************************************************************
Call:
./ldsc.py \
--h2 /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz \
--ref-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ \
--w-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/
Beginning analysis at Wed Mar 25 12:00:59 2026
Reading summary statistics from /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz ...
RuntimeWarning: compression has no effect when passing a non-binary object as input for file /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz
Read summary statistics for 61663 SNPs.
Reading reference panel LD Score from /nobackup/shared/data/ldsc/eas_ldscores/[1-22] ... (ldscore_fromlist)
Read reference panel LD Scores for 1208050 SNPs.
Removing partitioned LD Scores with zero variance.
Reading regression weight LD Score from /nobackup/shared/data/ldsc/eas_ldscores/[1-22] ... (ldscore_fromlist)
Read regression weight LD Scores for 1208050 SNPs.
After merging with reference panel LD, 14193 SNPs remain.
After merging with regression SNP LD, 14193 SNPs remain.
WARNING: number of SNPs less than 200k; this is almost always bad.
Using two-step estimator with cutoff at 30.
Total Observed scale h2: 0.2747 (0.1014)
Lambda GC: 1.2103
Mean Chi^2: 1.2826
Intercept: 0.9211 (0.051)
Ratio < 0 (usually indicates GC correction).
Analysis finished at Wed Mar 25 12:01:01 2026
Total time elapsed: 2.12s
**Note the following:**
* We used the ''/nobackup/shared/data/ldsc'' prefix for the //already downloaded and processed data sets// used in the example.
* The directories given to the ''--ref-ld-chr'' and ''--w-ld-chr'' arguments **must** have the __trailing slash__ (''/'') added; **ldsc** has a bug when missing that final slash.
* If not explicitly set, the default will be to output the **ldsc** log/results file to your ''$HOME''.
-----
===== Accessing Data =====
As long as you use the ''container.run'' method to launch the applications, you will automatically be able to read and write to files in your ''$HOME'', ''/scratch'' and ''/nobackup'' directories. This means that you can refer to the downloaded data sets under ''/nobackup/shared/data/ldsc'' if necessary.
If you run any of the applications inside the container manually, //without// using the ''container.run'' helper you will need to use the ''--bind'' argument to ''apptainer'' to ensure that all relevant directories are exposed within the container.
Do remember that the container filesystem itself cannot be changed - so you won't be able to write or update to ''/usr/local'', ''/opt'', ''/etc'' or any other internal folders - keep output directories restricted to the three areas listed above.
----
===== Building ldsc for Comet =====
**Important!**
This section is only for RSE HPC admin staff, or users who wish to understand how the ldsc container was built. If you are only interested in //using// ldsc, stop reading now.
**Build script**
#!/bin/bash
IMAGE_DATE=`date +%Y.%m`
echo "Loading modules..."
module load apptainer
echo ""
echo "Building container..."
export APPTAINER_TMPDIR=/scratch
echo ""
echo "Container will have date suffix $IMAGE_DATE"
SOURCE_DIR=`pwd`
apptainer build --bind $SOURCE_DIR:/mnt ldsc.$IMAGE_DATE.sif ldsc.def 2>&1 | tee ldsc.log
**Container definition**
Bootstrap: docker
From: ubuntu:jammy
####################################################################
#
# ldsc Container
# ==================
# This is a runtime environment for ldsc: https://github.com/cbiit/ldsc
# Please see:
# https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/dokuwiki/doku.php?id=advanced:software:ldsc
#
####################################################################
%post
# Prevent interactive prompts
export DEBIAN_FRONTEND=noninteractive
####################################################################
#
# Basic system packages
#
####################################################################
# Update & install only necessary packages
apt-get update
apt-get install -y aptitude wget unzip python3 python3-pip
ln -sf /usr/bin/python3 /usr/bin/python
# Clean up APT cache to save space
apt-get clean
# Any Python modules installed via pip go here
# pip install NAME --break-system-packages
# Remove any Python cache files after pip
pip cache purge
#################################################################################
#
# This is all the custom stuff needed to build the various ISSM tools
#
#################################################################################
# Src and opt
mkdir -p /src/zipped
mkdir -p /opt/bin
mkdir -p /opt/data
echo ""
echo "INSTALL LDSC"
echo "============"
echo ""
cd /src
wget -q https://github.com/CBIIT/ldsc/archive/refs/heads/main.zip -O /src/zipped/ldsc.zip
cd /opt
unzip /src/zipped/ldsc.zip
mv ldsc-main ldsc
cd ldsc
echo ""
echo "INSTALL PYTHON MODULES"
echo "======================"
echo ""
# Patch numpy version
cp requirements.txt requirements.txt.old
cat requirements.txt.old | grep -v ^numpy > requirements.txt
echo "numpy==1.22.4" >> requirements.txt
echo "matplotlib" >> requirements.txt
# Install requirements
pip install -r requirements.txt
# Remove anything not needed to run
rm -f dockerfile environment* setup.py requirements.txt
echo ""
echo "DOWNLOAD REFERENCE DATA"
echo "======================="
# Download reference data
wget -q https://ldlink.nih.gov/LDlinkRestWeb/copy_and_download/BBJ_HDLC22.txt -O /src/zipped/BBJ_HDLC22.txt
python munge_sumstats.py --sumstats /src/zipped/BBJ_HDLC22.txt --out /opt/data/BBJ_HDLC22
rm -f /opt/data/BBJ_HDLC22.log
wget -q "https://drive.usercontent.google.com/u/0/uc?id=1BtpWx02ON33KfjyCFSdmoWYlMZWImh2f&export=download" -O /src/zipped/eas_ldscores.tar.bz2
cd /opt/data
tar -jxf /src/zipped/eas_ldscores.tar.bz2
# Remove all src packages
echo ""
echo "FINAL CLEAN UP"
echo "=============="
echo ""
cd /
rm -rf /src
pip cache purge
%environment
export PATH=/opt/ldsc:$PATH
%runscript
**Helper script**
----
[[:advanced:software|Back to software]]