====== ldsc ====== This software and user guide is still under development. It is not yet available on Comet. > ldsc is a command line tool for estimating heritability and genetic correlation from GWAS summary statistics. ldsc also computes LD Scores. The original version of **ldsc** was available from https://github.com/bulik/ldsc, this __no longer works__ on any modern versions of Python. **Do not** attempt to use the older package. The updated version of **ldsc** for Python 3.9+ is based on https://github.com/CBIIT/ldsc, but again, has not been updated very often. ---- ===== Running ldsc on Comet ===== The **ldsc** tool is closely tied to older versions of Python (at least 3.9, but less than 3.12) and also requires a number of Python modules (numpy, scipy, matplotlib, etc). Rather than ask all users to install these custom versions, we have installed ''ldsc.py'' on Comet as a tiny, custom **Apptainer** //container image//. The **ldsc** container is stored in the ''/nobackup/shared/containers'' directory and is accessible to __all__ users of Comet. You do //not// need to take a copy of the container file; it should be left in its original location. You can find the container files here: * ''/nobackup/shared/containers/ldsc.2026.03.sif'' We //normally// recommend using the latest version of the container, in the case of Amber, the version numbers represent the date the container image was created (normally with the //current// version of the ldsc tool from Github at that time). **Container Image Versions** We //may// reference a specific container file, such as **ldsc.2026.03**, but you should always check whether this is the most recent version of the container available. Simply ''ls'' the ''/nobackup/shared/containers'' directory and you will be able to see if there are any newer versions listed. We have provided a convenience script that will automate **all** of steps needed to run applications inside the container, as well as access your ''$HOME'', ''/scratch'' and ''/nobackup'' directories to just two simple commands. * ''/nobackup/shared/containers/ldsc.2026.03.sh'' There is a corresponding ''.sh'' script for //each version// of the container image we make available. Just ''source'' this file and it will take care of loading ''apptainer'', setting up your ''bind'' directories and calling the ''exec'' command for you - and give you a single command called ''container.run'' (instead of the really long //apptainer exec// command) to then //run// anything you want inside the container. ---- ===== Simple use of ldsc ===== All of the **ldsc** commands are installed in the ''$PATH'' and can be called from inside the container without giving their full path or prefixing them with ''python''. The following **ldsc** commands are available: * ''ldsc'' * ''munge_sumstats'' * ''make_annot'' As an example, to run ''ldsc'', simply call it with the ''container.run'' helper as follows: $ source /nobackup/shared/containers/ldsc.2026.03.sh $ container.run ldsc -h usage: ldsc.py [-h] [--out OUT] [--bfile BFILE] [--l2] [--extract EXTRACT] [--keep KEEP] [--ld-wind-snps LD_WIND_SNPS] [--ld-wind-kb LD_WIND_KB] [--ld-wind-cm LD_WIND_CM] [--print-snps PRINT_SNPS] [--annot ANNOT] [--thin-annot] [--cts-bin CTS_BIN] [--cts-breaks CTS_BREAKS] [--cts-names CTS_NAMES] [--per-allele] [--pq-exp PQ_EXP] [--no-print-annot] [--maf MAF] [--h2 H2] [--h2-cts H2_CTS] [--rg RG] [--ref-ld REF_LD] [--ref-ld-chr REF_LD_CHR] [--w-ld W_LD] [--w-ld-chr W_LD_CHR] [--overlap-annot] [--print-coefficients] [--frqfile FRQFILE] [--frqfile-chr FRQFILE_CHR] [--no-intercept] [--intercept-h2 INTERCEPT_H2] [--intercept-gencov INTERCEPT_GENCOV] [--M M] [--two-step TWO_STEP] [--chisq-max CHISQ_MAX] [--ref-ld-chr-cts REF_LD_CHR_CTS] [--print-all-cts] [--print-cov] [--print-delete-vals] [--chunk-size CHUNK_SIZE] [--pickle] [--yes-really] [--invert-anyway] [--n-blocks N_BLOCKS] [--not-M-5-50] [--return-silly-things] [--no-check-alleles] [--samp-prev SAMP_PREV] [--pop-prev POP_PREV] options: -h, --help show this help message and exit ... $ ---- ===== Data used by ldsc ===== The sample data files used in the [[https://github.com/CBIIT/ldsc|Basic Useage Example]] from the developers Github page have already been downloaded and are available in the shared data area of the ''/nobackup'' filesystem on Comet: * ''/nobackup/shared/data/ldsc'' * ''/nobackup/shared/data/ldsc/eas_ldscores/'' * ''/nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22_sumstats.gz'' If you have a data set used in **ldsc** that would be useful to others, please [[:contact:index|Contact us]] and we will arrange to have it moved to the shared data area, where you can continue to access it, share it with others, and have it excluded from the [[:policies:data|Data retention]] policies. ---- ===== Running the ldsc example ===== You can follow the [[https://github.com/CBIIT/ldsc|Basic Usage Example]] from the **ldsc** Github page, as follows: $ source /nobackup/shared/containers/ldsc.2026.03.sh $ container.run ldsc \ --h2 /nobackup/shared/data/ldsc/sumstats/BBK_HDLC22.sumstats.gz \ --ref-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ \ --w-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ You should see the following output: ********************************************************************* * LD Score Regression (LDSC) * Version 3.0.1 * (C) 2014-2019 Brendan Bulik-Sullivan and Hilary Finucane * Broad Institute of MIT and Harvard / MIT Department of Mathematics * GNU General Public License v3 ********************************************************************* Call: ./ldsc.py \ --h2 /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz \ --ref-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ \ --w-ld-chr /nobackup/shared/data/ldsc/eas_ldscores/ Beginning analysis at Wed Mar 25 12:00:59 2026 Reading summary statistics from /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz ... RuntimeWarning: compression has no effect when passing a non-binary object as input for file /nobackup/shared/data/ldsc/sumstats/BBJ_HDLC22.sumstats.gz Read summary statistics for 61663 SNPs. Reading reference panel LD Score from /nobackup/shared/data/ldsc/eas_ldscores/[1-22] ... (ldscore_fromlist) Read reference panel LD Scores for 1208050 SNPs. Removing partitioned LD Scores with zero variance. Reading regression weight LD Score from /nobackup/shared/data/ldsc/eas_ldscores/[1-22] ... (ldscore_fromlist) Read regression weight LD Scores for 1208050 SNPs. After merging with reference panel LD, 14193 SNPs remain. After merging with regression SNP LD, 14193 SNPs remain. WARNING: number of SNPs less than 200k; this is almost always bad. Using two-step estimator with cutoff at 30. Total Observed scale h2: 0.2747 (0.1014) Lambda GC: 1.2103 Mean Chi^2: 1.2826 Intercept: 0.9211 (0.051) Ratio < 0 (usually indicates GC correction). Analysis finished at Wed Mar 25 12:01:01 2026 Total time elapsed: 2.12s **Note the following:** * We used the ''/nobackup/shared/data/ldsc'' prefix for the //already downloaded and processed data sets// used in the example. * The directories given to the ''--ref-ld-chr'' and ''--w-ld-chr'' arguments **must** have the __trailing slash__ (''/'') added; **ldsc** has a bug when missing that final slash. * If not explicitly set, the default will be to output the **ldsc** log/results file to your ''$HOME''. ----- ===== Accessing Data ===== As long as you use the ''container.run'' method to launch the applications, you will automatically be able to read and write to files in your ''$HOME'', ''/scratch'' and ''/nobackup'' directories. This means that you can refer to the downloaded data sets under ''/nobackup/shared/data/ldsc'' if necessary. If you run any of the applications inside the container manually, //without// using the ''container.run'' helper you will need to use the ''--bind'' argument to ''apptainer'' to ensure that all relevant directories are exposed within the container. Do remember that the container filesystem itself cannot be changed - so you won't be able to write or update to ''/usr/local'', ''/opt'', ''/etc'' or any other internal folders - keep output directories restricted to the three areas listed above. ---- ===== Building ldsc for Comet ===== **Important!** This section is only for RSE HPC admin staff, or users who wish to understand how the ldsc container was built. If you are only interested in //using// ldsc, stop reading now. **Build script** #!/bin/bash IMAGE_DATE=`date +%Y.%m` echo "Loading modules..." module load apptainer echo "" echo "Building container..." export APPTAINER_TMPDIR=/scratch echo "" echo "Container will have date suffix $IMAGE_DATE" SOURCE_DIR=`pwd` apptainer build --bind $SOURCE_DIR:/mnt ldsc.$IMAGE_DATE.sif ldsc.def 2>&1 | tee ldsc.log **Container definition** Bootstrap: docker From: ubuntu:jammy #################################################################### # # ldsc Container # ================== # This is a runtime environment for ldsc: https://github.com/cbiit/ldsc # Please see: # https://hpc.researchcomputing.ncl.ac.uk/dokuwiki/dokuwiki/doku.php?id=advanced:software:ldsc # #################################################################### %post # Prevent interactive prompts export DEBIAN_FRONTEND=noninteractive #################################################################### # # Basic system packages # #################################################################### # Update & install only necessary packages apt-get update apt-get install -y aptitude wget unzip python3 python3-pip ln -sf /usr/bin/python3 /usr/bin/python # Clean up APT cache to save space apt-get clean # Any Python modules installed via pip go here # pip install NAME --break-system-packages # Remove any Python cache files after pip pip cache purge ################################################################################# # # This is all the custom stuff needed to build the various ISSM tools # ################################################################################# # Src and opt mkdir -p /src/zipped mkdir -p /opt/bin mkdir -p /opt/data echo "" echo "INSTALL LDSC" echo "============" echo "" cd /src wget -q https://github.com/CBIIT/ldsc/archive/refs/heads/main.zip -O /src/zipped/ldsc.zip cd /opt unzip /src/zipped/ldsc.zip mv ldsc-main ldsc cd ldsc echo "" echo "INSTALL PYTHON MODULES" echo "======================" echo "" # Patch numpy version cp requirements.txt requirements.txt.old cat requirements.txt.old | grep -v ^numpy > requirements.txt echo "numpy==1.22.4" >> requirements.txt echo "matplotlib" >> requirements.txt # Install requirements pip install -r requirements.txt # Remove anything not needed to run rm -f dockerfile environment* setup.py requirements.txt echo "" echo "DOWNLOAD REFERENCE DATA" echo "=======================" # Download reference data wget -q https://ldlink.nih.gov/LDlinkRestWeb/copy_and_download/BBJ_HDLC22.txt -O /src/zipped/BBJ_HDLC22.txt python munge_sumstats.py --sumstats /src/zipped/BBJ_HDLC22.txt --out /opt/data/BBJ_HDLC22 rm -f /opt/data/BBJ_HDLC22.log wget -q "https://drive.usercontent.google.com/u/0/uc?id=1BtpWx02ON33KfjyCFSdmoWYlMZWImh2f&export=download" -O /src/zipped/eas_ldscores.tar.bz2 cd /opt/data tar -jxf /src/zipped/eas_ldscores.tar.bz2 # Remove all src packages echo "" echo "FINAL CLEAN UP" echo "==============" echo "" cd / rm -rf /src pip cache purge %environment export PATH=/opt/ldsc:$PATH %runscript **Helper script** ---- [[:advanced:software|Back to software]]