Table of Contents

CNVKit

A command-line toolkit and Python library for detecting copy number variants and alterations genome-wide from high-throughput sequencing.

CNVKit on Comet

We have installed CNVKit using the published Docker container: https://hub.docker.com/r/etal/cnvkit/. CNVKit is not installed as a module. The Docker container is converted to the easier-to-use Apptainer format for local use on Comet. Please read the linked Apptainer guide for more general information on how this works … this document is not intended to be a comprehensive guide to Apptainer tools.

The CNVKit container is stored in the /nobackup/shared/containers directory and is accessible to all users of Comet. You do not need to take a copy of the container file; it should be left in its original location.

You can find the container file here:

Container Image Versions

We may reference a specific container file, such as cnvkit.0.9.13.sif, but you should always check whether this is the most recent version of the container available. Simply ls the /nobackup/shared/containers directory and you will be able to see if there are any newer versions listed.


Running CNVKit on Comet

We have provided a convenience script that will automate all of the steps necessary for launching CNVKit inside the container, down to just two simple commands.

There is a corresponding .sh script for each version of the container image we make available.

Just source this file and it will take care of loading apptainer, setting up your bind directories (thus allowing the container to read/write to your $HOME, the /scratch and /nobackup directories) and calling the exec command for you - and give you a single command called container.run (instead of a really long apptainer exec command) to then run anything you want inside the container, for example - to see the help page for the cnvkit.py batch command:

To run cnvkit.py batch from inside the container, just run:

$ source /nobackup/shared/containers/cnvkit.0.9.13.sh
$ container.run cnvkit.py batch -h

You should see the options list for the batch command:

$ source /nobackup/shared/containers/cnvkit.0.9.13.sh
$ container.run cnvkit.py batch -h
usage: cnvkit.py batch [-h] [-m {hybrid,amplicon,wgs}]
                       [--segment-method {cbs,flasso,haar,none,hmm,hmm-tumor,hmm-germline}] [-y] [-c]
                       [--drop-low-coverage] [-p [PROCESSES]] [-q MIN_MAPQ] [--rscript-path PATH]
                       [--diploid-parx-genome DIPLOID_PARX_GENOME] [-n [FILES ...]] [-f FILENAME]
                       [-t FILENAME] [-a FILENAME] [--annotate FILENAME] [--short-names]
                       [--target-avg-size TARGET_AVG_SIZE] [-g FILENAME]
                       [--antitarget-avg-size ANTITARGET_AVG_SIZE]
                       [--antitarget-min-size ANTITARGET_MIN_SIZE] [--output-reference FILENAME]
                       [--cluster] [-r REFERENCE] [-d DIRECTORY] [--scatter] [--diagram]
                       [sample_fnames ...]

positional arguments:
  sample_fnames         Mapped sequence reads (.bam) or pre-computed per-base depth (bedGraph .bed.gz with
                        tabix index .tbi or .csi)

options:
  -h, --help            show this help message and exit
...
...
$

You can continue to use the container.run command as many times as you need in the same script or same bash session:

$ source /nobackup/shared/containers/fsl.6.0.7.19.sh
$ container.run cnvkit.py batch
$ container.run cnvkit.py guess_baits.py
$ container.run cnvkit.py cnv_annotate.py

We strongly recommend that you use this helper script and the container.run command to run software from inside the container as it will always ensure that you have correctly set up the bind directories and you are using the correct container version.

Using this command you can embed calls to CNVKit in your Slurm sbatch jobs and existing compute pipelines and use it as if it were a locally installed application.


Accessing Data

As long as you use the container.run method to launch CNVKit applications, you will automatically be able to read and read to files in your $HOME, /scratch and /nobackup directories.

If you run CNVKit manually, without using the container.run helper you will need to use the –bind argument to apptainer to ensure that all relevant directories are exposed within the container.

Do remember that the container filesystem itself cannot be changed - so you won't be able to write or update to /usr/local, /opt, /etc or any other internal folders - keep output directories restricted to the three areas listed above.


Building CNVKit on Comet

Important

This section is only relevant to RSE HPC staff, or users who want to know how we provisioned CNVKit on Comet. If all you are interested in is using CNVKit, then stop reading now.

Build Commands

$ module load apptainer
$ apptainer pull docker://etal/cnvkit:0.9.13
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
INFO:    Fetching OCI image...
0.0b / 160.2MiB [----------------------------------------------------------------------------] 0 % 0.0 b/s 0s
160.2MiB / 160.2MiB [======================================================================] 100 % 0.0 b/s 0s
61.9MiB / 61.9MiB [========================================================================] 100 % 0.0 b/s 0s
28.4MiB / 28.4MiB [========================================================================] 100 % 0.0 b/s 0s
521.2KiB / 521.2KiB [======================================================================] 100 % 0.0 b/s 0s
2.0GiB / 2.0GiB [==========================================================================] 100 % 0.0 b/s 0s
INFO:    Extracting OCI image...
INFO:    Inserting Apptainer configuration...
INFO:    Creating SIF file...
INFO:    To see mksquashfs output with progress bar enable verbose logging
$
$ ls -sh cnvkit.0.9.13.sif 
1.3G cnvkit.0.9.13.sif
$

Container Definition

There is no container definition file - the original Docker image was simply converted to Apptainer image format automatically at the time it was pulled from the Docker container registry.

Helper Script

#!/bin/bash

module load apptainer

IMAGE_NAME=/nobackup/shared/containers/cnvkit.0.9.13.sif

container.run() {
	# Run a command inside the container...
	# automatically bind the /scratch and /nobackup dirs
	# pass through any additional parameters given on the command line
	apptainer exec --bind /scratch:/scratch --bind /nobackup:/nobackup ${IMAGE_NAME} $@
}


Back to software