Skip to content

Installation

Quick Reference

Method Best For Includes Data
Docker/Singularity Production & HPC ✅ All bundled
Clone + Install Development ✅ Via Git LFS
pip + Data Clone Custom environments ⚠️ Requires env var

The easiest way to run Krewlyzer with all dependencies and data:

# Use a specific release tag (no :latest tag is published)
docker pull ghcr.io/msk-access/krewlyzer:0.5.3

Versioned Tags Only

We publish versioned tags only (e.g., :0.5.3). There is no :latest tag. Check releases for available versions.

Running with Docker

docker run --rm -v $PWD:/data ghcr.io/msk-access/krewlyzer:0.5.3 \
    run-all -i /data/sample.bam \
    --reference /data/hg19.fa \
    --output /data/results/ \
    --assay xs2

Volume Mounting

Use -v $PWD:/data to mount your current directory. All paths use the /data/ prefix. For Nextflow pipelines, volume mounting is automatic.


Option 2: Singularity/Apptainer (HPC)

For HPC clusters where Docker isn't available:

# Pull and convert to Singularity Image Format (SIF)
singularity pull krewlyzer.sif docker://ghcr.io/msk-access/krewlyzer:0.5.3

# Or using Apptainer (newer name for Singularity)
apptainer pull krewlyzer.sif docker://ghcr.io/msk-access/krewlyzer:0.5.3

Running with Singularity

singularity exec krewlyzer.sif krewlyzer run-all \
    -i /path/to/sample.bam \
    --reference /path/to/hg19.fa \
    --output /path/to/results/ \
    --assay xs2

HPC Bind Paths

Singularity auto-binds $HOME, /tmp, and $PWD. For other paths, use -B /scratch:/scratch.


Option 3: Clone Repository

Full installation with bundled data (for development or when Docker isn't available):

# Clone repository with LFS data
git clone https://github.com/msk-access/krewlyzer.git
cd krewlyzer
git lfs pull

# Install in development mode
pip install -e .

# Verify
krewlyzer --version

Why This Works Without Configuration

With pip install -e . (editable mode), Python runs code directly from the source directory. Asset paths resolve to src/krewlyzer/data/ where all LFS files exist.


Option 4: pip Install + Data Clone

For environments where you want PyPI code with external data:

Step 1: Install Package

pip install krewlyzer

Step 2: Clone Data Repository

# Shallow clone (faster, code not needed)
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull

Step 3: Configure Environment Variable

# Set for current session
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data

# Add to shell profile for persistence
echo 'export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data' >> ~/.bashrc

Required for pip Install

The KREWLYZER_DATA_DIR environment variable is required when installing via pip. Without it, asset auto-loading will fail. You can still use explicit paths like --pon-model.


Requirements

  • OS: Linux or macOS (tested on Ubuntu 20.04+, macOS 12+)
  • Python: 3.10+
  • RAM: ≥16GB recommended for large BAM files
  • Reference: Indexed FASTA file (hg19 or hg38)

Reference Genome Setup

Download and index the reference genome:

wget https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/hg19.fa.gz
gunzip hg19.fa.gz
samtools faidx hg19.fa
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz
samtools faidx hg38.fa

Bundled Data Files

Krewlyzer includes annotation files in src/krewlyzer/data/:

Directory Contents
ChromosomeBins/ 100kb genome bins (hg19, hg38)
ChromosomeArms/ Chromosome arm definitions
WpsAnchors/ WPS anchor regions (TSS + CTCF)
OpenChromatinRegion/ Tissue-specific OCR for OCF
MethMark/ Methylation markers for UXM
pon/ Panel of Normals models
TFBS/ GTRD meta-clusters for region entropy
ATAC/ TCGA ATAC peaks for region entropy

Troubleshooting

"Asset not found" or "PON not found"

If you installed via pip install krewlyzer, you need to set up the data directory:

git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data

"ModuleNotFoundError: krewlyzer._core"

The Rust extension failed to build. Ensure you have:

  • Python 3.10+
  • C compiler (gcc or clang)
  • Rust toolchain (install via rustup)

Reinstall with verbose output:

pip install krewlyzer -v

"htslib not found"

Install htslib development files:

sudo apt-get install libhts-dev
brew install htslib

Memory Errors

For large BAM files, increase available memory or process chromosomes separately:

krewlyzer extract sample.bam -r hg19.fa -o output/ --chromosomes chr1,chr2