Installation
Quick Reference
| Method | Best For | Includes Data |
|---|---|---|
| Docker/Singularity | Production & HPC | ✅ All bundled |
| Clone + Install | Development | ✅ Via Git LFS |
| pip + Data Clone | Custom environments | ⚠️ Requires env var |
Option 1: Docker (Recommended)
The easiest way to run Krewlyzer with all dependencies and data:
# Use a specific release tag (no :latest tag is published)
docker pull ghcr.io/msk-access/krewlyzer:0.5.3
Versioned Tags Only
We publish versioned tags only (e.g., :0.5.3). There is no :latest tag.
Check releases for available versions.
Running with Docker
docker run --rm -v $PWD:/data ghcr.io/msk-access/krewlyzer:0.5.3 \
run-all -i /data/sample.bam \
--reference /data/hg19.fa \
--output /data/results/ \
--assay xs2
Volume Mounting
Use -v $PWD:/data to mount your current directory. All paths use the /data/ prefix. For Nextflow pipelines, volume mounting is automatic.
Option 2: Singularity/Apptainer (HPC)
For HPC clusters where Docker isn't available:
# Pull and convert to Singularity Image Format (SIF)
singularity pull krewlyzer.sif docker://ghcr.io/msk-access/krewlyzer:0.5.3
# Or using Apptainer (newer name for Singularity)
apptainer pull krewlyzer.sif docker://ghcr.io/msk-access/krewlyzer:0.5.3
Running with Singularity
singularity exec krewlyzer.sif krewlyzer run-all \
-i /path/to/sample.bam \
--reference /path/to/hg19.fa \
--output /path/to/results/ \
--assay xs2
HPC Bind Paths
Singularity auto-binds $HOME, /tmp, and $PWD. For other paths, use -B /scratch:/scratch.
Option 3: Clone Repository
Full installation with bundled data (for development or when Docker isn't available):
# Clone repository with LFS data
git clone https://github.com/msk-access/krewlyzer.git
cd krewlyzer
git lfs pull
# Install in development mode
pip install -e .
# Verify
krewlyzer --version
Why This Works Without Configuration
With pip install -e . (editable mode), Python runs code directly from the source directory.
Asset paths resolve to src/krewlyzer/data/ where all LFS files exist.
Option 4: pip Install + Data Clone
For environments where you want PyPI code with external data:
Step 1: Install Package
Step 2: Clone Data Repository
# Shallow clone (faster, code not needed)
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
Step 3: Configure Environment Variable
# Set for current session
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data
# Add to shell profile for persistence
echo 'export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data' >> ~/.bashrc
Required for pip Install
The KREWLYZER_DATA_DIR environment variable is required when installing via pip.
Without it, asset auto-loading will fail. You can still use explicit paths like --pon-model.
Requirements
- OS: Linux or macOS (tested on Ubuntu 20.04+, macOS 12+)
- Python: 3.10+
- RAM: ≥16GB recommended for large BAM files
- Reference: Indexed FASTA file (hg19 or hg38)
Reference Genome Setup
Download and index the reference genome:
Bundled Data Files
Krewlyzer includes annotation files in src/krewlyzer/data/:
| Directory | Contents |
|---|---|
ChromosomeBins/ |
100kb genome bins (hg19, hg38) |
ChromosomeArms/ |
Chromosome arm definitions |
WpsAnchors/ |
WPS anchor regions (TSS + CTCF) |
OpenChromatinRegion/ |
Tissue-specific OCR for OCF |
MethMark/ |
Methylation markers for UXM |
pon/ |
Panel of Normals models |
TFBS/ |
GTRD meta-clusters for region entropy |
ATAC/ |
TCGA ATAC peaks for region entropy |
Troubleshooting
"Asset not found" or "PON not found"
If you installed via pip install krewlyzer, you need to set up the data directory:
git clone --depth 1 https://github.com/msk-access/krewlyzer.git ~/.krewlyzer-data
cd ~/.krewlyzer-data && git lfs pull
export KREWLYZER_DATA_DIR=~/.krewlyzer-data/src/krewlyzer/data
"ModuleNotFoundError: krewlyzer._core"
The Rust extension failed to build. Ensure you have:
- Python 3.10+
- C compiler (
gccorclang) - Rust toolchain (install via rustup)
Reinstall with verbose output:
"htslib not found"
Install htslib development files:
Memory Errors
For large BAM files, increase available memory or process chromosomes separately: