Skip to content

Nextflow Pipeline

Run Krewlyzer at scale with the Nextflow pipeline.

Quick Start

nextflow run msk-access/krewlyzer \
    --samplesheet samples.csv \
    --ref /path/to/hg19.fa \
    --outdir results/

Workflow Architecture

The pipeline uses a Nextflow-native parallel pattern:

flowchart TB
    BAM["sample.bam"] --> EXTRACT["KREWLYZER_EXTRACT"]
    EXTRACT --> BED["sample.bed.gz"]

    BED --> MOTIF["KREWLYZER_MOTIF"]
    BED --> FSC["KREWLYZER_FSC"]
    BED --> FSD["KREWLYZER_FSD"]
    BED --> WPS["KREWLYZER_WPS"]
    BED --> OCF["KREWLYZER_OCF"]
    BED --> ENTROPY["KREWLYZER_REGION_ENTROPY"]
    BED --> RMDS["KREWLYZER_REGION_MDS"]

    FSC --> FSR["KREWLYZER_FSR"]

    subgraph "Parallel Paths"
        METH_BAM["meth.bam"] --> UXM["KREWLYZER_UXM"]
        BAM2["BAM + MAF"] --> MFSD["KREWLYZER_MFSD"]
    end
Use mouse to pan and zoom

Documentation

Page Description
Samplesheet Input samplesheet format
Parameters All pipeline parameters
Outputs Output channels and files
Examples Workflow examples

Features

  • Parallel processing - Process multiple samples simultaneously
  • Resume support - Resume failed runs
  • Container support - Docker/Singularity
  • Cloud ready - AWS, Google Cloud, Azure

Performance Benchmarks

Real-world performance from MSK-ACCESS v1/v2 duplex plasma samples:

Sample Type Duration CPU Usage Peak Memory
Healthy control 2-5 min 90-140% 1.7-1.9 GB
ctDNA plasma 4-6 min 190-300% 2.8-3.2 GB

Tested Configuration

  • Docker with amd64 emulation on Apple Silicon
  • 8 CPUs, 32 GB memory
  • Panel mode with --skip-pon and --duplex enabled

See Also