Skip to content

Nextflow Pipeline

Run Krewlyzer at scale with the Nextflow pipeline.

Quick Start

nextflow run msk-access/krewlyzer \
    --samplesheet samples.csv \
    --ref /path/to/hg19.fa \
    --outdir results/

Workflow Architecture

The pipeline uses a Nextflow-native parallel pattern:

flowchart TB
    BAM["sample.bam"] --> EXTRACT["KREWLYZER_EXTRACT"]
    EXTRACT --> BED["sample.bed.gz"]

    BED --> MOTIF["KREWLYZER_MOTIF"]
    BED --> FSC["KREWLYZER_FSC"]
    BED --> FSD["KREWLYZER_FSD"]
    BED --> WPS["KREWLYZER_WPS"]
    BED --> OCF["KREWLYZER_OCF"]
    BED --> ENTROPY["KREWLYZER_REGION_ENTROPY"]
    BED --> RMDS["KREWLYZER_REGION_MDS"]

    FSC --> FSR["KREWLYZER_FSR"]

    subgraph "Parallel Paths"
        METH_BAM["meth.bam"] --> UXM["KREWLYZER_UXM"]
        BAM2["BAM + MAF"] --> MFSD["KREWLYZER_MFSD"]
    end

Use mouse to pan and zoom

Documentation

Page	Description
Samplesheet	Input samplesheet format
Parameters	All pipeline parameters
Outputs	Output channels and files
Examples	Workflow examples

Features

Parallel processing - Process multiple samples simultaneously
Resume support - Resume failed runs
Container support - Docker/Singularity
Cloud ready - AWS, Google Cloud, Azure

Performance Benchmarks

Real-world performance from MSK-ACCESS v1/v2 duplex plasma samples:

Sample Type	Duration	CPU Usage	Peak Memory
Healthy control	2-5 min	90-140%	1.7-1.9 GB
ctDNA plasma	4-6 min	190-300%	2.8-3.2 GB

Tested Configuration

Docker with amd64 emulation on Apple Silicon
8 CPUs, 32 GB memory
Panel mode with --skip-pon and --duplex enabled

See Also

CLI Reference - Command-line usage
Panel Mode - MSK-ACCESS workflows