Skip to content

Nextflow Pipeline¶

Run gbcms at scale on HPC clusters with automatic parallelization.

Overview¶

The Nextflow workflow provides:

Automatic parallelization across samples
SLURM/HPC integration with resource management
Containerization with Docker/Singularity
Resume capability for failed runs

Pipeline Architecture¶

flowchart TD
    CSV([📄 samplesheet.csv]):::input --> Parse[Parse Samplesheet]
    MAF([📄 variants.maf]):::input --> FilterCheck{filter_by_sample
AND .maf input?}
    Parse --> FilterCheck

    FilterCheck -->|Yes| FilterMAF["FILTER_MAF
(per-sample MAF extraction)"]
    FilterCheck -->|No| Ready[All samples get full variants file]

    FilterMAF --> HasData{Variants found?}
    HasData -->|Yes| Ready2[Join filtered MAF with BAM]
    HasData -->|No| Skip([⚪ Skip sample]):::skip
    FilterMAF --> Summary["PIPELINE_SUMMARY
(aggregate filter stats)"]

    Ready --> Run["GBCMS_RUN
(per-sample counting)"]:::run
    Ready2 --> Run

    Run --> Output([📊 VCF/MAF output]):::output
    Summary --> SummaryOut([📋 pipeline_summary.tsv]):::output

    classDef input fill:#3498db,color:#fff,stroke:#2471a3,stroke-width:2px;
    classDef run fill:#27ae60,color:#fff,stroke:#1e8449,stroke-width:2px;
    classDef output fill:#9b59b6,color:#fff,stroke:#7d3c98,stroke-width:2px;
    classDef skip fill:#95a5a6,color:#fff,stroke:#7f8c8d,stroke-width:2px;

Use mouse to pan and zoom

Quick Start¶

nextflow run nextflow/main.nf \
    --input samplesheet.csv \
    --variants variants.vcf \
    --fasta reference.fa \
    -profile docker

Documentation¶

Page	Description
Samplesheet	Input CSV format
Parameters	All configuration options
Examples	Common usage patterns

CLI Reference — For processing few samples
Troubleshooting — Common issues