Nextflow Pipeline¶
Run gbcms at scale on HPC clusters with automatic parallelization.
Overview¶
The Nextflow workflow provides:
- Automatic parallelization across samples
- SLURM/HPC integration with resource management
- Containerization with Docker/Singularity
- Resume capability for failed runs
Pipeline Architecture¶
flowchart TD
CSV([📄 samplesheet.csv]):::input --> Parse[Parse Samplesheet]
MAF([📄 variants.maf]):::input --> FilterCheck{filter_by_sample
AND .maf input?}
Parse --> FilterCheck
FilterCheck -->|Yes| FilterMAF["FILTER_MAF
(per-sample MAF extraction)"]
FilterCheck -->|No| Ready[All samples get full variants file]
FilterMAF --> HasData{Variants found?}
HasData -->|Yes| Ready2[Join filtered MAF with BAM]
HasData -->|No| Skip([⚪ Skip sample]):::skip
FilterMAF --> Summary["PIPELINE_SUMMARY
(aggregate filter stats)"]
Ready --> Run["GBCMS_RUN
(per-sample counting)"]:::run
Ready2 --> Run
Run --> Output([📊 VCF/MAF output]):::output
Summary --> SummaryOut([📋 pipeline_summary.tsv]):::output
classDef input fill:#3498db,color:#fff,stroke:#2471a3,stroke-width:2px;
classDef run fill:#27ae60,color:#fff,stroke:#1e8449,stroke-width:2px;
classDef output fill:#9b59b6,color:#fff,stroke:#7d3c98,stroke-width:2px;
classDef skip fill:#95a5a6,color:#fff,stroke:#7f8c8d,stroke-width:2px;
Use mouse to pan and zoom
Quick Start¶
nextflow run nextflow/main.nf \
--input samplesheet.csv \
--variants variants.vcf \
--fasta reference.fa \
-profile docker
Documentation¶
| Page | Description |
|---|---|
| Samplesheet | Input CSV format |
| Parameters | All configuration options |
| Examples | Common usage patterns |
Related¶
- CLI Reference — For processing few samples
- Troubleshooting — Common issues
abbreviations