Samplesheet Format¶

The samplesheet is a CSV file defining input samples for the Nextflow pipeline.

Basic Format¶

sample,bam,bai
sample1,/path/to/sample1.bam,/path/to/sample1.bam.bai
sample2,/path/to/sample2.bam,
sample3,/path/to/sample3.bam,/path/to/sample3.bam.bai

Columns¶

Column	Required	Description
`sample`	Yes	Sample identifier (used in output filenames)
`bam`	Yes	Path to BAM or CRAM file
`bai`	No	Path to index file. Auto-discovers `.bam.bai`/`.bai` for BAM or `.cram.crai`/`.crai` for CRAM.
`suffix`	No	Per-sample output suffix
`bam_type`	No	BAM type label for multi-BAM merging (e.g., `duplex`, `simplex`)
`tsb`	No	Tumor_Sample_Barcode pattern(s) for MAF filtering

Index Auto-Discovery¶

If the bai column is empty, the pipeline auto-discovers the index based on the alignment file extension:

BAM files (.bam):

<bam>.bai (e.g., sample.bam.bai)
<bam_without_extension>.bai (e.g., sample.bai)

CRAM files (.cram, v5.3.0+):

<cram>.crai (e.g., sample.cram.crai)
<cram_without_extension>.crai (e.g., sample.crai)

Tip

Leave the bai column empty to use auto-discovery. You can also provide an explicit index path of any naming convention.

CRAM Support (v5.3.0+)

The bam column accepts both BAM and CRAM file paths. The --fasta reference is automatically threaded to the Rust engine and Python layers for CRAM decoding. No additional configuration is needed.

Per-Sample Suffix¶

For samples with multiple BAM types:

sample,bam,bai,suffix
sample1,/path/to/sample1.duplex.bam,,-duplex
sample1,/path/to/sample1.simplex.bam,,-simplex
sample1,/path/to/sample1.unfiltered.bam,,-unfiltered

Output files: sample1-duplex.maf, sample1-simplex.maf, sample1-unfiltered.maf

Multi-Sample MAF Filtering¶

When using --filter_by_sample with a multi-sample MAF as --variants, each sample's variants are filtered by Tumor_Sample_Barcode.

Exact Match (default)¶

When no tsb column is provided, the sample column is matched exactly against Tumor_Sample_Barcode:

sample,bam
P-0012345-T01-IM7,/path/to/tumor.bam
P-0067890-T01-IM7,/path/to/tumor2.bam

Regex Match¶

Use the tsb column for pattern matching (e.g., patient-level filtering):

sample,bam,tsb
patient_A,/path/to/A.bam,P-0012345
patient_B,/path/to/B.bam,P-0067890

P-0012345 matches any Tumor_Sample_Barcode containing that substring (e.g., P-0012345-T01-IM7, P-0012345-T02-IM7).

Comma-Separated Multi-Select¶

Select specific samples with comma-separated patterns:

sample,bam,tsb
tumor_pair,/path/to/A.bam,"P-0012345-T01-IM7,P-0012345-T02-IM7"

Duplicate rows (matched by multiple patterns) are automatically deduplicated.

Note

Samples with 0 matching variants are skipped and reported in pipeline_summary.tsv.

Multi-BAM Type Merging¶

When --merge_counts is enabled, samples with a bam_type column are grouped by sample ID and merged into a single output MAF with type-prefixed count columns.

sample,bam,bai,suffix,bam_type
sample1,/path/to/sample1.duplex.bam,,,duplex
sample1,/path/to/sample1.simplex.bam,,,simplex

When bam_type is set:

The --column-prefix is auto-derived from the type label (e.g., duplex_)
The output suffix is auto-derived as -{bam_type} (e.g., sample1-duplex.maf) unless an explicit suffix column value is provided
After per-BAM genotyping, the pipeline groups MAFs by sample and runs gbcms merge
Combined simplex_duplex_* columns are computed when both types are present

Suffix auto-derivation

When bam_type is set, you do not need to set suffix — it is automatically derived as -{bam_type}. Only set suffix explicitly if you need a custom value that differs from the bam_type label.

Minimum 2 BAM types per sample

Merge requires at least 2 inputs per sample. Samples with only 1 BAM type skip the merge step and output the single MAF directly.

See Merge CLI for details on the merge algorithm.

Parameters — All pipeline options
Examples — Common usage patterns
Troubleshooting — Common issues