Choose Your Mode¶
gbcms supports two sequencing contexts. Pick the one that matches your data — the rest of the setup follows from there.
flowchart TD
Start(["What sequencing data?"]):::start
Start -->|"cfDNA · IMPACT · WGS\nWES · Panel"| DNA(["gbcms dna"]):::dna
Start -->|"STAR-aligned RNA-seq\ndUTP stranded"| RNA(["gbcms rna"]):::rna
Start -->|"Unstranded RNA-seq\n(random orientation)"| RNAL(["gbcms rna\n--no-strandedness"]):::rnal
classDef start fill:#9b59b6,color:#fff,stroke:#7d3c98,stroke-width:2px;
classDef dna fill:#27ae60,color:#fff,stroke:#1e8449,stroke-width:2px;
classDef rna fill:#3498db,color:#fff,stroke:#2471a3,stroke-width:2px;
classDef rnal fill:#2471a3,color:#fff,stroke:#1a5276,stroke-width:2px;
Use mouse to pan and zoom
Use for: cfDNA (MSK-ACCESS, IMPACT), WGS, WES, targeted gene panels
Key capabilities:
- Windowed indel detection with 3-layer safeguards (±5bp, adaptively extended in repeats)
- Mutant Fragment Size Distribution (
--mfsd) for short-fragment enrichment analysis - UMI-aware fragment deduplication (
--umi-tag) - Multi-allelic sibling exclusion to prevent REF inflation at complex loci
Defaults: MAPQ 20 · base quality 20 · duplicates filtered · PairHMM standard gap penalties
Use for: STAR-aligned RNA-seq (dUTP stranded or unstranded)
Key capabilities:
- NH:i:1 MAPQ rescue for novel splice junction reads
- dUTP strandedness filtering (disable with
--no-strandednessfor unstranded libraries) - A-to-I RNA editing site flagging via REDIportal (
--rna-editing-db) - Splice junction tracking (
rna_splice_spanning_count)
Defaults: MAPQ 1 · base quality 20 · secondary/supplementary/QC-failed reads filtered · PairHMM relaxed RT gap penalties
Prerequisites¶
| Requirement | DNA | RNA |
|---|---|---|
| Python 3.10+ | ✅ | ✅ |
BAM file with .bai index |
✅ | ✅ |
Reference FASTA with .fai index |
✅ | ✅ |
| VCF or MAF with variant positions | ✅ | ✅ |
| STAR-aligned BAM (NH tag) | — | ✅ |
Gene strand annotation in MAF (gene_strand column) |
— | Recommended |
| REDIportal TABLE1 file | — | Optional |
BAM Index
If your BAM lacks an index: samtools index sample.bam
FASTA Index
If your FASTA lacks an index: samtools faidx reference.fa
Related¶
- Installation — Install via PyPI, Docker, or from source
- Quick Start — Run your first counting job (DNA and RNA examples)
- CLI Reference — DNA — Full option reference for
gbcms dna - Nextflow Pipeline — For processing many samples in parallel on HPC
abbreviations