ArchitectureΒΆ
py-gbcms uses a hybrid Python/Rust architecture for maximum performance.
System OverviewΒΆ
flowchart TB
subgraph Python["π Python Layer"]
CLI[CLI
cli.py] --> Pipeline[Orchestration
pipeline.py]
Pipeline --> Reader[Input Adapters
VcfReader, MafReader]
Pipeline --> Writer[Output Writers
VcfWriter, MafWriter]
end
subgraph Rust["π¦ Rust Layer (gbcms._rs)"]
Counter[count_bam
counting.rs] --> CIGAR[CIGAR Parser]
Counter --> Stats[Strand Bias
stats.rs]
end
Pipeline -->|"PyO3"| Counter
Counter -->|"BaseCounts"| Pipeline
style Python fill:#3776ab,color:#fff
style Rust fill:#dea584,color:#000
Use mouse to pan and zoom
Data FlowΒΆ
flowchart LR
subgraph Input
VCF[VCF/MAF]
BAM[BAM Files]
FASTA[Reference]
end
subgraph Process
Load[Load Variants]
Validate[Validate vs Ref]
Count[Count Reads]
end
subgraph Output
Result[VCF/MAF + Counts]
end
VCF --> Load --> Validate
FASTA --> Validate
Validate --> Count
BAM --> Count
Count --> Result
Use mouse to pan and zoom
Coordinate SystemΒΆ
All coordinates normalized to 0-based, half-open internally:
flowchart LR
VCF["VCF (1-based)"] -->|"-1"| Internal["Internal (0-based)"]
MAF["MAF (1-based)"] -->|"-1"| Internal
Internal -->|"to Rust"| Rust["gbcms._rs"]
Rust -->|"+1"| Output["Output (1-based)"]
Use mouse to pan and zoom
| Format | System | Example |
|---|---|---|
| VCF input | 1-based | chr1:100 |
| Internal | 0-based | chr1:99 |
| Output | 1-based | chr1:100 |
FormulasΒΆ
Variant Allele Frequency (VAF)ΒΆ
Where: - AD = Alternate allele read count - RD = Reference allele read count
Strand Bias (Fisher's Exact Test)ΒΆ
| Forward Reverse |
-----+--------------------+
Ref | a b |
Alt | c d |
-----+--------------------+
p-value = Fisher's exact test on 2Γ2 contingency table
Low p-value (< 0.05) indicates potential strand bias artifact.
Module StructureΒΆ
src/gbcms/
βββ cli.py # Typer CLI
βββ pipeline.py # Orchestration
βββ core/
β βββ kernel.py # Coordinate normalization
βββ io/
β βββ input.py # VcfReader, MafReader
β βββ output.py # VcfWriter, MafWriter
βββ models/
β βββ core.py # Pydantic config
βββ utils/
βββ logging.py # Structured logging
rust/src/
βββ lib.rs # PyO3 module (_rs)
βββ counting.rs # BAM processing
βββ stats.rs # Fisher's exact test
βββ types.rs # Variant, BaseCounts
ConfigurationΒΆ
All settings via GbcmsConfig (Pydantic model):
flowchart TB
GbcmsConfig --> OutputConfig[Output Settings]
GbcmsConfig --> ReadFilters[Read Filters]
GbcmsConfig --> QualityThresholds[Quality Thresholds]
OutputConfig --> D1[output_dir, format, suffix]
ReadFilters --> D2[exclude_secondary, exclude_duplicates]
QualityThresholds --> D3[min_mapq, min_baseq]
Use mouse to pan and zoom
See models/core.py for definitions.