Changelog¶
All notable changes to py-gbcms are documented here.
Full History
See GitHub Releases for complete release notes.
Changelog¶
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[2.3.0] - 2026-02-06¶
✨ Added¶
- Nextflow BAI Auto-Discovery: Checks
.bam.baiand.baiextensions automatically - Documentation Modernization: Hierarchical navigation, glightbox, panzoom, abbreviations
- Performance Benchmarks: cfDNA duplex sample metrics in documentation
- RHEL 8 Installation Guide: Conda-based source installation for legacy Linux
🔄 Changed¶
- Dockerfile: Added
procps,bash, OCI labels,maturin[patchelf], selective COPY - Nextflow Config:
--platform linux/amd64, shell config, local profile, observability (trace/report/timeline/dag) - MkDocs: Switched to
navigation.sections, 20+ abbreviations with hover tooltips - GitHub Actions: Consolidated deploy-docs workflows, added caching and PR validation
- CI Wheels: Migrated from
manylinux_2_28tomanylinux_2_34(AlmaLinux 9 with OpenSSL 3.0+)
🔧 Fixed¶
- Nextflow: Empty
--suffixargument no longer causes failures - Admonitions: Converted GitHub-style alerts to MkDocs syntax
- CI Build: Resolved
curl-sysOpenSSL version conflict by switching to manylinux_2_34
[2.2.0] - 2026-02-04¶
✨ Added¶
- Multi-platform Wheel Publishing: Maturin-based CI builds for Linux (x86_64, aarch64), macOS (Intel, Apple Silicon), and Windows
- Structured Logging: New
utils/logging.pymodule with Rich console output, timing utilities, and log file support - Mermaid Diagrams: Architecture documentation with interactive flowcharts
- Release Guide: Comprehensive
docs/RELEASE.mdwith git-flow workflow
🔄 Changed¶
- Folder Restructure: Moved Rust code to
rust/(bundled asgbcms._rs) - Config Hierarchy: Nested Pydantic models (
ReadFilters,QualityThresholds,OutputConfig) for better organization - Code Quality: Added
__all__exports, docstrings, and type hints across all modules - StrEnum: Modern enum pattern with Python 3.10 backport
📚 Documentation¶
- New
docs/ARCHITECTURE.mdwith system diagrams - New
docs/DEVELOPMENT.md(developer guide) - New
docs/TESTING.md(testing guide) - Updated MkDocs with mermaid2 plugin and snippet includes
[2.1.2] - 2025-11-25¶
🔧 Fixed¶
- PyPI Distribution: Fixed source distribution size issue by correctly excluding large files (tests, docs, etc.) via
pyproject.tomlconfiguration.
[2.1.1] - 2025-11-25 [YANKED]¶
Yanked Release
This release was yanked from PyPI due to a source distribution size limit error. Use 2.1.2 instead.
🔧 Fixed¶
- PyPI Distribution: Added MANIFEST.in (failed to work with Hatchling) to reduce source distribution size
- Documentation: Added comprehensive Installation guide
- Documentation: Unified Contributing guide (merged code + docs contributions)
- Documentation: Added Changelog to documentation navigation
[2.1.0] - 2025-11-25¶
✨ Added¶
Nextflow Workflow¶
- Production-ready Nextflow workflow for processing multiple samples in parallel
- SLURM cluster support with customizable queue configuration
- Per-sample suffix support via optional
suffixcolumn in samplesheet - Docker and Singularity profiles for containerized execution
- Automatic BAI index discovery with validation
- Resume capability for failed workflow runs
- Resource management with automatic retry and scaling
- Comprehensive documentation in
docs/NEXTFLOW.mdandnextflow/README.md
Documentation¶
- Usage pattern comparison guide (
docs/WORKFLOWS.md) for choosing between CLI and Nextflow - MkDocs integration for beautiful GitHub Pages documentation
- Local documentation preview with live reload (
mkdocs serve) - Staging deployment from
developbranch for testing docs - Production deployment from
mainbranch - Reorganized documentation structure with clear CLI vs Nextflow separation
- CLI Quick Start guide (
docs/quick-start.md)
🔧 Changed¶
- Documentation workflow: docs now live on
mainbranch with automated deployment - GitBook integration: configured to read from
mainbranch - Nextflow module: improved parameter passing with meta.suffix support
📝 Documentation¶
- Complete Nextflow workflow guide with SLURM examples
- Per-sample suffix usage examples
- Git-flow documentation workflow guide
- Local preview instructions
- Updated README with clear usage pattern separation
[2.0.0] - 2025-11-21¶
🚀 Major Rewrite¶
Version 2.0.0 represents a complete rewrite of py-gbcms with a focus on performance, correctness, and modern architecture.
✨ Added¶
Core Features¶
- Rust-based Counting Engine: Hybrid Python/Rust architecture for 20x+ performance improvement
- Strand Bias Statistics: Fisher's exact test p-values and odds ratios for both reads (
SB_PVAL,SB_OR) and fragments (FSB_PVAL,FSB_OR) - Fragment-Level Counting: Majority-rule fragment counting with strand-specific counts (
RDF,ADF) - Variant Allele Fractions: Read-level (
VAF) and fragment-level (FAF) allele fraction calculations - Thread Control: Explicit control over parallelism via
--threadsargument (default: 1)
Input/Output¶
- VCF Output Format: Standard VCF with comprehensive INFO and FORMAT fields
- MAF Output Format: Extended MAF with custom columns for strand counts and statistics
- Column Preservation: Input MAF columns are preserved in output
- Multiple BAM Support: Process multiple samples via
--bam-listor repeated--bamarguments - Sample ID Override: Explicit sample naming via
--bam sample_id:pathsyntax
Filters¶
--filter-duplicates: Filter duplicate reads (default: enabled)--filter-secondary: Filter secondary alignments--filter-supplementary: Filter supplementary alignments--filter-qc-failed: Filter reads that failed QC--filter-improper-pair: Filter improperly paired reads--filter-indel: Filter reads with indels in CIGAR
CLI & Usability¶
- Modern CLI: Built with Typer and Rich for beautiful terminal output
- Progress Tracking: Real-time progress bars and status indicators
- Direct Invocation: Use
gbcms runinstead ofpython -m gbcms.cli - Output Customization:
--suffixflag for output filename customization - Flexible Input: Support for both VCF and MAF input formats
Infrastructure¶
- Docker Support: Production-ready multi-stage Dockerfile with optimized layers
- Type Safety: Full type annotations with mypy support
- Type Stubs: Provided
.pyistub file for Rust extension - Comprehensive Tests: Extended test suite with accuracy and filter validation
- CI/CD: GitHub Actions workflows for testing, linting, and releases
🔄 Changed¶
Architecture¶
- Migrated from pure Python to hybrid Python/Rust architecture
- Core counting logic implemented in Rust using
rust-htslib - Data parallelism over variants with per-thread BAM readers
Output Formats¶
- VCF FORMAT fields: Strand-specific counts now use comma-separated values (e.g.,
RD=5,3for forward,reverse) - MAF columns: Standardized column names (
t_ref_count_forward,t_alt_count_reverse, etc.) - Coordinate System: Internal 0-based indexing with correct conversion for VCF (1-based) and MAF output
Performance¶
- Speed: 20x+ faster than v1.x on typical datasets
- Memory: Efficient per-thread BAM readers with minimal overhead
- Scalability: Configurable thread pool for optimal resource usage
Dependencies¶
- Python: Updated to require Python ≥3.10
- Rust: pyo3 0.27.1, rust-htslib 0.51.0, statrs 0.18.0
- Python Packages: pysam ≥0.21.0, typer ≥0.9.0, rich ≥13.0.0, pydantic ≥2.0.0
🗑️ Removed¶
- Legacy Python Counting: Pure Python implementation removed in favor of Rust
- Old CLI: Deprecated
python -m gbcms.clientry point - Unused Dependencies: Removed
cyvcf2andnumba(no longer needed) - Pre-commit Hooks: Removed in favor of explicit linting in CI
🐛 Fixed¶
- Correct handling of complex variants (MNPs, DelIns)
- Proper strand assignment for fragment counting
- Reference validation against FASTA for all variant types
- Thread-safe BAM access with per-thread readers
📚 Documentation¶
- Complete rewrite of all documentation
- New guides:
INSTALLATION.md,CLI_FEATURES.md,INPUT_OUTPUT.md - Comprehensive API documentation
- Docker usage examples
- Contributing guidelines updated
🔧 Technical Details¶
Rust Components¶
gbcms._rs: PyO3-based Rust extension (bundled in wheel)- Fisher's exact test via
statrscrate - Rayon-based parallelism with configurable thread pools
- Safe memory management with Rust's ownership model
Testing¶
- 16 comprehensive test cases
- Accuracy validation with synthetic BAM files
- Filter validation for all read flag combinations
- Integration tests with real-world data
⚠️ Breaking Changes¶
Version 2.0.0 is not backward compatible with 1.x. Key breaking changes:
- CLI syntax: Use
gbcms runinstead ofpython -m gbcms.cli - Output format: VCF/MAF column structures have changed
- Default behavior: Only duplicate filtering enabled by default (was: all filters)
- Dependencies: Requires Rust toolchain for installation from source
- Python version: Minimum Python 3.10 (was: 3.8)
📦 Installation¶
# From PyPI (includes pre-built wheels)
pip install py-gbcms
# From source (requires Rust)
pip install git+https://github.com/msk-access/py-gbcms.git
# Docker
docker pull ghcr.io/msk-access/py-gbcms:2.0.0
🙏 Acknowledgments¶
This rewrite was designed and implemented with a focus on correctness, performance, and modern best practices in bioinformatics software development.
[1.x] - Legacy¶
Previous versions (1.x) used a pure Python implementation. See git history for details.