Changelog
All notable changes to this project will be documented in this file.
[0.8.2] - 2026-03-26
Fixed
- Region MDS Output Collision: Fixed
Path.with_suffix()compound-extension bug that causedMDS.exon.tsvandMDS.gene.tsvto both resolve toMDS.tsv, silently destroying per-exon and per-gene data. Affects standaloneregion-mds,motif, andrun-allcommands. - Motif Tracking Paths: Fixed 5 additional
with_suffix()bugs in motif output tracking (EndMotif, BreakPointMotif, MDS, MDS.ontarget, EndMotif1mer) that caused path collisions and silent PON z-score normalization failures. - Gene Format Conversion: Region MDS gene output now correctly undergoes format conversion (Parquet/gzip) when no PON is provided.
- MDS Z-score Logging: Z-score append failures upgraded from
debugtowarninglevel to surface issues in production runs.
Added
- Compound Extension Tests: 57 new parametrized tests (
test_compound_extension.py) covering all 13 compound base names across TSV, Parquet, both formats, and roundtrip. - Developer Guide: "Known Gotchas" section documenting the
Path.with_suffix()anti- pattern with safe alternative and complete table of affected names.
Changed
- Nextflow Outputs: Added missing Parquet emit declarations for TFBS sync (2), ATAC sync (2), ATAC/TFBS ontarget sync (2), and fsc_counts (4 — TSV+Parquet, off/on-target).
- SLURM Script: Tuned head process memory (16G→32G), queue size (100→200), added
--output_format both --compress_tsv true --verbose truefor 14K+ sample production runs.
[0.8.1] - 2026-03-24
Fixed
- WPS Panel GC Correction: Panel WPS now uses on-target GC correction factors instead of off-target. Panel anchors overlap capture regions, making on-target correction more accurate. Falls back to off-target factors when on-target unavailable.
- Rust WPS Background: Fixed
coitreesmetadata type mismatch in WPS background consumer (wps.rs:1684) — uses.clone()for cross-platform compatibility (returns&usizeon macOS butusizeon CI Linux).
Changed
- Dead Code Cleanup: Removed 11 unused Python z-score functions from
pon_integration.py(all replaced by Rust equivalents). Module reduced from 448 to 98 lines, 14 to 3 functions. Remaining:load_pon_model,compute_nrl_zscore,compute_periodicity_zscore. - GC Factor Resolution:
gc_str/gc_ontarget_strpath resolution hoisted to a shared section inunified_processor.py, eliminating duplication between panel WPS and TFBS/ATAC.
Added
- build-pon Logging: OCF on-target/off-target baseline status now logged during PON build.
- sbatch Script:
scripts/build_pon_unfiltered.shfor building PON from high-coverage unfiltered BAMs on SLURM clusters.
Data
- PON Models Updated: Rebuilt xs1/xs2 PON models for both
all_uniqueandduplexvariants with krewlyzer 0.8.x.
[0.8.0] - 2026-03-17
Added
- On-target PON Z-scores: Panel mode now computes on-target/off-target PON baselines for MDS, OCF, and FSD features, providing clinically-scoped z-scores in panel assays.
- FSR On-target Output: FSR now emits a separate
.FSR.ontarget.tsvfile in panel mode. - FSR Real Genomic Coordinates: Region labels now reflect true genomic window coordinates instead of internal indices.
Fixed
- Output Format / Compress: All output files now correctly respect
--output-formatand--compressflags (gzip compression, path handling, GC correction factors loading). - Rust GC Correction: Added missing
PathBufimport; fixed path handling for correction factor files. - Rust Clippy: Replaced manual string strip with
strip_suffix(manual_striplint).
Documentation
- Corrected stale FSR column names in
concepts.mdandjson-output.md. - Updated FSR on-target output docs and window/panel-mode descriptions.
- Documented
_core.pyistub maintenance requirements.
CI
- GitHub Actions Node.js 24: Bumped
actions/checkoutv4→v5,actions/setup-pythonv5→v6,actions/cachev4→v5 to address Node.js 20 deprecation (enforced June 2, 2026).
[0.7.0] - 2026-03-02
Added
- Configurable Output Formats:
--output-format tsv|parquet|both(default:tsv) controls all tabular feature outputs.--compressgzip-compresses TSV outputs (.tsv.gz). WPS outputs remain always-Parquet regardless of setting.
Fixed
- build-pon Intermediate Files: Explicitly force
output_format="tsv"andcompress=Falsein allprocess_sample()calls withinbuild-pon(both parallel and sequential paths). Prevents silent failure if default output format changes — intermediate files are internal scratch consumed bypd.read_csv(sep="\t"). - Feature Serializer: Include
mds_zin JSON output for thefrom_outputs()code path. - OCF Base File:
OCF.tsv= all reads (on + off combined), not off-target.OCF.offtarget.tsvis the true panel off-target score. Corrected in docs and code comments. - Rust wps.rs: Remove erroneous
*dereference onnode.metadata(E0614 —usizeis Copy). - Rust gc_correction.rs: Prefix unused
valid_regions_pathparam with_to silence compiler warning; parameter retained for API symmetry. - MkDocs Snippets: Fix
--8<-- "CHANGELOG.md"/--8<-- "CONTRIBUTING.md"broken includes by changingpymdownx.snippets.base_pathfrom scalardocsto list['.', 'docs']so repo-root files resolve without path traversal blocked by CI deploy sandbox.
Documentation
- Post-0.6.0 Docs Audit (12 files, 7 issue categories):
- Fixed broken
--8<-- "../CHANGELOG.md"/--8<-- "../CONTRIBUTING.md"snippet includes - Corrected outdated "no global
--output-formatflag" note incli/run-all.md - Updated
metadata.json→metadata.tsvacross 7 files (8 references total) - Added WPS always-Parquet exception note to
reference/output-files.md - Added build-pon intermediate TSV format note to
guides/building-pon.md - Updated test count (248 → 244 + 4 skipped) and added CI lint steps to
developer-guide.md - Added
--output_formatand--compressparameters tonextflow/parameters.md - Output Format Options section: New section in
reference/output-files.mddocumenting--output-format,--compress, WPS always-Parquet exception, and--generate-json - OCF Variant Clarification: Added 3-variant table and note block in
reference/output-files.mdexplainingOCF.tsv(all reads) vsOCF.ontarget.tsvvsOCF.offtarget.tsv - docs/index.md: Replaced
:latestDocker tag with explicit:0.7.0per release policy
CI
- Lint Job: Added parallel
lintjob (ruff · black · mypy · cargo clippy -- -D warnings) running alongside tests on all push/PR events
[0.6.0] - 2026-02-28
Added
- mFSD Base Quality Filtering:
--min-baseq/-Q(default 20) gates variant evidence by base quality - mFSD GC Correction: Rust-native LOESS GC bias correction for variant fragment size distributions
- mFSD Duplex Weighting: Proper consensus fragment handling via
--duplex - Region MDS
--sample-name: Consistent output naming without post-hoc rename - Feature Serializer: Auto-load
fsc_counts,region_mds,uxminfrom_outputs() - IRIS Batch Submission:
scripts/run_krewlyzer_iris.shfor SLURM/IRIS cluster runs with--generate_json - nf-core Institutional Configs:
custom_config_baseparam and IRIS profile support - Versioned Documentation: Implemented
mikefor dev/stable doc versions - Nextflow mfsd Module: Full standalone params (
--reference,--correction-factors,--mapq,--minlen,--maxlen,--min-baseq,--duplex,--no-skip-duplicates) - Nextflow runall:
fsc_counts.tsvoutput declaration,--min-baseqwired - mFSD Integration Tests: 161 lines of new test coverage
Fixed
- mFSD MAF Parsing: Header-based column lookup (fixes column-index mismatch with different MAF flavors)
- Nextflow FILTER_MAF: Complete overhaul — eliminated join operator blocking, replaced regex with substring match, fixed SyntaxError in versions.yml, dynamic maxForks for SLURM
- Nextflow Workflow Streaming: Fixed RUNALL blocking from
remainder:true,failOnMismatch, channel round-robin; usedmultiMapfor proper routing - Nextflow RUNALL Outputs: Added 14 output declarations, fixed BreakPointMotif casing, explicit publishDir
- Region MDS Nextflow:
--sample-namereplacesmvworkaround - Nextflow Config: Executor queueSize placement,
-qsCLI flag, global publishDir removal - WPS CLI Tests: Fixed
--inputflag (was positional arg) — recovered 2 skipped tests - Pandas FutureWarning: Fixed
pd.concatwith all-NA columns in PON test fixture
Changed
- Code Quality: Black formatted 71 files, ruff fixed 129 lint errors, cargo clippy applied
- Ruff Config: Added
[tool.ruff]topyproject.tomlwith documented E402/F821 ignores - Agent Config:
.agent/→.agent/rules/withalwaysApplyfrontmatter
Documentation
- 45-item Audit: Corrections across 25 doc files including
.csv→.tsv(7 files),.WPS.tsv.gz→.parquet(3 files), phantom--output-formatremoved, Docker versions→X.Y.Z, parameters.md 12→28, outputs.md 14→41, JSON schema corrected, developer guide Rust table 10→19, architecture pipeline signature updated - PDF Embedding: Fixed rendering with mkdocs-pdf plugin
[0.5.3] - 2026-02-06
Documentation
- Admonition Formatting: Converted blockquotes to proper MkDocs admonitions across 10+ docs files for consistent styling
- Table Rendering: Fixed tables not rendering by adding required blank lines before table headers
- LaTeX Formulas: Fixed underscore escaping in math blocks (mFSD, FSR, WPS formulas)
- Abbreviations: Added glossary with auto-append for consistent acronym tooltips (cfDNA, WPS, etc.)
- Docs CI: Added PR validation trigger with
mkdocs build --strictto catch issues before deployment
[0.5.2] - 2026-02-05
Added
- Dual BAM Support (mFSD): New
--mfsd-bamparameter inrun-allto use a dedicated duplex BAM for mFSD while other features use the main all_unique BAM - Auto-duplex detection from filename patterns (
-duplex,_duplex,.duplex) - Startup banner shows mFSD BAM path when specified
- Nextflow: Added
mfsd_bamcolumn to samplesheet schema
Fixed
- Correction Factors Loading: Fixed delimiter detection for
.correction_factors.tsvfiles (CSV format with TSV extension)
Documentation
- Release Guide: Added version format note (use
0.5.2notv0.5.2in code) - mFSD docs: Updated with dual BAM workflow examples
- Samplesheet docs: Added
mfsd_bamcolumn documentation - Footer: Added site footer with author attribution and Antigravity acknowledgment
- Theme: Changed color scheme to blood-red (#ef5552) to match "krew" (blood) branding
[0.5.1] - 2026-02-04
Fixed
- Docker Image: Data folder now bundled in container (was missing in 0.5.0)
- CI Tests: Tests requiring bundled data now skip in PyPI installs via
@requires_datamarker
Changed
- Docker Tags: Use versioned tags only (e.g.,
:0.5.1); no:latesttag published
Documentation
- Installation: Added Singularity/Apptainer section for HPC clusters
- Installation: Clarified Docker uses release tags, not
:latest - Nextflow Examples: Added IRIS HPC cluster profile example (
-profile iris) - Testing Guide: Added Data Availability section explaining PyPI vs source
- Structure: Moved Changelog to Development section; removed duplicate files
- Mermaid Diagrams: Upgraded to
mkdocs-mermaid2plugin for theme-aware rendering - Removed: Full-site PDF export (unreliable rendering)
[0.5.0] - 2026-02-02
Added
PON Framework
build-poncommand: Generate PON models from cohort samples with FSD/WPS/OCF/MDS baselines- Bundled PON assets: Pre-computed xs1/xs2 PON models for
all_uniqueandduplexvariants --pon-variantflag: Select betweenall_unique(max coverage) orduplex(highest accuracy) PON models- Duplex tag warning: mFSD warns when
--duplexis used but no cD tags found in BAM
Panel Mode (MSK-ACCESS)
- Assay-aware asset resolution: Auto-load targets, anchors, PON via
-A/--assay xs1|xs2 - On/Off-target splitting: All tools produce separate
.ontarget.tsvoutputs - Gene-level FSC: New
{sample}.FSC.gene.tsvand{sample}.FSC.regions.e1only.tsv - Bait edge masking: WPS
--bait-paddingto remove capture artifacts --skip-target-regions: Force WGS mode for panel assays
Duplex Sequencing
--duplexflag (mFSD): Enable family size (cD tag) weighting for duplex BAMs- LLR scoring: Log-likelihood ratio classification for cross-species support
- GC-weighted mFSD: 5 new GC-corrected mean fragment size columns
Region-Based Analysis
- Region Entropy: TFBS/ATAC dual-output architecture with per-cluster entropy
- Region MDS: Per-gene Motif Diversity Score with E1-only filtering
- Rust backend: New
region_entropy.rsfor high-performance calculation
Feature Enhancements
- Jagged Index: 1-mer End Motif analysis with C-end fraction
- WPS v2.0: Hierarchical stacking, extended anchors, panel-specific normalization
- Output formats:
--formatflag (tsv/parquet/json) for all tools --generate-json: Unified JSON export with all features
Infrastructure
- Git LFS: Large files (.gz, .parquet, .bed) tracked via git-lfs
- BGZF compression: All BED outputs use block-gzip format
- GC references: Pre-computed Parquet format for GRCh37/GRCh38
- Unified processor: Single-pass Rust pipeline for Extract→Motif→FSC→FSD→WPS→OCF
Changed
Nextflow Pipeline
- nf-core compliance: Refactored to shared INPUT_CHECK subworkflow
- KREWLYZER_RUNALL: Unified module with 30+ output channels
pon_variantparameter: New pipeline parameter (default:all_unique)
CLI
- 9 tools with
--pon-variant: run-all, fsc, fsd, fsr, wps, ocf, motif, region-entropy, region-mds - Centralized asset resolution:
resolve_pon_model()andresolve_target_regions() - Filter flags:
--mapq,--minlen,--maxlen,--skip-duplicates,--require-proper-pair - Parallel processing:
--threadsoption across all tools
Data & Assets
- Terminology: Renamed "blacklist" to "exclude_regions" for inclusive language
- Directory structure:
data/{type}/{genome}/{variant}/for organized assets - Memory optimization: Parallel sample processing with spawn context in build-pon
Fixed
- mFSD: Filter discordant reads with extreme TLEN values
- mFSD: Fix verbose mode hanging by moving debug logging outside parallel loop
- Proper pair detection for legacy BAMs without proper pair flags
- GC correction for gene-level FSC in panel mode
- CIGAR handling improvements for INDELs and complex variants
- BAM extraction for v1 ACCESS and bgzip output format
Documentation
- 11 docs updated: PON Variant Selection across pon.md, pipeline.md, usage.md, feature docs
- Release Guide: New
docs/advanced/release-guide.mdfor version release process - Math rendering: LaTeX formulas in all feature documentation
- Glossary: New terms and definitions
Tests
- 28 new test files: Unit, integration, and e2e coverage
- 4 PON variant tests: test_asset_resolution.py
- Real data tests: test_real_data.py for end-to-end validation
[0.3.2] - 2025-12-18
Fixed
- CI Build: Removed
gfortranandscikit-misc- GC correction now fully in Rust - FSC GC Correction: Added
--gc-correct/--no-gc-correctand--verboseflags to FSC CLI - Python LOESS Removed: Removed
gc_correct()fromhelpers.pyandpostprocess.py
Changed
- Dockerfile: Removed
gfortrandependency (no longer needed) - GitHub Actions: Removed
gfortranfrom CI workflows - Nextflow Modules: Updated all container versions to
0.3.2
[0.3.1] - 2025-12-18
Added
- Rust LOESS GC Correction: New
rust/src/gc_correction.rsmodule using thelowesscrate - Per-fragment-type correction (short, intermediate, long)
- Configurable LOESS parameters (fraction, iterations, delta)
- FSR GC Correction:
--gc-correct/--no-gc-correctflag (default: True) - Uses corrected counts from Rust before calculating ratios
--verboseflag for detailed logging- WPS GC Correction:
--gc-correct/--no-gc-correctflag (default: True) --reference, -r: Reference FASTA for computing region GC content- FASTA-based GC computation using rust-htslib::faidx
- Graceful fallback if reference not provided
Changed
- FSC Rust Backend: Added
count_fragments_gc_corrected()function for integrated GC correction - WPS Rust Backend: Updated
calculate_wps()withreference_path,gc_correct,verboseparameters - lowess Dependency: Updated from 0.3 to 0.4 for improved API
Documentation
- Updated
docs/features/fsr.mdanddocs/features/wps.mdwith GC correction options
[0.3.0] - 2025-12-16
Added
- mFSD Variant Type Support: Now handles all small variant types:
- SNV (single nucleotide)
- MNV (multi-nucleotide)
- Insertion
- Deletion
- Complex (substitution + indel)
- 4-Way Fragment Classification: Fragments classified as REF, ALT, NonREF, or N (low quality)
- Comprehensive Statistics: 6 pairwise KS comparisons (ALT-REF, ALT-NonREF, etc.)
- Derived Metrics: VAF_Proxy, Error_Rate, N_Rate, Size_Ratio, Quality_Score
- Quality Flags: ALT_Confidence (HIGH/LOW/NONE), KS_Valid
- Distributions Output:
--output-distributionsflag generates per-variant size distributions - Verbose Logging:
--verboseflag for debug-level logging with variant type breakdown - MRD Support: Proper handling of low fragment counts (1-2) common in MRD settings
Changed
- BREAKING: mFSD output format changed from 11 columns to 39 columns
- Nextflow MFSD: Distributions and verbose logging enabled by default
- Fragment Counting: Now counts fragments (R1 only) instead of reads to avoid double-counting
Fixed
- CIGAR Handling: Improved sequence extraction for INDELs and complex variants
[0.2.3] - 2025-12-16
Changed
- Nextflow Pipeline: Added
FILTER_MAFmodule for multi-sample MAF filtering: - New
mafandsingle_sample_mafcolumns in samplesheet - Filters MAF by
Tumor_Sample_Barcodematching sample ID (regex:.*sample_id.*) single_sample_maf=truebypasses filtering for per-sample MAFs- Skips MFSD when filtered MAF has zero variants (with warning)
- Memory-efficient streaming for large MAF files (100s of MBs)
- Outputs filtered MAF + mFSD results
- Nextflow Modules: Updated all container versions to
0.2.3
[0.2.2] - 2025-12-15
Changed
- Project Structure: Migrated to recommended maturin "src layout" for better Python/Rust separation:
- Python code moved from
krewlyzer/tosrc/krewlyzer/ - Rust code moved from
krewlyzer-core/torust/ - Single
rust/Cargo.toml(removed duplicate rootCargo.toml) - Rust module now imports as
krewlyzer._core(private) - Dockerfile: Rewritten as multi-stage build for smaller image size (~200MB vs ~1GB); amd64 only
- OCI Labels: Added container metadata for GitHub Container Registry
Fixed
- Distribution Compatibility: Updated release workflow to build
manylinux_2_28wheels (compatible with RHEL 8+, AlmaLinux 8, etc.) - Source Builds: Included
clangandllvm-develin build environment forbindgen/hts-sys - Docker Build: Added
patchelfto maturin installation for Linux wheel building - Test Imports: Updated test files to use new
krewlyzer._coreimport path - CI OpenSSL: Use system OpenSSL (
OPENSSL_NO_VENDOR=1) instead of building from source - CI Python Versions: Build wheels for Python 3.10, 3.11, 3.12 in release workflow
- Documentation: Updated logo paths for new
src/layout
[0.2.1] - 2025-12-15
Fixed
- Rust Compilation: Resolved cross-platform build issues with
coitreesmetadata types (usizevs&usize) by using explicit.to_owned()conversion. - CI Build: Added
gfortran,clang, andlibclang-devto CI workflows to fixscikit-miscandrust-htslibcompilation failures. - Permission Errors: CI scripts now robustly handle
sudopermissions when installing system dependencies.
[0.2.0] - 2025-12-12
Added
- Unified Engine: New high-performance Rust core (
krewlyzer-core) that processes Extract, Motif, FSC, FSD, WPS, and OCF in a single parallelized pass. - Fragment Extraction: dedicated
extractcommand (via Rust) to convert BAM to BED with configurable filters. - Extract Documentation: New
docs/features/extract.mddetailing extraction logic and JSON metadata. - Calculation Details: Comprehensive formulas and interpretation guides added to all feature documentation.
- Root Cargo.toml: Added to support standard
maturinbuilds for the hybrid Python-Rust package.
Changed
- Performance: Significant speedup (3-4x) for end-to-end analysis due to multi-threaded processing and single-pass I/O.
- Build System: Migrated to
maturinbackend for robust Rust extension compilation. - CLI (
run-all): Now defaults to the Unified Engine. - CLI Filters: Added
--mapq,--minlen,--maxlen,--skip-duplicates,--require-proper-pairflags torun-all,extract, andmotif. - Motif Outputs: Renamed output files to use
.tsvextension consistently (e.g.,{sample}.EndMotif.tsv). - Data Handling:
motifnow uses the unified engine, eliminating the need forbedtoolsbinary entirely. - Documentation: Updated
README.md,usage.md, andpipeline.mdto reflect the new workflow.- Corrected
pipeline.mdsamplesheet format documentation to match Nextflow schema. - Updated
usage.mdand feature docs to correctly specify output directory arguments.
- Corrected
Fixed
- Test Suite: Cleaned up
tests/directory, removing obsolete scripts and fixing integration tests (test_science.py,test_run_all_unified_wrapper.py). - Validation: Fixed BAM header issues in tests.
Removed
- Legacy Python Backends: Removed pure Python implementations of
extract,motif,fsc,fsd, ensuring all paths use the unified Rust core. - Validation Artifacts: Deleted temporary validation scripts and data.
[0.1.7] - 2025-11-26
Fixed
- PyPI Metadata: Added
readmeandlicensefields topyproject.tomlto ensure the long description is correctly displayed on PyPI.
[0.1.6] - 2025-11-26
Fixed
- Docker Build: Removed
libatlas-base-devdependency fromDockerfileas it is not available in thepython:3.10-slim(Debian Trixie) base image.
[0.1.5] - 2025-11-26
Fixed
- Docker Publishing: Switched to
GITHUB_TOKENfor GHCR authentication to fix permission issues. - PyPI Publishing: Added
skip-existing: trueto handle existing versions gracefully. - CI/CD: Added build checks for Python package and Docker image to PR workflows.
[0.1.4] - 2025-11-26
Fixed
- Test Dependencies: Removed unused
pybedtoolsimports fromfsr.py,fsd.py,uxm.py, andfsc.pywhich were causingImportErrorin CI environments wherepybedtoolsis not installed.
[0.1.3] - 2025-11-26
Changed
- Dependency Reduction: Removed
pybedtoolsdependency. - Refactor:
motif.pynow usespandasfor blacklist filtering and sorting, removing the need forbedtoolsbinary. - CI/CD: Added
pytestandpytest-mocktotestoptional dependencies inpyproject.toml.
[0.1.2] - 2025-11-26
Added
- Mutant Fragment Size Distribution (
mfsd): New module to compare fragment size distributions of mutant vs. wild-type reads using VCF/MAF input. - Enhanced Fragment Size Ratios (
fsr): Added "Ultra-Short" (<100bp) ratio bin. - Documentation: Comprehensive MkDocs website (
docs/) with material theme. - Pipeline:
run-allcommand now supports--variant-inputformfsdanalysis. - Nextflow: Pipeline updated to support optional variant input in samplesheet.
Changed
- Updated
README.mdto point to the new documentation site. - Added
mkdocsandmkdocs-materialas optional dependencies.