Changelog
All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
[0.0.15] - 2026-06-01
Added
- CPU+GPU JSON Merge Helpers:
load_model_results(directory, evaluator_name)andload_all_model_results(directory)ineval_engine.pytransparently discover and merge*_model_results.json(CPU) and*_gpu_model_results.json(GPU) files. GPU model keys (AUC, OOF probs, SHAP) are merged into the CPU dict. Used by report templates, scoreboard, and multimodal engine. - KREVIEW_SCOREBOARD: New Nextflow process (
scoreboard.nf) generatesscoreboard_combined__all.parquetafter all CPU/GPU eval jobs complete. Usesbuild_scoreboard()which now callsload_all_model_results()for unified GPU+CPU scoring. - GPU Exit Code:
kreview eval gpunow exits with code 1 if NO models produced valid AUC results (was silent success). Prevents Nextflow from treating empty GPU results as success. - Unit Tests: 11 new tests in
test_merge_helpers.pycovering CPU-only, GPU-only, CPU+GPU merge, malformed JSON, and directory scan scenarios.
Changed
- GPU JSON Output Naming: GPU eval module now produces
*_gpu_model_results.json(was*_model_results.json), preventing filename collision when CPU and GPU outputs are collected into the same Nextflow channel. - Report Input Signature:
KREVIEW_REPORTnow accepts 6 inputs (matrices, model_results, eval_stats, selection_qc, joblib_files, scoreboard) for complete dashboard rendering. - Workflow Wiring:
kreview_eval.nfcreates unifiedch_all_jsonsandch_all_joblibchannels that mix CPU+GPU outputs, feeds them to SCOREBOARD → REPORT. - Scoreboard Loading:
build_scoreboard()now usesload_all_model_results()instead of manual glob+load loop. - Multimodal Baselines Loading:
_load_per_evaluator_baselines()now usesload_all_model_results()instead of manual glob+load loop. - FSD Density Calculation: Added
select_dtypes(include="number")filter before density computation infsd.pyandfsd_genomewide.pyto prevent TypeError on non-numeric metadata columns. - Multimodal NF Module: Removed unnecessary staging loop —
--results-dir .uses Nextflow work dir symlinks directly. - Dockerfile Optimization: Selective builder
COPY(onlypyproject.toml,kreview/,LICENSE), singlepip install "${WHL}[gpu]"pass to ensure local wheel resolution, droppedpython3.12-devandwgetfrom GPU runtime, merged OCI labels into singleLABELinstruction. - CI Parallelism:
test.ymlsplit into paralleltest(Python) anddocker(matrix[cpu, gpu]) jobs withfail-fast: false. GPU build getsjlumbroso/free-disk-spacecleanup (~20-30 GB freed). .dockerignore: Expanded to excludedocs/,nextflow/,scripts/,tests/,.github/,.agents/— reduces build context transfer.
Fixed
- OOF Label Key Search:
"oof_labels".endswith("_oof_labels")isFalse— fixed to check both exact match and suffix match. Applied to bothreport_template.qmdandreport_multimodal_template.qmd. - FSD TypeError: Non-numeric columns (e.g.,
sample_id,filename) causedTypeError: ufunc 'divide' not supportedin FSD density calculation. Now filtered to numeric columns only. - Multimodal Validation Logging: Now shows CPU vs GPU JSON counts separately for better debugging.
- Docker GPU Build CI Failure: GPU image build exceeded runner disk space (~14 GB free). Fixed by adding
jlumbroso/free-disk-spaceaction and optimizing Dockerfile layers.
Removed
tabpfn-extensions[interpretability]from[gpu]extras — unused since v0.0.13 (SHAP computation replaced byshapiq). Eliminatestransformers,wandb,huggingface-hub,tokenizerstransitive dependencies (~500 MB).- Deprecated HPC scripts:
run_hpc_0.8.3.sh,run_hpc_v0.0.10.sh,run_hpc_v0.0.11.sh,runner.sh,utils/symlinker.py— replaced by unifiedscripts/run_hpc.sh.
Breaking Changes
- GPU JSON output renamed:
{eval}_model_results.json→{eval}_gpu_model_results.json KREVIEW_REPORTinput signature expanded from 2 to 6 inputs
[0.0.14] - 2026-05-25
Added
- Reproducibility Seed:
--seedCLI flag onrun,eval cpu,eval gpu,eval multimodal, andselectcommands (default: 42). - Deterministic Mode:
--deterministic / --no-deterministicflag for PyTorch cudnn (default: True) on all eval commands. kreview/reproducibility.py:seed_everything()utility for Python, PyTorch, and cuDNN seeding.- Nextflow Params:
seedanddeterministicparams innextflow.config, forwarded through all 8 pipeline modules. - Reproducibility Docs: New section in
models-and-metrics.mddocumenting seed propagation strategy.
Fixed
- 3 hardcoded
random_state=42in_select_multimodal_features()now accept caller seed. - 7 internal forwarding gaps where
random_statewas not passed to child functions. - GPU models (TabPFN, TabICL) now receive
random_stateparameter via_build_gpu_model(). shapiq.TabularExplainerandshap.PermutationExplainernow seeded for reproducible SHAP values.BorutaShap.fit()now receivesrandom_statefor deterministic feature selection.- SLURM Partition Routing: All
withNameprocess blocks now pinqueueexplicitly to bypass the nf-core iris institutional config's dynamic queue closure that injects the inaccessiblecpupartition intosbatch -p. - KREVIEW_LABEL Error Strategy: Changed from inherited
'ignore'(from iris config) to'terminate'— pipeline aborts immediately if labeling fails instead of deadlocking.
[0.0.13] - 2026-05-24
Added
- GPU Foundation Models: TabPFN v8.0.3 and TabICL v2.1 support via
_build_gpu_model()with updated import paths (tabpfn.TabPFNClassifier,tabicl.TabICLClassifier). - shapiq Integration: SHAP values for GPU models now computed via
shapiq.TabularExplainer(model-agnostic kernel-based Shapley values), replacing the deprecatedtabpfn-extensionsinterpreter. - Unified Model Persistence:
_save_fitted_models()helper incli_eval.pyprovides consistent joblib saves for both CPU and GPU models.--skip-gpu-joblibflag opts out of large GPU model files. - Multimodal GPU Models:
kreview eval multimodalnow accepts--gpu-models tabpfn,tabiclfor GPU-accelerated stacking and raw feature evaluation. - Pre-computed SHAP Fallback: Report templates render mean |SHAP| bar charts from JSON when joblib files are unavailable (e.g.,
--skip-gpu-joblib). - KREVIEW_REPORT_MULTIMODAL: New Nextflow process for rendering multimodal stacking dashboards in multistage mode.
- HPC Script v0.0.13: Updated SLURM launch script with GPU multimodal params, Boruta-SHAP selection, and top_percentile.
Changed
- Feature Selection:
--top-percentilereplaces--top-kfor MI-based feature selection. Percentage-based selection adapts to varying feature set sizes across evaluators. - Boruta-SHAP MI Reducer: When Boruta-SHAP confirms >500 features, an MI-based reducer narrows the set to
top_percentile(default 10%). - SLURM Hardening:
KREVIEW_LABELandKREVIEW_EVAL_GPU_SINGLEprocesses now setcache = 'lenient'andscratch = falseto prevent institutional queue/cache interference on IRIS. - Dynamic Report Rendering: ROC CI reverse-map, DCA loop, and AUC deltas now dynamically discover models from
DYNAMIC_MODELSinstead of hardcoding LR/RF/XGB.
Fixed
- ROC CI KeyError: Hardcoded
{"Logistic Regression": "lr", ...}reverse-map inreport_template.qmdreplaced with dynamic lookup fromDYNAMIC_MODELS, preventing crashes when GPU models are present. - DCA Loop: DCA now renders curves for all trained models (was hardcoded to RF/XGB only).
- AUC Deltas: Pairwise AUC delta display now discovers all
auc_delta_*keys dynamically. - Monolithic GPU Fitted Capture:
gpu_res, gpu_fitted = gpu_models(...)now captures fitted models (was_ = ...).
Dependencies
tabpfn >= 8.0.3(was>= 2.0)tabicl >= 2.1(was>= 0.0.4)- Added
shapiqfor GPU model SHAP computation.
[0.0.12] - 2026-05-24
Added
- Nextflow publishDir: All 8 multistage modules now publish outputs to
params.outdirwithmode: copy. Output structure:labels/,matrices/raw/,matrices/selected/,models/cpu/,models/gpu/,matrices/fused/,models/multimodal/,reports/. - Label-first DAG:
KREVIEW_LABELruns once as the first step in multistage mode, producinglabels.parquetshared across all extract jobs. Eliminates 26× redundant re-labeling (~13 min saved). --labelsCLI flag:kreview extractnow accepts--labels /path/to/labels.parquetto skip internal labeling and use pre-computed labels.- Boruta-SHAP in HPC script:
scripts/run_hpc_v0.0.11.shnow passes--multimodal_selection boruta_shapfor rigorous multimodal feature selection.
Changed
- Report DAG dependency: Report process now receives both matrices AND
*_model_results.jsonfrom CPU/GPU eval, enabling complete dashboard rendering (ROC, SHAP, metrics). Report runs in parallel with FUSE + MULTIMODAL. - Extract module:
extract.nfnow acceptslabels.parquetas input from upstreamKREVIEW_LABEL.
Documentation
- README: Comprehensive overhaul — added mRMR/Boruta-SHAP, GPU models, multimodal stacking, Nextflow HPC, pipeline architecture Mermaid diagram, fixed stale
--workersflag. - Statistical Tests: Detailed mRMR documentation (optimization objective, F-statistic/Pearson explanation, algorithm steps), Boruta-SHAP documentation (shadow features, SHAP importance, flowchart, configuration table, when-to-choose guide), fixed stale Selection QC keys.
- Nextflow Operations: Updated module list to match actual filenames, added
irisprofile docs, added publishDir output structure tree, addedparq-clitip. - Pipeline CLI: Fixed stale
--workers 4, added--labelsto extract example, added--ch-hotspot-maf, added label-first modular pipeline flow, addedparq-clitip. - Pipeline Architecture: Updated DAG (Label-first, Report parallel with Multimodal), fixed process table with actual
_single.nfmodule names, added publishDir column.
[0.0.11] - 2026-05-24
Changed
- Feature Selection (Default): Default single-evaluator strategy changed from
hybrid_uniontomrmr(Minimum Redundancy Maximum Relevance). mRMR selects features maximizing target correlation while minimizing inter-feature redundancy, preventing multi-collinearity. - Multimodal Selection:
--multimodal-selectionnow supportsboruta_shap(interaction-aware, SHAP-based) in addition tomi(mutual information, default). - CLI Log Output:
kreview selectandkreview runnow display method-aware selection summaries (mRMR shows variance-dropped count; hybrid_union shows AUC∩MI overlap breakdown).
Added
mrmr-selectiondependency: Addedmrmr-selection>=0.2.8topyproject.toml.BorutaShapdependency: AddedBorutaShap>=1.0.17topyproject.toml.- Structured startup logging:
kreview selectandkreview eval multimodalnow emitlog.info("select_start", ...)andlog.info("eval_multimodal_start", ...)at startup for machine-parseable audit trails. - Multimodal selection method persistence:
multimodal_results.jsonnow includesraw_features_selection_method("mi"or"boruta_shap"). - Selection Strategy value box: Multimodal dashboard executive summary displays the cross-evaluator selection strategy.
- Scoreboard
selection_methodcolumn: Now visible in both single-evaluator and multimodal dashboard scoreboard tables. - mRMR scatter disclaimer: When
strategy="mrmr", the Feature Selection scatter plot displays a callout explaining that AUC/MI axes are observational (mRMR uses F-statistic + Pearson correlation internally). - mRMR scatter coloring: Selected features labeled "Selected (mRMR)" with cyan color, distinct from hybrid_union's 4-way AUC/MI overlap coloring.
Fixed
KeyErrorcrash incli_select.py: Fixed crash when--strategy mrmrwas used — the log output tried to access hybrid-union-only QC keys (n_overlap_both,n_auc_only,n_mi_only).- Misleading
cli.pylog output:kreview run --strategy mrmrnow shows mRMR-specific summary instead of zeroed-out hybrid_union counts. - Stale docstrings: Updated module docstrings in
selection.py,cli_select.py,cli_eval.py, andeval_engine.pyto reflect mRMR as default.
[0.0.10] - 2026-05-22
Added
- Modular CLI: Extracted
runinto atomic sub-commands (label,extract,fuse,eval,select,report). - Nextflow DAG: Complete rebuild of monolithic pipeline into a multi-stage, per-evaluator parallelized workflow.
- GPU Evaluation: Native support for PyTorch-based GPU models (TabPFN, TabICL) via the
kreview eval --gpu-modelstarget. - Multimodal Models: Cross-evaluator stacking capability allowing evaluation of multiple fragmentomics assays simultaneously.
- Testing Optimizaton: Dropped unit testing execution time from >6min to <30s via module-scoped caching.
- Docker Containers: Stabilized multi-target release action generating discrete CPU and GPU containers cleanly.
[0.0.9] - 2026-05-07
Changed
- Feature Selection: Replaced Cohen's D
--top-nselection with hybrid union: top X% by Univariate AUC ∪ top X% by Mutual Information. This captures both linear and non-linear predictors. - CLI:
--top-ndeprecated (prints warning, ignored). Use--top-percentile(default: 10%) instead.--compute-univariate-aucnow defaults toTrue. - Volcano Plot: X-axis changed from Cohen's D to Univariate AUC for better alignment with model-based selection.
- Top-20 Bar Chart: Ranked by Univariate AUC instead of Cohen's D.
- Statistical Ledger: Sort priority changed to
univariate_auc > mutual_info > cohens_d > kw_statistic. - Feature Cards: Display AUC + MI scores instead of Cohen's D.
Added
mutual_info_score(): New function ineval_engine.pyusingsklearn.feature_selection.mutual_info_classif(k=3)for non-linear feature relevance scoring.selection_qcmetadata: Saved inmodel_results.jsonwith method, overlap stats, and feature counts for audit trail.- Feature Selection QC Scatter: New dashboard plot showing AUC vs MI with 4-color category coding (Both / AUC-only / MI-only / Dropped).
- Scoreboard columns:
selection_method,n_selected_features,selection_overlap_pct. - Structured logging:
feature_scoring_complete,feature_selection_complete,univariate_auc_disabled,variance_guard_dropped,model_skip_insufficient_data. - Resume checkpoint: Warns on legacy JSONs missing
selection_qc(pre-v0.0.9 Cohen's D runs). - Tests: 13 new tests for
univariate_auc(6) andmutual_info_score(7) covering bounds, NaN handling, constant features, signal detection, and reproducibility. - univariate_auc logging:
log.warning("univariate_auc_failed")replaces bare silentexcept.
Fixed
- Duplicate label mask (GAP-1): Removed duplicate 4-tier label mask construction; reuse scoring target for model target.
- Redundant import (GAP-3): Removed
import structlog as _sloginside evaluator loop; use module-levellog. - Silent degradation (GAP-4): Explicit warning when
--no-compute-univariate-aucdegrades to MI-only selection. - Double imputation (GAP-6): Cached
_impute()result in variance guard to avoid re-computation. - Silent model skip (GAP-7): Added echo + structlog warning when model eligibility fails (< 20 samples or single class).
[0.0.8] - 2026-05-06
Fixed
- AUC Consistency (D-01): ROC plots and KDE density now use out-of-fold predictions matching the official AUC value boxes. Eliminates the misleading AUC discrepancy (e.g., 0.845 vs 0.72) caused by mixing training/subsample models.
- Dashboard Crashes (D-02/D-03): All
cohens_d_true_vs_healthyreferences guarded with column existence checks. Statistical Ledger falls back tokw_statistic→univariate_aucwhen Cohen's D is unavailable. - Classification Labels (C-02): Confusion matrix and classification report now correctly show "ctDNA Negative" instead of "Healthy Normal".
- Fold Variability (C-01): Now displays all three models (LR, RF, XGBoost) with per-model std in title instead of hardcoded RF-only.
- Duplicate Imports (C-03): Removed redundant plotly import in report template.
- Volcano Hover (GAP-1): NaN-safe hover text shows "N/A" instead of "nan" for zero-variance features.
- JSON Bloat (GAP-6): OOF probability arrays rounded to 6 decimal places (~40% size reduction per evaluator).
- HPC Memory (OOM): Base memory bumped 256→512GB for genome-wide evaluators (FSD, WPS).
Added
- Structured Logging:
structlog-based logging in CLIreport()with per-evaluator timing, success/fail counts, and summary statistics. - Smart Resume (GAP-4): Resume checkpoint validates JSON freshness — warns if OOF keys are missing from pre-hardening runs.
- SHAP Waterfall Guards (S-04): Validates array lengths before plotting to prevent IndexError.
- SHAP Docstrings: All three SHAP helper functions now have full docstrings.
- Variable Initialization:
oof_y,oof_preds, andkde_labelsinitialized at template top level to prevent NameError when models fail.
Removed
plot_threshold_sensitivity()— dead code replaced by pre-computed threshold sweep in v0.0.7.- Redundant
cross_val_scoreimport and second independent CV computation. - Stale
plot_threshold_sensitivityentry from_modidx.py.
[0.0.7] - 2026-04-13
Added
- Dashboard Redesign: Restructured from single-page to 6-page progressive disclosure hierarchy (Executive Summary, Model Validation, Feature Explanation, Biomarker Yield, Cohort & QC, Data Explorer).
- XGBoost Integration: Full XGBoost model evaluation alongside LR and RF in
single_feature_model(). - Bootstrap AUC CIs: 95% confidence intervals for all model AUCs using
scipy.stats.bootstrap. - Decision Curve Analysis (DCA): Standalone
decision_curve_analysis()function computing net clinical benefit across thresholds for RF and XGBoost models. - Precision-Recall Curves: PR curves and average precision for all three models (LR, RF, XGBoost).
- Fold-Level AUC Tracking: Per-fold AUC tracking via
cross_val_scorefor all three models with*_auc_stdstability metric. - Threshold Sensitivity Sweep: 50-point sweep of sensitivity, specificity, and PPV across thresholds (0.01–0.99) for RF.
- Feature Stability: CV cross-fold feature importance consistency scoring (0.0–1.0 scale).
- QC Metrics: Per-feature
n_missing,pct_missing, andis_zero_variancecomputed inevaluate_feature(). - Feature Cards: Auto-generated metadata cards from evaluator registry with tier, category, and derived feature type detection.
- SHAP Tabbed Interface: Unified SHAP tabs for RF and XGBoost with consistent layout.
- Verdict Value Box: AUC-threshold verdict (Strong ≥0.80, Moderate ≥0.70, Weak <0.70) in Executive Summary.
- Label × Cancer Type Sunburst: Interactive sunburst visualization replacing static bar chart.
- Per-Sample Coverage Histogram: Histogram of non-null feature percentage per sample.
- Per-Chromosome Feature Chart: Conditional chromosome-level feature bar chart.
- Feature Importances Tab: Top-20 RF Gini importance bar chart in Model Validation.
- AUC Deltas: RF–LR and XGB–RF AUC deltas surfaced in Performance Metrics.
- Training Time: Model training time displayed in Performance Metrics tab.
- VAF Scatter: Re-added tumor burden independence scatter (top feature vs max_vaf, LOWESS trendline).
- great_tables Integration: Column-grouped data explorer using
great_tableswith auto-detected feature family spanners. - Quarto Auto-Discovery:
_find_quarto()probes PATH then 7 well-known install locations (Positron, Homebrew, system, conda). - Methods Link: Sidebar link to API documentation site.
- Documentation: New dashboard guide, DCA methodology doc, feature cards API reference, comprehensive CLI flag docs.
- Tests: 24 new tests (53 total) covering DCA, PR curves, fold AUC, threshold sweep, feature stability, QC metrics, training time, AUC deltas, and feature cards.
Changed
- JSON schema expanded from 24 to 44 fields per feature set.
- Dashboard template: 1,555 lines, 52 balanced code blocks.
- Language discipline applied across docs and template (removed promotional/overclaiming language).
custom.scss: bumped font size to 0.95rem, added QC warning class, added@media printrules.
Fixed
- Silent
except: passin calibration code replaced withlog.warning(). - Division by
len(df)without zero-guard in class balance text. - Scoreboard NaN values: sensitivity, specificity, n_samples, n_positive now correctly extracted from
rf_classification_reportdict instead of non-existent top-level keys. - QC tab Figure code dump: unsuppressed
fig.update_traces()return displayed as rawFigure({...})text. - Dashboard
debug: trueleaked Python output into rendered HTML; replaced withwarning: false. - QC row heights summed to 135%; rebalanced to 100% (10+40+25+25).
- Undefined
_to_float_arrayinwps_panel.pyandwps_genomewide.pyreplaced with correctly importedparse_array. - SHAP waterfall rendering stabilized for edge cases with zero feature contributions.
- Typo
fallsbacks→fallbacksinpyproject.toml. great_tables>=0.15.0added topyproject.tomldependencies.
[0.0.6] - 2026-04-10
Fixed
- Bin-Level Extractor Coverage Filtering: Fixed critical data bug where FSC and FSR on-target extractors computed
median()across all 28,823 genomic bins, including the 98% with zero coverage. The zero-coverage sentinel values (-29.897for log2,0.0for ratios) dominated the median, producing constant features and AUC=0.500 for all models. Now filters tototal > 0bins before aggregation. - SHAP Feature Shape Mismatch: Fixed
TreeExplainercrashes caused by passing a feature-subset matrix instead of the full model-trained feature set. SHAP now computes on all 50 model features;--shap-featuresonly limits visualization display count. - SHAP Binary Output Handling: Added robust handling for all
TreeExplainerreturn shapes (list, 3D array, 2D array) and safeexpected_valueextraction for binary classifiers. - Error Visibility: Changed Quarto stderr capture from first 500 chars to last 1500 chars, surfacing actual Python errors instead of kernel boot messages.
Added
- Nextflow HPC Parity: Wired all new CLI flags (
--top-n,--shap-samples,--shap-features,--resume,--skip-report,--cvd-safe) into Nextflownextflow.configandrun.nfwith HPC-optimized SLURM defaults (10-fold CV, 5000 SHAP samples, 64GB RAM). - CLI Configuration Logging: All three commands (
label,run,report) now log their full parameter state at startup for reproducibility. - Resume Support:
--resumeflag skips evaluators with existing model results, enabling incremental re-runs. - Variance Guard: CLI automatically drops zero-variance features before model training with a logged warning, preventing wasted compute on constant columns.
- Coverage Monitoring: All bin-level extractors now emit
n_covered_binsandn_total_binsin the feature matrix for downstream QC.
Changed
- Memory Management: Aggressive dashboard optimization — subsampled KDE plots (2000 samples),
gc.collect()between SHAP runs, explicit dataframe deletion after filtering. - SHAP Explainer: Switched from generic
shap.Explainertoshap.TreeExplainerto prevent OOM-causingKernelExplainerfallback. - HPC Resource Allocation:
process_highmemory increased from 32GB to 64GB for SHAP-heavy report generation.
[0.0.5] - 2026-04-09
Fixed
- PyPI Release Wheel Artifact Synchorization: Enforced a permanent "Triple Bump" protocol standardizing manual version matching globally across
settings.ini,nextflow.config, andkreview/__init__.pynatively. This resolves a decoupling bug that structurally blocked Github Actionspython -m buildcommands from publishing Wheels accurately tracking downstream package footprints correctly.
[0.0.4] - 2026-04-09
Fixed
- Documentation Sync: Hardcoded the
mikeversion provider default target alias gracefully tolatestintrinsically allowing GitHub Action GH-Pages orchestration arrays to map tags seamlessly. - Git Flow Constraints: Implemented official mandatory 8-step PR-driven workflow structures strictly bypassing raw fast-forward terminal merges blocking CI.
[0.0.3] - 2026-04-09
Added
- Global Config Parity: Synchronized dynamic execution limits tracking across native Python modules (v0.0.3) and HPC workflow blocks (
nextflow.config0.0.3). - Core CLI Flags: Engineered
@app.callback()directly intokreviewTyper infrastructure enabling standalone--versioninvocation blocks natively supporting Nextflowsoftware_versions.yamlregistry capture logging.
Fixed
- Systemic Module Sync: Repaired native
__init__.pydecoupling bugs where sequential tool paths caused legacy0.0.1footprints to bypass metadata injections. - Git Asset Purge: Permanently bound dynamic CLI Quarto runtimes (
html,static_plots,nohup) directly inside.gitignoreblocking blob registry accumulation remotely. - CI Formatting: Established uniform
black .compliance thresholds natively locking bothnbs/JSON source notebooks and/kreviewPython exports securely for Github Actions lint workflows.
[0.0.2] - 2026-04-09
Added
- Diagnostic Upgrades: Implemented explicit Sensitivity and Specificity clinical evaluations automatically calculating Youden's J static cutoffs and logging metrics securely to the results JSON matrix.
- Model Expansion: Added scalable
XGBoostmodeling alongside nativeRandom ForestandLogistic Regressionclassifiers inline. - Theme Selection: Introduced
set_theme()support parsing--cvd-safetoggle parameters. Allows zero-friction toggling between Okabe-Ito (colorblind secure) and global Muted Neon visualization workflows. - Cluster Deployment Optimization: Injected
[all]pip extra support resolving seamless multi-tool HPC dependencies smoothly (pip install -e ".[all]").
Fixed
- Dashboard Standalone Render: Resolved
format: dashboardclipping anomalies resulting from undocumented Quarto_quarto.ymlproject context bleeding. Hardcoded formatting constraints dynamically allowing pipeline usage deeply isolated in site-packages perfectly. - Path Resolution Hacks: Removed namespace package AST resolution assumptions and substituted secure
__file__contextual path checks, completely resolving editablepip install -e .templateNoneTypebugs natively. - CLI Options: Restored destroyed
--verboseoption parsing.
[0.0.1] - 2026-04-09
Added
- Formally released the production-grade
kreviewEvaluation Framework for fragmentomics features. - Five-tier classification algorithm implemented for accurate clinical ctDNA labeling:
Possible ctDNA−,True ctDNA+,Possible ctDNA+,Healthy Normal,Malignancy (Heme). - Scikit-learn Pipeline injection to eliminate cross-fold scaling leakage during standard ML
evaluate_feature. - DuckDB I/O optimizations strictly mapping internal batch chunks dynamically configured for Desktop (
--chunk-size 50) and HPC SLURM networks (--chunk-size 500). - Integrated dynamic
kaleidoplotting backends for clinical PDF Quarto workflows. - Dedicated Nextflow NF-Core DSL2 environment scaling wrapper specifically designed to ingest
manifest.txtparameters recursively without crushing symmetricwork/limits. - GitHub Container Registry release pipeline (
ghcr.io/msk-access/kreview) fully established utilizing OCI registry labels.
Fixed
- Out-of-fold probability caching logic strictly refactored away from simplistic aggregation bias.
- Hardcoded
papermillexceptions mapped into modular pipeline architecture. - Replaced ambiguous test metrics with explicitly verified Benjamini-Hochberg (FDR) corrections over Mann-Whitney metrics.