Command Line Interface (CLI)

kreview exposes all primary pipeline orchestrations explicitly through the terminal using typer.

kreview

ctDNA fragmentomics feature evaluation

Usage:

kreview [OPTIONS] COMMAND [ARGS]...

Options:

  --version
  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

eval

Model evaluation commands

Usage:

kreview eval [OPTIONS] COMMAND [ARGS]...

cpu

Per-evaluator evaluation using LR, RF, XGBoost (CPU).

Iterates over all _matrix.parquet files in --matrices-dir, trains the specified models, and writes _model_results.json to --output.

Usage:

kreview eval cpu [OPTIONS]

Options:

  --matrices-dir PATH             Directory containing *_matrix.parquet files
                                  from kreview extract  [required]
  --output PATH                   Output directory  [default: output/]
  --models TEXT                   Comma-separated CPU models: lr,rf,xgb
                                  [default: lr,rf,xgb]
  --cv-folds INTEGER              Cross-validation folds  [default: 5]
  --resume                        Skip evaluators with existing results
  --seed INTEGER                  Random seed for reproducibility.  [default:
                                  42]
  --deterministic / --no-deterministic
                                  Enable PyTorch deterministic mode (slower
                                  but reproducible).  [default: deterministic]

gpu

Per-evaluator evaluation using TabPFN, TabICL (GPU).

Fine-tuning is ON by default. Use --no-finetune for zero-shot. Iterates over all *_matrix.parquet files and writes results JSONs.

Usage:

kreview eval gpu [OPTIONS]

Options:

  --matrices-dir PATH             Directory containing *_matrix.parquet files
                                  from kreview extract  [required]
  --output PATH                   Output directory  [default: output/]
  --models TEXT                   Comma-separated GPU models: tabpfn,tabicl
                                  [default: tabpfn,tabicl]
  --cv-folds INTEGER              Cross-validation folds  [default: 5]
  --no-finetune                   Use zero-shot inference instead of fine-
                                  tuning (not recommended)
  --finetune-epochs INTEGER       Fine-tuning epochs  [default: 30]
  --finetune-lr FLOAT             Fine-tuning learning rate  [default: 1e-05]
  --device TEXT                   PyTorch device: cuda, cpu  [default: cuda]
  --shap                          Compute SHAP values
  --shap-samples INTEGER          Max SHAP samples  [default: 500]
  --resume                        Skip evaluators with existing results
  --skip-gpu-joblib               Skip saving GPU model joblib files (can be
                                  >200MB each)
  --seed INTEGER                  Random seed for reproducibility.  [default:
                                  42]
  --deterministic / --no-deterministic
                                  Enable PyTorch deterministic mode (slower
                                  but reproducible).  [default: deterministic]

multimodal

Cross-evaluator multimodal evaluation with stacking and ablation.

Reads per-evaluator model_results.json files for OOF probabilities and combines them into a stacking matrix. Three strategies are run:

Stacking: Meta-learner on OOF probabilities across evaluators
Raw features (if --super-matrix provided): MI or Boruta-SHAP selected features
Ablation: Leave-one-evaluator-out importance analysis

Usage:

kreview eval multimodal [OPTIONS]

Options:

  --results-dir PATH              Directory with *_model_results.json files
                                  from eval cpu/gpu  [required]
  --super-matrix PATH             Optional path to super_matrix.parquet for
                                  raw-feature strategy
  --output PATH                   Output directory  [default: output/]
  --models TEXT                   Comma-separated CPU models for multimodal
                                  evaluation (lr,rf,xgb)  [default: rf,xgb]
  --gpu-models TEXT               Comma-separated GPU models: tabpfn,tabicl.
                                  Empty = CPU only.
  --top-percentile FLOAT          Top N%% features for MI selection (matches
                                  per-evaluator pipeline)  [default: 10.0]
  --multimodal-selection TEXT     Multimodal feature selection: mi (default,
                                  fast) or boruta_shap (interaction-aware)
                                  [default: mi]
  --cv-folds INTEGER              Cross-validation folds  [default: 5]
  --device TEXT                   PyTorch device: cuda, cpu  [default: cuda]
  --no-finetune                   Zero-shot GPU inference (not recommended)
  --finetune-epochs INTEGER       GPU fine-tuning epochs  [default: 30]
  --finetune-lr FLOAT             GPU fine-tuning learning rate  [default:
                                  1e-05]
  --seed INTEGER                  Random seed for reproducibility.  [default:
                                  42]
  --deterministic / --no-deterministic
                                  Enable PyTorch deterministic mode (slower
                                  but reproducible).  [default: deterministic]

extract

Label samples and extract feature matrices (no eval/model/report).

Runs the labeling pipeline, then extracts features for each matched evaluator into *_matrix.parquet files. This is the first half of kreview run, designed for parallelized Nextflow execution.

Usage:

kreview extract [OPTIONS]

Options:

  --cancer-samplesheet PATH       Cancer samplesheet CSV  [required]
  --healthy-xs1-samplesheet PATH  Healthy XS1 samplesheet CSV  [required]
  --healthy-xs2-samplesheet PATH  Healthy XS2 samplesheet CSV  [required]
  --cbioportal-dir PATH           Directory with cBioPortal files  [required]
  --krewlyzer-dir TEXT            krewlyzer output directory  [required]
  --output PATH                   Output directory for matrices  [default:
                                  output/]
  --min-vaf FLOAT                 Min VAF for Possible ctDNA+ (default 1%)
                                  [default: 0.01]
  --min-fragments INTEGER         Min fragments PF for Depth QC (samples below
                                  are Insufficient Data)  [default: 2000]
  --min-variants INTEGER          Min # variants passing VAF for Possible
                                  ctDNA+  [default: 1]
  --ch-hotspot-maf PATH           Optional TSV of CH hotspot variants for CH-
                                  only demotion.
  --features TEXT                 Comma-separated evaluator names (default:
                                  all)
  --tier INTEGER                  Run only this tier
  --chunk-size TEXT               Samples per DuckDB read batch. 'auto'
                                  (default) probes parquet row density at
                                  runtime, or pass an integer to override
                                  (e.g. --chunk-size 200).  [default: auto]
  --labels PATH                   Path to a pre-computed labels.parquet file.
                                  When provided, skips the internal labeling
                                  step entirely. Used by Nextflow multistage
                                  to avoid re-running labeling per evaluator.

features-list

List all registered feature evaluators.

Usage:

kreview features-list [OPTIONS]

fuse

Fuse per-evaluator matrices into a single super-matrix.

Discovers all *_matrix.parquet files in --output-dir, extracts their feature columns (prefixed with evaluator name), outer-joins on SAMPLE_ID, and writes super_matrix.parquet for downstream multimodal evaluation.

Usage:

kreview fuse [OPTIONS]

Options:

  --output-dir PATH         Directory containing *_matrix.parquet files
                            [required]
  --min-evaluators INTEGER  Minimum number of evaluators a sample must appear
                            in to be retained  [default: 1]
  --output-name TEXT        Filename for the fused super-matrix (written to
                            --output-dir)  [default: super_matrix.parquet]

label

Generate ctDNA labels without feature evaluation.

Usage:

kreview label [OPTIONS]

Options:

  --cancer-samplesheet PATH       Cancer samplesheet CSV  [required]
  --healthy-xs1-samplesheet PATH  Healthy XS1 samplesheet CSV  [required]
  --healthy-xs2-samplesheet PATH  Healthy XS2 samplesheet CSV  [required]
  --cbioportal-dir PATH           Directory with cBioPortal files  [required]
  --output PATH                   Output parquet file  [default:
                                  labels.parquet]
  --min-vaf FLOAT                 Min VAF for Possible ctDNA+ (default 1%)
                                  [default: 0.01]
  --min-fragments INTEGER         Min fragments PF for Depth QC (samples below
                                  are Insufficient Data)  [default: 2000]
  --min-variants INTEGER          Min # variants passing VAF for Possible
                                  ctDNA+  [default: 1]
  --ch-hotspot-maf PATH           Optional TSV of CH hotspot variants for CH-
                                  only demotion. Samples with only CH
                                  mutations are demoted to Possible ctDNA−.

report

Re-generate HTML Dashboards from existing matrix parquet files.

Scans input_dir for *_matrix.parquet files, renders each as a standalone Quarto HTML dashboard, and writes them to out_dir. Each evaluator is rendered independently so a single failure does not block the remaining dashboards.

Usage:

kreview report [OPTIONS]

Options:

  --input-dir PATH         Directory with *_matrix.parquet files  [required]
  --out-dir PATH           Directory to deposit Quarto reports  [default:
                           reports/]
  --cvd-safe               Render dashboards and plots using an Okabe-Ito
                           Colorblind-Safe palette instead of default neon.
  --shap-samples INTEGER   Max samples for SHAP explainability computation in
                           dashboards.  [default: 500]
  --shap-features INTEGER  Max features to visualize in SHAP plots.  [default:
                           10]
  --multimodal             Render the multimodal dashboard for stacking model
                           results.

run

Run full pipeline: label → extract → evaluate → report.

Usage:

kreview run [OPTIONS]

Options:

  --cancer-samplesheet PATH       [required]
  --healthy-xs1-samplesheet PATH  [required]
  --healthy-xs2-samplesheet PATH  [required]
  --cbioportal-dir PATH           [required]
  --krewlyzer-dir TEXT            krewlyzer output directory  [required]
  --output PATH                   Output directory  [default: output/]
  --min-vaf FLOAT                 [default: 0.01]
  --min-fragments INTEGER         Min fragments PF for Depth QC (samples below
                                  are Insufficient Data)  [default: 2000]
  --min-variants INTEGER          [default: 1]
  --features TEXT                 Comma-separated features to run
  --tier INTEGER                  Run features of this tier only
  --cvd-safe                      Render dashboards and plots using an Okabe-
                                  Ito Colorblind-Safe palette instead of
                                  default neon.
  --skip-report / --no-skip-report
                                  Skip HTML report generation  [default: no-
                                  skip-report]
  --cv-folds INTEGER              Number of cross-validation folds (3-20,
                                  default 5)  [default: 5]
  --impute-strategy TEXT          Imputation strategy for missing values:
                                  median, mean, or zero  [default: median]
  --export-duckdb                 Export a persistent duckdb data lake
                                  containing all feature matrices
  --chunk-size TEXT               Samples per DuckDB read batch. 'auto'
                                  (default) probes parquet row density at
                                  runtime, or pass an integer to override
                                  (e.g. --chunk-size 200).  [default: auto]
  --top-n INTEGER                 [DEPRECATED] Use --top-percentile instead.
                                  If set, overrides --top-percentile with a
                                  fixed count.
  --top-percentile FLOAT          Top X%% of features to select per metric
                                  (AUC, MI). The union of both sets feeds into
                                  models.  [default: 10.0]
  --strategy TEXT                 Feature selection strategy: mrmr (default)
                                  or hybrid_union  [default: mrmr]
  --multimodal-selection TEXT     Multimodal selection: mi (default) or
                                  boruta_shap  [default: mi]
  --shap-samples INTEGER          Max samples for SHAP explainability
                                  computation in dashboards. Lower values
                                  reduce memory usage.  [default: 500]
  --shap-features INTEGER         Max features to visualize in SHAP
                                  beeswarm/waterfall plots.  [default: 10]
  --resume                        Skip models whose AUC results already exist
                                  in the output JSON. Enables incremental
                                  runs: first CPU, then GPU with --resume.
  --compute-univariate-auc        Compute per-feature univariate LR AUC.
                                  Required for hybrid selection (default:
                                  True).  [default: True]
  --ch-hotspot-maf PATH           Optional TSV of CH hotspot variants for CH-
                                  only demotion. Samples with only CH
                                  mutations are demoted to Possible ctDNA−.
  --gpu-models TEXT               Comma-separated GPU models to run after CPU
                                  models: tabpfn,tabicl. Empty (default) = CPU
                                  only. Requires torch + model packages (pip
                                  install kreview[gpu]).
  --no-finetune                   Use zero-shot inference for GPU models
                                  instead of fine-tuning (not recommended).
  --finetune-epochs INTEGER       Number of fine-tuning epochs for GPU
                                  foundation models.  [default: 30]
  --finetune-lr FLOAT             Learning rate for GPU model fine-tuning.
                                  [default: 1e-05]
  --device TEXT                   PyTorch device for GPU models: cuda, cpu.
                                  [default: cuda]
  --skip-gpu-joblib               Skip saving GPU model joblib files (can be
                                  >200MB each).
  --seed INTEGER                  Random seed for reproducibility across all
                                  models and CV splits.  [default: 42]
  --deterministic / --no-deterministic
                                  Enable PyTorch deterministic mode (slower
                                  but reproducible). Default: True.  [default:
                                  deterministic]

select

Score features and apply feature selection to extracted matrices.

Reads all *_matrix.parquet files from --matrices-dir, computes feature scores (univariate AUC + mutual information), and selects the top features using mRMR (default, redundancy-aware) or hybrid union (top N%% by AUC ∪ top N%% by MI).

Writes selected matrices, eval stats, and QC metadata to --output (or overwrites originals when --overwrite is set).

Usage:

kreview select [OPTIONS]

Options:

  --matrices-dir PATH             Directory with *_matrix.parquet files from
                                  kreview extract  [required]
  --top-percentile FLOAT          Top X%% of features to select (controls K
                                  for mRMR, or per-metric cutoff for
                                  hybrid_union)  [default: 50.0]
  --strategy TEXT                 Feature selection strategy: mrmr (default,
                                  redundancy-aware) or hybrid_union (AUC∪MI)
                                  [default: mrmr]
  --cv-folds INTEGER              Cross-validation folds for univariate AUC
                                  scoring  [default: 5]
  --impute-strategy TEXT          Imputation strategy for variance check:
                                  median, mean, zero  [default: median]
  --output PATH                   Output directory for selected matrices
                                  (ignored when --overwrite is set)  [default:
                                  output/]
  --overwrite                     Overwrite original matrices in --matrices-
                                  dir instead of writing to --output
  --compute-univariate-auc / --no-compute-univariate-auc
                                  Compute per-feature univariate AUC (disable
                                  for MI-only selection)  [default: compute-
                                  univariate-auc]
  --seed INTEGER                  Random seed for reproducibility.  [default:
                                  42]