Command Line Interface (CLI)
kreview exposes all primary pipeline orchestrations explicitly through the terminal using typer.
kreview
ctDNA fragmentomics feature evaluation
Usage:
Options:
--version
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
eval
Model evaluation commands
Usage:
cpu
Per-evaluator evaluation using LR, RF, XGBoost (CPU).
Iterates over all _matrix.parquet files in --matrices-dir, trains the specified models, and writes _model_results.json to --output.
Usage:
Options:
--matrices-dir PATH Directory containing *_matrix.parquet files
from kreview extract [required]
--output PATH Output directory [default: output/]
--models TEXT Comma-separated CPU models: lr,rf,xgb
[default: lr,rf,xgb]
--cv-folds INTEGER Cross-validation folds [default: 5]
--resume Skip evaluators with existing results
--seed INTEGER Random seed for reproducibility. [default:
42]
--deterministic / --no-deterministic
Enable PyTorch deterministic mode (slower
but reproducible). [default: deterministic]
gpu
Per-evaluator evaluation using TabPFN, TabICL (GPU).
Fine-tuning is ON by default. Use --no-finetune for zero-shot. Iterates over all *_matrix.parquet files and writes results JSONs.
Usage:
Options:
--matrices-dir PATH Directory containing *_matrix.parquet files
from kreview extract [required]
--output PATH Output directory [default: output/]
--models TEXT Comma-separated GPU models: tabpfn,tabicl
[default: tabpfn,tabicl]
--cv-folds INTEGER Cross-validation folds [default: 5]
--no-finetune Use zero-shot inference instead of fine-
tuning (not recommended)
--finetune-epochs INTEGER Fine-tuning epochs [default: 30]
--finetune-lr FLOAT Fine-tuning learning rate [default: 1e-05]
--device TEXT PyTorch device: cuda, cpu [default: cuda]
--shap Compute SHAP values
--shap-samples INTEGER Max SHAP samples [default: 500]
--resume Skip evaluators with existing results
--skip-gpu-joblib Skip saving GPU model joblib files (can be
>200MB each)
--seed INTEGER Random seed for reproducibility. [default:
42]
--deterministic / --no-deterministic
Enable PyTorch deterministic mode (slower
but reproducible). [default: deterministic]
multimodal
Cross-evaluator multimodal evaluation with stacking and ablation.
Reads per-evaluator model_results.json files for OOF probabilities and combines them into a stacking matrix. Three strategies are run:
- Stacking: Meta-learner on OOF probabilities across evaluators
- Raw features (if --super-matrix provided): MI or Boruta-SHAP selected features
- Ablation: Leave-one-evaluator-out importance analysis
Usage:
Options:
--results-dir PATH Directory with *_model_results.json files
from eval cpu/gpu [required]
--super-matrix PATH Optional path to super_matrix.parquet for
raw-feature strategy
--output PATH Output directory [default: output/]
--models TEXT Comma-separated CPU models for multimodal
evaluation (lr,rf,xgb) [default: rf,xgb]
--gpu-models TEXT Comma-separated GPU models: tabpfn,tabicl.
Empty = CPU only.
--top-percentile FLOAT Top N%% features for MI selection (matches
per-evaluator pipeline) [default: 10.0]
--multimodal-selection TEXT Multimodal feature selection: mi (default,
fast) or boruta_shap (interaction-aware)
[default: mi]
--cv-folds INTEGER Cross-validation folds [default: 5]
--device TEXT PyTorch device: cuda, cpu [default: cuda]
--no-finetune Zero-shot GPU inference (not recommended)
--finetune-epochs INTEGER GPU fine-tuning epochs [default: 30]
--finetune-lr FLOAT GPU fine-tuning learning rate [default:
1e-05]
--seed INTEGER Random seed for reproducibility. [default:
42]
--deterministic / --no-deterministic
Enable PyTorch deterministic mode (slower
but reproducible). [default: deterministic]
extract
Label samples and extract feature matrices (no eval/model/report).
Runs the labeling pipeline, then extracts features for each matched
evaluator into *_matrix.parquet files. This is the first half of
kreview run, designed for parallelized Nextflow execution.
Usage:
Options:
--cancer-samplesheet PATH Cancer samplesheet CSV [required]
--healthy-xs1-samplesheet PATH Healthy XS1 samplesheet CSV [required]
--healthy-xs2-samplesheet PATH Healthy XS2 samplesheet CSV [required]
--cbioportal-dir PATH Directory with cBioPortal files [required]
--krewlyzer-dir TEXT krewlyzer output directory [required]
--output PATH Output directory for matrices [default:
output/]
--min-vaf FLOAT Min VAF for Possible ctDNA+ (default 1%)
[default: 0.01]
--min-fragments INTEGER Min fragments PF for Depth QC (samples below
are Insufficient Data) [default: 2000]
--min-variants INTEGER Min # variants passing VAF for Possible
ctDNA+ [default: 1]
--ch-hotspot-maf PATH Optional TSV of CH hotspot variants for CH-
only demotion.
--features TEXT Comma-separated evaluator names (default:
all)
--tier INTEGER Run only this tier
--chunk-size TEXT Samples per DuckDB read batch. 'auto'
(default) probes parquet row density at
runtime, or pass an integer to override
(e.g. --chunk-size 200). [default: auto]
--labels PATH Path to a pre-computed labels.parquet file.
When provided, skips the internal labeling
step entirely. Used by Nextflow multistage
to avoid re-running labeling per evaluator.
features-list
List all registered feature evaluators.
Usage:
fuse
Fuse per-evaluator matrices into a single super-matrix.
Discovers all *_matrix.parquet files in --output-dir, extracts
their feature columns (prefixed with evaluator name), outer-joins on
SAMPLE_ID, and writes super_matrix.parquet for downstream multimodal
evaluation.
Usage:
Options:
--output-dir PATH Directory containing *_matrix.parquet files
[required]
--min-evaluators INTEGER Minimum number of evaluators a sample must appear
in to be retained [default: 1]
--output-name TEXT Filename for the fused super-matrix (written to
--output-dir) [default: super_matrix.parquet]
label
Generate ctDNA labels without feature evaluation.
Usage:
Options:
--cancer-samplesheet PATH Cancer samplesheet CSV [required]
--healthy-xs1-samplesheet PATH Healthy XS1 samplesheet CSV [required]
--healthy-xs2-samplesheet PATH Healthy XS2 samplesheet CSV [required]
--cbioportal-dir PATH Directory with cBioPortal files [required]
--output PATH Output parquet file [default:
labels.parquet]
--min-vaf FLOAT Min VAF for Possible ctDNA+ (default 1%)
[default: 0.01]
--min-fragments INTEGER Min fragments PF for Depth QC (samples below
are Insufficient Data) [default: 2000]
--min-variants INTEGER Min # variants passing VAF for Possible
ctDNA+ [default: 1]
--ch-hotspot-maf PATH Optional TSV of CH hotspot variants for CH-
only demotion. Samples with only CH
mutations are demoted to Possible ctDNA−.
report
Re-generate HTML Dashboards from existing matrix parquet files.
Scans input_dir for *_matrix.parquet files, renders each as a
standalone Quarto HTML dashboard, and writes them to out_dir.
Each evaluator is rendered independently so a single failure does not
block the remaining dashboards.
Usage:
Options:
--input-dir PATH Directory with *_matrix.parquet files [required]
--out-dir PATH Directory to deposit Quarto reports [default:
reports/]
--cvd-safe Render dashboards and plots using an Okabe-Ito
Colorblind-Safe palette instead of default neon.
--shap-samples INTEGER Max samples for SHAP explainability computation in
dashboards. [default: 500]
--shap-features INTEGER Max features to visualize in SHAP plots. [default:
10]
--multimodal Render the multimodal dashboard for stacking model
results.
run
Run full pipeline: label → extract → evaluate → report.
Usage:
Options:
--cancer-samplesheet PATH [required]
--healthy-xs1-samplesheet PATH [required]
--healthy-xs2-samplesheet PATH [required]
--cbioportal-dir PATH [required]
--krewlyzer-dir TEXT krewlyzer output directory [required]
--output PATH Output directory [default: output/]
--min-vaf FLOAT [default: 0.01]
--min-fragments INTEGER Min fragments PF for Depth QC (samples below
are Insufficient Data) [default: 2000]
--min-variants INTEGER [default: 1]
--features TEXT Comma-separated features to run
--tier INTEGER Run features of this tier only
--cvd-safe Render dashboards and plots using an Okabe-
Ito Colorblind-Safe palette instead of
default neon.
--skip-report / --no-skip-report
Skip HTML report generation [default: no-
skip-report]
--cv-folds INTEGER Number of cross-validation folds (3-20,
default 5) [default: 5]
--impute-strategy TEXT Imputation strategy for missing values:
median, mean, or zero [default: median]
--export-duckdb Export a persistent duckdb data lake
containing all feature matrices
--chunk-size TEXT Samples per DuckDB read batch. 'auto'
(default) probes parquet row density at
runtime, or pass an integer to override
(e.g. --chunk-size 200). [default: auto]
--top-n INTEGER [DEPRECATED] Use --top-percentile instead.
If set, overrides --top-percentile with a
fixed count.
--top-percentile FLOAT Top X%% of features to select per metric
(AUC, MI). The union of both sets feeds into
models. [default: 10.0]
--strategy TEXT Feature selection strategy: mrmr (default)
or hybrid_union [default: mrmr]
--multimodal-selection TEXT Multimodal selection: mi (default) or
boruta_shap [default: mi]
--shap-samples INTEGER Max samples for SHAP explainability
computation in dashboards. Lower values
reduce memory usage. [default: 500]
--shap-features INTEGER Max features to visualize in SHAP
beeswarm/waterfall plots. [default: 10]
--resume Skip models whose AUC results already exist
in the output JSON. Enables incremental
runs: first CPU, then GPU with --resume.
--compute-univariate-auc Compute per-feature univariate LR AUC.
Required for hybrid selection (default:
True). [default: True]
--ch-hotspot-maf PATH Optional TSV of CH hotspot variants for CH-
only demotion. Samples with only CH
mutations are demoted to Possible ctDNA−.
--gpu-models TEXT Comma-separated GPU models to run after CPU
models: tabpfn,tabicl. Empty (default) = CPU
only. Requires torch + model packages (pip
install kreview[gpu]).
--no-finetune Use zero-shot inference for GPU models
instead of fine-tuning (not recommended).
--finetune-epochs INTEGER Number of fine-tuning epochs for GPU
foundation models. [default: 30]
--finetune-lr FLOAT Learning rate for GPU model fine-tuning.
[default: 1e-05]
--device TEXT PyTorch device for GPU models: cuda, cpu.
[default: cuda]
--skip-gpu-joblib Skip saving GPU model joblib files (can be
>200MB each).
--seed INTEGER Random seed for reproducibility across all
models and CV splits. [default: 42]
--deterministic / --no-deterministic
Enable PyTorch deterministic mode (slower
but reproducible). Default: True. [default:
deterministic]
select
Score features and apply feature selection to extracted matrices.
Reads all *_matrix.parquet files from --matrices-dir, computes
feature scores (univariate AUC + mutual information), and selects the
top features using mRMR (default, redundancy-aware) or hybrid union
(top N%% by AUC ∪ top N%% by MI).
Writes selected matrices, eval stats, and QC metadata to --output
(or overwrites originals when --overwrite is set).
Usage:
Options:
--matrices-dir PATH Directory with *_matrix.parquet files from
kreview extract [required]
--top-percentile FLOAT Top X%% of features to select (controls K
for mRMR, or per-metric cutoff for
hybrid_union) [default: 50.0]
--strategy TEXT Feature selection strategy: mrmr (default,
redundancy-aware) or hybrid_union (AUC∪MI)
[default: mrmr]
--cv-folds INTEGER Cross-validation folds for univariate AUC
scoring [default: 5]
--impute-strategy TEXT Imputation strategy for variance check:
median, mean, zero [default: median]
--output PATH Output directory for selected matrices
(ignored when --overwrite is set) [default:
output/]
--overwrite Overwrite original matrices in --matrices-
dir instead of writing to --output
--compute-univariate-auc / --no-compute-univariate-auc
Compute per-feature univariate AUC (disable
for MI-only selection) [default: compute-
univariate-auc]
--seed INTEGER Random seed for reproducibility. [default:
42]