Skip to content

Evaluation Engine API Reference

The kreview.eval_engine module contains the statistical testing functions, ML model training, visualization generators, and clinical utility computations.

For conceptual explanations, see:


kreview.eval_engine

FeatureEvaluator

Base class for all feature evaluators. Defines the extraction contract that transforms raw DuckDB queries into 1D arrays.

extract(df)

Transform the loaded raw dataframe into meaningful scalar metrics. Called per sample-group or per sample.

parse_array(s)

Parse a string-encoded numeric array into a list of floats.

Handles formats like '[1.0 2.0 3.0]' from parquet serialization. Returns empty list on any parse failure (no silent corruption).

univariate_auc(feature_col, y, n_folds=5, random_state=42)

Compute cross-validated AUC for a single feature using univariate LR.

Parameters:

Name Type Description Default
feature_col

pandas Series or array-like of a single feature.

required
y

binary label array (0/1).

required
n_folds int

number of CV folds.

5
random_state int

random seed.

42

Returns:

Type Description
float

Cross-validated AUC (float). Returns 0.5 if the feature is constant,

float

there are too few samples per class, or CV fails.

set_theme(cvd_safe=False)

Dynamically updates the global label and model colors based on CVD preference.

evaluate_feature(feature_values, labels, total_fragments=None, max_vaf=None)

Run all statistical tests for a single feature in one stratum. Outputs metrics directly to scoring dict.

plot_violin(df, feature_col, label_col='label', title='')

4-group violin with overlaid box plot and individual points for small groups.

plot_density(df, feature_col, label_col='label', title='')

Overlaid density curves per group — shows distribution shape differences.

plot_feature_vs_vaf(df, feature_col, vaf_col='max_vaf', label_col='label', title='')

Continuous relationship between feature and tumor burden (VAF proxy).

plot_roc_curves(y_true_dict, y_score_dict, title='')

Overlay ROC curves for multiple comparisons.

plot_feature_importance(importances, title='')

Bar plot of RF feature importances.

plot_threshold_sensitivity(results_df, title='')

Show how label counts shift with VAF/min_variants thresholds.

decision_curve_analysis(y_true, y_prob, thresholds=None)

Compute Decision Curve Analysis (DCA) net benefit data.

For each threshold, calculates the net benefit of using the model vs treating all or treating none. This helps clinicians choose an operating threshold that balances false positives against missed detections.

Parameters:

Name Type Description Default
y_true ndarray

Binary ground truth labels (0/1).

required
y_prob ndarray

Predicted probabilities for positive class.

required
thresholds ndarray | None

Array of decision thresholds to evaluate. Defaults to np.linspace(0.01, 0.99, 99).

None

Returns:

Type Description
dict

Dictionary with keys thresholds, net_benefit_model, and

dict

net_benefit_treat_all.

single_feature_model(X, y, feature_names=None, cancer_types=None, assays=None, n_folds=5, random_state=42)

Train LR, RF, and XGB on a feature matrix with stratified CV.

Returns (results_dict, lr_pipeline, rf_model, xgb_model).

Fixes applied (audit v3): - C-01: LR uses Pipeline(scaler+lr) to prevent data leakage - C-02: Subgroup metrics use out-of-fold predictions (unbiased) - H-01: LR has class_weight="balanced", XGB has scale_pos_weight - H-07: Bare except replaced with Exception - M-02: Bootstrap 95% CI on AUC values