Skip to content

Labeling API Reference

The kreview.labels module implements the 5-tier ctDNA labeling engine.

For the biological rationale behind each label, see the ctDNA Labeling guide.


kreview.labels

CtDNALabeler

Assigns 5-tier ctDNA labels to ACCESS cfDNA samples.

True ctDNA+, Possible ctDNA+, Possible ctDNA−,

Healthy Normal, Insufficient Data.

Features
  • IMPACT tissue rescue for True ctDNA+ confirmation
  • Continuous VAF regression targets (mean_vaf, std_vaf)
  • Optional CH hotspot filtering with automatic demotion of samples whose only evidence is clonal hematopoiesis

load_ch_hotspots(ch_maf_path)

Load a CH hotspot MAF file and return a set of variant keys.

The MAF file must have columns: Hugo_Symbol, Chromosome, Start_Position, Reference_Allele, Tumor_Seq_Allele2.

Each row is converted to a (chrom, pos, ref, alt) tuple for fast lookup during SNV summary computation.

compute_impact_match(eligible_ids, maf, clinical)

For each eligible ACCESS sample, check if any somatic variant (any VAF) also appears in the same patient's IMPACT tissue sample.

compute_snv_summary(eligible_ids, maf, min_vaf=0.01, min_variants=1, ch_variants=None)

Summarize somatic SNV status per eligible sample.

Returns one row per eligible sample with columns

has_snv, n_somatic_snvs, n_total_somatic_snvs, max_vaf, mean_vaf, std_vaf, n_ch_variants, n_non_ch_variants.

mean_vaf and std_vaf are computed only from VAF-passing variants (those above min_vaf), providing continuous regression targets for the Stage 2 Quantifier.

If ch_variants is provided, variants matching CH hotspots are tagged and counted separately. n_non_ch_variants counts only VAF-passing, non-CH variants — the key input for CH-only demotion.

compute_sv_summary(eligible_ids, sv_df)

Summarize somatic SV status per eligible sample (binary presence/absence).

compute_cna_summary(eligible_ids, cna_df)

Summarize CNA status per eligible sample.