Labeling API Reference
The kreview.labels module implements the 5-tier ctDNA labeling engine.
For the biological rationale behind each label, see the ctDNA Labeling guide.
kreview.labels
CtDNALabeler
Assigns 5-tier ctDNA labels to ACCESS cfDNA samples.
True ctDNA+, Possible ctDNA+, Possible ctDNA−,
Healthy Normal, Insufficient Data.
Features
- IMPACT tissue rescue for True ctDNA+ confirmation
- Continuous VAF regression targets (mean_vaf, std_vaf)
- Optional CH hotspot filtering with automatic demotion of samples whose only evidence is clonal hematopoiesis
load_ch_hotspots(ch_maf_path)
Load a CH hotspot MAF file and return a set of variant keys.
The MAF file must have columns: Hugo_Symbol, Chromosome, Start_Position, Reference_Allele, Tumor_Seq_Allele2.
Each row is converted to a (chrom, pos, ref, alt) tuple for fast lookup during SNV summary computation.
compute_impact_match(eligible_ids, maf, clinical)
For each eligible ACCESS sample, check if any somatic variant (any VAF) also appears in the same patient's IMPACT tissue sample.
compute_snv_summary(eligible_ids, maf, min_vaf=0.01, min_variants=1, ch_variants=None)
Summarize somatic SNV status per eligible sample.
Returns one row per eligible sample with columns
has_snv, n_somatic_snvs, n_total_somatic_snvs, max_vaf, mean_vaf, std_vaf, n_ch_variants, n_non_ch_variants.
mean_vaf and std_vaf are computed only from VAF-passing variants
(those above min_vaf), providing continuous regression targets
for the Stage 2 Quantifier.
If ch_variants is provided, variants matching CH hotspots are
tagged and counted separately. n_non_ch_variants counts only
VAF-passing, non-CH variants — the key input for CH-only demotion.
compute_sv_summary(eligible_ids, sv_df)
Summarize somatic SV status per eligible sample (binary presence/absence).
compute_cna_summary(eligible_ids, cna_df)
Summarize CNA status per eligible sample.