Evaluator Feature Registry
This documentation is directly synthesized from the nbs/features/*.ipynb Jupyter notebooks. These notebooks act as the active execution environment for each specific biological feature. During the nbdev-export step, they are automatically compiled into the Python classes below.
The registry.py module dynamically discovers all FeatureEvaluator subclasses and registers them into the kreview execution engine.
For the biological rationale behind each feature, see the Fragmentomics Feature Glossary.
π Fragment Size Coverage & Distributions
These features measure length distortions in circulating blood DNA caused by necrotic tumor shedding biases.
kreview.features.fsc_gene
FSCGeneEvaluator
kreview.features.fsc_binlevel
FSCOnTargetEvaluator
Bases: FeatureEvaluator
Extracts GC-corrected log2 fragment size category signals from on-target genomic bins.
Only bins with read coverage (total > 0) are included in aggregation. On-target panels typically cover ~2% of bins; without this filter, the 98% zero-coverage bins dominate the median with sentinel values.
kreview.features.fsc_binlevel_genomewide
FSCGenomewideEvaluator
Bases: FeatureEvaluator
Extracts GC-corrected log2 fragment size category signals from genomewide bins.
Only bins with read coverage (total > 0) are included in aggregation. Genomewide panels typically cover ~93% of bins, but the filter ensures uncovered bins don't contribute noise to summary statistics.
kreview.features.fsc_regions
FSCRegionsEvaluator
Bases: FeatureEvaluator
Extracts fragment size category ratios aggregated across gene-level regions.
Only regions with read coverage (total > 0) are included in aggregation. FSC regions are typically ~99.8% covered, so this filter is a consistency safeguard rather than a critical fix.
kreview.features.fsd
FSDOnTargetEvaluator
Bases: FeatureEvaluator
Extracts normalized densities for on-target fragment size buckets.
Derived metrics: - Bimodality index: mono-nucleosomal peak / di-nucleosomal valley - Shannon entropy of the size distribution - 143/166 ratio (classic cfDNA short-fragment proxy) - Per-chromosome 143/166 ratio (if chrom/region column available)
kreview.features.fsd_genomewide
FSDGenomewideEvaluator
Bases: FeatureEvaluator
Extracts normalized densities for genomewide fragment size buckets.
Derived metrics: - Bimodality index: mono-nucleosomal peak / di-nucleosomal valley - Shannon entropy of the size distribution - 143/166 ratio (classic cfDNA short-fragment proxy) - Per-chromosome 143/166 ratio (if chrom/region column available)
kreview.features.fsr
FSROnTargetEvaluator
Bases: FeatureEvaluator
Extracts the short/long fragment size ratio across on-target genomic bins.
Only bins with read coverage (total_count > 0) are included in aggregation. On-target panels typically cover ~2% of bins; without this filter, the 98% zero-coverage bins dominate the median with zero values.
Per-chromosome metrics: median short_long_ratio per chromosome,
parsed from region column format chrN:start-end.
kreview.features.fsr_genomewide
FSRGenomewideEvaluator
Bases: FeatureEvaluator
Extracts the short/long fragment size ratio across genomewide bins.
Only bins with read coverage (total_count > 0) are included in aggregation. Genomewide panels typically cover ~93% of bins, but the filter ensures uncovered bins don't contribute noise to summary statistics.
Per-chromosome metrics: median short_long_ratio per chromosome,
parsed from region column format chrN:start-end.
βοΈ Nucleosome Protection (WPS & TFBS)
Measures the physical blockade signatures left by transcription factors and wrapped DNA histones before nuclease shedding.
kreview.features.wps_panel
WPSPanelEvaluator
Bases: FeatureEvaluator
Extracts WPS nucleosome binding geometries with spectral features.
For each WPS array, extracts: - mean, std (original) - peak-to-valley amplitude - median absolute deviation - spectral max power and dominant frequency (FFT-based periodicity) - local_depth scalar (if available)
Handles both numpy array and string columns from krewlyzer parquets.
kreview.features.wps_genomewide
WPSGenomeEvaluator
Bases: FeatureEvaluator
Extracts genome-wide WPS metrics with spectral features.
For each WPS array, extracts: - mean, std (original) - peak-to-valley amplitude (nucleosome occupancy proxy) - median absolute deviation (robust dispersion) - spectral max power and dominant frequency (FFT-based periodicity)
Handles both numpy array and string columns from krewlyzer parquets.
kreview.features.wps_background
WPSBackgroundEvaluator
kreview.features.tfbs
TFBSOnTargetEvaluator
kreview.features.tfbs_genomewide
TFBSGenomewideEvaluator
π Cleavage Signatures (EndMotifs)
Models the specific micro-nuclease patterns (like DNASE1L3) structurally slicing accessible DNA at CCCA junctions.
kreview.features.endmotif
EndMotifOnTargetEvaluator
Bases: FeatureEvaluator
Extracts 4-mer fragment end motif frequencies for on-target regions.
Produces raw 256 4-mer frequencies plus derived summary metrics: - Shannon entropy (cleavage site diversity) - DNASE1L3 signature score (CC-ending motif sum) - Top-10 motif concentration - Purine/pyrimidine asymmetry at terminal base
kreview.features.endmotif_genomewide
EndMotifGenomewideEvaluator
Bases: FeatureEvaluator
Extracts 4-mer fragment end motif frequencies for genomewide regions.
Produces raw 256 4-mer frequencies plus derived summary metrics: - Shannon entropy (cleavage site diversity) - DNASE1L3 signature score (CC-ending motif sum) - Top-10 motif concentration - Purine/pyrimidine asymmetry at terminal base
kreview.features.endmotif_1mer
EndMotif1merEvaluator
Bases: FeatureEvaluator
Extracts 1-mer fragment end base frequencies with strand bias metrics.
Derived metrics: - Purine/pyrimidine asymmetry: (A+G) - (C+T) - A/T strand bias: A / (A+T) - C/G strand bias: C / (C+G)
kreview.features.breakpoint_motif
BreakPointMotifOnTargetEvaluator
kreview.features.breakpoint_motif_genomewide
BreakPointMotifGenomewideEvaluator
𧬠Motif Divergence Scores
Measures the statistical divergence of end-motif distributions from healthy baselines.
kreview.features.mds
MDSOnTargetEvaluator
Bases: FeatureEvaluator
On-target MDS signature.
Extracts ALL numeric columns from the single-row MDS on-target parquet rather than just the 2 originally hardcoded scalars.
kreview.features.mds_genomewide
MDSGenomewideEvaluator
Bases: FeatureEvaluator
Genomewide MDS signature.
Extracts ALL numeric columns from the single-row MDS genomewide parquet rather than just the 2 originally hardcoded scalars.
kreview.features.mds_gene
MDSGeneEvaluator
Bases: FeatureEvaluator
Gene-specific MDS signatures with cross-gene distribution statistics.
Per-gene metrics: mds_mean, mds_e1, mds_std, mds_z, mds_e1_z Cross-gene derived: z-score std/skew, fraction of genes diverged (|z|>2)
kreview.features.mds_exon
MDSExonEvaluator
Bases: FeatureEvaluator
Exon-level MDS distributions with cross-exon statistics.
Per-gene exon-level: mean and std of MDS across exons per gene. Cross-exon derived: global mean, std, skew, fraction diverged (|MDS|>2).
πΊοΈ Accessibility & Orientation
kreview.features.atac
ATACOnTargetEvaluator
kreview.features.atac_genomewide
ATACGenomewideEvaluator
kreview.features.ocf_ontarget
OCFOntargetEvaluator
Bases: FeatureEvaluator
Extracts on-target OCF metrics per tissue with cross-tissue aggregates.
Cross-tissue derived metrics: - max_ocf_z: highest z-score across all tissues - n_tissues_elevated: count of tissues with z > 2.0 - ocf_entropy: Shannon entropy of positive z-score distribution
kreview.features.ocf_offtarget
OCFOfftargetEvaluator
Bases: FeatureEvaluator
Extracts off-target OCF metrics per tissue with cross-tissue aggregates.
Cross-tissue derived metrics: - max_ocf_z: highest z-score across all tissues - n_tissues_elevated: count of tissues with z > 2.0 - ocf_entropy: Shannon entropy of positive z-score distribution