Skip to content

Evaluator Feature Registry

This documentation is directly synthesized from the nbs/features/*.ipynb Jupyter notebooks. These notebooks act as the active execution environment for each specific biological feature. During the nbdev-export step, they are automatically compiled into the Python classes below.

The registry.py module dynamically discovers all FeatureEvaluator subclasses and registers them into the kreview execution engine.

For the biological rationale behind each feature, see the Fragmentomics Feature Glossary.


πŸ“ Fragment Size Coverage & Distributions

These features measure length distortions in circulating blood DNA caused by necrotic tumor shedding biases.

kreview.features.fsc_gene

FSCGeneEvaluator

Bases: FeatureEvaluator

Extracts all gene-level fragment size characteristics.

kreview.features.fsc_binlevel

FSCOnTargetEvaluator

Bases: FeatureEvaluator

Extracts GC-corrected log2 fragment size category signals from on-target genomic bins.

Only bins with read coverage (total > 0) are included in aggregation. On-target panels typically cover ~2% of bins; without this filter, the 98% zero-coverage bins dominate the median with sentinel values.

kreview.features.fsc_binlevel_genomewide

FSCGenomewideEvaluator

Bases: FeatureEvaluator

Extracts GC-corrected log2 fragment size category signals from genomewide bins.

Only bins with read coverage (total > 0) are included in aggregation. Genomewide panels typically cover ~93% of bins, but the filter ensures uncovered bins don't contribute noise to summary statistics.

kreview.features.fsc_regions

FSCRegionsEvaluator

Bases: FeatureEvaluator

Extracts fragment size category ratios aggregated across gene-level regions.

Only regions with read coverage (total > 0) are included in aggregation. FSC regions are typically ~99.8% covered, so this filter is a consistency safeguard rather than a critical fix.

kreview.features.fsd

FSDOnTargetEvaluator

Bases: FeatureEvaluator

Extracts normalized densities for on-target fragment size buckets.

Derived metrics: - Bimodality index: mono-nucleosomal peak / di-nucleosomal valley - Shannon entropy of the size distribution - 143/166 ratio (classic cfDNA short-fragment proxy) - Per-chromosome 143/166 ratio (if chrom/region column available)

kreview.features.fsd_genomewide

FSDGenomewideEvaluator

Bases: FeatureEvaluator

Extracts normalized densities for genomewide fragment size buckets.

Derived metrics: - Bimodality index: mono-nucleosomal peak / di-nucleosomal valley - Shannon entropy of the size distribution - 143/166 ratio (classic cfDNA short-fragment proxy) - Per-chromosome 143/166 ratio (if chrom/region column available)

kreview.features.fsr

FSROnTargetEvaluator

Bases: FeatureEvaluator

Extracts the short/long fragment size ratio across on-target genomic bins.

Only bins with read coverage (total_count > 0) are included in aggregation. On-target panels typically cover ~2% of bins; without this filter, the 98% zero-coverage bins dominate the median with zero values.

Per-chromosome metrics: median short_long_ratio per chromosome, parsed from region column format chrN:start-end.

kreview.features.fsr_genomewide

FSRGenomewideEvaluator

Bases: FeatureEvaluator

Extracts the short/long fragment size ratio across genomewide bins.

Only bins with read coverage (total_count > 0) are included in aggregation. Genomewide panels typically cover ~93% of bins, but the filter ensures uncovered bins don't contribute noise to summary statistics.

Per-chromosome metrics: median short_long_ratio per chromosome, parsed from region column format chrN:start-end.


βœ‚οΈ Nucleosome Protection (WPS & TFBS)

Measures the physical blockade signatures left by transcription factors and wrapped DNA histones before nuclease shedding.

kreview.features.wps_panel

WPSPanelEvaluator

Bases: FeatureEvaluator

Extracts WPS nucleosome binding geometries with spectral features.

For each WPS array, extracts: - mean, std (original) - peak-to-valley amplitude - median absolute deviation - spectral max power and dominant frequency (FFT-based periodicity) - local_depth scalar (if available)

Handles both numpy array and string columns from krewlyzer parquets.

kreview.features.wps_genomewide

WPSGenomeEvaluator

Bases: FeatureEvaluator

Extracts genome-wide WPS metrics with spectral features.

For each WPS array, extracts: - mean, std (original) - peak-to-valley amplitude (nucleosome occupancy proxy) - median absolute deviation (robust dispersion) - spectral max power and dominant frequency (FFT-based periodicity)

Handles both numpy array and string columns from krewlyzer parquets.

kreview.features.wps_background

WPSBackgroundEvaluator

Bases: FeatureEvaluator

Extracts periodicity distances for nucleosomes.

kreview.features.tfbs

TFBSOnTargetEvaluator

Bases: FeatureEvaluator

Extracts TFBS footprint metrics for on-target regions.

kreview.features.tfbs_genomewide

TFBSGenomewideEvaluator

Bases: FeatureEvaluator

Extracts TFBS footprint metrics for genomewide regions.


πŸ›‘ Cleavage Signatures (EndMotifs)

Models the specific micro-nuclease patterns (like DNASE1L3) structurally slicing accessible DNA at CCCA junctions.

kreview.features.endmotif

EndMotifOnTargetEvaluator

Bases: FeatureEvaluator

Extracts 4-mer fragment end motif frequencies for on-target regions.

Produces raw 256 4-mer frequencies plus derived summary metrics: - Shannon entropy (cleavage site diversity) - DNASE1L3 signature score (CC-ending motif sum) - Top-10 motif concentration - Purine/pyrimidine asymmetry at terminal base

kreview.features.endmotif_genomewide

EndMotifGenomewideEvaluator

Bases: FeatureEvaluator

Extracts 4-mer fragment end motif frequencies for genomewide regions.

Produces raw 256 4-mer frequencies plus derived summary metrics: - Shannon entropy (cleavage site diversity) - DNASE1L3 signature score (CC-ending motif sum) - Top-10 motif concentration - Purine/pyrimidine asymmetry at terminal base

kreview.features.endmotif_1mer

EndMotif1merEvaluator

Bases: FeatureEvaluator

Extracts 1-mer fragment end base frequencies with strand bias metrics.

Derived metrics: - Purine/pyrimidine asymmetry: (A+G) - (C+T) - A/T strand bias: A / (A+T) - C/G strand bias: C / (C+G)

kreview.features.breakpoint_motif

BreakPointMotifOnTargetEvaluator

Bases: FeatureEvaluator

Extracts 4-mer adjacent breakpoint motifs for on-target regions.

kreview.features.breakpoint_motif_genomewide

BreakPointMotifGenomewideEvaluator

Bases: FeatureEvaluator

Extracts 4-mer adjacent breakpoint motifs for genomewide regions.


🧬 Motif Divergence Scores

Measures the statistical divergence of end-motif distributions from healthy baselines.

kreview.features.mds

MDSOnTargetEvaluator

Bases: FeatureEvaluator

On-target MDS signature.

Extracts ALL numeric columns from the single-row MDS on-target parquet rather than just the 2 originally hardcoded scalars.

kreview.features.mds_genomewide

MDSGenomewideEvaluator

Bases: FeatureEvaluator

Genomewide MDS signature.

Extracts ALL numeric columns from the single-row MDS genomewide parquet rather than just the 2 originally hardcoded scalars.

kreview.features.mds_gene

MDSGeneEvaluator

Bases: FeatureEvaluator

Gene-specific MDS signatures with cross-gene distribution statistics.

Per-gene metrics: mds_mean, mds_e1, mds_std, mds_z, mds_e1_z Cross-gene derived: z-score std/skew, fraction of genes diverged (|z|>2)

kreview.features.mds_exon

MDSExonEvaluator

Bases: FeatureEvaluator

Exon-level MDS distributions with cross-exon statistics.

Per-gene exon-level: mean and std of MDS across exons per gene. Cross-exon derived: global mean, std, skew, fraction diverged (|MDS|>2).


πŸ—ΊοΈ Accessibility & Orientation

kreview.features.atac

ATACOnTargetEvaluator

Bases: FeatureEvaluator

Extracts ATAC footprint metrics for on-target regions.

kreview.features.atac_genomewide

ATACGenomewideEvaluator

Bases: FeatureEvaluator

Extracts ATAC footprint metrics for genomewide regions.

kreview.features.ocf_ontarget

OCFOntargetEvaluator

Bases: FeatureEvaluator

Extracts on-target OCF metrics per tissue with cross-tissue aggregates.

Cross-tissue derived metrics: - max_ocf_z: highest z-score across all tissues - n_tissues_elevated: count of tissues with z > 2.0 - ocf_entropy: Shannon entropy of positive z-score distribution

kreview.features.ocf_offtarget

OCFOfftargetEvaluator

Bases: FeatureEvaluator

Extracts off-target OCF metrics per tissue with cross-tissue aggregates.

Cross-tissue derived metrics: - max_ocf_z: highest z-score across all tissues - n_tissues_elevated: count of tissues with z > 2.0 - ocf_entropy: Shannon entropy of positive z-score distribution