Skip to content

Citation & Scientific Background

If you use Krewlyzer in your work, please cite this repository and the relevant methods papers below.

Primary Literature

Krewlyzer implements or adapts methods from the following foundational papers in cfDNA fragmentomics:


OCF — Orientation-aware Fragmentation

Sun K, Jiang P, Chan KC, et al. Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin. Genome Res. 2019;29(3):418-427. DOI

Key Concept: OCF measures differentially phased fragment ends (Upstream/Downstream) at tissue-specific open chromatin regions to infer tissue-of-origin.

Mechanism: - In open chromatin → nucleosomes are evicted → longer linker DNA exposed - During apoptosis → endonuclease cuts exposed linker DNA
- Creates characteristic pattern: U ends peak ~60bp right, D ends peak ~60bp left of OCR center

Healthy Baseline: - T-cells: Highest OCF (dominant cfDNA source) - Liver: Second highest - Other tissues: Near zero

Cancer Pattern:

Cancer Type OCF Change
HCC (liver) ↑ Liver OCF, correlates with tumor fraction (R=0.36)
Colorectal ↑ Intestine OCF (R=0.89), ↓ T-cell OCF
Lung ↑ Lung OCF, ↓ T-cell OCF

WPS — Windowed Protection Score

Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016;164(1-2):57-68. DOI

Key Concept: WPS quantifies nucleosome occupancy by comparing fragments that span a protection window vs those ending within it.

Formula:

WPS(k) = N_spanning(k) - N_ends(k)

Interpretation:

WPS Value Meaning
Positive Nucleosome present (DNA protected)
~Zero Transitional region
Negative Open chromatin (nucleosome-free)

Healthy vs Cancer: - Nucleosome patterns are cell-type specific → infer tissue-of-origin - Cancer: Aberrant nucleosome positioning at oncogene/TSG promoters - Loss of 10bp periodicity at dysregulated genes


FSC/FSR — Fragment Size Coverage & Ratio (DELFI)

Cristiano S, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385-389. DOI

Mouliere F, Chandrananda D, et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med. 2018;10(466):eaat4921. DOI

Key Concept: DELFI (DNA Evaluation of Fragments for earLy Interception) analyzes short/long fragment ratios genome-wide for cancer detection.

Fragment Classes:

Class Size Range Origin
Short 100-150bp Enriched in tumor cfDNA
Long 151-220bp Healthy/mono-nucleosomal

Healthy vs Cancer:

Metric Healthy Cancer
Modal peak ~166bp Left-shifted (~145bp)
Short/Long ratio Low (baseline) Elevated
Genome-wide variability Minimal Increased aberrations

Performance: 57-99% sensitivity across 7 cancer types at 98% specificity (AUC=0.94)


UXM — Fragment-level Methylation

Loyfer N, et al. A DNA methylation atlas of normal human cell types. Nature. 2022;613(7943):355-364. DOI

Key Concept: Classify each cfDNA fragment as Unmethylated (U), Mixed (X), or Methylated (M) to deconvolve cell-type contributions.

Classification Thresholds: - U: ≤25% methylated CpGs - M: ≥75% methylated CpGs
- X: Between 25-75%

Healthy cfDNA Composition:

Cell Type Contribution
Megakaryocytes ~31%
Granulocytes ~30%
Monocytes/Macrophages ~20%
Endothelial ~6%
Hepatocytes ~3%

Resolution: Achieves ~0.1% detection (10x better than array-based methods)


Motif / Jagged Ends

Zhou Q, et al. Detection and characterization of jagged ends of double-stranded DNA in plasma. Genome Res. 2020;30(8):1144-1153. DOI

Key Concept: cfDNA fragments have single-stranded "jagged" ends that vary by tissue origin and health status.

Key Findings: - 87.8% of cfDNA molecules have jagged ends - Jaggedness relates to nuclease activity (DNASE1/DNASE1L3) - End motif diversity reflects fragmentation patterns

Healthy vs Cancer:

Metric Healthy Cancer (ctDNA)
Jaggedness Lower Higher
Fetal vs Maternal Fetal has higher jaggedness
Tumor vs Wild-type Tumor-derived has higher jaggedness

MDS (Motif Diversity Score): - High (~1.0): Random/diverse fragmentation (healthy-like) - Low: Stereotyped fragmentation (possible tumor signal)


mFSD — Variant-centric Fragment Size

Mouliere F, Chandrananda D, et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med. 2018;10(466):eaat4921. DOI

Key Concept: mFSD analyzes fragment size distributions specifically at variant loci, enabling mutation-level fragmentation profiling.

Methodology: - Extract fragments overlapping known variant positions - Compare size distributions of variant-supporting vs wild-type fragments - Tumor-derived fragments tend to be shorter

Clinical Application: - Enhanced variant calling specificity - Fragment-level evidence for somatic mutations - Integration with VAF for confident detection


Region Entropy — TFBS/ATAC Size Entropy

Helzer KT, Sharifi MN, Sperger JM, et al. Analysis of cfDNA fragmentomics metrics and commercial targeted sequencing panels. Nat Commun 16, 9122 (2025). DOI

Key Concept: Shannon entropy of fragment size distributions at transcription factor binding sites (TFBS) and open chromatin regions enables cancer phenotyping.

Data Sources: - TFBS: GTRD v19.10 — 808 transcription factors, top 5000 experimentally-supported sites per TF - ATAC: TCGA ATAC-seq — 23 cancer-type-specific open chromatin regions

Methodology: From Helzer et al.: "Shannon entropy was calculated on the frequency of the fragment lengths... This yielded a single entropy value for each TF [or cancer type] in each sample."

Key Findings: - TFBS/ATAC entropy works well for cancer detection and subtyping - Can be applied to commercial targeted sequencing panels without WGS - Diversity metrics measure the spread of fragment sizes at regulatory regions

GitHub Data: Zhao-Lab-UW-DHO/fragmentomics_metrics


Region MDS — Per-Gene Motif Diversity Score

Helzer KT, Sharifi MN, Sperger JM, et al. Analysis of cfDNA fragmentomics metrics and commercial targeted sequencing panels. Nat Commun 16, 9122 (2025). DOI

Key Concept: Region MDS applies Motif Diversity Score (Shannon entropy of 4-mer end motifs) at the gene/exon level rather than globally, enabling detection of localized aberrant fragmentation patterns.

Methodology: - Calculate MDS independently for each exon/target region - Identify E1 (first exon) of each gene by genomic position - Aggregate to gene-level statistics (mean, E1, std)

Key Findings (from Helzer et al.): - Per-region fragmentomics metrics work effectively on commercial panels - E1 (first exon) closest to promoter shows most pronounced cancer-associated changes - MDS changes correlate with aberrant gene regulation in cancer

Interpretation:

MDS Value Meaning
Higher (~7.5-8.0) Diverse motif usage (healthy)
Lower (~6.0-7.0) Restricted motifs (potentially aberrant)

Clinical Application: - Detect genes with aberrant fragmentation patterns - Z-score normalization against PON enables per-gene anomaly detection - E1 focus for promoter-proximal signal


Acknowledgements

Krewlyzer was developed by Ronak Shah at Memorial Sloan Kettering Cancer Center.

The fragmentomics methods implemented here build upon foundational work from laboratories worldwide including Dennis Lo (CUHK), Jay Shendure (UW), Victor Velculescu (JHU), and others.