Citation & Scientific Background
If you use Krewlyzer in your work, please cite this repository and the relevant methods papers below.
Primary Literature
Krewlyzer implements or adapts methods from the following foundational papers in cfDNA fragmentomics:
OCF — Orientation-aware Fragmentation
Sun K, Jiang P, Chan KC, et al. Orientation-aware plasma cell-free DNA fragmentation analysis in open chromatin regions informs tissue of origin. Genome Res. 2019;29(3):418-427. DOI
Key Concept: OCF measures differentially phased fragment ends (Upstream/Downstream) at tissue-specific open chromatin regions to infer tissue-of-origin.
Mechanism:
- In open chromatin → nucleosomes are evicted → longer linker DNA exposed
- During apoptosis → endonuclease cuts exposed linker DNA
- Creates characteristic pattern: U ends peak ~60bp right, D ends peak ~60bp left of OCR center
Healthy Baseline: - T-cells: Highest OCF (dominant cfDNA source) - Liver: Second highest - Other tissues: Near zero
Cancer Pattern:
| Cancer Type | OCF Change |
|---|---|
| HCC (liver) | ↑ Liver OCF, correlates with tumor fraction (R=0.36) |
| Colorectal | ↑ Intestine OCF (R=0.89), ↓ T-cell OCF |
| Lung | ↑ Lung OCF, ↓ T-cell OCF |
WPS — Windowed Protection Score
Snyder MW, Kircher M, Hill AJ, Daza RM, Shendure J. Cell-free DNA Comprises an In Vivo Nucleosome Footprint that Informs Its Tissues-Of-Origin. Cell. 2016;164(1-2):57-68. DOI
Key Concept: WPS quantifies nucleosome occupancy by comparing fragments that span a protection window vs those ending within it.
Formula:
Interpretation:
| WPS Value | Meaning |
|---|---|
| Positive | Nucleosome present (DNA protected) |
| ~Zero | Transitional region |
| Negative | Open chromatin (nucleosome-free) |
Healthy vs Cancer: - Nucleosome patterns are cell-type specific → infer tissue-of-origin - Cancer: Aberrant nucleosome positioning at oncogene/TSG promoters - Loss of 10bp periodicity at dysregulated genes
FSC/FSR — Fragment Size Coverage & Ratio (DELFI)
Cristiano S, et al. Genome-wide cell-free DNA fragmentation in patients with cancer. Nature. 2019;570(7761):385-389. DOI
Mouliere F, Chandrananda D, et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med. 2018;10(466):eaat4921. DOI
Key Concept: DELFI (DNA Evaluation of Fragments for earLy Interception) analyzes short/long fragment ratios genome-wide for cancer detection.
Fragment Classes:
| Class | Size Range | Origin |
|---|---|---|
| Short | 100-150bp | Enriched in tumor cfDNA |
| Long | 151-220bp | Healthy/mono-nucleosomal |
Healthy vs Cancer:
| Metric | Healthy | Cancer |
|---|---|---|
| Modal peak | ~166bp | Left-shifted (~145bp) |
| Short/Long ratio | Low (baseline) | Elevated |
| Genome-wide variability | Minimal | Increased aberrations |
Performance: 57-99% sensitivity across 7 cancer types at 98% specificity (AUC=0.94)
UXM — Fragment-level Methylation
Loyfer N, et al. A DNA methylation atlas of normal human cell types. Nature. 2022;613(7943):355-364. DOI
Key Concept: Classify each cfDNA fragment as Unmethylated (U), Mixed (X), or Methylated (M) to deconvolve cell-type contributions.
Classification Thresholds:
- U: ≤25% methylated CpGs
- M: ≥75% methylated CpGs
- X: Between 25-75%
Healthy cfDNA Composition:
| Cell Type | Contribution |
|---|---|
| Megakaryocytes | ~31% |
| Granulocytes | ~30% |
| Monocytes/Macrophages | ~20% |
| Endothelial | ~6% |
| Hepatocytes | ~3% |
Resolution: Achieves ~0.1% detection (10x better than array-based methods)
Motif / Jagged Ends
Zhou Q, et al. Detection and characterization of jagged ends of double-stranded DNA in plasma. Genome Res. 2020;30(8):1144-1153. DOI
Key Concept: cfDNA fragments have single-stranded "jagged" ends that vary by tissue origin and health status.
Key Findings: - 87.8% of cfDNA molecules have jagged ends - Jaggedness relates to nuclease activity (DNASE1/DNASE1L3) - End motif diversity reflects fragmentation patterns
Healthy vs Cancer:
| Metric | Healthy | Cancer (ctDNA) |
|---|---|---|
| Jaggedness | Lower | Higher |
| Fetal vs Maternal | Fetal has higher jaggedness | — |
| Tumor vs Wild-type | — | Tumor-derived has higher jaggedness |
MDS (Motif Diversity Score): - High (~1.0): Random/diverse fragmentation (healthy-like) - Low: Stereotyped fragmentation (possible tumor signal)
mFSD — Variant-centric Fragment Size
Mouliere F, Chandrananda D, et al. Enhanced detection of circulating tumor DNA by fragment size analysis. Sci Transl Med. 2018;10(466):eaat4921. DOI
Key Concept: mFSD analyzes fragment size distributions specifically at variant loci, enabling mutation-level fragmentation profiling.
Methodology: - Extract fragments overlapping known variant positions - Compare size distributions of variant-supporting vs wild-type fragments - Tumor-derived fragments tend to be shorter
Clinical Application: - Enhanced variant calling specificity - Fragment-level evidence for somatic mutations - Integration with VAF for confident detection
Region Entropy — TFBS/ATAC Size Entropy
Helzer KT, Sharifi MN, Sperger JM, et al. Analysis of cfDNA fragmentomics metrics and commercial targeted sequencing panels. Nat Commun 16, 9122 (2025). DOI
Key Concept: Shannon entropy of fragment size distributions at transcription factor binding sites (TFBS) and open chromatin regions enables cancer phenotyping.
Data Sources: - TFBS: GTRD v19.10 — 808 transcription factors, top 5000 experimentally-supported sites per TF - ATAC: TCGA ATAC-seq — 23 cancer-type-specific open chromatin regions
Methodology: From Helzer et al.: "Shannon entropy was calculated on the frequency of the fragment lengths... This yielded a single entropy value for each TF [or cancer type] in each sample."
Key Findings: - TFBS/ATAC entropy works well for cancer detection and subtyping - Can be applied to commercial targeted sequencing panels without WGS - Diversity metrics measure the spread of fragment sizes at regulatory regions
GitHub Data: Zhao-Lab-UW-DHO/fragmentomics_metrics
Region MDS — Per-Gene Motif Diversity Score
Helzer KT, Sharifi MN, Sperger JM, et al. Analysis of cfDNA fragmentomics metrics and commercial targeted sequencing panels. Nat Commun 16, 9122 (2025). DOI
Key Concept: Region MDS applies Motif Diversity Score (Shannon entropy of 4-mer end motifs) at the gene/exon level rather than globally, enabling detection of localized aberrant fragmentation patterns.
Methodology: - Calculate MDS independently for each exon/target region - Identify E1 (first exon) of each gene by genomic position - Aggregate to gene-level statistics (mean, E1, std)
Key Findings (from Helzer et al.): - Per-region fragmentomics metrics work effectively on commercial panels - E1 (first exon) closest to promoter shows most pronounced cancer-associated changes - MDS changes correlate with aberrant gene regulation in cancer
Interpretation:
| MDS Value | Meaning |
|---|---|
| Higher (~7.5-8.0) | Diverse motif usage (healthy) |
| Lower (~6.0-7.0) | Restricted motifs (potentially aberrant) |
Clinical Application: - Detect genes with aberrant fragmentation patterns - Z-score normalization against PON enables per-gene anomaly detection - E1 focus for promoter-proximal signal
Acknowledgements
Krewlyzer was developed by Ronak Shah at Memorial Sloan Kettering Cancer Center.
The fragmentomics methods implemented here build upon foundational work from laboratories worldwide including Dennis Lo (CUHK), Jay Shendure (UW), Victor Velculescu (JHU), and others.