Skip to content

Fragment Size Ratio (FSR)

Command: krewlyzer fsr

Plain English

FSR measures the ratio of short (tumor-enriched) to long (healthy) DNA fragments. A higher core_short_long_ratio means more tumor-derived DNA in your sample.

Example: core_short_long_ratio = 1.5 suggests ~30% tumor burden (vs. ~0.9 in healthy plasma).


Purpose

Computes short/long fragment ratios for cancer biomarker analysis. Uses PoN-normalization before ratio calculation for accurate cross-sample comparison.


Processing Flowchart

flowchart LR
    BED["sample.bed.gz"] --> RUST["Rust Pipeline"]
    BINS["100kb Bins"] --> RUST
    GC["GC Correction"] --> RUST

    RUST --> COUNTS["Raw Counts"]

    subgraph "Normalization Order"
        COUNTS --> NORM["PoN Normalize"]
        NORM --> RATIO["Compute Ratio"]
    end

    RATIO --> FSR["FSR.tsv"]

    subgraph "With --target-regions"
        RUST --> FSR_ON["FSR.ontarget.tsv"]
    end
Use mouse to pan and zoom

Python/Rust Architecture

flowchart TB
    subgraph "Python (CLI)"
        CLI["fsr.py"] --> UP["unified_processor.py"]
        UP --> ASSETS["AssetManager"]
    end

    subgraph "Rust Backend"
        UP --> RUST["_core.run_unified_pipeline()"]
        RUST --> GC["GC correction"]
        GC --> COUNT["Size bin counting"]
    end

    subgraph "Python (Post-processing)"
        COUNT --> PROC["fsr_processor.py"]
        PROC --> PON["PON normalization"]
        PON --> OUT["FSR.tsv"]
    end
Use mouse to pan and zoom

Biological Context

The ratio of short to long fragments is a key indicator of tumor burden in cfDNA:

Fragment Type Size Range Biological Source
core_short 100-149bp Tumor DNA (sub-nucleosomal, ~145bp peak)
mono_nucl 150-259bp Standard mono-nucleosomal cfDNA
di_nucl 260-399bp Di-nucleosomal (healthy chromatin)
long 400+bp Very long fragments

Key Biomarker: core_short_long_ratio – Higher ratio = higher probability of tumor DNA

Why Short Fragments = Tumor?

Tumor cells have abnormal chromatin structure: - Disrupted nucleosome positioning → non-canonical cutting - Smaller protected regions → shorter fragments - Result: Tumor cfDNA peaks at ~145bp vs. healthy cfDNA at ~166bp


Usage

# Basic usage
krewlyzer fsr -i sample.bed.gz -o output_dir/ --sample-name SAMPLE

# With PON normalization (recommended)
krewlyzer fsr -i sample.bed.gz -o output/ -P cohort.pon.parquet

# Panel data with on/off-target split
krewlyzer fsr -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

CLI Options

Option Short Type Default Description
--input -i PATH required Input .bed.gz file (output from extract)
--output -o PATH required Output directory
--sample-name -s TEXT Override sample name
--bin-input -b PATH Custom bin file
--pon-model -P PATH PON model for hybrid GC correction
--pon-variant TEXT all_unique PON variant: all_unique or duplex
--skip-pon FLAG Skip PON z-score normalization
--target-regions -T PATH Target BED (for on/off-target split)
--skip-target-regions FLAG Force WGS mode (ignore bundled targets)
--assay -A TEXT Assay code (xs1/xs2) for bundled assets
--genome -G TEXT hg19 Genome build (hg19/hg38)
--windows -w INT 100000 Window size
--continue-n -c INT 50 Consecutive window number
--gc-correct FLAG True Apply GC bias correction
--threads -t INT 0 Number of threads (0=all cores)
--verbose -v FLAG Enable verbose logging

Size Bin Definitions

FSR uses the Rust backend's 5-channel size bins:

Bin Name Size Range Biological Meaning
ultra_short 65-99bp TF footprints, highly tumor-specific
core_short 100-149bp Primary tumor biomarker
mono_nucl 150-259bp Standard mono-nucleosomal
di_nucl 260-399bp Di-nucleosomal, healthy-enriched
long 400+bp Multi-nucleosomal (rare)

Formulas

Normalization Order (Critical)

Important

FSR normalizes counts to PoN BEFORE computing ratios.

Step 1 - Normalize short:

\[ \text{core_short_norm} = \frac{\text{core_short_count}}{\text{PoN_core_short_mean}} \]

Step 2 - Normalize long:

\[ \text{long_norm} = \frac{\text{long_count}}{\text{PoN_long_mean}} \]

Step 3 - Compute ratio:

\[ \text{core_short_long_ratio} = \frac{\text{core_short_norm}}{\text{long_norm}} \]

This removes batch effects before ratio calculation, ensuring accurate cross-sample comparison.

Step 4 - Log2 ratio (optional):

\[ \text{short_long_log2} = \log_2(\text{core_short_long_ratio}) \]
log2 Value Meaning
0 Equal short/long (baseline)
> 0.5 Elevated short fragments (tumor signal)
< -0.5 Depleted short fragments

Output Format

Output: {sample}.FSR.tsv

Column Type Description
region str Genomic region (chr:start-end)
ultra_short_count int Ultra-short fragments (65-99bp)
core_short_count int Core short fragments (100-149bp)
mono_nucl_count int Mono-nucleosomal fragments (150-259bp)
di_nucl_count int Di-nucleosomal fragments (260-399bp)
long_count int Long fragments (400+bp)
total_count int Total fragments
ultra_short_ratio float ultra_short / total
core_short_ratio float core_short / total
mono_nucl_ratio float mono_nucl / total
di_nucl_ratio float di_nucl / total
long_ratio float long / total
core_short_long_ratio float core_short / long (primary biomarker)

Panel Mode (--target-regions)

For targeted sequencing panels (MSK-ACCESS):

krewlyzer fsr -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

Output Files

File Contents Use Case
{sample}.FSR.tsv Off-target fragments Unbiased ratio (primary)
{sample}.FSR.ontarget.tsv On-target fragments Gene-level ratio

Important

Off-target = unbiased – preferred for tumor detection.
On-target = capture-biased – reflects panel design, not true biology.


Clinical Interpretation

Metric Healthy Plasma Cancer (ctDNA)
Modal fragment size ~166bp Left-shifted (~145bp)
core_short_long_ratio ~0.8-1.0 (baseline) >1.2 elevated
Interpretation Normal profile Elevated tumor burden

Decision Flowchart

flowchart TD
    FSR[FSR Results] --> Q1{core_short_long_ratio > 1.3?}
    Q1 -->|Yes| HIGH[High tumor burden]
    Q1 -->|No| Q2{core_short_long_ratio > 1.1?}
    Q2 -->|Yes| MOD[Moderate tumor signal]
    Q2 -->|No| LOW[Low/no detectable tumor]
Use mouse to pan and zoom

Note

Thresholds depend on your cohort and should be validated against known samples.


See Also

  • FSC – Full 5-channel coverage
  • FSD – Size distribution by arm
  • PON Models – Normalization baselines
  • Glossary – Terminology reference
  • Citation – DELFI paper references