Fragment Size Ratio (FSR)

Command: krewlyzer fsr

Plain English

FSR measures the ratio of short (tumor-enriched) to long (healthy) DNA fragments. A higher core_short_long_ratio means more tumor-derived DNA in your sample.

Example: core_short_long_ratio = 1.5 suggests ~30% tumor burden (vs. ~0.9 in healthy plasma).

Purpose

Computes short/long fragment ratios for cancer biomarker analysis. Uses PoN-normalization before ratio calculation for accurate cross-sample comparison.

Processing Flowchart

flowchart LR
    BED["sample.bed.gz"] --> RUST["Rust Pipeline"]
    BINS["100kb Bins"] --> RUST
    GC["GC Correction"] --> RUST

    RUST --> COUNTS["Raw Counts"]

    subgraph "Normalization Order"
        COUNTS --> NORM["PoN Normalize"]
        NORM --> RATIO["Compute Ratio"]
    end

    RATIO --> FSR["FSR.tsv"]

    subgraph "With --target-regions"
        RUST --> FSR_ON["FSR.ontarget.tsv"]
    end

Use mouse to pan and zoom

Python/Rust Architecture

flowchart TB
    subgraph "Python (CLI)"
        CLI["fsr.py"] --> UP["unified_processor.py"]
        UP --> ASSETS["AssetManager"]
    end

    subgraph "Rust Backend"
        UP --> RUST["_core.run_unified_pipeline()"]
        RUST --> GC["GC correction"]
        GC --> COUNT["Size bin counting"]
    end

    subgraph "Python (Post-processing)"
        COUNT --> PROC["fsr_processor.py"]
        PROC --> PON["PON normalization"]
        PON --> OUT["FSR.tsv"]
    end

Use mouse to pan and zoom

Biological Context

The ratio of short to long fragments is a key indicator of tumor burden in cfDNA:

Fragment Type	Size Range	Biological Source
core_short	100-149bp	Tumor DNA (sub-nucleosomal, ~145bp peak)
mono_nucl	150-259bp	Standard mono-nucleosomal cfDNA
di_nucl	260-399bp	Di-nucleosomal (healthy chromatin)
long	400+bp	Very long fragments

Key Biomarker: core_short_long_ratio – Higher ratio = higher probability of tumor DNA

Why Short Fragments = Tumor?

Tumor cells have abnormal chromatin structure: - Disrupted nucleosome positioning → non-canonical cutting - Smaller protected regions → shorter fragments - Result: Tumor cfDNA peaks at ~145bp vs. healthy cfDNA at ~166bp

Usage

# Basic usage
krewlyzer fsr -i sample.bed.gz -o output_dir/ --sample-name SAMPLE

# With PON normalization (recommended)
krewlyzer fsr -i sample.bed.gz -o output/ -P cohort.pon.parquet

# Panel data with on/off-target split
krewlyzer fsr -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

CLI Options

Option	Short	Type	Default	Description
`--input`	`-i`	PATH	required	Input .bed.gz file (output from extract)
`--output`	`-o`	PATH	required	Output directory
`--sample-name`	`-s`	TEXT		Override sample name
`--bin-input`	`-b`	PATH		Custom bin file
`--pon-model`	`-P`	PATH		PON model for hybrid GC correction
`--pon-variant`		TEXT	all_unique	PON variant: `all_unique` or `duplex`
`--skip-pon`		FLAG		Skip PON z-score normalization
`--target-regions`	`-T`	PATH		Target BED (for on/off-target split)
`--skip-target-regions`		FLAG		Force WGS mode (ignore bundled targets)
`--assay`	`-A`	TEXT		Assay code (xs1/xs2) for bundled assets
`--genome`	`-G`	TEXT	hg19	Genome build (hg19/hg38)
`--windows`	`-w`	INT	100000	Window size
`--continue-n`	`-c`	INT	50	Consecutive window number
`--gc-correct`		FLAG	True	Apply GC bias correction
`--threads`	`-t`	INT	0	Number of threads (0=all cores)
`--verbose`	`-v`	FLAG		Enable verbose logging

Size Bin Definitions

FSR uses the Rust backend's 5-channel size bins:

Bin Name	Size Range	Biological Meaning
`ultra_short`	65-99bp	TF footprints, highly tumor-specific
`core_short`	100-149bp	Primary tumor biomarker
`mono_nucl`	150-259bp	Standard mono-nucleosomal
`di_nucl`	260-399bp	Di-nucleosomal, healthy-enriched
`long`	400+bp	Multi-nucleosomal (rare)

Formulas

Normalization Order (Critical)

Important

FSR normalizes counts to PoN BEFORE computing ratios.

Step 1 - Normalize short:

\[ \text{core_short_norm} = \frac{\text{core_short_count}}{\text{PoN_core_short_mean}} \]

Step 2 - Normalize long:

\[ \text{long_norm} = \frac{\text{long_count}}{\text{PoN_long_mean}} \]

Step 3 - Compute ratio:

\[ \text{core_short_long_ratio} = \frac{\text{core_short_norm}}{\text{long_norm}} \]

This removes batch effects before ratio calculation, ensuring accurate cross-sample comparison.

Step 4 - Log2 ratio (optional):

\[ \text{short_long_log2} = \log_2(\text{core_short_long_ratio}) \]

log2 Value	Meaning
0	Equal short/long (baseline)
> 0.5	Elevated short fragments (tumor signal)
< -0.5	Depleted short fragments

Output Format

Output: {sample}.FSR.tsv

Column	Type	Description
`region`	str	Genomic region (chr:start-end)
`ultra_short_count`	int	Ultra-short fragments (65-99bp)
`core_short_count`	int	Core short fragments (100-149bp)
`mono_nucl_count`	int	Mono-nucleosomal fragments (150-259bp)
`di_nucl_count`	int	Di-nucleosomal fragments (260-399bp)
`long_count`	int	Long fragments (400+bp)
`total_count`	int	Total fragments
`ultra_short_ratio`	float	ultra_short / total
`core_short_ratio`	float	core_short / total
`mono_nucl_ratio`	float	mono_nucl / total
`di_nucl_ratio`	float	di_nucl / total
`long_ratio`	float	long / total
`core_short_long_ratio`	float	core_short / long (primary biomarker)

Panel Mode (--target-regions)

For targeted sequencing panels (MSK-ACCESS):

krewlyzer fsr -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

Output Files

File	Contents	Use Case
`{sample}.FSR.tsv`	Off-target fragments	Unbiased ratio (primary)
`{sample}.FSR.ontarget.tsv`	On-target fragments	Gene-level ratio

Important

Off-target = unbiased – preferred for tumor detection.
On-target = capture-biased – reflects panel design, not true biology.

Clinical Interpretation

Metric	Healthy Plasma	Cancer (ctDNA)
Modal fragment size	~166bp	Left-shifted (~145bp)
`core_short_long_ratio`	~0.8-1.0 (baseline)	>1.2 elevated
Interpretation	Normal profile	Elevated tumor burden

Decision Flowchart

flowchart TD
    FSR[FSR Results] --> Q1{core_short_long_ratio > 1.3?}
    Q1 -->|Yes| HIGH[High tumor burden]
    Q1 -->|No| Q2{core_short_long_ratio > 1.1?}
    Q2 -->|Yes| MOD[Moderate tumor signal]
    Q2 -->|No| LOW[Low/no detectable tumor]

Use mouse to pan and zoom

Note

Thresholds depend on your cohort and should be validated against known samples.

Fragment Size Ratio (FSR)

Purpose

Processing Flowchart

Python/Rust Architecture

Biological Context

Why Short Fragments = Tumor?

Usage

CLI Options

Size Bin Definitions

Formulas

Normalization Order (Critical)

Output Format

Panel Mode (--target-regions)

Output Files

Clinical Interpretation

Decision Flowchart

See Also