Orientation-aware Fragmentation (OCF)

Command: krewlyzer ocf

Plain English

OCF detects where cfDNA fragments came from by looking at their "orientation" near regulatory regions. Different tissues cut DNA in different directions—OCF captures this signal for tissue-of-origin detection.

Use case: Identify liver cancer vs. colon cancer based on cfDNA fragmentation patterns.

Purpose

Computes orientation-aware cfDNA fragmentation (OCF) values in tissue-specific open chromatin regions. Enables tissue-of-origin analysis from cfDNA.

Processing Flowchart

flowchart LR
    BED["sample.bed.gz"] --> RUST["Rust Pipeline"]
    OCR["Open Chromatin Regions"] --> RUST
    GC["GC Correction"] --> RUST

    RUST --> OCF["OCF.tsv"]
    RUST --> SYNC["OCF.sync.tsv"]

    subgraph "With --pon-model"
        OCF --> PON["z-score normalization"]
    end

    subgraph "With --target-regions"
        RUST --> OCF_ON["OCF.ontarget.tsv"]
    end

Use mouse to pan and zoom

Warning

OCF regions are only available for GRCh37/hg19. For hg38, you must provide a custom OCR file with -r/--ocr-input.

Python/Rust Architecture

flowchart TB
    subgraph "Python (CLI)"
        CLI["ocf.py"] --> UP["unified_processor.py"]
        UP --> ASSETS["AssetManager"]
    end

    subgraph "Rust Backend"
        UP --> RUST["_core.run_unified_pipeline()"]
        RUST --> GC["GC correction"]
        GC --> OCF_CALC["OCF strand counting"]
    end

    subgraph "Python (Post-processing)"
        OCF_CALC --> PROC["Output cleanup"]
        PROC --> PON["PON z-scores"]
        PON --> OUT["OCF.tsv"]
    end

Use mouse to pan and zoom

Biological Context

OCF (Sun et al., 2019) measures the phasing of upstream (U) and downstream (D) fragment ends in open chromatin regions, informing tissue-of-origin of cfDNA.

Usage

# Basic usage
krewlyzer ocf -i sample.bed.gz -o output_dir/ --genome hg19

# With PON for z-scores
krewlyzer ocf -i sample.bed.gz -o output/ -P tissue.pon.parquet

# Panel data with on/off-target split
krewlyzer ocf -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

CLI Options

Option	Short	Type	Default	Description
`--input`	`-i`	PATH	required	Input .bed.gz file
`--output`	`-o`	PATH	required	Output directory
`--sample-name`	`-s`	TEXT		Override sample name
`--ocr-input`	`-r`	PATH		Open chromatin regions file
`--target-regions`	`-T`	PATH		Target BED (for on/off-target split)
`--skip-target-regions`		FLAG		Force WGS mode (ignore bundled targets)
`--assay`	`-A`	TEXT		Assay code (xs1/xs2) for bundled assets
`--genome`	`-G`	TEXT	hg19	Genome build (hg19/hg38)
`--pon-model`	`-P`	PATH		PON model for z-score computation
`--pon-variant`		TEXT	all_unique	PON variant: `all_unique` or `duplex`
`--skip-pon`		FLAG		Skip PON z-score normalization
`--gc-correct`		FLAG	True	Apply GC bias correction
`--verbose`	`-v`	FLAG		Enable verbose logging
`--threads`	`-t`	INT	0	Number of threads (0=all)

Output Files

File	Description
`{sample}.OCF.tsv`	Summary OCF per tissue type
`{sample}.OCF.sync.tsv`	Detailed sync scores

Formulas

OCF Score Calculation

\[ \text{OCF} = \sum \left( \text{Right}_{-60} + \text{Left}_{+60} \right) - \sum \left( \text{Left}_{-60} + \text{Right}_{+60} \right) \]

Where: - \(\text{Right}_{-60}\) = Right fragment ends at -60bp from OCR center (phased) - \(\text{Left}_{+60}\) = Left fragment ends at +60bp from OCR center (phased) - \(\text{Left}_{-60}\), \(\text{Right}_{+60}\) = Background (unphased)

Calculation Details: 1. Fragments are mapped relative to the center of the Open Chromatin Region (OCR) 2. Left/Right ends counted in 10bp bins across ±1000bp window 3. Counts normalized by total sequencing depth

PON Normalization

When --pon-model is provided, OCF output includes z-score columns:

Output Columns with PON

Column	Formula	Description
`OCF`	Raw OCF score	Phased fragment orientation
`ocf_z`	`(OCF - PON_mean) / PON_std`	Z-score vs healthy baseline

Z-Score Interpretation

ocf_z	Meaning
-2 to +2	Normal tissue contribution
> +2	Elevated tissue signal (possible tumor origin)
< -2	Decreased tissue contribution

Panel Mode

For targeted sequencing panels (MSK-ACCESS):

krewlyzer ocf -i sample.bed.gz -o output/ \
    --target-regions MSK-ACCESS_targets.bed

How Panel OCF Works

In panel mode, OCF produces two complementary outputs using a sophisticated two-pass approach:

flowchart TB
    subgraph "Pass 1: Genome-Wide"
        OCR1["All 50K OCR regions"] --> RUN1["OCF analysis"]
        FRAGS1["All fragments"] --> RUN1
        RUN1 --> OCF["OCF.tsv"]
        RUN1 --> SYNC["OCF.sync.tsv"]
    end

    subgraph "Pass 2: Panel-Focused"
        OCR2["Panel OCRs (~500)"] --> RUN2["OCF analysis"]
        TARGET["Target regions"] --> FILTER["Filter OCRs"]
        OCR1 --> FILTER
        FILTER --> OCR2
        FRAGS2["On-target frags"] --> RUN2
        RUN2 --> OCFON["OCF.ontarget.tsv"]
    end

Use mouse to pan and zoom

Panel OCF Regions

Before the ontarget OCF run, the genome-wide OCR atlas (~50,000 regions) is filtered to keep only regions that overlap with panel targets (+2kb promoter extension). For a typical panel like MSK-ACCESS:

	Genome-Wide	Panel-Filtered
OCR regions	~50,000	~500
Noise reduction	-	~99%

Output Files

File	Fragment Source	OCR Regions	Use Case
`{sample}.OCF.tsv`	All fragments	All ~50K	Unbiased genome-wide tissue signal
`{sample}.OCF.ontarget.tsv`	On-target only	Panel ~500	Panel-focused tissue signal
`{sample}.OCF.sync.tsv`	All fragments	All ~50K	Debugging/visualization
`{sample}.OCF.ontarget.sync.tsv`	On-target only	Panel ~500	Panel OCF detail
`{sample}.OCF.offtarget.tsv`	Off-target only	All ~50K	Off-target baseline
`{sample}.OCF.offtarget.sync.tsv`	Off-target only	All ~50K	Off-target detail

Note

The ontarget naming is consistent with other features (FSD.ontarget, FSC.ontarget). For OCF, ontarget means both on-target fragments AND panel-filtered OCR regions.

Why Both Filters?

flowchart LR
    subgraph "On-Target Fragments"
        CAP["Captured near panel genes"]
    end

    subgraph "Panel OCR Regions"
        POCR["OCRs near panel genes"]
    end

    CAP --> BOTH["Same genomic space"]
    POCR --> BOTH
    BOTH --> SIGNAL["Maximum signal-to-noise"]

Use mouse to pan and zoom

On-target fragments and panel OCR regions both focus on the same genomic space (near panel target genes), so combining both filters maximizes the signal-to-noise ratio for tissue-of-origin detection.

Example: MSK-ACCESS Panel

Tissue	OCF.tsv (Genome-Wide)	OCF.ontarget.tsv (Panel)
Liver	265.3	52.8
Intestine	224.4	-20.9
Lung	173.0	-9.6
Breast	108.5	19.0
Ovary	123.1	88.9
Placenta	-25.1	-51.6
T-cell	8.9	86.7

Tip

Genome-wide OCF provides the unbiased baseline for tissue-of-origin analysis. Panel OCF provides a focused view specific to your assay's target regions.

Clinical Interpretation

Healthy Plasma Baseline

Tissue	OCF Value
T-cells (hematopoietic)	Highest
Liver	Second highest
Other tissues	Near zero

Cancer-Specific Patterns

Cancer Type	Expected OCF Change
Hepatocellular carcinoma	↑ Liver OCF
Colorectal cancer	↑ Intestine OCF, ↓ T-cell OCF
Lung cancer	↑ Lung OCF, ↓ T-cell OCF

Interpretation Guide

Pattern	Interpretation
↑ Tissue-specific OCF	Tumor shedding from that tissue
↓ T-cell OCF	Dilution by tumor DNA
OCF correlates with tumor fraction	Higher ctDNA → stronger signal

Orientation-aware Fragmentation (OCF)

Purpose

Processing Flowchart

Python/Rust Architecture

Biological Context

Usage

CLI Options

Output Files

Formulas

OCF Score Calculation

PON Normalization

Output Columns with PON

Z-Score Interpretation

Panel Mode

How Panel OCF Works

Panel OCF Regions

Output Files

Why Both Filters?

Example: MSK-ACCESS Panel

Clinical Interpretation

Healthy Plasma Baseline

Cancer-Specific Patterns

Interpretation Guide

See Also