Panel Mode

Panel mode enables accurate cfDNA analysis for capture-based sequencing panels like MSK-ACCESS.

Overview

When using targeted capture panels, two key issues affect cfDNA analysis:

GC Bias: Capture probes introduce additional GC bias on top of sequencing bias
Coverage Splitting: On-target fragments behave differently than off-target fragments

Panel mode addresses both by: - Training the GC model on off-target fragments only (unbiased by capture) - Computing dual baselines for on-target and off-target regions separately

Enabling Panel Mode

Building GC Reference Assets (One-time)

For panel mode, generate panel-specific GC reference assets:

krewlyzer build-gc-reference hg19.fa -o data/gc/ \
    --target-regions msk_access_baits.bed

This generates both standard and on-target GC reference files.

At PON Build Time

krewlyzer build-pon samples.txt \
    --assay msk-access-v2 \
    --reference hg19.fa \
    --target-regions msk_access_baits.bed \
    --output msk-access.pon.parquet

At Sample Processing Time

The --assay flag enables panel-specific optimizations:

# MSK-ACCESS v2 with all panel features
krewlyzer run-all -i sample.bam -r hg19.fa -o out/ \
    --assay xs2 \
    --target-regions msk_access_baits.bed \
    --pon-model msk-access.pon.parquet

What `--assay` Enables

Feature	Without --assay	With --assay
Gene FSC	Window-based only	+ Gene-level aggregation (`FSC.gene.tsv`)
WPS Anchors	Genome-wide (~15k)	Panel-specific (~2k + genome-wide)
WPS Output	Single `WPS.parquet`	Dual: `WPS.parquet` + `WPS.panel.parquet`
JSON Output	Standard features	+ `fsc_gene`, `wps_panel`

Dual WPS Output

With --assay, Krewlyzer generates two WPS files:

File	Anchors	Use Case
`{sample}.WPS.parquet`	Genome-wide TSS+CTCF	Cancer detection signature
`{sample}.WPS.panel.parquet`	Panel gene anchors	Targeted gene profiling

This dual output provides both broad cancer signals and focused gene-level analysis.

Minimal Panel Mode (No PON)

For quick analysis without a custom PON:

krewlyzer run-all -i sample.bam -r hg19.fa -o out/ \
    --assay xs2 \
    --target-regions targets.bed

Auto-Loading Assets with `--assay`

When you specify --assay, krewlyzer automatically loads bundled assets:

# Auto-loads PON and target regions for xs2
krewlyzer run-all -i sample.bam -r hg19.fa -o out/ --assay xs2

Assay	PON Model	Target Regions
`xs1`	`xs1.all_unique.pon.parquet`	`xs1.targets.bed.gz`
`xs2`	`xs2.all_unique.pon.parquet`	`xs2.targets.bed.gz`
`wgs`	None	None (WGS mode)

This auto-loading applies to all tools including: - extract, run-all, fsc, fsd, fsr, wps, ocf, region-entropy, motif, region-mds, build-pon

Forcing WGS Mode with `--skip-target-regions`

To force WGS-like behavior even when using a panel assay (e.g., for comparison):

# Use xs2 PON but disable panel mode (process as WGS)
krewlyzer run-all -i sample.bam -r hg19.fa -o out/ \
    --assay xs2 \
    --skip-target-regions

This is useful when: - Comparing panel samples to WGS baselines - Running validation without on/off-target splitting - Processing samples where target regions don't apply

Note

--skip-target-regions only disables target region loading. The PON model is still loaded from --assay unless you also add --skip-pon.

Flag Priority

Asset resolution follows this priority order:

Explicit path (--target-regions path/to/file.bed) - highest priority
Skip flag (--skip-target-regions) - forces WGS mode
Bundled asset (auto-loaded from --assay)
None - WGS mode (no targets)

How It Works

GC Correction

WGS Mode:     All fragments → GC model → Single correction curve
Panel Mode:   Off-target fragments → GC model → Unbiased correction curve
              On-target fragments → GC model → Capture-aware correction curve

The GC model is built from off-target fragments because: - On-target fragments have probe-specific GC bias - Off-target fragments represent natural cfDNA (similar to WGS)

Dual Correction Factor Files

In panel mode, krewlyzer extract generates TWO correction factor files:

File	Source	Used For
`{sample}.correction_factors.csv`	Off-target fragments	Primary biomarker analysis
`{sample}.correction_factors.ontarget.csv`	On-target fragments	Copy number, variant calling

Feature Splitting

In panel mode, each feature outputs two files:

Feature	Primary File	On-Target File
FSC	`.FSC.tsv`	`.FSC.ontarget.tsv`
FSR	`.FSR.tsv`	`.FSR.ontarget.tsv`
FSD	`.FSD.tsv`	`.FSD.ontarget.tsv`
OCF	`.OCF.tsv`	`.OCF.ontarget.tsv`
TFBS	`.TFBS.tsv` (genome-wide)	`.TFBS.ontarget.tsv` (panel regions)
ATAC	`.ATAC.tsv` (genome-wide)	`.ATAC.ontarget.tsv` (panel regions)

Note

On-target outputs use on-target GC correction factors (.correction_factors.ontarget.tsv) when available, providing better accuracy for capture-biased data.

Note on OCF ontarget: OCF.ontarget uses both on-target fragments AND panel-filtered OCR regions. This dual-filter approach maximizes signal-to-noise for panel-specific tissue-of-origin detection. See OCF Feature for details.

Note on TFBS/ATAC: - Primary files (.TFBS.tsv, .ATAC.tsv) use all fragments across all ~808 TFs / 23 cancer types → WGS-comparable baseline - On-target files use pre-intersected panel regions → panel-specific signal enrichment

See Region Entropy for details.

Primary files are used for: - Fragment-based biomarkers - GC-corrected coverage analysis - Comparison with WGS baselines

On-target files are used for: - Copy number analysis - Integration with variant calling - Panel-specific tissue signals (OCF)

Target Regions File

The --target-regions BED file should contain the capture probe coordinates:

chr1    11166102    11166202    MTOR_exon1
chr1    27022522    27022622    ARID1A_exon1
...

Use the bait coordinates (not target intervals)
Standard BED format (0-based, half-open)
Optional 4th column for region names

Gene-Centric FSC (MSK-ACCESS)

For MSK-ACCESS panels (v1 and v2), krewlyzer provides gene-level FSC aggregation:

# FSC with gene-level output for MSK-ACCESS v2
krewlyzer fsc -i sample.bed.gz -o out/ --assay xs2

Output Files

File	Description
`{sample}.FSC.tsv`	Standard window-based FSC
`{sample}.FSC.gene.tsv`	Gene-level FSC (146 genes for xs2)

Gene FSC Output Format

gene    n_regions  total_bp  ultra_short  core_short  mono_nucl  di_nucl  long  total  ultra_short_ratio  ...
ATM     62         8432      1234         5678        9012       3456     789   20169  0.0612             ...
BRCA2   42         5689      ...

Supported Assays

Assay	Flag	Genes
MSK-ACCESS v1	`--assay xs1`	128
MSK-ACCESS v2	`--assay xs2`	146

The gene groupings are bundled with krewlyzer in data/genes/GRCh37/.

Panel WPS Anchors (MSK-ACCESS)

For MSK-ACCESS panels, WPS analysis uses panel-specific anchors filtered to genes in the panel:

# WPS with panel-specific anchors for MSK-ACCESS v2
krewlyzer wps -i sample.bed.gz -o out/ \
    --wps-anchors $(python -c "from krewlyzer.core.wps_anchor_filter import get_bundled_wps_anchors; print(get_bundled_wps_anchors('xs2', 'GRCh37'))")

Bundled Panel Anchors

Assay	File	Anchors
MSK-ACCESS v1	`xs1.wps_anchors.bed.gz`	1,611
MSK-ACCESS v2	`xs2.wps_anchors.bed.gz`	1,820

Anchor Types

TSS anchors: Transcription start sites for panel genes
CTCF anchors: CTCF binding sites within 100kb of panel TSS sites

Tip

Using panel-specific anchors reduces noise from irrelevant genome-wide signals and focuses WPS analysis on oncologically relevant regions.

PON Compatibility

The PON model stores whether it was built in panel mode:

from krewlyzer.pon.model import PonModel

pon = PonModel.load("msk-access.pon.parquet")
print(f"Panel mode: {pon.panel_mode}")
print(f"Target file: {pon.target_regions_file}")

For best results, use a PON built with the same --target-regions as sample processing.