MSK-ACCESS Quickstart

This guide walks through analyzing MSK-ACCESS panel data with Krewlyzer.

Prerequisites

Krewlyzer installed (pip install krewlyzer or Docker)
MSK-ACCESS BAM file (sorted, indexed)
Reference genome (hg19.fa)

Basic Panel Analysis

krewlyzer run-all -i ACCESS_sample.bam \
    --reference hg19.fa \
    --output results/ \
    --assay xs2 \
    --target-regions MSK-ACCESS-v2_targets.bed

What This Does

Flag	Effect
`--assay xs2`	Enables MSK-ACCESS v2 optimizations
`--target-regions`	Splits outputs into on/off-target

Output Files

Standard Outputs

File	Description
`sample.bed.gz`	Extracted fragments
`sample.FSC.tsv`	Fragment size coverage
`sample.FSC.ontarget.tsv`	FSC for on-target regions
`sample.FSR.tsv`	Fragment size ratios
`sample.FSD.tsv`	Size distribution by arm
`sample.WPS.parquet`	Nucleosome protection (genome-wide)
`sample.OCF.tsv`	Tissue of origin footprint
`sample.EndMotif.tsv`	End motif frequencies
`sample.MDS.tsv`	Motif Diversity Score

Panel-Specific Outputs (with `--assay`)

File	Description
`sample.FSC.gene.tsv`	Gene-level FSC (146 genes for xs2)
`sample.WPS.panel.parquet`	Panel-focused WPS (~2k anchors)

With PON Normalization

For z-score computation against healthy baselines:

krewlyzer run-all -i ACCESS_sample.bam \
    --reference hg19.fa \
    --output results/ \
    --assay xs2 \
    --target-regions MSK-ACCESS-v2_targets.bed \
    --pon-model xs2.pon.parquet

This adds z-score columns to all outputs (e.g., z_core_short, wps_nuc_z, mds_z).

JSON Output for ML

Generate a unified JSON file for machine learning pipelines:

krewlyzer run-all -i ACCESS_sample.bam \
    --reference hg19.fa \
    --output results/ \
    --assay xs2 \
    --target-regions MSK-ACCESS-v2_targets.bed \
    --pon-model xs2.pon.parquet \
    --generate-json

This creates sample.features.json with all features. See JSON Output for schema details.

Assay Options

Assay Code	Panel	Genes	WPS Anchors
`xs1`	MSK-ACCESS v1	128	1,611
`xs2`	MSK-ACCESS v2	146	1,820

Standalone Tool Examples

FSC with Gene Aggregation

krewlyzer fsc -i sample.bed.gz -o results/ \
    --assay xs2 \
    --target-regions targets.bed

Outputs: sample.FSC.tsv + sample.FSC.gene.tsv

WPS with Dual Output

krewlyzer wps -i sample.bed.gz -o results/ \
    --assay xs2 \
    --target-regions targets.bed

Outputs: sample.WPS.parquet (genome-wide) + sample.WPS.panel.parquet (panel genes)

Building Your Own PON

Create a PON from your healthy plasma samples:

# Create sample list file
ls /path/to/healthy/*-duplex.bam > healthy_samples.txt

# Build PON (HPC with custom temp directory)
krewlyzer build-pon healthy_samples.txt \
    --assay xs2 \
    --reference hg19.fa \
    --target-regions MSK-ACCESS-v2_targets.bed \
    --temp-dir /scratch/$USER/pon_tmp \
    --output xs2.pon.parquet \
    --threads 16 \
    --verbose

Tip: Use --temp-dir to specify a directory with more disk space than /tmp. Each BAM extraction creates temporary BED.gz files (~100MB each).

Nextflow Batch Processing

For multiple samples, use the Nextflow pipeline:

nextflow run nextflow/main.nf \
    --samplesheet samples.csv \
    --ref /path/to/hg19.fa \
    --outdir results/

Samplesheet Example

sample,bam,meth_bam,vcf,bed,maf,single_sample_maf,assay,pon,targets
ACCESS_001,/data/sample1.bam,,,,,false,XS2,,
ACCESS_002,/data/sample2.bam,,,,,false,XS2,,
ACCESS_003,/data/sample3.bam,,,,,false,XS2,,

When assay=XS2 is set, the pipeline automatically resolves: - PON: {asset_dir}/pon/GRCh37/xs2.pon.parquet - Targets: {asset_dir}/targets/GRCh37/xs2.targets.bed

Key Differences from WGS

Aspect	WGS	MSK-ACCESS
Coverage	Uniform	Targeted
GC Model	All fragments	Off-target only
FSC Output	Windows	+ Gene-level
WPS Output	Genome-wide	+ Panel-specific
Fragments used	All	Split on/off-target

Troubleshooting

"0% of reads pass filters"

Duplex/consensus BAMs need:

krewlyzer run-all ... --no-require-proper-pair

Missing Gene BED

Ensure bundled data is present:

ls $(python -c "from krewlyzer.assets import AssetManager; a=AssetManager('hg19'); print(a.base_path)")/genes/GRCh37/
# Should show: xs1.genes.bed.gz, xs2.genes.bed.gz

Next Steps

Panel Mode Details - How on/off-target splitting works
PON Building - Create your own PON
JSON Schema - ML integration
Troubleshooting - Common issues