Input File Formats
This page documents the expected formats for custom input files used as overrides. Krewlyzer validates these formats when you provide custom files.
Quick Reference
| File Type | Columns | Used By | Example |
|---|---|---|---|
| Sample List | paths | build-pon |
/path/to/sample.bam |
| BED3 | chrom, start, end | --bin-input, --target-regions |
chr1\t0\t100000 |
| Gene BED | chrom, start, end, gene, [name] | --gene-bed |
chr1\t100\t5000\tTP53\texon1 |
| Arms BED | chrom, start, end, arm | --arms-file |
chr1\t0\t125000000\t1p |
| WPS Anchors | BED6 format | --wps-anchors, --wps-background |
chr1\t1000\t2000\tGene_TSS\t0\t+ |
| Region BED | chrom, start, end, label | --ocr-file, --tfbs-regions, --atac-regions |
chr1\t500\t800\tLiver |
| GC Factors TSV | length_bin, gc_pct, factor | --gc-factors |
10\t45\t1.05 |
Sample List
Plain text file with one sample path per line for PON building.
Format
| Input Type | Description |
|---|---|
.bam / .cram |
Full processing including MDS baseline |
.bed.gz |
Pre-extracted fragments (faster, no MDS) |
Notes
- One path per line
- No header row
- Paths can be absolute or relative to working directory
- Mixing BAM and BED.gz inputs is allowed
Used By
build-pon SAMPLE_LIST- First positional argument
BED3
Standard 3-column BED format for genomic intervals.
Format
| Column | Type | Description |
|---|---|---|
| chrom | string | Chromosome (e.g., chr1, chrX) |
| start | int | 0-based start position |
| end | int | 1-based end position (exclusive) |
Example
Used By
--bin-input/-b- Custom bins for FSC/FSR--target-regions/-T- Panel capture regions--mark-input/-m- UXM methylation markers
Gene BED
Extended BED format for gene annotations with 4-5 columns.
Format
| Column | Type | Required | Description |
|---|---|---|---|
| chrom | string | ✅ | Chromosome |
| start | int | ✅ | 0-based start |
| end | int | ✅ | 1-based end |
| gene | string | ✅ | Gene symbol (e.g., TP53) |
| name | string | Optional | Exon/region name |
Example
chr17 7676594 7676707 TP53 exon1
chr17 7676707 7676863 TP53 exon2
chr7 140719327 140724764 BRAF exon15
Used By
- Custom gene files for panel FSC
Arms BED
Chromosome arm annotations for FSD analysis.
Format
| Column | Type | Description |
|---|---|---|
| chrom | string | Chromosome (e.g., chr1) |
| start | int | 0-based start position |
| end | int | 1-based end position |
| arm | string | Arm identifier (must match pattern: Np or Nq) |
Arm Pattern
The arm column must match the regex pattern: ^\d{1,2}[pq]$
Valid examples: 1p, 1q, 22p, 22q
Invalid: Arm1, chr1p, p
Example
Used By
--arms-file/-a- Custom chromosome arms for FSD
WPS Anchors
BED6 format for WPS anchor regions (TSS, CTCF sites, etc.).
Format
| Column | Type | Description |
|---|---|---|
| chrom | string | Chromosome |
| start | int | 0-based start |
| end | int | 1-based end |
| name | string | Anchor name (e.g., TP53_TSS) |
| score | int | Score (typically 0) |
| strand | string | Strand: +, -, or . |
Example
chr1 11873 14409 DDX11L1_TSS 0 +
chr1 29553 31109 MIR1302-2_TSS 0 +
chr17 7676594 7676707 TP53_TSS 0 -
Used By
--wps-anchors- Custom WPS anchor regions--wps-background/-B- Background normalization regions
Region BED
Labeled genomic regions for OCF, TFBS, and ATAC analysis.
Format
| Column | Type | Description |
|---|---|---|
| chrom | string | Chromosome |
| start | int | 0-based start |
| end | int | 1-based end |
| label | string | Region label (tissue, TF name, cancer type) |
Example
Used By
--ocr-file/-r- Open chromatin regions for OCF--tfbs-regions- Transcription factor binding sites--atac-regions- ATAC-seq peaks
GC Factors TSV
Tab-separated correction factors for GC bias normalization.
Format
| Column | Type | Description |
|---|---|---|
| length_bin | int | Fragment length bin: (length - 60) // 5 |
| gc_pct | int | GC percentage (0-100) |
| factor | float | Correction factor (typically 0.5-2.0) |
Example
Notes
- Length bin 10 corresponds to fragments 110-114bp
- GC percentage is rounded to the nearest integer
- Factors near 1.0 indicate minimal bias
Used By
--gc-factors/-F- Custom GC correction factors
Compression Support
All BED files can be gzip-compressed (.bed.gz). Krewlyzer automatically detects and handles compression.
Validation
Validate your files before analysis:
# Validate specific files
krewlyzer validate --gene-bed my_genes.bed
krewlyzer validate --arms-bed my_arms.bed --wps-anchors my_anchors.bed
# Validate bundled assets
krewlyzer validate --genome hg19
If validation fails, you'll see: - Expected format and columns - Line number of the first error - Example of correct format
See Troubleshooting > Asset Validation for common errors.