Testing Guide

Overview

Krewlyzer has 245 tests covering all features via pytest.

Category	Tests	Speed	Location
Unit	155	<1s	`tests/unit/`
Integration	52	5-30s	`tests/integration/`
CLI	10	2-5s	`tests/cli/`
E2E	3	30-60s	`tests/e2e/`
Asset Resolution	19	<1s	`tests/test_asset_resolution.py`

Note

Rust code is tested via Python. The test_rust_python_equivalence.py suite verifies Rust output matches Python implementations.

Feature → Test Map

When modifying a feature, update the corresponding test file:

Feature	Test File(s)	Tests
FSC	`unit/test_fsc.py`	7
FSD	`unit/test_fsd.py`, `integration/test_fsd_cli.py`	8
FSR	`integration/test_fsr_cli.py`	3
WPS	`unit/test_wps.py`, `integration/test_wps_cli.py`	19
Motif	`integration/test_motif.py`	3
OCF	`integration/test_ocf.py`	1
mFSD	`integration/test_mfsd.py`	9
UXM	`integration/test_uxm.py`	1
Region Entropy	`integration/test_region_entropy.py`	10
Extract	`integration/test_extract.py`	4
PON Model	`unit/test_pon_model.py`	51
PON Validation	`unit/test_pon_validation.py`	14
PON Build	`integration/test_pon.py`, `test_pon_dual_gc.py`	8
Gene BED	`unit/test_gene_bed.py`	25
Asset Manager	`unit/test_asset_manager.py`	10
External Data Dir	`unit/test_external_data_dir.py`	6
Asset Resolution	`test_asset_resolution.py`	19
BGZF Reader	`unit/test_bgzf_reader.py`	5
Normalization	`unit/test_normalization.py`	6
run-all flags	`unit/test_run_all_flags.py`	6
Rust/Python	`unit/test_rust_python_equivalence.py`	11
Real Data	`integration/test_real_data.py`	5

Running Tests

# All tests
pytest tests/

# By category
pytest tests/unit/           # Fast unit tests
pytest tests/integration/    # Tool integration
pytest tests/e2e/            # Full pipeline

# Specific feature
pytest tests/unit/test_fsc.py
pytest tests/unit/test_wps.py -v

# Stop on first failure
pytest -x

# With coverage
pytest tests/ --cov=krewlyzer --cov-report=html

Test Markers

pytest -m unit           # Unit tests only
pytest -m integration    # Integration tests
pytest -m "not slow"     # Skip slow tests

Data Availability

Important

The entire src/krewlyzer/data/ folder is EXCLUDED from PyPI wheels to keep size <100MB. Tests that require bundled data files are automatically skipped in CI/PyPI installs.

Data by Install Method

Install Method	Data Files	Test Coverage
`pip install krewlyzer` (PyPI)	❌ None	~85% (skips asset tests)
`pip install -e .` (git clone)	✅ All	100%
Docker image	✅ All	100%

How It Works

Tests that verify bundled assets use the @requires_data decorator from conftest.py:

from conftest import requires_data

@requires_data
class TestAssetManager:
    def test_gene_bed_exists(self):
        # Skipped if data not available
        ...

Running Full Tests Locally

# Clone the repository (includes data/)
git clone https://github.com/msk-access/krewlyzer.git
cd krewlyzer

# Development install (uses source data directly)
pip install -e ".[test]"

# Run all tests - data-dependent tests will pass
pytest tests/ -v

External Data Directory

For PyPI installs, you can provide data via environment variable:

export KREWLYZER_DATA_DIR=/path/to/krewlyzer/src/krewlyzer/data
pytest tests/ -v

Fixtures

Key fixtures from tests/conftest.py:

Fixture	Description
`temp_bam`	Minimal BAM with proper pair
`temp_bedgz`	Minimal BED.gz (3 fragments)
`temp_reference`	FASTA reference (12kb chr1)
`temp_bins`	FSC bins file
`temp_arms`	FSD arms file
`temp_transcripts`	WPS transcripts file
`temp_ocr`	OCF regions file
`temp_vcf`	mFSD variants file
`full_test_data`	Bundle for run-all
`real_bam`	Production data (3144 reads)
`real_pon`	MSK-ACCESS v1 PON

Adding Tests

Naming Convention

File: test_<feature>.py
Function: test_<action>_<expected>()

Example Unit Test

@pytest.mark.unit
def test_fsc_counts_fragments_by_size(temp_bedgz, temp_bins):
    """FSC should categorize fragments into size bins."""
    result = fsc.count_fragments(temp_bedgz, temp_bins)

    assert "core_short" in result.columns
    assert result["total"].sum() > 0

Example Integration Test

@pytest.mark.integration
def test_fsc_cli_produces_output(temp_bedgz, temp_bins, tmp_path):
    """FSC CLI should write TSV output."""
    from typer.testing import CliRunner
    from krewlyzer.cli import app

    runner = CliRunner()
    result = runner.invoke(app, [
        "fsc", "-i", str(temp_bedgz), 
        "-b", str(temp_bins), "-o", str(tmp_path)
    ])

    assert result.exit_code == 0
    assert (tmp_path / "test.FSC.tsv").exists()

Example Data-Dependent Test

from conftest import requires_data

@requires_data
def test_bundled_gene_bed_loads(manager):
    """Test bundled gene BED (only runs with source data)."""
    path = manager.get_gene_bed("xs1")
    assert path.exists()