Skip to content

Testing Guide

This guide covers running tests, adding new tests, and accuracy validation for gbcms.

Running Tests

Quick Test

# Run all tests
pytest -v

# Run with coverage
pytest --cov=gbcms --cov-report=html

# Run specific test file
pytest tests/test_accuracy.py -v

Test Categories

Category Files Purpose
Accuracy test_accuracy.py SNP, indel, complex variant counting, DP invariant
Shifted Indels test_shifted_indels.py Windowed indel detection (±5bp), 3-layer safeguards
Complex Masking test_fuzzy_complex.py Quality-aware masked comparison, ambiguity detection, MSI gap penalties
Fragment Consensus test_fragment_consensus.py Quality-weighted R1/R2 conflict resolution, DPF invariant
Normalization test_normalization.py Left-alignment, REF validation, homopolymer detection, dynamic window expansion
DP Neither test_dp_neither.py Gap 1D: DP includes third-allele/neither reads
Multi-Allelic test_multi_allelic.py Gap 1A: Sibling ALT exclusion, overlapping indel DP
CLI test_cli_sample_id.py Argument parsing, input validation, error paths, --lenient-bam, subcommands
Filters test_filters.py Read filtering logic
MAF test_maf_*.py MAF column preservation, reader
Pipeline test_pipeline_v2.py End-to-end workflow
Strand test_strand_counts.py Strand-specific counts
Alignment Backend test_alignment_backend.py SW vs PairHMM concordance, backend integration

Rust-Level Tests

# Run Rust unit tests (normalize + counting inline tests)
cd rust && cargo test

# Run a specific Rust test
cargo test test_window_expansion_long_homopolymer

Rust tests live inside #[cfg(test)] modules in normalize.rs (20 tests) and cover:

Area Tests Purpose
Left-alignment 10+ SNP passthrough, homopolymer shifts, offset handling
Repeat detection 3 find_tandem_repeat() edge cases
Adaptive padding 3 Context padding from repeat spans
Window expansion 1 Gap 1B: >100bp repeat normalization

Test Structure

tests/
├── test_accuracy.py             # Variant type accuracy + DP invariant
├── test_cli_sample_id.py        # CLI argument parsing
├── test_dp_neither.py           # Gap 1D: DP includes third-allele reads
├── test_filters.py              # Read filtering
├── test_fragment_consensus.py   # Fragment-level quality consensus + DPF invariant
├── test_fuzzy_complex.py        # Quality-aware masked complex matching + MSI penalties
├── test_maf_preservation.py     # MAF column preservation
├── test_maf_reader.py           # MAF input parsing
├── test_multi_allelic.py        # Gap 1A: Sibling ALT exclusion
├── test_normalization.py        # Left-alignment, REF validation, window expansion
├── test_pipeline_v2.py          # End-to-end pipeline
├── test_shifted_indels.py       # Windowed indel detection (±5bp)
└── test_strand_counts.py        # Strand-specific counts

Writing Tests

Basic Test Template

import pytest
from pathlib import Path

def test_my_feature(tmp_path):
    """Test description."""
    # Arrange
    input_file = tmp_path / "input.txt"
    input_file.write_text("test data")

    # Act
    result = my_function(input_file)

    # Assert
    assert result.success
    assert result.count == 42

Accuracy Test Template

def test_snp_accuracy():
    """Verify SNP counting against known BAM."""
    # Create variant
    variant = Variant("chr1", 100, "A", "T", "SNP")

    # Run counting
    results = count_bam(bam_path, [variant], decomposed=[None], ...)

    # Validate allele counts
    assert results[0].rd == 50
    assert results[0].ad == 10
    # Gap 1D invariant: DP includes ALL reads (including 'neither')
    assert results[0].dp >= results[0].rd + results[0].ad

Multi-Allelic Test Template

def test_with_siblings():
    """Verify sibling ALT exclusion at multi-allelic sites."""
    v1 = Variant("chr1", 100, "A", "T", "SNP")
    v2 = Variant("chr1", 100, "A", "C", "SNP")

    results = count_bam(
        bam_path, [v1, v2], decomposed=[None, None],
        sibling_variants=[[v2], [v1]],  # Gap 1A: sibling info
        ...
    )

Key Invariants to Assert

All counting tests should verify:

Invariant Description
dp >= rd + ad DP includes 'neither' reads (Gap 1D)
dpf >= rdf + adf DPF includes discarded ambiguous fragments
rd == rd_fwd + rd_rev Strand consistency
ad == ad_fwd + ad_rev Strand consistency

Manual Validation

Using samtools for Spot-Check

# Check counts at specific position
samtools mpileup -r chr1:100-100 -q 20 \
    -f ref.fa sample.bam 2>/dev/null | \
    awk '{print "DP="$4}'

Comparing with gbcms Output

# Run gbcms
gbcms run -v variants.maf -b sample.bam -f ref.fa -o output/

# Check output
awk -F'\t' 'NR==2 {print "REF="$41, "ALT="$42}' output/*.maf

Accuracy Validation

Variant Types Tested

Type Test Status
SNP test_snp_accuracy
Insertion test_insertion_accuracy
Deletion test_deletion_accuracy
Complex test_complex_accuracy
MNP test_mnp_accuracy
Shifted Indels test_shifted_indels.py (15 cases)
Complex Masking test_fuzzy_complex.py (15 cases)
DP Neither test_dp_neither.py (3 cases)
Multi-Allelic test_multi_allelic.py (4 cases)
Fragment Consensus test_fragment_consensus.py (3 cases)
Window Expansion test_normalization.py (9 cases)
MSI Gap Penalties test_fuzzy_complex.py::TestGap3A

Real-World Validation

# Compare gbcms vs samtools for a SNP
# Position: chr1:11168293 G>A

# gbcms output
awk -F'\t' '$5=="1" && $6=="11168293" {print "REF="$41, "ALT="$42}' output.maf

# samtools output
samtools mpileup -r 1:11168293-11168293 -q 20 -f ref.fa sample.bam | \
    awk '{gsub(/\^.|\$/,"",$5); print "DP="$4, "Pileup="$5}'

Coverage Targets

Module Target Current
cli.py 90% ~90%
pipeline.py 70% 29%
io/input.py 85% 82%
io/output.py 90% 96%
models/core.py 90% 90%

Run coverage report:

pytest --cov=gbcms --cov-report=html
open htmlcov/index.html