Skip to content

Integration Test Suite

The instar project includes a Python-based integration test suite that verifies instar info produces output identical to qemu-img info, that instar check correctly detects structural corruption in QCOW2 images, that instar compare produces byte-for-byte identical output to qemu-img compare, and that instar convert produces output identical to qemu-img convert. Since instar aims to be a drop-in replacement, any difference in output is considered a bug.

Architecture

The test suite uses:

  • testtools - Extended unittest framework with better assertions
  • testscenarios - Parameterized test scenarios
  • stestr - Parallel test runner with result storage

Tests compare instar output against either: 1. Live qemu-img output (for safe images - info, compare, convert) 2. Stored expected output files (for malicious images)

Test Categories

Safe Images (test_info_safe.py)

Tests against known-safe disk images. These run qemu-img directly and compare outputs character-for-character.

Malicious Images (test_info_malicious.py)

Tests against images designed to exploit vulnerabilities (e.g., backing file references to /etc/passwd). These use pre-stored expected output files instead of running qemu-img, since running qemu-img on malicious images defeats the security purpose of instar.

Check Validation Tests (test_check_formats.py)

Tests for the instar check operation: - Format detection: Verifies check correctly identifies QCOW2, VMDK, VHD formats - Corrupt images: Tests against deliberately corrupt format headers (VMDK, VHDX, VHD) - QCOW2 structural validation: Uses 4 script-generated corrupt QCOW2 images: - Clean baseline (should pass with 0 errors) - Overlapping clusters (two L2 entries pointing to same host cluster) - Refcount-zero (referenced cluster with refcount=0) - Leaked cluster (refcount>0 but no L2 reference) - Unsafe quirks mode: Verifies non-QCOW2 formats are treated as raw with --unsafe-quirks

Corrupt QCOW2 test images are generated by instar-testdata/custom/check-validation/create-corrupt-images.py, which creates images with qemu-img/qemu-io and then surgically corrupts specific QCOW2 structures via binary manipulation.

Compare Tests (test_compare.py)

Tests for the instar compare operation, cross-validated against qemu-img compare:

Raw-vs-raw (TestCompareRawIdentical, TestCompareRawDifferent, TestCompareRawSizeMismatch, TestCompareRawJson): - Identical images: Self-compare and two identical files - Different content: Mismatch at offset 0 and at mid-file offsets - Size mismatch: Non-strict (zeros = identical), non-strict (non-zero = differs), and strict mode (always fails on size difference) - JSON output: Validates identical, first-mismatch-offset, total-bytes-compared, and size-mismatch fields

QCOW2-vs-raw (TestCompareQcow2VsRaw): - Identical content across formats (including all-zeros) - Different content reports correct mismatch offset - Cross-validated against qemu-img compare

QCOW2-vs-QCOW2 (TestCompareQcow2VsQcow2): - Identical and different content between two QCOW2 images - Virtual size mismatch handling - Cross-validated against qemu-img compare

Compressed QCOW2 (TestCompareQcow2Compressed): - Compressed QCOW2 vs raw with same content (zlib decompression) - Compressed vs uncompressed QCOW2 with same content - Cross-validated against qemu-img compare

Backing chains (TestCompareBackingChain): - QCOW2 overlay with raw backing file vs flattened raw (identical) - QCOW2 overlay vs different raw (mismatch detected) - Deep chain (3-level: top -> mid -> base) vs flattened raw (identical) - Two different QCOW2 backing chains with same virtual content (identical) - All scenarios cross-validated against qemu-img compare

Test images are created at runtime using qemu-img create, qemu-io write, qemu-img convert -c (for compressed), and qemu-img create -b (for backing chains), so no external testdata is needed.

qemu-img cross-validation: Every scenario verifies byte-for-byte identical stdout and matching exit codes with qemu-img compare.

Convert Tests (test_convert.py)

Tests for the instar convert operation (QCOW2 to raw), cross-validated against qemu-img convert:

Basic conversion (TestConvertBasicQcow2ToRaw): - Empty QCOW2 image to raw - QCOW2 with written data to raw - Output size matches virtual size - All cross-validated against qemu-img convert

Compressed QCOW2 (TestConvertCompressed): - Compressed QCOW2 to raw (zlib decompression) - Compared against original raw source

Backing chains (TestConvertBackingChain): - QCOW2 overlay with raw backing flattened to raw - Deep chain (3-level: top -> mid -> base) flattened to raw - Cross-validated against qemu-img convert

Raw passthrough (TestConvertRawToRaw): - Raw to raw identity conversion

Error handling (TestConvertErrors): - Unsupported output format rejected - Nonexistent input file rejected

Manifest images (TestConvertManifestImages): - Converts real-world QCOW2 images from the test manifest - Cross-validates against qemu-img convert output - Skips images with cluster_size > 64KB (unsupported) - Skips images whose virtual_size exceeds available temp space

Adversarial Image Tests (test_adversarial.py)

Tests verifying that instar safely handles malicious and malformed images without crashing, hanging, or consuming excessive resources. Uses the run_adversarial() helper in base.py which enforces timeouts (hang detection), memory limits via RLIMIT_AS (resource exhaustion), and signal checks (crash detection).

Phase 1 — CVE-adjacent attacks: - Compression bombs: Zlib and ZSTD compressed QCOW2 images with extreme expansion ratios. Verifies decompression buffer bounds are enforced and output files stay small. - Circular backing chains: 2-level cycle (A→B→A), 3-level cycle (A→B→C→A), and self-referencing (A→A). Verifies chain discovery detects the cycle and rejects it. - Deep backing chains: Chains at 16 levels (device limit) and 17 levels (exceeds limit). Verifies depth enforcement and correct rejection. - Integer overflow: L1 table size near u32::MAX, L1 size = 0, cluster_bits below minimum (8) and above maximum (22). Verifies checked arithmetic prevents undefined behavior.

Phase 2 — boundary value cases: - Refcount order edges: refcount_order = 7 (128-bit, invalid) and 255 (extreme value). Verifies clamping or rejection. - Oversized virtual size: 1 petabyte and u64::MAX virtual sizes. Verifies info reports the size and check doesn't allocate based on virtual size alone. - VMDK grain size: Zero and huge (2^63) grain sizes. Verifies checked_mul prevents division by zero and overflow. - VHDX conflicting headers: Dual headers with different sequence numbers and valid CRC-32C checksums. - BAT beyond EOF: VHD and VHDX images with BAT entries pointing past end of file. Verifies I/O error handling.

Phase 3 — format confusion: - Polyglot files: QCOW2 magic header with VMDK descriptor body, and QCOW2 magic with ELF binary content. Verifies format detection works (magic wins) and structural validation catches inconsistencies. - Truncated headers: QCOW2 v2 header cut at 32 bytes, VMDK with only 8 bytes (magic + version), VHD footer at 48 bytes. Verifies all operations fail gracefully with no crash. - VMDK descriptor attacks: Null bytes in descriptor, multiple extent declarations, and inflated 1MB descriptor size claim. Verifies parser handles adversarial text safely.

Test images are generated by scripts in instar-testdata/scripts/ (the private testdata repository). Scripts that generate adversarial or CVE-reproducer images must always be placed in instar-testdata, never in the public instar repository.

All generated images live in instar-testdata/custom/audit/.

Security Tests (test_security.py)

Tests verifying instar's security properties: - Backing file references are detected but not followed - External data file references are reported but not read - VMDK descriptor extent paths are not accessed

Running Tests

Make Targets

# Create Python virtual environment (first time only)
make test-venv

# Run safe tests (default, suitable for development)
make test

# Run CI-suitable tests
make test-ci

# Run all tests including malicious images (explicit opt-in)
make test-malicious

# Run with verbose output (useful for debugging diffs)
make test-report

# Clean test artifacts
make clean-tests

Direct stestr Usage

cd tests
source .venv/bin/activate

# Run all tests
stestr run

# Run specific test module
stestr run test_info_safe

# Run with verbose output
stestr run --serial -- --verbose

# List available tests
stestr list

Test Image Manifest

Test images are defined in tests/manifest.json:

{
    "id": "cirros-qcow2",
    "path": "downloaded/cirros/cirros-0.6.3-x86_64-disk.img",
    "format": "qcow2",
    "safety": "safe",
    "run_in_ci": true,
    "description": "CirrOS minimal cloud image",
    "tags": ["qcow2", "cloud-image"]
}

Manifest Fields

Field Description
id Unique identifier for the test image
path Path relative to testdata root
format Expected disk format (qcow2, vmdk, vhd, etc.)
safety safe, caution, or malicious
run_in_ci Whether to include in CI test runs
unsafe_quirks_required If true, requires --unsafe-quirks flag for qemu-img compatibility
description Human-readable description
tags Searchable tags for filtering
expected_override Path to expected output file (for malicious images)

Unsafe Quirks Testing

Images marked with unsafe_quirks_required: true do not have valid format headers or partition tables. In default (secure) mode, instar rejects these files as "unknown format" rather than accepting them as raw images.

To test qemu-img compatibility for these images, use --unsafe-quirks:

# Default mode: rejects files without valid structure
instar info random-garbage.raw
# Error: Unknown format (no valid disk image header or partition table)

# Unsafe quirks mode: matches qemu-img behavior
instar info --unsafe-quirks random-garbage.raw
# file format: raw

See configuration.md and quirks.md for details on safe vs unsafe quirks.

Test Data Location

Test images are stored in a separate repository (instar-testdata) to keep the main repository small. The location is resolved in order:

  1. INSTAR_TESTDATA_PATH environment variable
  2. ../instar-testdata (sibling directory)

Expected Output Overrides

For malicious images where running qemu-img would be dangerous, store the expected output in tests/expected_outputs/:

tests/expected_outputs/
└── qcow2_backing_etc_passwd.txt

Reference this file in the manifest:

{
    "id": "qcow2-backing-passwd",
    "expected_override": "expected_outputs/qcow2_backing_etc_passwd.txt"
}

Image Notes

The docs/image_notes/ directory documents which test images exposed specific quirks or implementation details. When a test image reveals unexpected qemu-img behavior that requires compatibility work, create a markdown file documenting:

  • The specific values that revealed the behavior
  • How qemu-img handles the case
  • How instar now handles it
  • Links to relevant quirks documentation

See Image Notes for existing documentation.

Adding New Test Images

  1. Add the image to instar-testdata repository
  2. Add entry to tests/manifest.json
  3. For safe images: add scenario to test_info_safe.py
  4. For malicious images:
  5. Create expected output file in tests/expected_outputs/
  6. Add scenario to test_info_malicious.py
  7. If the image exposes new quirks: create docs/image_notes/<image-id>.md

Output Comparison

The test suite performs exact string comparison. On failure, it shows:

  • Unified diff with whitespace made visible
  • for trailing spaces
  • for tabs
  • for trailing newlines
  • Raw repr() of both outputs for debugging

Environment Variables

Variable Description
INSTAR_TESTDATA_PATH Override default testdata location
INSTAR_BINARY_PATH Override default instar binary location

Differential Fuzzing

The project includes a differential fuzzer (scripts/differential-fuzz.py) that compares instar against qemu-img on randomly generated images. This is Phase 3 of the security audit plan (PLAN-audit.md).

How it works

For each iteration the fuzzer:

  1. Picks a random seed (logged for reproducibility).
  2. Generates a random disk image with qemu-img create, varying format (qcow2, raw, vmdk, vpc), virtual size (1M-1G), cluster size, compression, and data patterns (zeros, random, sparse, MBR).
  3. Creates separate copies for instar and qemu-img.
  4. Runs a random chain of 2-4 operations (info, check, convert, compressed convert) against both tools.
  5. Compares outputs at each stage: exit codes, normalized JSON info output, and converted file content (SHA-256 of raw-flattened output).

Known quirks (see quirks.md) are excluded from comparison: disk size fields and format-specific metadata.

libyal cross-validation

When libyal tools are installed in the environment (libvmdk-utils, libvhdi-utils, libqcow-utils), the fuzzer adds two additional comparison layers:

  1. Info cross-check: Parsed fields from vmdkinfo, vhdiinfo, and qcowinfo (virtual size, format version, cluster size, etc.) are compared against instar's JSON output for the same image.
  2. Parse-success consistency: For each format, if the libyal tool successfully parses the image, instar check should report no errors (and vice versa). Disagreements are flagged as divergences.

This closes the gap where VMDK/VHD/VHDX had no differential reference for check validation (qemu-img check only supports QCOW2), and provides a third independent opinion for QCOW2. libyal tools are optional — the fuzzer degrades gracefully when they are unavailable.

Running locally

python3 scripts/differential-fuzz.py \
    --instar src/target/release/instar \
    --iterations 100 \
    --seed 42 \
    --log-dir ./fuzz-logs

CI integration

The fuzzer runs automatically via .github/workflows/differential-fuzz.yml at three tiers:

Trigger Iterations When
pull_request 100 PR changes fuzzer script or workflow
push to develop 200 Post-merge smoke test
schedule 1000 Nightly at 02:00 UTC
workflow_dispatch configurable Manual trigger

On failure, the workflow uploads logs as artifacts and auto-files GitHub Issues with the security-audit label, including the seed, iteration, image attributes, and a reproduction command.

Reproducing a divergence

Each divergence report includes the seed and iteration number. To reproduce:

python3 scripts/differential-fuzz.py \
    --instar src/target/release/instar \
    --iterations <ITERATION + 1> \
    --seed <SEED> \
    --fail-fast

Phase 6: Coverage-Guided Fuzzing

Coverage-guided fuzzing uses cargo-fuzz (libFuzzer) to exercise the no_std parser crates directly, without the full VMM/KVM stack. A mock CallTable backed by fuzzer input provides the I/O layer, allowing libFuzzer to explore malformed input space that differential fuzzing (Phase 3) cannot reach.

Fuzz targets

13 targets across all parser crates, organized in src/fuzz/:

Target Crate Type
fuzz_format_detect shared Buffer-based
fuzz_qcow2_header qcow2 Buffer-based
fuzz_qcow2_l1l2 qcow2 CallTable
fuzz_qcow2_refcount qcow2 CallTable
fuzz_qcow2_decompress qcow2 CallTable
fuzz_vmdk_header vmdk Buffer-based
fuzz_vmdk_grain vmdk CallTable
fuzz_vhd_footer vhd Buffer-based
fuzz_vhd_bat vhd CallTable
fuzz_vhdx_header vhdx Buffer-based
fuzz_vhdx_metadata vhdx CallTable
fuzz_raw_partition raw Buffer-based
fuzz_luks_header luks Buffer-based

Buffer-based targets call parser functions that take &[u8] directly (e.g. QcowHeader::parse(data)). CallTable targets use the mock CallTable from src/fuzz/src/lib.rs to simulate sector-based I/O from the fuzzer input.

Running locally

# Inside the instar-build container:
cd src/fuzz

# Run a single target for 60 seconds
cargo fuzz run fuzz_qcow2_header -- -max_total_time=60

# Run with specific corpus
cargo fuzz run fuzz_qcow2_header corpus/fuzz_qcow2_header/

# Minimize a crash
cargo fuzz tmin fuzz_qcow2_header artifacts/fuzz_qcow2_header/<crash>

# Generate coverage report
cargo fuzz coverage fuzz_qcow2_header

Corpus seeding

The seed corpus is extracted from instar-testdata using:

python3 scripts/extract-fuzz-corpus.py --testdata /path/to/instar-testdata

This copies test images into per-target corpus directories under src/fuzz/corpus/, filtered by format. Header-only targets receive truncated copies. Hand-crafted minimal valid inputs are also generated for each format.

CI integration

The CI workflow (.github/workflows/coverage-fuzz.yml) runs: - Nightly: 1 hour per target at 04:00 UTC - PR validation: 60-second smoke test when fuzz/parser code changes - Manual dispatch: configurable duration and target selection

Crashes are minimized with cargo fuzz tmin and filed as GitHub Issues with the security-audit label immediately when found. New corpus entries are pushed to instar-testdata/custom/fuzz-corpus/ after nightly runs.

Automated bug fixes

The CI workflow (.github/workflows/fuzz-autofix.yml) runs daily at 06:00 UTC and picks up open security-audit issues. It invokes Claude Code (30-turn limit) to diagnose and fix the crash, then verifies the fix by rebuilding and running core tests. Two attempts per issue; failed issues are labelled autofix-failed for human attention. Complexity guardrails prevent runaway fixes (max 3 files, no cross-crate changes, no new dependencies).

📝 Report an issue with this page