Integration Test Suite¶
The instar project includes a Python-based integration test suite that verifies
instar info produces output identical to qemu-img info, that instar check
correctly detects structural corruption in QCOW2 images, that instar compare
produces byte-for-byte identical output to qemu-img compare, and that
instar convert produces output identical to qemu-img convert. Since instar
aims to be a drop-in replacement, any difference in output is considered a bug.
Architecture¶
The test suite uses:
- testtools - Extended unittest framework with better assertions
- testscenarios - Parameterized test scenarios
- stestr - Parallel test runner with result storage
Tests compare instar output against either:
1. Live qemu-img output (for safe images - info, compare, convert)
2. Stored expected output files (for malicious images)
Test Categories¶
Safe Images (test_info_safe.py)¶
Tests against known-safe disk images. These run qemu-img directly and
compare outputs character-for-character.
Malicious Images (test_info_malicious.py)¶
Tests against images designed to exploit vulnerabilities (e.g., backing file
references to /etc/passwd). These use pre-stored expected output files
instead of running qemu-img, since running qemu-img on malicious images
defeats the security purpose of instar.
Check Validation Tests (test_check_formats.py)¶
Tests for the instar check operation:
- Format detection: Verifies check correctly identifies QCOW2, VMDK, VHD formats
- Corrupt images: Tests against deliberately corrupt format headers (VMDK, VHDX, VHD)
- QCOW2 structural validation: Uses 4 script-generated corrupt QCOW2 images:
- Clean baseline (should pass with 0 errors)
- Overlapping clusters (two L2 entries pointing to same host cluster)
- Refcount-zero (referenced cluster with refcount=0)
- Leaked cluster (refcount>0 but no L2 reference)
- Unsafe quirks mode: Verifies non-QCOW2 formats are treated as raw with
--unsafe-quirks
Corrupt QCOW2 test images are generated by
instar-testdata/custom/check-validation/create-corrupt-images.py, which creates
images with qemu-img/qemu-io and then surgically corrupts specific QCOW2
structures via binary manipulation.
Compare Tests (test_compare.py)¶
Tests for the instar compare operation, cross-validated against qemu-img
compare:
Raw-vs-raw (TestCompareRawIdentical, TestCompareRawDifferent,
TestCompareRawSizeMismatch, TestCompareRawJson):
- Identical images: Self-compare and two identical files
- Different content: Mismatch at offset 0 and at mid-file offsets
- Size mismatch: Non-strict (zeros = identical), non-strict (non-zero =
differs), and strict mode (always fails on size difference)
- JSON output: Validates identical, first-mismatch-offset,
total-bytes-compared, and size-mismatch fields
QCOW2-vs-raw (TestCompareQcow2VsRaw):
- Identical content across formats (including all-zeros)
- Different content reports correct mismatch offset
- Cross-validated against qemu-img compare
QCOW2-vs-QCOW2 (TestCompareQcow2VsQcow2):
- Identical and different content between two QCOW2 images
- Virtual size mismatch handling
- Cross-validated against qemu-img compare
Compressed QCOW2 (TestCompareQcow2Compressed):
- Compressed QCOW2 vs raw with same content (zlib decompression)
- Compressed vs uncompressed QCOW2 with same content
- Cross-validated against qemu-img compare
Backing chains (TestCompareBackingChain):
- QCOW2 overlay with raw backing file vs flattened raw (identical)
- QCOW2 overlay vs different raw (mismatch detected)
- Deep chain (3-level: top -> mid -> base) vs flattened raw (identical)
- Two different QCOW2 backing chains with same virtual content (identical)
- All scenarios cross-validated against qemu-img compare
Test images are created at runtime using qemu-img create, qemu-io write,
qemu-img convert -c (for compressed), and qemu-img create -b (for backing
chains), so no external testdata is needed.
qemu-img cross-validation: Every scenario verifies byte-for-byte
identical stdout and matching exit codes with qemu-img compare.
Convert Tests (test_convert.py)¶
Tests for the instar convert operation (QCOW2 to raw), cross-validated
against qemu-img convert:
Basic conversion (TestConvertBasicQcow2ToRaw):
- Empty QCOW2 image to raw
- QCOW2 with written data to raw
- Output size matches virtual size
- All cross-validated against qemu-img convert
Compressed QCOW2 (TestConvertCompressed):
- Compressed QCOW2 to raw (zlib decompression)
- Compared against original raw source
Backing chains (TestConvertBackingChain):
- QCOW2 overlay with raw backing flattened to raw
- Deep chain (3-level: top -> mid -> base) flattened to raw
- Cross-validated against qemu-img convert
Raw passthrough (TestConvertRawToRaw):
- Raw to raw identity conversion
Error handling (TestConvertErrors):
- Unsupported output format rejected
- Nonexistent input file rejected
Manifest images (TestConvertManifestImages):
- Converts real-world QCOW2 images from the test manifest
- Cross-validates against qemu-img convert output
- Skips images with cluster_size > 64KB (unsupported)
- Skips images whose virtual_size exceeds available temp space
Adversarial Image Tests (test_adversarial.py)¶
Tests verifying that instar safely handles malicious and malformed images
without crashing, hanging, or consuming excessive resources. Uses the
run_adversarial() helper in base.py which enforces timeouts (hang
detection), memory limits via RLIMIT_AS (resource exhaustion), and signal
checks (crash detection).
Phase 1 — CVE-adjacent attacks: - Compression bombs: Zlib and ZSTD compressed QCOW2 images with extreme expansion ratios. Verifies decompression buffer bounds are enforced and output files stay small. - Circular backing chains: 2-level cycle (A→B→A), 3-level cycle (A→B→C→A), and self-referencing (A→A). Verifies chain discovery detects the cycle and rejects it. - Deep backing chains: Chains at 16 levels (device limit) and 17 levels (exceeds limit). Verifies depth enforcement and correct rejection. - Integer overflow: L1 table size near u32::MAX, L1 size = 0, cluster_bits below minimum (8) and above maximum (22). Verifies checked arithmetic prevents undefined behavior.
Phase 2 — boundary value cases: - Refcount order edges: refcount_order = 7 (128-bit, invalid) and 255 (extreme value). Verifies clamping or rejection. - Oversized virtual size: 1 petabyte and u64::MAX virtual sizes. Verifies info reports the size and check doesn't allocate based on virtual size alone. - VMDK grain size: Zero and huge (2^63) grain sizes. Verifies checked_mul prevents division by zero and overflow. - VHDX conflicting headers: Dual headers with different sequence numbers and valid CRC-32C checksums. - BAT beyond EOF: VHD and VHDX images with BAT entries pointing past end of file. Verifies I/O error handling.
Phase 3 — format confusion: - Polyglot files: QCOW2 magic header with VMDK descriptor body, and QCOW2 magic with ELF binary content. Verifies format detection works (magic wins) and structural validation catches inconsistencies. - Truncated headers: QCOW2 v2 header cut at 32 bytes, VMDK with only 8 bytes (magic + version), VHD footer at 48 bytes. Verifies all operations fail gracefully with no crash. - VMDK descriptor attacks: Null bytes in descriptor, multiple extent declarations, and inflated 1MB descriptor size claim. Verifies parser handles adversarial text safely.
Test images are generated by scripts in instar-testdata/scripts/ (the private
testdata repository). Scripts that generate adversarial or CVE-reproducer images
must always be placed in instar-testdata, never in the public instar repository.
All generated images live in instar-testdata/custom/audit/.
Security Tests (test_security.py)¶
Tests verifying instar's security properties: - Backing file references are detected but not followed - External data file references are reported but not read - VMDK descriptor extent paths are not accessed
Running Tests¶
Make Targets¶
# Create Python virtual environment (first time only)
make test-venv
# Run safe tests (default, suitable for development)
make test
# Run CI-suitable tests
make test-ci
# Run all tests including malicious images (explicit opt-in)
make test-malicious
# Run with verbose output (useful for debugging diffs)
make test-report
# Clean test artifacts
make clean-tests
Direct stestr Usage¶
cd tests
source .venv/bin/activate
# Run all tests
stestr run
# Run specific test module
stestr run test_info_safe
# Run with verbose output
stestr run --serial -- --verbose
# List available tests
stestr list
Test Image Manifest¶
Test images are defined in tests/manifest.json:
{
"id": "cirros-qcow2",
"path": "downloaded/cirros/cirros-0.6.3-x86_64-disk.img",
"format": "qcow2",
"safety": "safe",
"run_in_ci": true,
"description": "CirrOS minimal cloud image",
"tags": ["qcow2", "cloud-image"]
}
Manifest Fields¶
| Field | Description |
|---|---|
id |
Unique identifier for the test image |
path |
Path relative to testdata root |
format |
Expected disk format (qcow2, vmdk, vhd, etc.) |
safety |
safe, caution, or malicious |
run_in_ci |
Whether to include in CI test runs |
unsafe_quirks_required |
If true, requires --unsafe-quirks flag for qemu-img compatibility |
description |
Human-readable description |
tags |
Searchable tags for filtering |
expected_override |
Path to expected output file (for malicious images) |
Unsafe Quirks Testing¶
Images marked with unsafe_quirks_required: true do not have valid format
headers or partition tables. In default (secure) mode, instar rejects these
files as "unknown format" rather than accepting them as raw images.
To test qemu-img compatibility for these images, use --unsafe-quirks:
# Default mode: rejects files without valid structure
instar info random-garbage.raw
# Error: Unknown format (no valid disk image header or partition table)
# Unsafe quirks mode: matches qemu-img behavior
instar info --unsafe-quirks random-garbage.raw
# file format: raw
See configuration.md and quirks.md for details on safe vs unsafe quirks.
Test Data Location¶
Test images are stored in a separate repository (instar-testdata) to keep
the main repository small. The location is resolved in order:
INSTAR_TESTDATA_PATHenvironment variable../instar-testdata(sibling directory)
Expected Output Overrides¶
For malicious images where running qemu-img would be dangerous, store the
expected output in tests/expected_outputs/:
Reference this file in the manifest:
{
"id": "qcow2-backing-passwd",
"expected_override": "expected_outputs/qcow2_backing_etc_passwd.txt"
}
Image Notes¶
The docs/image_notes/ directory documents which test images exposed
specific quirks or implementation details. When a test image reveals
unexpected qemu-img behavior that requires compatibility work, create a
markdown file documenting:
- The specific values that revealed the behavior
- How qemu-img handles the case
- How instar now handles it
- Links to relevant quirks documentation
See Image Notes for existing documentation.
Adding New Test Images¶
- Add the image to
instar-testdatarepository - Add entry to
tests/manifest.json - For safe images: add scenario to
test_info_safe.py - For malicious images:
- Create expected output file in
tests/expected_outputs/ - Add scenario to
test_info_malicious.py - If the image exposes new quirks: create
docs/image_notes/<image-id>.md
Output Comparison¶
The test suite performs exact string comparison. On failure, it shows:
- Unified diff with whitespace made visible
␣for trailing spaces→for tabs↵for trailing newlines- Raw repr() of both outputs for debugging
Environment Variables¶
| Variable | Description |
|---|---|
INSTAR_TESTDATA_PATH |
Override default testdata location |
INSTAR_BINARY_PATH |
Override default instar binary location |
Differential Fuzzing¶
The project includes a differential fuzzer (scripts/differential-fuzz.py)
that compares instar against qemu-img on randomly generated images. This is
Phase 3 of the security audit plan (PLAN-audit.md).
How it works¶
For each iteration the fuzzer:
- Picks a random seed (logged for reproducibility).
- Generates a random disk image with
qemu-img create, varying format (qcow2, raw, vmdk, vpc), virtual size (1M-1G), cluster size, compression, and data patterns (zeros, random, sparse, MBR). - Creates separate copies for instar and qemu-img.
- Runs a random chain of 2-4 operations (info, check, convert, compressed convert) against both tools.
- Compares outputs at each stage: exit codes, normalized JSON info output, and converted file content (SHA-256 of raw-flattened output).
Known quirks (see quirks.md) are excluded from comparison: disk size fields and format-specific metadata.
libyal cross-validation¶
When libyal tools are installed in the environment (libvmdk-utils,
libvhdi-utils, libqcow-utils), the fuzzer adds two additional
comparison layers:
- Info cross-check: Parsed fields from
vmdkinfo,vhdiinfo, andqcowinfo(virtual size, format version, cluster size, etc.) are compared against instar's JSON output for the same image. - Parse-success consistency: For each format, if the libyal tool successfully parses the image, instar check should report no errors (and vice versa). Disagreements are flagged as divergences.
This closes the gap where VMDK/VHD/VHDX had no differential reference
for check validation (qemu-img check only supports QCOW2), and
provides a third independent opinion for QCOW2. libyal tools are
optional — the fuzzer degrades gracefully when they are unavailable.
Running locally¶
python3 scripts/differential-fuzz.py \
--instar src/target/release/instar \
--iterations 100 \
--seed 42 \
--log-dir ./fuzz-logs
CI integration¶
The fuzzer runs automatically via .github/workflows/differential-fuzz.yml
at three tiers:
| Trigger | Iterations | When |
|---|---|---|
pull_request |
100 | PR changes fuzzer script or workflow |
push to develop |
200 | Post-merge smoke test |
schedule |
1000 | Nightly at 02:00 UTC |
workflow_dispatch |
configurable | Manual trigger |
On failure, the workflow uploads logs as artifacts and auto-files GitHub
Issues with the security-audit label, including the seed, iteration,
image attributes, and a reproduction command.
Reproducing a divergence¶
Each divergence report includes the seed and iteration number. To reproduce:
python3 scripts/differential-fuzz.py \
--instar src/target/release/instar \
--iterations <ITERATION + 1> \
--seed <SEED> \
--fail-fast
Phase 6: Coverage-Guided Fuzzing¶
Coverage-guided fuzzing uses cargo-fuzz (libFuzzer) to exercise the
no_std parser crates directly, without the full VMM/KVM stack. A
mock CallTable backed by fuzzer input provides the I/O layer,
allowing libFuzzer to explore malformed input space that differential
fuzzing (Phase 3) cannot reach.
Fuzz targets¶
13 targets across all parser crates, organized in src/fuzz/:
| Target | Crate | Type |
|---|---|---|
fuzz_format_detect |
shared | Buffer-based |
fuzz_qcow2_header |
qcow2 | Buffer-based |
fuzz_qcow2_l1l2 |
qcow2 | CallTable |
fuzz_qcow2_refcount |
qcow2 | CallTable |
fuzz_qcow2_decompress |
qcow2 | CallTable |
fuzz_vmdk_header |
vmdk | Buffer-based |
fuzz_vmdk_grain |
vmdk | CallTable |
fuzz_vhd_footer |
vhd | Buffer-based |
fuzz_vhd_bat |
vhd | CallTable |
fuzz_vhdx_header |
vhdx | Buffer-based |
fuzz_vhdx_metadata |
vhdx | CallTable |
fuzz_raw_partition |
raw | Buffer-based |
fuzz_luks_header |
luks | Buffer-based |
Buffer-based targets call parser functions that take &[u8]
directly (e.g. QcowHeader::parse(data)). CallTable targets
use the mock CallTable from src/fuzz/src/lib.rs to simulate
sector-based I/O from the fuzzer input.
Running locally¶
# Inside the instar-build container:
cd src/fuzz
# Run a single target for 60 seconds
cargo fuzz run fuzz_qcow2_header -- -max_total_time=60
# Run with specific corpus
cargo fuzz run fuzz_qcow2_header corpus/fuzz_qcow2_header/
# Minimize a crash
cargo fuzz tmin fuzz_qcow2_header artifacts/fuzz_qcow2_header/<crash>
# Generate coverage report
cargo fuzz coverage fuzz_qcow2_header
Corpus seeding¶
The seed corpus is extracted from instar-testdata using:
This copies test images into per-target corpus directories under
src/fuzz/corpus/, filtered by format. Header-only targets receive
truncated copies. Hand-crafted minimal valid inputs are also generated
for each format.
CI integration¶
The CI workflow (.github/workflows/coverage-fuzz.yml) runs:
- Nightly: 1 hour per target at 04:00 UTC
- PR validation: 60-second smoke test when fuzz/parser code changes
- Manual dispatch: configurable duration and target selection
Crashes are minimized with cargo fuzz tmin and filed as GitHub
Issues with the security-audit label immediately when found. New
corpus entries are pushed to instar-testdata/custom/fuzz-corpus/
after nightly runs.
Automated bug fixes¶
The CI workflow (.github/workflows/fuzz-autofix.yml) runs daily
at 06:00 UTC and picks up open security-audit issues. It invokes
Claude Code (30-turn limit) to diagnose and fix the crash, then
verifies the fix by rebuilding and running core tests. Two attempts
per issue; failed issues are labelled autofix-failed for human
attention. Complexity guardrails prevent runaway fixes (max 3 files,
no cross-crate changes, no new dependencies).
Related Documentation¶
- Format Coverage - Comparison with oslo.utils format_inspector
- Format Detection Safety - Security model for format auto-detection
- Security Analysis - CVE analysis and threat model