Phase 6: integration tests¶
Master plan: PLAN-map.md · Previous phase: PLAN-map-phase-05-baselines.md
Status: Complete¶
tests/test_map.py shipped with five test classes
(TestMapSmoke / TestMapBaselineSource / TestMapWindowFilter /
TestMapErrorPaths / TestMapDivergenceRegression). Final
count: 95 active tests + 91 documented skips, 0 failures.
tests/base.py gained 'map' in COMMAND_OUTPUT_DIRS and
the get_profile_for_installed_qemu helper for the 3
map-json profiles. The full sweep surfaced and fixed two
real bugs mid-phase: (1) JSON output was missing a trailing
newline (renderer fixed; the phase 4 "no trailing newline"
doc note was a cat -A misread, corrected in
docs/quirks.md); (2) host-side --start-offset > file_size
check compared against the on-disk file size rather than the
virtual size, causing spurious rejections for sparse
qcow2 sources (check removed).
Mission¶
Add tests/test_map.py that exercises instar map end-to-
end against the phase 5 baseline matrix. Each safe-tier
source image × output type cell is compared byte-for-byte
against the version-keyed expected output for the qemu-img
installed on the test host. Cases where instar deliberately
diverges from qemu-img (chain sources refused, VHDX partial-
present, etc.) are skipped with documented reasons rather
than failing.
Phase 6 also covers window-filter behaviour
(--start-offset / --max-length) via in-test fixtures
(no baselines — built with qemu-img create + targeted
writes in setUp), error paths (the --image-opts rejection
and other host-side guards from phase 3b), and divergence
regression tests so a future change that silently lifts a
known divergence is surfaced rather than allowed to drift.
Why this is its own phase¶
Phases 1-5 built the renderer and produced the baselines.
Phase 6 is where the byte-for-byte parity claim is actually
verified — every safe-tier qcow2 / raw / vmdk / vhd / vhdx
image gets run through instar map and compared against
the matching baseline. Without phase 6, the parity claim is
hand-verification only.
Splitting from phase 5 (baselines) keeps the bulk-data work
in instar-testdata and the Python plumbing in instar.
Splitting from phases 7-8 (fuzz) keeps the deterministic
regression suite separate from the random-input
campaigns — phase 6 fails loudly on a regression in a
specific image; phase 7-8 surface unknown bugs.
Architecture¶
tests/base.py extensions¶
Add 'map' to COMMAND_OUTPUT_DIRS:
COMMAND_OUTPUT_DIRS = {
'info': 'qemu-img',
'check': 'check',
'compare': 'compare',
'measure': 'measure',
'create': 'create-info',
'map': 'map', # PLAN-map phase 6
}
Add get_profile_for_installed_qemu(self, output_type, command)
helper. With map-json's 3 profiles (one per qemu-img format
era), tests must select the profile matching the installed
qemu-img — next(iter(profiles['profiles'])) is no longer
safe.
def get_profile_for_installed_qemu(
self,
output_type: str,
command: str,
) -> str:
"""
Resolve the profile name for the installed qemu-img version.
Returns the profile string (e.g. 'profile-6-0-0',
'profile-10-0-0') that the version_to_profile map records
for the host's qemu-img version. Falls back to the first
profile when the exact version isn't in the map.
"""
profiles = self.get_output_profiles(output_type, command)
if not self._qemu_version:
# Cached lookup failed; fall back to the first profile.
return next(iter(profiles['profiles']))
major, minor = self._qemu_version
# version_to_profile keys are "X.Y.Z" strings. Find the
# best-prefix match (e.g. installed 10.0.8 picks any
# 10.0.x entry); if none, fall back to the first profile.
v2p = profiles['version_to_profile']
prefix = f'{major}.{minor}.'
for key, profile in v2p.items():
if key.startswith(prefix):
return profile
return next(iter(profiles['profiles']))
The helper does not fail when the installed qemu-img isn't in the baseline matrix — falls back to the first profile and lets the byte-equality assertion produce a clear failure if drift is real.
tests/test_map.py outline¶
Five test classes, all inheriting from TestMapSmoke to
share the run_instar_map helper and the _testdata_root /
_qemu_version class attributes:
TestMapSmoke(InstarTestBase): shared helper + wiring checks.run_instar_map(*args, timeout=60)— analogous torun_instar_measure.test_help_succeeds—instar map --helpreturns 0 and contains the documented flags.test_baselines_present—get_output_profiles(...)returns non-emptyprofilesandversion_to_profilefor both map-human and map-json.-
test_smoke_qcow2_runs_and_returns_zero— pick a small safe-tier qcow2 (e.g.qcow2-min-clusterorcirros-qcow2), runinstar map FILENAME, expect rc==0 and stdout contains the header row. -
TestMapBaselineSource(TestMapSmoke): per-image factory generating one test per (image, output_type). Uses_make_source_testanalogous to measure's pattern. - Skip when no baseline file exists.
- Skip when baseline
meta.jsonshows non-zeroreturn_code(qemu-img reported an error for that cell — chain image without-Fhint, etc.). - Skip when
KNOWN_MAP_DIVERGENCESlists the image. - Otherwise run
instar map IMAGE --output=TYPE, fetch the expected output viaget_expected_output(..., profile=get_profile_for_installed_qemu(...)), and assert byte equality. -
Generates ~78 tests (~39 images × 2 output types).
-
TestMapWindowFilter(TestMapSmoke): in-test fixtures exercising--start-offset/--max-length. setUpconstructs a small qcow2 fixture in atempfile.TemporaryDirectory()viaqemu-img create -f qcow2 -o cluster_size=65536 fixture.qcow2 1Mthen writes allocated clusters at known offsets via a raw image +qemu-img convert -f raw -O qcow2intermediate (the same pattern phase 4a used).test_default_window_emits_all_extents— no flags, same output as no-window.test_start_offset_clips_leading_extents— emit starting at a known cluster boundary, assert subsequent extents only.test_max_length_clips_trailing_extents— emit only the first N bytes, assert no extents past N.test_start_offset_plus_max_length_window— combo.test_start_offset_past_eof_errors— host-side pre-check produces a clear error.test_max_length_past_eof_clips_silently— non-error; output ends at virtual_size.-
These do not assert byte-equality against qemu-img (no baseline for window cases) — they assert structural properties (extent count, byte ranges reachable). The phase 4a
MapRendererunit tests already pin the byte-level output shape. -
TestMapErrorPaths(TestMapSmoke): host-side guards from phase 3b. test_image_opts_rejected—instar map --image-opts FILEreturns non-zero with a stderr message mentioning--image-opts.test_missing_source_file_errors— non-existent FILENAME returns non-zero with an stderr message.test_invalid_sector_size_errors— non-power-of-2--sector-sizereturns non-zero.test_chain_qcow2_rejected_with_has_backing— pick a chain image from the safe-tier manifest (e.g.qcow2-overlay-chain), runinstar map, expect non-zero exit with HAS_BACKING-style stderr message.-
test_vmdk_monolithicflat_rejected— pick avmdkdescriptor image (if present in the safe tier), expect thepeek_is_vmdk_descriptorhost- side refusal message. -
TestMapDivergenceRegression(TestMapSmoke): for every entry inKNOWN_MAP_DIVERGENCES, assert the divergence still happens. If a future change accidentally fixes a divergence, this surfaces it as a failure so the entry can be cleanly removed and the corresponding entry inKNOWN_MAP_DIVERGENCEStrimmed.
KNOWN_MAP_DIVERGENCES¶
Module-scope dict mapping image_id -> (output_type_pattern,
reason). Each entry documents a known instar-vs-qemu-img
divergence that the TestMapBaselineSource factory skips
rather than fails. Phase 6 entries cover:
KNOWN_MAP_DIVERGENCES = {
# Chain sources: instar refuses with HAS_BACKING; qemu-img
# walks the chain and emits depth-tagged extents.
'qcow2-overlay-chain': ('*', 'chain composition deferred; see PLAN-map.md'),
'chain-middle-qcow2': ('*', 'chain composition deferred; see PLAN-map.md'),
'chain-top-qcow2': ('*', 'chain composition deferred; see PLAN-map.md'),
'sf-vda': ('*', 'chain composition deferred; see PLAN-map.md'),
# debian-12-sfagent uses sf-vda-backing as its backing image; instar
# refuses both. (Add additional chain images as phase 6 surfaces them.)
# Compressed-cluster reporting: instar emits compressed: false
# unconditionally; qemu-img emits compressed: true for compressed
# cluster extents. Affects map-json output for qcow2 sources with
# compressed clusters.
'qcow2-zstd': ('json', 'compressed-cluster reporting deferred; see docs/quirks.md'),
# Raw sparse: instar reports one fully-allocated extent; qemu-img
# walks SEEK_HOLE. Phase 4c quirks doc.
'raw-sparse-empty': ('*', 'raw SEEK_HOLE detection not implemented'),
# VHDX partial-present: instar treats every partially-present block
# as fully data; qemu-img walks the per-sector bitmap.
# (Specific image IDs filled in during 6c when the baselines are
# consulted to find which images trigger the divergence.)
# VMDK multi-extent: refused host-side by peek_is_vmdk_descriptor.
# (Image IDs added during 6c.)
}
The list is intentionally conservative on draft — actual
entries are added during step 6b/6c as make
test-integration surfaces specific failing cells. The
phase 4c quirks doc enumerates the categories; phase 6
maps each category to specific image IDs.
Window-filter fixture construction¶
Phase 4a established a clean pattern: truncate a raw
image to the desired virtual size, python3 -c "..."
writes bytes at known offsets, qemu-img convert -f raw
-O qcow2 produces a fragmented qcow2. Phase 6's
TestMapWindowFilter.setUp follows the same recipe to
keep fixture construction obvious and self-contained:
def setUp(self):
super().setUp()
self.tmpdir = tempfile.mkdtemp(prefix='instar-map-test-')
self.addCleanup(shutil.rmtree, self.tmpdir, ignore_errors=True)
raw_path = os.path.join(self.tmpdir, 'fixture.raw')
qcow_path = os.path.join(self.tmpdir, 'fixture.qcow2')
# 1 MiB raw with two 64 KiB allocated runs
subprocess.run(['truncate', '-s', '1M', raw_path], check=True)
with open(raw_path, 'r+b') as f:
f.seek(0)
f.write(b'\xab' * 0x10000)
f.seek(0x80000)
f.write(b'\xcd' * 0x10000)
subprocess.run(
['qemu-img', 'convert', '-f', 'raw', '-O', 'qcow2',
raw_path, qcow_path],
check=True,
)
self.fixture = qcow_path
Each window-filter test then runs instar map
self.fixture with various window args and asserts
structural invariants.
Tests-suite size budget¶
- TestMapSmoke: ~4 tests
- TestMapBaselineSource: ~78 tests (39 images × 2 output types); ~20 skipped due to chain / divergence / non-zero baseline
- TestMapWindowFilter: ~6 tests
- TestMapErrorPaths: ~5 tests
- TestMapDivergenceRegression: ~5-8 tests (one per KNOWN_MAP_DIVERGENCES entry)
Total: ~100 tests, ~75-80 active + ~20-25 documented skips.
Open questions¶
-
Profile selection fallback: when the installed qemu-img isn't in the baseline matrix at all (e.g. qemu-img 11.0.0 ships before we regenerate the baselines), should the test fail or skip? Recommendation: pick the newest profile and let the byte-equality assertion run; if it fails, the test surfaces a real format drift that the user should investigate.
-
KNOWN_MAP_DIVERGENCESfor VHDX partial-present: which specific VHDX images trigger this? Recommendation: determine empirically during 6c — lettest-integrationrun, observe the failures, add the specific image IDs to the list with the documented reason. -
Window-filter byte-exact assertions: the master plan left window cases out of baseline generation. Should phase 6 byte-compare instar's window output against qemu-img run inside the test? Recommendation: no — phase 4a's
MapRendererunit tests already pin the byte-level output shape; window tests only need to assert structural properties (extent count, byte ranges). Adding qemu-img-comparison would duplicate phase 4a's coverage. -
VMDK monolithicFlat fixture availability: the safe- tier manifest may not contain a multi-extent VMDK source. Recommendation: skip the
test_vmdk_monolithicflat_rejectedtest if no such fixture exists (with a clear reason); the host-side guard is exercised by the unit-levelMapArgstests anyway. -
Sf-vda as a divergence:
sf-vdais inKNOWN_SOURCE_SCANNER_DIVERGENCESfor measure (qcow2 scanner difference). For map,sf-vdalikely has a chain (it's an overlay) and so is a chain divergence here. Recommendation: add to map's list with the chain reason; the measure entry stays unchanged. -
Compressed-cluster divergence — which images?: the safe tier includes
qcow2-zstd(a deliberately compressed qcow2 fixture). Other compressed-cluster images may also need entries. Recommendation: start withqcow2-zstd, add more during 6c if baselines reveal them. -
Test runtime budget: ~100 tests × ~1s each = ~2 minutes for the full suite. Acceptable; comparable to test_measure.py's ~3-minute runtime.
-
--start-offsetsemantics divergence: phase 4c quirks doc noted that instar's window filter is byte-level while qemu-img clamps to cluster boundaries on output. Phase 6's window tests should assert instar's byte-level behaviour rather than qemu-img parity — the divergence is documented and intentional for v1.
Execution¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 6a | medium | sonnet | none | Extend tests/base.py: add 'map': 'map' to COMMAND_OUTPUT_DIRS (line 27 area). Add get_profile_for_installed_qemu(self, output_type, command) method per the schema in the Architecture section — pick the profile whose version_to_profile entry matches the host's qemu-img version by major-minor prefix, falling back to the first profile when no match. Create tests/test_map.py with TestMapSmoke(InstarTestBase) containing: run_instar_map(*args, timeout=60) helper, test_help_succeeds, test_baselines_present (asserts non-empty profiles for both human/json), test_smoke_qcow2_runs_and_returns_zero (pick cirros-qcow2 or qcow2-min-cluster from the safe tier; assert rc==0 and stdout starts with the Offset header row). Run make test-integration TEST=test_map and confirm the smoke tests pass. |
| 6b | high | sonnet | none | Add TestMapBaselineSource(TestMapSmoke) to tests/test_map.py. Define KNOWN_MAP_DIVERGENCES per the Architecture section starting with the chain images (qcow2-overlay-chain, chain-middle-qcow2, chain-top-qcow2, sf-vda), qcow2-zstd (compressed), and raw-sparse-empty (SEEK_HOLE). Implement _make_map_source_test(image_dict, output_type) factory analogous to measure's _make_source_test (test_measure.py line 693): skip cases without baseline meta.json, skip non-zero-exit baselines, skip KNOWN_MAP_DIVERGENCES entries, run instar map IMAGE --output=TYPE, fetch the expected output via get_expected_output(image_id, profile, output_type, command='map') where profile = self.get_profile_for_installed_qemu(output_type, 'map'), and assert byte equality after substitute_testdata_root. Loop over _safe_tier_images() × {human, json} to setattr ~78 test methods. Run make test-integration TEST=test_map and report pass/skip/fail counts. Iterate the KNOWN_MAP_DIVERGENCES list based on actual failures — add specific image IDs that trigger the documented categories (VHDX partial-present, VMDK multi-extent, etc.). High effort because: the per-image factory generates many tests, the version-keyed profile selection has edge cases, and the KNOWN_MAP_DIVERGENCES list needs empirical iteration to capture every image-specific divergence cleanly. |
| 6c | medium | sonnet | none | Add TestMapWindowFilter(TestMapSmoke) to tests/test_map.py. setUp constructs a small fragmented qcow2 fixture per the "Window-filter fixture construction" section above (truncate raw → write bytes at 0 and 0x80000 → qemu-img convert). Tests: test_default_window_emits_all_extents (no flags), test_start_offset_clips_leading_extents (--start-offset=0x80000), test_max_length_clips_trailing_extents (--max-length=0x10000), test_start_offset_plus_max_length_window (combination), test_start_offset_past_eof_errors (host-side rejection), test_max_length_past_eof_clips_silently (output ends at virtual_size). Tests assert structural properties — extent count, byte ranges, presence/absence of specific offsets — not byte-equality against qemu-img. Run make test-integration TEST=test_map.TestMapWindowFilter. |
| 6d | medium | sonnet | none | Add TestMapErrorPaths(TestMapSmoke) and TestMapDivergenceRegression(TestMapSmoke) to tests/test_map.py. Error-path tests: test_image_opts_rejected (stderr contains --image-opts), test_missing_source_file_errors, test_invalid_sector_size_errors, test_chain_qcow2_rejected_with_has_backing (runs instar map qcow2-overlay-chain.qcow2, expects non-zero exit + stderr mentioning backing/chain). Divergence regression: for each entry in KNOWN_MAP_DIVERGENCES, assert the divergence is still observable (i.e. when the test runs instar map <image>, instar produces output that differs from the baseline in a way matching the documented reason). Use assertNotEqual(stdout, expected) so an accidental fix is surfaced loudly. Run make test-integration TEST=test_map. |
| 6e | low | sonnet | none | Update ARCHITECTURE.md operations/map/ entry: append "Integration tests in tests/test_map.py cross-validate instar map against the qemu-img map baselines in instar-testdata/expected-outputs/map-* for every safe-tier image, plus in-test fixtures for window-filter behaviour, error paths, and divergence-regression assertions for the known instar-vs-qemu-img gaps." Update CHANGELOG.md Unreleased / Added with one line citing the new integration tests. Run pre-commit run --all-files. |
Total: 5 commits.
Why no opus step¶
Phase 6 is plumbing — extending an established test pattern (test_measure.py) to a new command. No new algorithmic work; no subtle correctness arguments. Sonnet with a detailed brief for the per-image factory in 6b is the right tool. The high-effort flag on 6b is for iteration volume (empirically discovering the right KNOWN_MAP_DIVERGENCES entries) rather than reasoning depth.
Out of scope for phase 6¶
- Coverage-guided fuzz harness (phase 7).
- Differential fuzz against qemu-img map (phase 8).
- Window-case byte-exact comparison against qemu-img (deferred; phase 4a unit tests cover output shape).
- Backing-chain composition support (future work).
- Compressed-cluster reporting fix (future work).
- New testdata fixtures specifically for map (the safe- tier manifest already covers the formats needed).
- Output-profile machinery additions in instar's VMM (phase 5 produced 1 + 3 profiles cleanly; no vmm-side handling needed).
Success criteria¶
tests/test_map.pyexists with the five test classes enumerated above.tests/base.pyhas'map'inCOMMAND_OUTPUT_DIRSand aget_profile_for_installed_qemuhelper.make test-integration TEST=test_mapruns to completion with a documented mix of pass / skip / no-fail outcomes (typical: ~75 pass, ~25 skip, 0 fail).KNOWN_MAP_DIVERGENCEScovers every cell that would otherwise produce aassertEqualmismatch; each entry has a clear reason citing PLAN-map.md or docs/quirks.md.TestMapDivergenceRegressioncatches accidental divergence fixes (assertNotEqualagainst the baseline).TestMapWindowFilterverifies the window-filter behaviour without requiring qemu-img comparison.make lint,make test-rust, andpre-commit run --all-filesremain clean (phase 6 is Python-only and doesn't touch Rust code).ARCHITECTURE.mdandCHANGELOG.mdreflect phase 6.
Risks and mitigations¶
-
Per-image factory generates noise on failure: with ~78 generated tests, a single regression looks like 78 failures unless the factory short-circuits. Mitigation: the factory's skip logic catches missing/non-zero baselines and known divergences. Real failures are rare and indicate real format drift.
-
Version-keyed profile selection picks the wrong profile: with 3 map-json profiles and qemu-img installed at an in-between version (8.2.0 falls in profile-10-0-0; 8.1.5 falls in profile-6-1-0), the profile lookup must be careful. Mitigation: 6a's
get_profile_for_installed_qemuuses major-minor prefix matching; explicit unit tests verify the matcher for at least 3 distinct version strings. -
Window-filter test fixtures depend on qemu-img:
setUpcallsqemu-img convertwhich fails if qemu-img isn't on PATH. Mitigation: skip the class if qemu-img is unavailable (the rest of the test suite already depends on qemu-img for baseline generation; same constraint applies). -
KNOWN_MAP_DIVERGENCESis incomplete on first run: step 6b's brief explicitly calls out the iterative process — run the full suite, observe specific failures, add to the list with documented reasons, repeat until pass/skip is clean. The list is a living document. -
sf-vdaetc. may have ambiguous reasons: an image may be both a chain source AND have compressed clusters. Mitigation: pick the more-fundamental reason (chain wins over compressed), document both in the comment. -
CI runs on a host with a qemu-img not in the baseline matrix: the profile fallback picks the first profile, byte-equality fails, test reports a real failure. Mitigation: instar-testdata's matrix is regenerated periodically; if a new qemu-img ships before the next regen, the test fails loudly which is the right signal.
Back brief¶
Before executing any step, the executing agent should
back-brief: which file is being edited (tests/base.py
for 6a, tests/test_map.py for 6a-d, ARCHITECTURE.md +
CHANGELOG.md for 6e), which existing test class is the
closest template (test_measure.py's
TestMeasureBaselineSource for the per-image factory;
in-test fixture construction follows phase 4a's pattern),
and the iteration loop expected during 6b (run, observe
failures, extend KNOWN_MAP_DIVERGENCES, repeat). The
reviewer should verify that the per-image factory
correctly handles all four skip categories (no baseline,
non-zero baseline, KNOWN_MAP_DIVERGENCES, profile-not-
found), that TestMapWindowFilter's fixture setUp is
self-cleaning, and that TestMapDivergenceRegression
genuinely asserts continued divergence (assertNotEqual,
not just skip).