PLAN-resize phase 11: integration tests¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the instar codebase thoroughly. Read the
existing test surface (tests/base.py, tests/test_create.py,
tests/helpers/info_json.py, tests/test_convert.py,
tests/manifest.json), understand how
InstarTestBase exposes binaries / qemu-version detection /
testdata roots, how assert_info_equivalent normalises JSON
across writers, and how test_create.py walks the cross-
version baseline matrix. Phase 11 is structurally a sibling
of test_create.py's phase-8 matrix; differences come from
resize's specific surface (the [+-]SIZE end_spec, shrink
semantics, the qemu-doesn't-support-vmdk/vhd/vhdx asymmetry).
Where a question touches on external concepts (qemu-img resize semantics, posix preallocation, stestr discovery), research as needed to give a confident answer. Flag any uncertainty explicitly rather than guessing.
This is a phase plan under PLAN-resize.md. Refer to that
master plan for overall context. Phases 1–10 shipped the
planner, the guest binary, the host CLI + preallocation
post-pass, and the cross-version baseline matrix (3,280
baselines × 80 qemu-img versions in instar-testdata).
Phase 11 is the runtime harness that diffs instar resize
against those baselines and the live system qemu-img.
Mission¶
Ship tests/test_resize.py covering five surfaces:
-
Schema-drift tripwire. A single test that walks
expected-outputs/resize-info-json/<target>/<version>/and asserts the on-disk case set matches theRESIZE_CASESmirror declared at the top oftest_resize.py. Catches drift between the test mirror and the testdata baseline generator (thetest_create_cases_match_baselinesprecedent attests/test_create.py:906). -
Cross-version baseline matrix — one test per
(target, case)tuple generated fromRESIZE_CASES. For each case: instar createthe start image (matches the baseline's qemu-img create step byte-for-byte modulo the create writer-divergence whitelist already established in phase 8).instar resize -f <fmt> [--shrink] [--preallocation <mode>] <path> <end_spec>exactly as the baseline's resize step did (theapplied_shrink_flagandpreallocationfields in the matching meta.json dictate which flags to pass).- Run system
qemu-img info --output=jsonon the post-resize file. assert_info_equivalentagainst the baseline's<case>.stdout.txt.
For vmdk / vhd / vhdx baselines where the matching meta
records resize_return_code != 0 (qemu rejects), this
matrix skips with skipTest('qemu-img cannot resize
<fmt>; phase 11 falls back to consistency test') and
defers coverage to surface 5 below.
-
Live cross-validation — for a curated subset of
RESIZE_CASES(qcow2 + raw only — the formats qemu supports for resize), independently runqemu-img create → qemu-img resizeandinstar create → instar resize, theninstar info --output=jsonon both, and assert the normalised JSONs match. MirrorsTestCreateCrossValidationintests/test_create.py:1043. -
Round-trip check — for every
(target, case)whereinstar checkis meaningful (excludes raw):instar create → instar resize → instar check, assertcheckreports clean. Catches writer-after-resize regressions that produce a fileqemu-img infoaccepts but the instar reader flags. MirrorsTestCreateRoundTripCheckintests/test_create.py:1186. -
Internal-consistency suite (vmdk / vhd / vhdx where qemu-img can't resize): per case,
instar create → instar resize → instar info --output=json, assert key fields match the expected post-resize state — at minimumvirtual-size == expected_final_sizefrom the meta. For vhd / vhdx this is also followed byinstar check(vmdk check is a no-op today, same as create's matrix). These are the only resize tests for vmdk / vhd / vhdx; the tripwire skip in surface 2 references this surface so reviewers know coverage isn't lost. -
Targeted error-path tests — small fixed set, not matrix-generated, exercising the host CLI rejections that never reach a baseline:
- Subtractive end_spec without
--shrink(mirrors baseline64M-to-1M-no-shrinkbut covers raw and qcow2 both; surface 2 covers it for qcow2 + raw via baseline, surface 6 makes the error-message contract explicit). - Invalid
[+-]SIZEstrings (+0,abc,0, empty). --preallocation=metadataon raw → rejected by host (covered by phase 9 unit test but worth a smoke).--preallocation=falloc+ shrink → rejected by host (phase 9 again; smoke).--object/--image-optsrejected with the "not supported" message (already-shipped phase 8 surface, but the integration test pins it).
What the survey turned up¶
tests/base.py'sInstarTestBaseat lines 36–428 is the parent class for every integration test. Providesget_instar_binary()(falls back tosrc/target/release/instarand skipTest if missing),_qemu_versiontuple (best-effortqemu-img --versionparse for matching baselines),_testdata_root(env varINSTAR_TESTDATA_PATHor sibling-dir fallback). Norun_instar_resizehelper exists yet — phase 11 adds it to the test class (or tobase.pyif more than one test module needs it; for v1 a private helper onTestResizeSmokeis enough).tests/test_create.py's phase-8 structure at lines 817–1245 is the closest precedent: a top-level mirror dict (CREATE_CASES), a_make_baseline_test()factory that builds one method per case, a setattr loop in module scope that attaches the methods to theTestCreateBaselineMatrixclass, a siblingTestCreateCrossValidationwith_make_xval_test(), and aTestCreateRoundTripCheckwith_make_check_test(). Phase 11 reuses the same shape.tests/helpers/info_json.py'sassert_info_equivalentat lines 126–166 is the comparison primitive. Already handlesUNIVERSAL_DIVERGENCE(actual-size,dirty-flag),CACHE_HINT_FIELDS, per-target divergence (vhdxlog-size, vmdkcid/parent-cid), and$FILENAMEsubstitution. Resize doesn't introduce new divergence fields for the post-resize info JSON specifically — the same set applies. Confirmed by spot-checking three baselines from phase 10's output (qcow264M-minus-32M-shrink, raw1M-to-4M-prealloc-full, vhdx1M-to-64M-default).KNOWN_WRITER_DIVERGENCESattests/test_create.py:788–814is the create-time whitelist that phase 11 must respect — if instar's create-time output diverges from qemu's for a given case, the post-resize info will also diverge (the divergence is sticky), and the resize test must skip the same cases. The dict is currently 16 entries coveringrefcount_bits=*,compat=0.10,zstd, vhdx default block_size, and vhd CHS rounding. Phase 11 imports it fromtest_createand applies the same skip logic.- The phase-10 baselines at
instar-testdata/expected-outputs/resize-info-json/: 41 cases per target × 80 versions = 3,280 baselines. Each case has<case_name>.{stdout.txt,stderr.txt,meta.json}. The meta recordscreate_return_code,resize_return_code,info_return_code,applied_shrink_flag,expected_final_size,preallocation, andoptions_list— every field phase 11 needs for filtering and command-line reconstruction. - The cross-tool capability gap: phase 10 confirmed qemu-img does not support resize on vmdk / vhd / vhdx on any shipped version. Of the 41 cases per version, 16 record qemu's "Image format driver does not support resize" rejection. Phase 11 must:
- In surface 2 (baseline matrix), skip these cases with a message pointing at surface 5.
- In surface 3 (live cross-validation), skip these targets entirely.
- In surface 5 (consistency suite), exercise them via
instar create → instar resize → instar info → instar check.
Algorithmic design¶
File layout¶
tests/test_resize.py, modelled on test_create.py.
Single file because the surfaces share most of their
plumbing (the RESIZE_CASES mirror, the
_args_for_resize_case helper, the qemu-version dir
resolution). Total estimated size ~900 lines including
the mirror and the per-case factories — comparable to
test_create.py's 1,245.
Class hierarchy:
class TestResizeSmoke(InstarTestBase):
# Smoke tests: small fixed set covering the happy paths
# per format + a handful of error paths.
class TestResizeBaselineMatrix(TestResizeSmoke):
# Schema-drift tripwire + per-(target, case) baseline diff.
# Generated via _make_baseline_test() / setattr loop.
class TestResizeCrossValidation(TestResizeSmoke):
# Curated subset, instar vs system qemu-img for qcow2 + raw.
class TestResizeRoundTripCheck(TestResizeSmoke):
# instar create -> resize -> check across all cases
# where check is meaningful.
class TestResizeConsistency(TestResizeSmoke):
# vmdk / vhd / vhdx: instar-only round-trip + virtual-size
# assertion + (where applicable) instar check.
The RESIZE_CASES mirror¶
Phase 11's mirror lives at the top of test_resize.py,
shape:
# (case_name, start_size, end_spec, create_opts, prealloc)
RESIZE_CASES = {
'qcow2': [
('1M-to-4M-default', '1M', '4M', [], None),
# ... 18 more entries matching RESIZE_CASES in
# instar-testdata/scripts/generate-baselines.py.
],
# vmdk / vhd / vhdx / raw lists identical to the generator.
}
The schema-drift tripwire (surface 1) asserts the on-disk
case-name set in
expected-outputs/resize-info-json/<target>/<version>/
matches the in-test mirror, so any future generator
change without a test update fails CI loudly. Exactly
mirrors test_create.py:906's
test_create_cases_match_baselines.
Per-case command-line reconstruction¶
Each baseline's meta.json records the exact flags that
were passed. Phase 11 builds the instar command line from
the case tuple (not the meta) so the test fails if the
mirror drifts away from what the generator produces, then
sanity-checks the case against the meta to confirm:
meta['applied_shrink_flag']matches our locally inferred flag (subtractive end_spec OR resolved-end-bytes < start-bytes ANDnot case_name.endswith('-no-shrink')).meta['preallocation']matches the case'spreallocfield.meta['expected_final_size']matches ourresolve_resize_end_bytes(start, end_spec).
If any mismatch: fail the test with a clear "mirror/baseline divergence — regenerate" message. The infer-locally-and- compare pattern catches a generator bug that silently records the wrong flag set.
A small _apply_resize_args_for_case(args, case, meta)
helper builds the resize command's argv. It returns the
list [*f_flag, *shrink_flag, *prealloc_flag, path,
end_spec] ready to pass to run_instar_resize.
The run_instar_resize helper¶
Adds to TestResizeSmoke (parallels run_instar_create
in test_create.py:26):
def run_instar_resize(self, *args, timeout=120):
instar = self.get_instar_binary()
cmd = [str(instar), 'resize', *[str(a) for a in args]]
try:
r = subprocess.run(cmd, capture_output=True,
text=True, timeout=timeout)
return r.stdout, r.stderr, r.returncode
except subprocess.TimeoutExpired:
return '', f'Timeout after {timeout}s', -1
Timeout is 120 (vs create's 60) because resize on a
non-raw format spins up the guest VMM, which costs ~5–8 s
per case. 41 cases × 7 s = ~5 min for the matrix; well
under stestr's per-test timeout.
Per-(target, case) factory¶
Mirrors _make_baseline_test at test_create.py:938:
def _make_resize_baseline_test(target, case):
case_name, start_size, end_spec, create_opts, prealloc = case
def test(self):
# 1. Skip if qemu didn't produce a usable baseline.
meta = self._baseline_meta(target, case_name)
if meta is None:
self.skipTest(...)
if meta['create_return_code'] != 0:
self.skipTest('baseline: qemu create rejected ' ...)
if meta['resize_return_code'] != 0:
# vmdk/vhd/vhdx land here universally — covered
# by TestResizeConsistency instead.
self.skipTest(
f'qemu-img cannot resize {target} '
f'(meta resize_rc={meta["resize_return_code"]}); '
f'phase 11 covers via TestResizeConsistency'
)
if meta['info_return_code'] != 0:
self.skipTest('baseline info rejected')
# 2. Skip if the case is a known writer divergence.
# (Create-time divergence transfers to post-resize info.)
known = KNOWN_WRITER_DIVERGENCES.get((target, case_name))
if known is not None:
self.skipTest(f'create-time divergence carries over: {known}')
# 3. Skip the shrink-rejection cases — those exercise
# a host CLI error path, not a baseline-diff.
if case_name.endswith('-no-shrink'):
self.skipTest('rejection case; covered by smoke tests')
# 4. Sanity-check: mirror vs meta.
self._assert_case_matches_meta(target, case, meta)
# 5. Build the start image with instar create.
with tempfile.TemporaryDirectory() as td:
path = Path(td) / f'image.{ext_for(target)}'
# Pass the create opts as -o key=val, plus -f.
c_args = ['-f', _instar_target_name(target)]
for opt in create_opts:
c_args.extend(['-o', opt])
c_args.extend([str(path), start_size])
_, c_stderr, c_rc = self.run_instar_create(*c_args)
self.assertEqual(c_rc, 0, ...)
# 6. Run the resize.
r_args = self._apply_resize_args_for_case(target, case, meta)
r_args = [a.replace('$PATH', str(path)) for a in r_args]
_, r_stderr, r_rc = self.run_instar_resize(*r_args)
self.assertEqual(r_rc, 0, ...)
# 7. Run system qemu-img info on the post-resize file
# and compare to the baseline.
info_stdout, info_stderr, info_rc = self._run_qemu_img_info(path)
if info_rc == -1 and 'not installed' in info_stderr:
self.skipTest('system qemu-img not installed')
self.assertEqual(info_rc, 0, ...)
expected = self._baseline_stdout(target, case_name).read_text()
assert_info_equivalent(self, info_stdout, expected,
target, tmp_path=str(path),
msg=f'{target}/{case_name}')
test.__name__ = f'test_baseline_{target}_{case_name.replace("-", "_")}'
test.__doc__ = (
f'instar resize -f {target} {" ".join(create_opts)} '
f'{start_size} -> {end_spec} '
f'(prealloc={prealloc}) matches phase-10 baseline.'
)
return test
The mirror-vs-meta sanity check is the divergence-catcher: if the test mirror falls out of sync with the testdata generator the test fails with a clear message before reaching the more expensive qemu/instar invocations.
Surface 3: cross-validation subset¶
RESIZE_CROSS_VALIDATION_CASES = [
# qcow2 only (qemu supports), small + default options to
# keep run time bounded.
('qcow2', ('1M-to-4M-default', '1M', '4M', [], None)),
('qcow2', ('1M-to-64M-default', '1M', '64M', [], None)),
('qcow2', ('1M-to-4M-prealloc-full','1M','4M', [], 'full')),
('qcow2', ('64M-minus-32M-shrink', '64M','-32M',[], None)),
# raw paths
('raw', ('1M-to-64M-default', '1M', '64M', [], None)),
('raw', ('64M-to-1M-shrink', '64M','1M', [], None)),
('raw', ('1M-to-4M-prealloc-full','1M','4M', [], 'full')),
]
For each entry: independently run qemu and instar through the
full create + resize pipeline, then compare via
instar info on both outputs (not qemu info on both — using
instar info on both sides ensures we're measuring writer
agreement, not reader agreement; matches the create pattern at
test_create.py:1120).
Surface 4: round-trip check¶
For every (target, case) in RESIZE_CASES:
- Skip raw (instar check is no-op).
- Skip cases where instar create is known to fail
(KNOWN_CHECK_FAILURES from test_create.py:1176
— currently just qcow2/1G-rb-64; none of the resize
case set hits this today, but the dict is imported
defensively in case it grows).
- Skip the -no-shrink rejection cases.
- instar create → instar resize → instar check, expect
rc==0.
Catches a resize emitter regression that produces a file
qemu-img info accepts but instar check flags. Mirrors
test_create.py:1198.
Surface 5: consistency suite (vmdk / vhd / vhdx)¶
For every (target, case) where
meta['resize_return_code'] != 0 (i.e. qemu rejects but
instar supports) — vmdk × 3 + vhd × 6 + vhdx × 5 = 14 cases:
def test(self):
with tempfile.TemporaryDirectory() as td:
path = Path(td) / f'image.{ext_for(target)}'
# Create the start image at the baseline's start_size.
_, _, c_rc = self.run_instar_create(...)
self.assertEqual(c_rc, 0, ...)
# Resize.
_, _, r_rc = self.run_instar_resize(...)
self.assertEqual(r_rc, 0, ...)
# Info: virtual-size matches expected_final_size.
info_stdout, _, info_rc = self.run_instar_info(
path, output_format='json')
self.assertEqual(info_rc, 0, ...)
info = json.loads(info_stdout)
self.assertEqual(
info['virtual-size'],
meta['expected_final_size'],
...
)
# Check (vhd / vhdx only; vmdk check is a no-op).
if target in ('vhd', 'vhdx'):
_, k_stderr, k_rc = self.run_instar_check(path)
self.assertEqual(k_rc, 0, ...)
The vmdk subset skips instar check since the integration
suite as a whole treats vmdk checks as unsupported today.
Each case has known divergences inherited from
KNOWN_WRITER_DIVERGENCES (vhdx default block_size, vhd
CHS rounding); for consistency purposes those don't
matter — the post-resize virtual size is the only field we
assert. The create-time divergence is what trips the baseline
matrix, not what tripa virtual-size.
Surface 6: targeted error-path tests¶
Six methods on TestResizeSmoke, no factory:
def test_shrink_without_flag_rejected_raw(self): ...
def test_shrink_without_flag_rejected_qcow2(self): ...
def test_invalid_size_string_rejected(self): ... # 4 subtests
def test_metadata_on_raw_rejected(self): ...
def test_preallocation_falloc_with_shrink_rejected(self): ...
def test_object_flag_rejected(self): ...
def test_image_opts_flag_rejected(self): ...
Each is a small standalone case targeting a specific error message. Pins the contract that phase 8 / 9 established and catches regressions in the human-facing strings.
Test surface¶
The full test_resize.py adds roughly 200 generated
tests + 9 fixed tests = 209 cases:
- Surface 1 (schema tripwire): 1 test.
- Surface 2 (baseline matrix): 41 case-names per format × 5 formats = 205 entries, of which:
- Run end-to-end (qcow2 + raw): 19 + 8 = 27 cases minus the 2 shrink-without-flag cases minus the known writer divergences ≈ ~22 actively-run.
- Skip via meta (vmdk + vhd + vhdx baselines all record qemu rejection): 3 + 6 + 5 = 14 cases skipped.
- Skip via writer divergence: ~6 cases.
- Total: 205 entries, ~22 actively run, the rest skipped with documented reasons.
- Surface 3 (cross-validation): 7 cases.
- Surface 4 (round-trip check): ~33 cases (all non-raw, non-no-shrink), ~30 actively run.
- Surface 5 (consistency suite): 14 cases.
- Surface 6 (error paths): ~9 fixed tests.
End-to-end wall-clock estimate:
- Each non-raw resize ≈ 6 s (KVM spin-up + planning +
guest exec + post-pass).
- Raw resize ≈ 0.1 s.
- Total: roughly 70 actively-run KVM tests × 7 s + 30
raw × 0.1 s ≈ ~8 minutes under stestr's default
serial run. With --concurrency=4 (what
make test-container uses) ≈ 2 minutes.
Public API delta¶
None. Phase 11 is purely an addition under tests/.
Open questions¶
-
Should the baseline matrix skip on qemu-img missing or hard-fail?
test_create.pyusesskipTest. Same here — CI installs qemu-img; dev hosts may not. Skip keeps the test honest about what it actually measured. -
Should we cross-validate via qemu-img info or instar info? The create precedent uses
instar infoon both outputs. Reason: we want to measure writer agreement, not reader agreement. Phase 11 follows suit. (Separately, surface 2 uses qemu-img info against the baseline, matching the baseline-generator's recording — that's a reader agreement check.) -
Should the per-case factory build the start image via
instar createorqemu-img create? Building viainstar createis the simpler path — we already have it in the harness, no shell-out to qemu-img beyond the final info call. The downside: ifinstar createhas a divergence for a given case, the post-resize info will also diverge, and the baseline test fails for a reason that's really a create bug. The fix: skip viaKNOWN_WRITER_DIVERGENCES(already imported) so create-time divergences don't pollute resize results. Recommendation: useinstar create. The create matrix is well-covered by phase 8; carrying the writer-divergence whitelist forward into resize is honest and simple. -
Should we also test additive end_spec via
qemu-img infoafterqemu-img create(no instar involvement) to confirm baselines are well-formed? No — that's phase 10's job (schema sanity check is done at generation time and via surface 1's tripwire). Phase 11's purpose is to validate instar against the recorded truth, not the truth against itself. -
Should the live cross-validation surface include
--preallocation=metadatafor qcow2? Today instar rejectsqcow2 + metadata + resizevia thePreallocationUnsupportedplanner rejection (phase 2c, deferred). qemu accepts it. Including the case would fail consistently until the planner gains support, which is queued under master-plan Future work. Recommendation: skip the metadata cross- validation for now; add a TODO comment pointing at the future-work entry so it lands the day the planner lifts the rejection. -
Stestr concurrency. Resize tests spin up the guest VMM, which acquires
/dev/kvmper test. The container path uses--concurrency 4; KVM is re-entrant per-process, so this is fine. No special handling needed. -
stestr filtering /
--no-discover. No changes to the existing test-discovery config.test_resize.pygets picked up automatically by the same stestr.conf that catchestest_create.py. -
The shrink-without-flag rejection cases (
64M-to-1M-no-shrinkfor qcow2 and raw). The baseline records qemu's error. Phase 11's surface 2 skips them (we don't want a baseline-diff on an error message — too brittle); surface 6 covers the rejection contract explicitly via a fixed assertion on the stderr substring. Trade-off accepted. -
virtual-sizefield name — qemu-img info JSON usesvirtual-size(hyphen). Confirmed by readingtests/test_create.py:261and a phase-10 baseline. Phase 11's surface 5 uses the same key. -
VHD
expected_final_sizemismatch from CHS rounding. qemu'svirtual-sizediffers from instar's because qemu rounds VHD virtual_size up to the next CHS-aligned multiple (documented inKNOWN_WRITER_DIVERGENCES). For surface 5's consistency check on vhd, the assertion isinfo['virtual-size'] == meta['expected_final_size']using the value instar wrote (no rounding). Since the consistency check runs only instar (qemu rejected the resize), no divergence. The risk is the value drifts if a future phase teaches the vhd planner CHS rounding — at which point the test catches it and we update the expected formula. Recommendation: document the CHS-rounding caveat in a comment so future-us doesn't get surprised.
Execution¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 11a | medium | sonnet | none | Create tests/test_resize.py containing surfaces 1, 6, plus the TestResizeSmoke parent class. Surface 1: schema-drift tripwire walking expected-outputs/resize-info-json/<target>/<version>/ and asserting against the in-test RESIZE_CASES mirror (mirrors tests/test_create.py:906). Surface 6: the nine targeted error-path tests covering shrink-without-flag, invalid size strings, metadata-on-raw, falloc+shrink, --object, --image-opts. Add run_instar_resize() to TestResizeSmoke (parallel to run_instar_create in test_create.py:26). Add helper _apply_resize_args_for_case(target, case, meta) that builds the resize argv from a case + its meta. Add helper resolve_resize_end_bytes(start, end_spec) (port the function from phase 10's generator). Run the smoke + error tests via make test-integration -- tests.test_resize. Commit. |
| 11b | medium | sonnet | none | Add surface 2 (TestResizeBaselineMatrix): the per-(target, case) factory _make_resize_baseline_test() and the setattr loop. Add helpers _baseline_root / _baseline_version_dir / _baseline_stdout / _baseline_meta / _run_qemu_img_info (lift from test_create.py:834–904). Add the mirror-vs-meta _assert_case_matches_meta() sanity check. Import KNOWN_WRITER_DIVERGENCES from test_create and apply it as a skip. Run the matrix; expect ~22 actively-run + ~180 documented skips. Commit. |
| 11c | small | sonnet | none | Add surfaces 3, 4, 5: TestResizeCrossValidation (~7 cases for qcow2 + raw), TestResizeRoundTripCheck (~30 cases create→resize→check), TestResizeConsistency (~14 cases for vmdk/vhd/vhdx). All three use the same setattr-factory pattern. Confirm wall-clock under 10 min in serial; under 3 min with make test-container's --concurrency 4. Commit. |
| 11d | small | sonnet | none | Wall-clock smoke: run make test-rust && make test-integration end-to-end on a clean tree. Capture timings. If any surface unexpectedly skips more than half its cases, investigate (likely a meta-decode bug). Mark phase 11 complete in docs/plans/PLAN-resize.md. Commit. |
Out of scope for phase 11¶
- Differential fuzz (phase 12).
- Documentation (phase 13).
- Lifting
qcow2 + metadataplanner rejection (master- plan Future work). - Cross-tool consistency for vmdk / vhd / vhdx (qemu doesn't support resize; surface 5 is the substitute).
- Coverage on the multi-file vmdk subformats (twoGbMaxExtentSparse etc.) — instar rejects them outright; covered by existing planner unit tests.
Success criteria for phase 11¶
make test-integrationpasses (including phase 11) in serial mode and with--concurrency 4.- ~22 baseline cases actively diff against the recorded qemu-img output; ~180 skip with documented reasons (qemu doesn't support format, create-time divergence, rejection case).
- Cross-validation runs ~7 cases, all pass.
- Round-trip check passes for every actively-run case.
- Consistency suite covers all 14 vmdk/vhd/vhdx cases,
asserting
virtual-sizematchesexpected_final_sizeand (for vhd / vhdx)instar checkreports clean. - Targeted error tests all pass.
- The schema-drift tripwire fails loudly the next time
RESIZE_CASESin the generator changes without a matching mirror update — verify by intentionally removing one entry from the in-test mirror, running the tripwire, observing the failure, restoring. - No
actual-size/dirty-flag/ cache-hint mismatch surfaces (the existingassert_info_equivalentnormalisation covers everything we've seen in the phase-10 baselines).
Sub-agent guidance¶
Read these files before starting any step:
tests/test_create.py(the entire file — phase 11 is structurally a sibling; reuse every helper / patterns).tests/base.py:36–428(InstarTestBasesurface).tests/helpers/info_json.py(assert_info_equivalent, the divergence sets — confirm nothing new is needed for resize).instar-testdata/scripts/generate-baselines.py(RESIZE_CASESdefinition +parse_qemu_size/resolve_resize_end_bytes— port the resolver into the test file so the mirror-vs-meta sanity check can recompute byte sizes).instar-testdata/expected-outputs/resize-info-json/qcow2/10.2.0/(sample baselines + meta to confirm field shapes before writing the harness).docs/plans/PLAN-resize-phase-10-baselines.md(the capability-gap section explaining why vmdk / vhd / vhdx baselines record qemu rejection).docs/plans/PLAN-resize-phase-09-preallocation.md(the host CLI rejections phase 11 surfaces 6 pins).
The management session review checklist is the same as
prior phases: per-step git diff review; smoke before
full matrix; report any surface that unexpectedly skips
more than half its cases (signals a meta-decoding bug,
not a feature gap).