Phase 7: cross-version qemu-img create baselines¶
Master plan: PLAN-create.md · Previous phase: PLAN-create-phase-06-preallocation.md
Prompt¶
Before responding to questions or discussion points in this
document, explore the instar codebase thoroughly. Read relevant
source files, understand existing patterns (the baseline
generator, detect-profiles.py dedup flow, how phase 6 of
PLAN-measure.md shipped its analogous matrix, how
tests/test_measure.py consumes versioned baselines, how the
testdata repo layout works under expected-outputs/), and ground
answers in what the code actually does today. Where a question
touches on external concepts (qemu-img create flag matrix per
qemu version, qemu-img info --output=json schema drift across
6.0.0 → 10.2.0, qcow2 / vmdk / vhd / vhdx on-disk layout
differences in qemu-img create defaults), research as needed.
Flag uncertainty explicitly rather than guessing.
Status: Not started¶
Mission¶
Extend instar-testdata's baseline generator so that for every
qemu-img binary in qemu-img-binaries/x86_64/ (~80 versions
from 6.0.0 to 10.2.0) we record, for each (target_format,
options, size) case in a curated matrix:
qemu-img create -f <target> [-o KEY=VAL,...] <tmpfile> <size>— exit code + stderr ("Formatting '...' ..."), thenqemu-img info --output=json <tmpfile>— stdout (the info JSON, which is the comparable artefact).
The recorded baselines feed phase 8's integration tests, which
compare instar info --output=json on an instar create-
produced image against the version-matched qemu baseline,
asserting field equivalence (modulo the documented divergence
whitelist — uuid, mtime, tool version, dirty flag, etc.).
Phase 7 is implementation in instar-testdata/, not the instar
repo. The only instar-repo edits are this plan file and one
CHANGELOG.md / ARCHITECTURE.md paragraph in the last step.
Why this is its own phase¶
Decomposes cleanly into:
- Script edit (medium): teach
generate-baselines.pyanddetect-profiles.pyto understand a fourth command (create) whose semantics differ frommeasure: there is no source- image axis, every case is a(target_format, options, size)triple, and the comparable output isqemu-img info's JSON on the produced file rather thanqemu-img create's own stdout (which is only a "Formatting ..." log line and the exit code). - Long-running data generation (low effort, slow wall clock): ~80 qemu versions × ~30 cases per version × 2 steps (create + info) ≈ 4 800 invocations. Each is sub-second; total ~20 minutes serial.
- Commit the artefact to the testdata repo.
Splitting from phase 8 means the baselines exist before the test code that consumes them; phase 8 is then pure test plumbing.
What the survey turned up¶
instar-testdata/scripts/generate-baselines.py shape today¶
COMMANDS dict at lines 68-132 has entries for info, check,
compare, and measure. The main() dispatch loop at lines
740-859 has an explicit if command_name == 'measure': branch
covering measure's two sub-modes (--size cases and source-
image × target_format cases) and an else: branch covering the
standard one-image-per-invocation commands.
measure was wired up in phase 6 of PLAN-measure.md via:
- A
'measure'entry inCOMMANDSwith extratarget_formatsandsize_caseskeys. - A module-level
SIZE_CASESconst (lines 139-174) listing(case_name, size_str, target_format, options_list)tuples, patched intoCOMMANDS['measure']['size_cases']after the dict literal closes. - Two new generator functions:
generate_measure_size_baseline(line 440) andgenerate_measure_source_baseline(line 525). - A dedicated dispatch branch in
main()for the measure command at line 758.
Phase 7 mirrors this pattern for create with one structural
difference: there is no source-image sub-mode, so the dispatch
branch only iterates size_cases.
qemu-img create invocation shape¶
- Exit 0 on success, 1 on most failures.
- Always writes a one-line "Formatting '
', fmt=... size=..." to stderr regardless of -q(qemu-img has no reliable suppression for this; we capture stderr verbatim). - Output file path is positional — we use a
tempfile-scoped path under the testdata repo's temp dir. - After success,
qemu-img info --output=json <FILE>produces the JSON the integration tests will compare against.
qemu-img create validates options against the running
qemu-img version's known key set; on old binaries newer options
(e.g. extended_l2=on before 5.0, compression_type=zstd
before 5.1) return non-zero with "Unknown option". We record
the failure verbatim — phase 8 will skip comparison when the
baseline's create-step exit is non-zero, matching the
info/check/measure precedent.
What instar implements that we want baselined¶
Phases 1–6 of PLAN-create.md ship:
- qcow2:
cluster_size,refcount_bits,extended_l2,lazy_refcounts,compat,preallocation(off / metadata / falloc / full),compression_type(accept-ignore), backing (-b/-F/-u). - vmdk:
subformat(monolithicSparse,streamOptimized),grain_size, backing. - vhd (vpc):
subformat(dynamic,fixed),force_size(accept-ignore), backing. - vhdx:
subformat(dynamiconly),block_size,log_size(accept-ignore), backing. - raw:
preallocation(off, falloc, full) — but raw has no metadata to baseline sinceqemu-img infoon a raw file just reports size + format. We baseline raw anyway as a sanity floor (any drift in info's raw JSON schema becomes visible).
Backing-file cases need a parent fixture; defer to phase 8's
integration tests which can construct chains at test time
(matching what tests/test_create.py::TestCreateBackingChain
already does in phase 5). Phase 7 baselines stick to the no-
backing matrix.
Size matrix discipline¶
The master plan suggested 1M / 1G / 1T as the size sweep. For phase 7 we restrict to 1M, 64M, 1G as the canonical sweep and drop 1T for these reasons:
- The info JSON for a 1T qcow2 differs from a 1G qcow2 only in
the
virtual-sizefield and thelazy-refcounts/extended-l2-influencedcluster_sizederivations — every other field is sweep-independent. So 1T adds little coverage beyond 1G. preallocation=fullandpreallocation=fallocat 1T would actually allocate 1 TiB on disk during baseline generation. Even capped to 1G these modes need real disk space.- 1T fixtures stay sparse for
off/metadatabut the create step takes noticeably longer on some qemu versions where the refcount table is written eagerly.
For preallocation=falloc and preallocation=full we cap at
1M in the case list — these modes write real blocks and a
1G allocation per case × 80 versions = wall-clock waste.
Output schema¶
instar-testdata/expected-outputs/
└── create-info-json/ # only output type
├── qcow2/
│ ├── 6.0.0/
│ │ ├── 1M-default.stdout.txt # info JSON
│ │ ├── 1M-default.stderr.txt # create + info stderr combined
│ │ ├── 1M-default.meta.json
│ │ ├── 1G-cs-64k.stdout.txt
│ │ ├── 1G-extended-l2.stdout.txt
│ │ ├── 1M-prealloc-full.stdout.txt
│ │ ...
│ ├── 7.2.0/
│ ...
├── vmdk/
├── vhd/
├── vhdx/
├── raw/
├── profiles/ # detect-profiles dedup buckets
│ └── profile-NN/
└── version-map.json
Key conventions:
- One output type —
create-info-json. No human variant: the info-human output embeds the absolute path which complicates cross-host portability, and the JSON view is the contract. - Per-format bucket directly under
create-info-json/(qcow2/,vmdk/,vhd/,vhdx/,raw/); no_size/pseudo-bucket because every create case is "size-mode" — the format-axis is the natural grouping. <case-name>=<size>-<options-key>where<options-key>isdefault,cs-64k,extended-l2,prealloc-metadata, etc. — same naming convention measure uses.meta.jsonrecords the exit codes for both thecreatestep and theinfostep, plus stderr byte counts, plus the options list as raw strings (so phase 8 can reconstruct the invocation).
Generator function signature¶
A single new function:
def generate_create_baseline(
binary: Path,
version: str,
case_name: str,
size_str: str,
target_format: str,
options_list: list[str],
output_dir: Path,
tmp_dir: Path,
timeout: int = 60,
) -> dict:
"""
Generate one create + info baseline.
Pipeline:
1. qemu-img create -f TARGET [-o KEY=VAL,...] <tmp> SIZE
2. if (1) succeeded: qemu-img info --output=json <tmp>
3. write <case_name>.stdout.txt = info JSON
4. write <case_name>.stderr.txt = create stderr + (if
info ran) info stderr
5. write <case_name>.meta.json with both exit codes
Always deletes the tmp file before returning.
"""
The tmp_dir argument is a per-version subdir under
/tmp/create-baselines/<version>/ so concurrent generator runs
(if ever parallelised) don't collide.
CREATE_CASES shape¶
Per-target case lists. Following the measure precedent, define as a module-level dict near the top of the script:
# Curated cases for create baselines.
# Each entry is (case_name, size_str, options_list).
# options_list entries are passed as individual -o arguments.
# Non-zero exits (e.g. extended_l2=on before 5.0) recorded verbatim.
CREATE_CASES = {
'qcow2': [
# Sizes only
('1M-default', '1M', []),
('64M-default', '64M', []),
('1G-default', '1G', []),
# cluster_size sweep at 1G
('1G-cs-512', '1G', ['cluster_size=512']),
('1G-cs-4k', '1G', ['cluster_size=4k']),
('1G-cs-64k', '1G', ['cluster_size=64k']),
('1G-cs-1M', '1G', ['cluster_size=1M']),
('1G-cs-2M', '1G', ['cluster_size=2M']),
# refcount_bits
('1G-rb-1', '1G', ['refcount_bits=1']),
('1G-rb-8', '1G', ['refcount_bits=8']),
('1G-rb-64', '1G', ['refcount_bits=64']),
# extended_l2 (requires qemu-img >= 5.0)
('1G-extended-l2', '1G', ['extended_l2=on,cluster_size=64k']),
('64M-extended-l2', '64M', ['extended_l2=on,cluster_size=64k']),
# compat
('1G-compat-v2', '1G', ['compat=0.10']),
# lazy_refcounts
('1G-lazy-refcounts', '1G', ['lazy_refcounts=on']),
# preallocation (capped at 1M for falloc/full to avoid 1G of disk per case)
('1M-prealloc-metadata', '1M', ['preallocation=metadata']),
('1M-prealloc-falloc', '1M', ['preallocation=falloc']),
('1M-prealloc-full', '1M', ['preallocation=full']),
],
'vmdk': [
('1M-default', '1M', []), # monolithicSparse default
('64M-default', '64M', []),
('1G-default', '1G', []),
('1G-stream-optimized', '1G', ['subformat=streamOptimized']),
('1G-monolithic-sparse', '1G', ['subformat=monolithicSparse']),
('1G-grain-4k', '1G', ['subformat=monolithicSparse']), # qemu default grain
# grain_size sweep (instar exposes --grain-size; qemu-img does not have
# an -o key for grain). For phase 7 we keep grain at default and let
# phase 8 cover instar-only grain combinations via round-trip checks.
],
'vhd': [
('1M-default', '1M', []), # dynamic by default
('64M-default', '64M', []),
('1G-default', '1G', []),
('1M-fixed', '1M', ['subformat=fixed']), # fixed allocates SIZE bytes
('16M-fixed', '16M', ['subformat=fixed']),
],
'vhdx': [
('1M-default', '1M', []),
('64M-default', '64M', []),
('1G-default', '1G', []),
('1G-block-16M', '1G', ['block_size=16M']),
('1G-block-32M', '1G', ['block_size=32M']),
],
'raw': [
# raw produces no metadata; baselines exist as a sanity floor
# against info-JSON schema drift.
('1M-default', '1M', []),
('1G-default', '1G', []),
],
}
Rough totals: 18 qcow2 + 6 vmdk + 5 vhd + 5 vhdx + 2 raw = 36 cases per version. 36 × 80 versions × 3 files per case (stdout / stderr / meta.json) = ~8 640 files, ≈ 15 MiB on disk (within the testdata repo's existing baseline footprint).
qemu-img create argument shape per format¶
qemu-img uses -f for source format and -O for target
format in commands like convert, but for create the format
is set with -f. We pass -f <target_format> per case. For
target_format=vhd, qemu-img accepts both vpc (canonical)
and vhd. Phase 7's case dict uses vhd for the directory
name (matches the instar -f vhd UI) and translates to vpc
when building the qemu-img command, mirroring the existing
shared/format_code mapping.
detect-profiles.py updates¶
The dedup flow groups versions whose stdout output for a given case is byte-identical. For create:
- Hash per
(case-name, target-format)triple across all versions. - We expect high dedup: qemu-img's create defaults for qcow2 shifted at well-known version boundaries (compat default at 3.0, default cluster_size at 1.0 — both well before our version floor). For 6.0.0 → 10.2.0 we expect ≤ 10 distinct profiles for most cases.
- The script likely already handles arbitrary output_type strings — verify during 7b.
Execution mechanics¶
- 36 cases × 80 versions × ~0.3s per case (create + info + unlink) ≈ 14 minutes serial.
- Sanity pass: run against latest (10.2.0) only; verify all
36 expected baseline files appear with non-empty
stdout.txtfor non-error cases. - Full pass: run with no
--versionfilter; let it complete. - Verify the existing generator already supports
--version <V>for single-version runs (line 233-240 inget_qemu_binaries).
Why not subset to a few qemu versions¶
Same reason as measure phase 6: the cross-version matrix is the contract phase 8 enforces. Cutting it saves disk for one release cycle and accumulates unverified versions thereafter.
What qemu-img info --output=json reports per format¶
Quick reminder of the comparable fields (phase 8 will pick the comparison set):
- All targets:
filename,format,virtual-size,actual-size,format-specific.type. - qcow2:
cluster-size,format-specific.data.compat,format-specific.data.lazy-refcounts,format-specific.data.refcount-bits,format-specific.data.extended-l2,format-specific.data.preallocation(if set),dirty-flag(alwaysfalseon a fresh create). - vmdk:
cluster-size(= grain size),format-specific.data.cid,format-specific.data.parent-cid,format-specific.data.create-type,format-specific.data.extents[*]. - vhd:
cluster-size(= block size for dynamic; ignored for fixed),format-specific.data.subformat(qemu reports this via a newer field; instar exposes it via the disk-type byte in the footer). - vhdx:
cluster-size(= block size),format-specific.data.log-size,format-specific.data.has-parent(always false here).
Fields that will differ between qemu-img and instar at phase 8 comparison time and need a divergence whitelist:
filename— absolute path differs between baseline-time (qemu tmp path) and test-time (instar tmp path). Strip / replace with$FILENAMEbefore recording, similarly to measure'sTESTDATA_ROOT_PLACEHOLDER.actual-size— depends on the host filesystem (block size, whetherpreallocation=fullwas actually zero-filled vs fallocated). Compare with a tolerance, or exclude.- vmdk
cid/parent-cid— random 32-bit values per invocation. Always differ. Exclude from comparison. - vhdx
header-id/ data-region UUIDs — random. Exclude.
These exclusions are phase 8's problem; phase 7 just records the raw qemu output. Document the expected drift list in 7e's docs paragraph so phase 8 has a starting point.
Public surface added in phase 7¶
In instar-testdata/scripts/generate-baselines.py:
COMMANDS['create'] = {
'output_types': {
'create-info-json': 'json',
},
'targets': ['qcow2', 'vmdk', 'vhd', 'vhdx', 'raw'],
'create_cases': None, # patched in after CREATE_CASES is defined
}
CREATE_CASES = { ... } # the dict above
COMMANDS['create']['create_cases'] = CREATE_CASES
A new generate_create_baseline() function (signature above)
and an elif command_name == 'create': branch in main()'s
dispatch loop that iterates targets × cases.
In instar-testdata/scripts/detect-profiles.py: add
create-info-json to whatever drives the per-output-type
iteration (verify the shape during 7b — measure's wiring
already established the precedent).
In the instar repo:
CHANGELOG.md— a "create cross-version baselines committed to instar-testdata" paragraph.ARCHITECTURE.md— addcreate-info-jsonto the test-image- generation / baseline-matrix paragraph if such a paragraph exists.
Open questions¶
These should be answered during execution; escalate to the operator rather than guessing.
-
-qquiet onqemu-img create. qemu-img's-qflag suppresses the "Formatting ..." stderr line on some versions but not all. Recommendation: do not pass-q, capture stderr verbatim; phase 8's comparison only looks at thestdout.txtJSON, not the stderr file. -
What about
-o backing_file=...cases? Out of scope — needs a parent fixture, complicates the per-case cleanup, and phase 8 already plans backing tests as round-trips rather than baseline comparisons. Defer to a follow-up if needed. -
Should we also baseline
qemu-img createfailure modes (e.g. SIZE=0, conflicting options)? Negative-path coverage is valuable but better placed in phase 8 as instar-specific error-text assertions. Phase 7 sticks to happy-path option combinations. -
compression_type=zstd(qcow2). instar accept-ignores it. qemu-img added support in 5.1. We could include('1G-zstd', '1G', ['compression_type=zstd'])to verify the info JSON'scompression-typefield round-trips. Recommendation: include it; it's one extra case. -
vmdk
grain_sizevia-o. qemu-img does not acceptgrain_sizeas an-okey — grain is fixed by subformat. instar exposes--grain-sizeindependently. Phase 7's vmdk case list cannot baseline grain via qemu-img; phase 8 covers grain via round-trip checks. The cases marked1G-grain-4kin the draft above are removed — keeping a case named "grain" with no-o grain_size=would be misleading. Strike that line; final vmdk list is 5 cases. -
VHDX subformat=fixed. qemu-img doesn't emit it; instar doesn't either (phase 1d notes vhdx is dynamic-only). Skip from phase 7's case list (already absent).
-
Tmp directory location. The script runs in the testdata repo dir. Use
tempfile.mkdtemp(prefix='create- baselines-')under the system tmp; clean up at the end of each version's loop iteration. Do not use the testdata repo's directory — accidental commits of the tmp fixtures would balloon the repo. -
Per-case timeout. Default 60s. qemu-img create at 1G non-preallocated completes in < 1s on a normal host;
preallocation=fullat 1M is < 1s. 60s is a generous ceiling for the slow disk / VM case.
Execution¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 7a | medium | sonnet | none | In the instar-testdata repo at /srv/kasm_profiles/mikal/vscode/src/shakenfist/instar-testdata, edit scripts/generate-baselines.py. Add the 'create' entry to COMMANDS mirroring the measure precedent: single output_types entry (create-info-json: 'json'), 'targets' list of the five target formats we baseline, 'create_cases': None to be patched in after CREATE_CASES is defined. Define CREATE_CASES as a module-level dict near SIZE_CASES, populated with the case lists above (qcow2 = 18 cases, vmdk = 5, vhd = 5, vhdx = 5, raw = 2 — 35 total per version). Add a new generate_create_baseline() function next to generate_measure_size_baseline(), with the pipeline described under "Generator function signature" above (qemu-img create into a tmp path, then qemu-img info --output=json, capture both exit codes, write stdout/stderr/meta.json, always clean up the tmp file). Strip the absolute tmp path from the info JSON (replace with a $FILENAME placeholder, mirror the TESTDATA_ROOT_PLACEHOLDER pattern). Add an elif command_name == 'create': branch to main()'s dispatch loop, mirroring the measure branch but iterating for target, cases in create_cases.items(): for case in cases:. Translate target 'vhd' to the qemu-img canonical 'vpc' at command-build time only — the directory name and case dict key stay vhd for UI symmetry with instar. Run a sanity check: python scripts/generate-baselines.py --command create --version 10.2.0. Confirm expected-outputs/create-info-json/qcow2/10.2.0/1M-default.stdout.txt exists and contains a valid JSON object with "format": "qcow2", "virtual-size": 1048576. Spot-check one vhd-fixed (vhd/10.2.0/1M-fixed.stdout.txt, "format": "vpc", "virtual-size": 1048576) and one vhdx (vhdx/10.2.0/1M-default.stdout.txt). Do NOT generate the full matrix yet — that is step 7c. The instar repo is not touched in this step. |
| 7b | medium | sonnet | none | In the instar-testdata repo, edit scripts/detect-profiles.py to handle the create-info-json output type. The existing script's iteration should already pick up arbitrary expected-outputs/<output_type>/ directories — verify by reading the script's directory-discovery logic. If it has a hard-coded output-type whitelist, extend it; if it walks the directory tree, no edit needed beyond confirming behaviour. Run python scripts/detect-profiles.py --output-type create-info-json after the 7a sanity pass and confirm a expected-outputs/create-info-json/profiles/ subdir + version-map.json are generated for the 10.2.0-only matrix (which will have exactly one profile per case since only one version is recorded). instar repo not touched. |
| 7c | medium | sonnet | none | This step is execution, not coding. Run the generator against the full qemu-img-binaries matrix to produce all baselines. Command: python scripts/generate-baselines.py --command create (no --version filter). Expected wall clock ~15 min on a normal host (~80 versions × 35 cases × ~0.3 s each). Watch for warnings or skips. Old qemu versions will fail some option combinations (extended_l2=on before 5.0, compression_type=zstd before 5.1, subformat=streamOptimized in some early 6.x point releases) — these record as non-zero exit baselines and are expected, not errors. After completion, run python scripts/detect-profiles.py --output-type create-info-json to regenerate the dedup profiles. Spot-check a handful: (a) create-info-json/qcow2/10.2.0/1G-cs-64k.stdout.txt should report "cluster-size": 65536; (b) create-info-json/vhd/10.2.0/1M-fixed.stdout.txt should report "virtual-size": 1048576, "format-specific" carrying the fixed subformat indicator; (c) create-info-json/qcow2/6.0.0/1G-extended-l2.stderr.txt should mention "Unknown option" and the matching .meta.json shows a non-zero create_return_code. If the runtime exceeds 45 minutes or anything looks wrong, stop and surface the issue rather than continuing. Operator may run this step themselves rather than delegating — it's long-running and best in a tmux/nohup session. instar repo not touched. |
| 7d | low | sonnet | none | In the instar-testdata repo, git add expected-outputs/create-info-json scripts/generate-baselines.py scripts/detect-profiles.py. Inspect git status --short and verify the diff is roughly "two scripts modified, ~8 600 new files under expected-outputs/create-info-json/". Total disk delta should be ≤ 20 MiB. Commit with a clear message. Push to the remote only if the operator approves (do not push unprompted). The instar repo is still not touched. |
| 7e | low | sonnet | none | Back in the instar repo (/srv/kasm_profiles/mikal/vscode/src/shakenfist/instar-wt-resize), update CHANGELOG.md Unreleased / Added with a paragraph: "Cross-version qemu-img create baselines committed to instar-testdata/expected-outputs/create-info-json/. Generated against every qemu-img binary in qemu-img-binaries/x86_64/ (6.0.0 through 10.2.0). Consumed by phase 8's integration tests. (phase 7)". Add to the create paragraph in ARCHITECTURE.md near the existing measure / info baseline mention, noting create-info-json joins the matrix. Also mark phase 7 of PLAN-create.md as Complete in the execution table. Run pre-commit run --all-files. Only CHANGELOG.md, ARCHITECTURE.md, and docs/plans/PLAN-create.md modified in the instar repo. |
Total: 5 commits (4 in instar-testdata, 1 in instar).
Out of scope for phase 7¶
- Test code that consumes the baselines (phase 8).
- Backing-file baselines (defer to phase 8 round-trips; needs parent fixtures, not a baseline-generator concern).
- Negative-path baselines (size=0, invalid options) — better as instar-specific assertions in phase 8.
monolithicFlatortwoGbMaxExtent*vmdk subformats — instar doesn't emit these.- Encrypted-create (
encrypt.*) — instar rejects with a clear "not yet supported" error. - Updating existing info / check / compare / measure baselines.
- Pinning
instar-testdatato a specific commit from the instar side (not done for any other operation).
Success criteria¶
instar-testdata/scripts/generate-baselines.pyrecognises thecreatecommand and produces baselines underexpected-outputs/create-info-json/<target>/<version>/.instar-testdata/scripts/detect-profiles.pyproducescreate-info-json/profiles/with dedup buckets plus a populatedversion-map.json.- The full matrix is generated and committed to
instar-testdata: ≥ 35 cases × ≥ 80 versions × 3 files per case = ≥ 8 400 baseline files (≈ 15 MiB on disk). - Spot-check pass: at least 3 baseline files (a qcow2 default,
a qcow2 with options, a vhd fixed) carry the expected
virtual-size/cluster-size/format-specificfields per theqemu-img infoschema for the recorded version. - The
instarrepo'sCHANGELOG.md+ARCHITECTURE.mdnote the new baselines. - No instar code changes in this phase beyond docs.
Risks and mitigations¶
- Old qemu versions reject a recent option → silent
baseline corruption (a recorded non-zero exit that phase 8
can't compare against an instar success). Mitigation:
meta.jsoncarries the create-step exit code; phase 8's test logic skips comparison when the baseline'screate_return_codeis non-zero. - Disk volume: ~15 MiB worst case. Existing
expected-outputs/is larger. Acceptable. - Wall-clock: ~15 min for the full matrix. Long enough to risk operator distraction. Mitigation: the script already logs per-version progress; run in tmux/nohup.
- Tmp-dir cleanup: a crash mid-loop leaves fixture files.
Mitigation: per-version
tempfile.mkdtemp+ atry/finallyremoving the dir. Cleanup is best-effort; orphaned files are harmless (in/tmp). -okey drift across qemu versions — some keys renamed or removed between versions (e.g.preallocation_mode→preallocationin qemu 2.0). Phase 7's case list uses the modern names; old-version failures get recorded as non-zero exit baselines and skipped at phase 8 comparison time.CREATE_CASESdrift over time — as instar adds new qcow2 features (e.g. data files, encryption) the matrix grows. Acceptable; it's config-as-code in one place.- VHD canonical name
vpc— easy to get this wrong in the command-build path. Mitigation: 7a's brief explicitly calls out thevhd→vpctranslation; a test in 7a's sanity pass exercises it (thevhd/10.2.0/directory only populates if the translation works).
Bugs fixed during this work¶
(To be filled in.)
Back brief¶
Before executing any step, the executing agent should back-
brief: which repo is being edited (instar-testdata vs
instar), which scripts are being modified, and which paths
are being written. The reviewer should verify nothing in the
instar repo changes except CHANGELOG.md, ARCHITECTURE.md,
and the PLAN-create.md execution row in step 7e.