PLAN-dd phase 07: Cross-version baselines¶

Master plan: PLAN-dd.md Previous phase: PLAN-dd-phase-06-integration.md

Status: Complete (test 6ebe645; testdata baselines pending operator push)¶

Outcome. 7a: added dd to the testdata generate-baselines.py (DD_CASES, generate_dd_baseline mirroring resize) + a baselines-dd Makefile target, and registered dd-info-json in detect-profiles.py (the assumption it handled arbitrary types was wrong — MULTI_BUCKET_TYPES needed the entry). 7b: generated 1440 baselines across 80 qemu versions → 80 profiles (one per version, like create — qemu's info-JSON format drifts per version) + version-map.json. 7c (6ebe645): tests/test_dd_baselines.py compares instar dd's result info to the qemu baseline for the host's profile — 16 cases compared, 3 skipped (count=0 vmdk: qemu exits 1; two vhdx: pre-existing 32-vs-8 MiB block-size writer divergence, not a dd bug). 7d docs folded into phase 10 (holistic dd docs).

Operator step: the testdata repo changes (generator + detect-profiles.py + the generated expected-outputs/dd-info-json/ tree) are committed on a branch there and need review + push to the protected main ([[testdata-push-token]]).

Prompt¶

Before responding to questions or discussion points in this document, explore the instar codebase thoroughly. Read relevant source files, understand existing patterns (VMM structure, guest operation layout, shared crate conventions, call table ABI, format parsing, test infrastructure), and ground your answers in what the code actually does today. Do not speculate about the codebase when you could read it instead. Flag any uncertainty explicitly rather than guessing.

Mission¶

Add dd cross-version baselines using the established testdata baseline mechanism — the same generate-baselines.py + detect-profiles.py + Makefile flow that produces the create-info-json / resize-info-json / amend-info-json baselines. No new mechanism: dd is a producing command exactly like resize/amend (create a fixture → run the operation → capture qemu-img info --output=json of the result), so it slots into the generator as one more command with a curated case list.

The baselines capture qemu-img dd's result info across all installed qemu versions; the consuming test runs instar dd on the same fixture and asserts its result info matches the captured qemu baseline (per profile). This complements the live cross-validation from phases 3/4/6 (which runs against a single installed qemu) with a committed, multi-version reference.

Two repos are involved: - instar-testdata (../instar-testdata): generator + Makefile target + the generated expected-outputs/dd-info-json/ tree. - instar (this repo): the consuming test + docs.

Design¶

Mirror `resize`/`amend`, not the image-driven commands¶

info/check/measure/map iterate manifest images. The producing commands (create, resize, amend, rebase, commit) instead use curated operand-driven case lists and procedurally created fixtures (qemu-img create). dd is a producing command, so it follows the producing pattern. resize is the closest analog: its pipeline is qemu-img create -f FMT <tmp> <start> → qemu-img resize ... → qemu-img info --output=json, driven by RESIZE_CASES of (case_name, start_size, end_spec, create_opts, prealloc). dd's pipeline is qemu-img create -f <in_fmt> <tmp> <size> → qemu-img dd <operands> -O <out_fmt> if=… of=… → qemu-img info --output=json <out>.

The generator runs qemu-img throughout (str(binary) is the per-version qemu-img). It does not run instar — the baseline is the qemu reference. (The exploration that fed this plan initially suggested generating from the instar binary; that is wrong and must not be done.)

dd output info doesn't depend on input data¶

A windowed dd's output virtual-size/format derive from the input virtual size + window + output format, not the input's data content. So fixtures can be empty qemu-img create images (no qemu-io pattern needed), exactly like resize/amend. actual-size (allocation) is normalised at test time by the existing substitute_actual_size helper.

Curated `DD_CASES` — target the cross-version-sensitive behaviour¶

The dd-specific thing worth pinning across versions is the output virtual-size rounding (qcow2/vmdk/vhdx → round_up(out_vsize, 512), vhd → CHS) and the empty-window per-format behaviour. Whole-image dd (out_vsize == input size, already 512-aligned) exercises no rounding, so the list must include windowed cases. Keep it curated (~20–25 cases), not a manifest × format matrix.

Representative case axes (the implementer finalises exact names): - Input formats: raw and qcow2 (output-rounding is largely input-format-independent; vmdk/vhd/vhdx inputs are covered by phase-6 live tests). - Output formats: raw, qcow2, vmdk, vpc (vhd), vhdx. - Windows: - whole-image (baseline sanity per output format), - bs=1000 count=3 — non-512 out_vsize 3000 → exercises the 512 rounding (and VHD's CHS 34816), the highest-value cross- version case per output format, - bs=65536 skip=2 count=4 — an aligned window, - count=0 — empty window per output format (captures the qcow2/vpc readable-vsize-0, the vmdk qemu-exits-1, and the vhdx behaviour; note the known count=0 -O vhdx instar limitation — see below).

A DD_CASES shape like RESIZE_CASES: (case_name, input_size, input_format, window_operands, output_format), e.g. ('1M-raw-bs1000-count3-vhd', '1M', 'raw', ['bs=1000','count=3'], 'vpc').

Versions where `qemu-img dd -O` is unsupported¶

-f/-O were added to qemu-img dd in a later qemu series; very old versions in qemu-img-binaries/ may reject -O (or dd entirely). The generator records the non-zero exit in the .meta.json (as resize/amend already do for unsupported transitions), and the consuming test skips any profile/version whose baseline meta shows a non-zero qemu exit. Report which versions are skipped.

Known limitation interaction¶

count=0 -O vhdx: instar's empty VHDX is rejected by qemu-img info (master plan Future work). For that single case the consuming test must not compare instar's info to the qemu baseline (which is readable) — assert instar exit 0 only, matching the phase-4 handling. The generator still records qemu's baseline normally.

Steps¶

Step	Effort	Model	Isolation	Brief for sub-agent
7a	medium	sonnet	none	In `../instar-testdata/scripts/generate-baselines.py`, add dd following `resize` exactly. Read `generate_resize_baseline` (≈1511) and `RESIZE_CASES` (≈433) and the `resize` `COMMANDS` entry (≈160) + main dispatch. Add: a `'dd'` `COMMANDS` entry with `'output_types': {'dd-info-json': 'json'}` and `dd_cases`; a `DD_CASES` curated list (axes per the Design — ~20–25 cases); `generate_dd_baseline(binary, version, case_name, input_size, input_format, window_operands, output_format, output_dir, tmp_dir, ...)` whose pipeline is `qemu-img create -f <input_format> <tmp_in> <input_size>` → `qemu-img dd <window_operands> -O <qemu_out_fmt> if=<tmp_in> of=<tmp_out>` → `qemu-img info --output=json <tmp_out>`, writing `<case>.stdout.txt` (info JSON, `$FILENAME`-normalised), `<case>.stderr.txt` (dd stderr + marker + info stderr), `<case>.meta.json` (both exit codes, window operands, input/output formats); the `main()` dispatch branch iterating `DD_CASES`; and the `'dd'` output-dir layout matching resize. Map `vpc`↔ instar's vhd naming as resize/create do for their format args. Run a one-version sanity gen (e.g. `generate-baselines.py --command dd --version <newest>` or the resize-equivalent flag) and show one produced `dd-info-json/.../<case>.stdout.txt` is valid JSON. Then in `../instar-testdata/Makefile` add a `baselines-dd` target mirroring `baselines-amend`/`baselines-resize`. Report the diff and the sample output. Do NOT run the full multi-version generation yet (step 7b) and do NOT commit.
7b	—	—	—	Management/operator-run, not a sub-agent. From `../instar-testdata`: `make baselines-dd` (full multi-version generation across `qemu-img-binaries/`) then `make profiles` (`detect-profiles.py` regenerates `dd-info-json/profiles/` + `version-map.json`). Inspect: confirm `expected-outputs/dd-info-json/` populated, the profile count is sane, and which old versions were skipped (no dd `-O`). This runs the real qemu binaries; the management session does it and reviews output. The testdata repo is a separate repo with a protected `main` — committing/pushing it is an operator step (see [[testdata-push-token]]).
7c	medium	sonnet	none	In the instar repo, add the consuming baseline test (in `tests/test_dd.py` or a `tests/test_dd_baselines.py`), mirroring the `resize`/`amend` baseline test (read `tests/test_resize.py` / `tests/test_amend.py` and the `get_output_profiles` / `get_expected_output` helpers in `tests/base.py` ≈111–243, plus `substitute_testdata_root` / `substitute_actual_size` in `tests/helpers/comparators.py`). For each dd case × profile: recreate the fixture with `qemu-img create` (same spec the generator used), run `instar dd <window> -O <fmt>`, run `qemu-img info --output=json` on instar's output, normalise `$FILENAME`/`actual-size`, and assert it equals the loaded baseline. Skip any (case, version/profile) whose baseline meta records a non-zero qemu exit. Special-case `count=0 -O vhdx`: assert instar exit 0 only (do not compare info). Add the `dd` mapping to `COMMAND_OUTPUT_DIRS` in `tests/base.py` if needed (e.g. `'dd': 'dd-info'`). Gate the test so it skips cleanly when the `dd-info-json` baselines are absent (until 7b is committed in testdata). Run `^test_dd\.` and report.
7d	low	sonnet	none	Docs: note `dd-info-json` baselines in `ARCHITECTURE.md` (alongside the other `*-info-json` baseline types) and add a `CHANGELOG.md` entry. (`README.md`/`AGENTS.md` only if they enumerate baseline types.) Flip the phase-7 row in this plan + the master plan after 7b/7c land.

Verification¶

generate-baselines.py --command dd produces valid dd-info-json baselines; make baselines-dd + make profiles populate expected-outputs/dd-info-json/ + version-map.json.
The consuming test passes (^test_dd\.), iterating profiles, with instar dd's result info matching the qemu baseline for every supported (case, version); unsupported old versions are skipped via meta exit codes.
count=0 -O vhdx is asserted exit-0-only (not info-compared).
The dd-specific rounding (512 for qcow2/vmdk/vhdx, CHS for vhd) is represented in the baselines via the bs=1000 count=3 cases — confirm the captured virtual-size values match the phase-4 findings (3072 / 34816).
pre-commit run --all-files passes in the instar repo.
instar-repo changes limited to tests/ + docs; testdata-repo changes are the generator, Makefile target, and generated baselines.
Commit messages follow conventions (model/context/effort). The testdata repo is committed/pushed as an operator step.

Hand-off¶

Remaining phases: 8 coverage-guided fuzzing (operand parser, window math, chs_rounded_size, read primitives), 9 differential fuzzing vs qemu-img dd (random bs/count/skip/-O; resolve the count=0 -O vhdx limitation here), 10 docs. The [[dd-qemu-img-parity-contract]] memory records the verified rules.

📝 Report an issue with this page