PLAN-dd phase 09: Differential fuzzing vs qemu-img dd¶
Master plan: PLAN-dd.md Previous phase: PLAN-dd-phase-08-fuzz.md
Status: Complete (op_dd f62d7d9; fixes 779e7a7, b80c5d7)¶
Outcome.
op_ddadded todifferential-fuzz.py; a 500-iteration--ops ddcampaign (seed 20260624) ran clean after the campaign surfaced two real bugs, both fixed: 1. Pre-existing convert data loss (779e7a7): the vmdk/vhd/vhdx writers passed a full grain/block toread_chain_virtual_cluster(which fills only one input cluster), so qcow2 inputs with sub-grain cluster sizes were truncated. Fixed withread_chain_virtual_range. This also fixesinstar converton develop — worth a separate cherry-pick. 2. Dense-VHD output-capacity under-estimate (b80c5d7): a dense VHD's per-block bitmap/alignment overhead exceeded the generic headroom, stalling the final write; sized explicitly. Empty-window-O vmdk/-O vhdxare the only whitelisted known divergences. The convert integration suite stays 201/0; dd tests andmake test-rustgreen.
Prompt¶
Before responding to questions or discussion points in this document, explore the instar codebase thoroughly. Read relevant source files, understand existing patterns (VMM structure, guest operation layout, shared crate conventions, call table ABI, format parsing, test infrastructure), and ground your answers in what the code actually does today. Do not speculate about the codebase when you could read it instead. Flag any uncertainty explicitly rather than guessing.
Mission¶
Add dd to scripts/differential-fuzz.py so random
bs/count/skip/-O invocations are cross-checked against the
real qemu-img dd binary, and resolve (by handling/documenting)
the degenerate empty-window divergences. This is the broadest
parity guard — it explores the operand space the fixed integration
matrix (phases 3/4/6) cannot, especially non-512 bs, unaligned
windows, and skip/count interactions.
Design¶
op_dd, modeled on op_convert¶
scripts/differential-fuzz.py dispatches per operation via an
if/elif chain (≈ line 3204) over the OPERATIONS list (≈ line
50). op_convert (≈ line 675) is the template for an image
producer: pick a random target, run instar + qemu-img, compare
exit codes (compare_exit_codes), and — if both succeeded —
compare content by flattening both outputs to raw with
qemu-img convert -O raw (neutral ground) and files_match-ing
them (raw targets compare directly).
op_dd(instar_bin, instar_copy, qemu_copy, fmt, timeout, rng)
mirrors this:
1. Random target_fmt ∈ {raw, qcow2, vmdk, vpc, vhdx}.
2. Random window operands (the value-add over the fixed matrix):
- bs: draw from a mix that exercises the rounding + sub-sector
paths — common 512-multiples (512, 4096, 65536, 1M) and
non-512 values (e.g. 1000, 999, odd numbers) and the
boundaries (1, near INT_MAX). instar matches qemu for all of
these (verified phases 3/4); the fuzzer's job is to keep that
true. Bound bs×count so buffers/outputs stay small.
- count: None (whole image) or a random block count
(including values beyond EOF, and occasionally 0 — see known
divergences).
- skip: 0..~2× the image's block count (so skip-past-EOF is
reachable).
3. Build operands for BOTH tools (same values): if=, of=,
bs=, optional count=/skip=, plus -O target_fmt.
4. Run run_instar(instar_bin, ['dd'], …) and
run_qemu_img(['dd'], …).
5. compare_exit_codes(...); if both non-zero, return None; else
flatten both outputs to raw and files_match (mirror
op_convert's raw-vs-structured branches). A
dd_content_divergence dict on mismatch.
Known-divergence handling (the count=0 -O vhdx resolution)¶
These are pre-established, not new bugs. op_dd must recognise
them and NOT report them as failures (the standard fuzzer pattern,
cf. KNOWN_DIVERGENCE_FIELDS / KNOWN_WRITER_DIVERGENCES):
- Empty-window
-O vmdk(out_vsize == 0, i.e.count=0orskippast the count-clamped end):qemu-img dditself exits 1 (monolithicSparse cannot represent a 0-capacity disk), while instar exits 0 with an (unreadable) empty vmdk. This is an exit-code divergence on a degenerate input — skip/whitelist it. - Empty-window
-O vhdx: instar's 0-virtual-size vhdx is rejected byqemu-img info/convert(the documented phase-4 limitation), so flattening instar's output to raw fails; qemu's empty vhdx is readable. Skip/whitelist this case. - vhdx block-size (instar 32 MiB vs qemu 8 MiB default): a
pre-existing convert/vhdx-writer divergence. It does not
affect the round-trip-to-raw content compare (raw flattening is
block-size-agnostic), so no special handling is needed for
op_dd— but do not be surprised by it.
Detect the empty-window case in op_dd (recompute out_vsize from
the chosen bs/count/skip and the input's virtual size, or
detect from the results — qemu rc≠0 for vmdk, instar-output
unflattenable for vhdx) and treat target_fmt ∈ {vmdk, vhdx} +
empty-window as a known divergence with a clear inline comment.
Resolution scope: "resolve the count=0 -O vhdx limitation"
means the fuzzer handles it cleanly as a documented known
divergence so the campaign runs clean — NOT that instar's empty
vhdx is made qemu-readable (the phase-4 agent could not isolate
why qemu rejects instar's specific empty BAT layout). Keep
"make instar's empty vhdx qemu-readable" in the master plan's
Future work.
Registration + campaign¶
- Add
'dd'toOPERATIONS(≈ line 50) and anelif op == 'dd': div = op_dd(...)arm to the dispatch (≈ line 3204). When no--opsfilter is given, dd is then included automatically. - Check
.github/workflows/differential-fuzz.yml(≈ thedifferential-fuzz.pyinvocation, line 164): if it runs all ops, dd is picked up; if it hardcodes an op list, adddd. - Run a campaign (e.g.
--ops dd --iterations 2000 [--seed …]) to confirm clean. Broad parity is already verified (phases 3/4), so expect no divergences beyond the documented empty-window cases. Any REAL divergence the campaign finds is a bug to fix (diagnose - minimise + fix, like phases 3/4) — do not whitelist a genuine mismatch.
Steps¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 9a | high | opus | none | In scripts/differential-fuzz.py, add op_dd modeled on op_convert (≈675): random target_fmt ∈ {raw,qcow2,vmdk,vpc,vhdx} + random bs (mix of 512-multiples AND non-512 values + boundaries, bounded so outputs stay small) + random count(or None)/skip (reaching past EOF). Run instar dd and qemu-img dd with identical operands; compare_exit_codes; on dual success, flatten both outputs to raw via qemu-img convert -O raw and files_match (raw targets compare directly), returning a dd_content_divergence dict on mismatch. Add 'dd' to OPERATIONS and the dispatch elif. Handle the empty-window known divergences (-O vmdk: qemu exits 1; -O vhdx: instar output unflattenable) by detecting out_vsize == 0 and whitelisting those target_fmts with an inline comment citing the phase-4 limitation; do NOT whitelist anything else. Read op_convert, compare_exit_codes (≈490), generate_image (≈94 — for the input's virtual size), run_instar/run_qemu_img, and files_match/_file_sha256 to mirror idioms. Check .github/workflows/differential-fuzz.yml and update if it enumerates ops. THEN run a campaign: python3 scripts/differential-fuzz.py --instar <release binary> --ops dd --iterations 2000 (use the project's documented invocation / container per AGENTS.md; /dev/kvm + qemu-img present). Report the campaign result: clean, or any divergence with its seed. If a REAL divergence is found (not an empty-window known case), diagnose, minimise, and fix it (in the dd guest/host code) — that is the point of this phase; report the fix. |
Per the master plan / PLAN-TEMPLATE.md, the sub-agent implements
and the management session reviews before committing (verify the
known-divergence whitelist is narrow — only empty-window vmdk/vhdx
— and that op_dd genuinely round-trip-compares content, not a
tautology). Suggested commit: the op_dd addition + any real-bug
fix the campaign surfaces (separate commits if a code fix is
needed).
Verification¶
-
op_ddadded;ddinOPERATIONSand the dispatch. - A
--ops ddcampaign of ≥2000 iterations runs clean (no divergences beyond the documented empty-window vmdk/vhdx cases). Report the iteration count + seed. - The known-divergence whitelist is limited to empty-window
-O vmdk/-O vhdx; non-512bs, unaligned windows, and skip/count combos produce NO divergence (parity holds). - Any real divergence found is fixed in dd code (not whitelisted) and re-verified.
-
differential-fuzz.ymlincludes dd (or auto-runs it). -
pre-commit run --all-filespasses. - Commit messages follow conventions (model/context/effort).
Hand-off¶
Phase 10 (docs) is the last phase: docs/dd.md; README /
ARCHITECTURE / AGENTS / CHANGELOG updates (including the deferred
phase-7 dd-info-json baseline notes); the docs/plans/index.md
Complete flip; and confirming the master plan's Future-work list
captures the deferred items (PVE extensions, --image-opts/
--object/-U, the empty-vhdx readability fix, and fuzz_dd_operands).
The [[dd-qemu-img-parity-contract]] memory records the verified
rules.