instar amend subcommand¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the instar codebase thoroughly. Read relevant
source files, understand existing patterns (VMM structure, guest
operation layout, shared crate conventions, call table ABI,
format parsing, test infrastructure), and ground your answers in
what the code actually does today. Do not speculate about the
codebase when you could read it instead. Where a question touches
on external concepts (QCOW2 header layout, the qcow2 v2/v3
feature model, qemu-img amend semantics, lazy refcounts), research
as needed to give a confident answer. Flag any uncertainty
explicitly rather than guessing.
All planning documents go in docs/plans/. Phase plans for this
master plan are named PLAN-amend-phase-NN-<descriptive>.md
alongside this file and linked from the Execution table below.
They are not added to docs/plans/order.yml — only the master
plan is.
Consult ARCHITECTURE.md for the overall system structure (host
VMM, KVM guest, call table, device emulation). Consult AGENTS.md
for build commands, project conventions, code organisation, and the
security-model summary. Consult docs/qcow2/ for the qcow2 format
notes and docs/commentary/ for design rationale.
I prefer one commit per logical change, and at minimum one commit per phase. Each commit should be self-contained: it should build, pass tests, and have a clear commit message explaining what changed and why.
Situation¶
PLAN-convert-followups.md enumerated the qemu-img subcommands
deferred from the original convert effort. That roster —
measure, create, resize, rebase, commit, map,
snapshot — is now complete (PLAN-snapshot.md closed it),
and check --repair shipped on top (PLAN-check-repair.md). The
convert-followups "Future work" section names qemu-img amend as
the next candidate sibling capability:
If
qemu-img amendever becomes useful (changing image options after creation, e.g. cluster size), add it as a fourth phase. —PLAN-convert-followups.md:144
PLAN-check-repair.md:430 likewise lists it as a deferred sibling.
The operations matrix in docs/usage.md:906 rates amend as
Low priority, oVirt-only. It is one of a small set of
unimplemented low-priority verbs (amend, dd, bitmap,
bench); amend is the one with an existing plan slot and the
clearest correctness contract, so it is the natural next pickup.
What qemu-img amend does. It changes format options of an
existing image in place, without converting or recreating it. For
qcow2 the load-bearing options are the compatibility version
(compat=0.10 ⇔ qcow2 v2, compat=1.1 ⇔ qcow2 v3) and the
lazy_refcounts flag (a qcow2 v3 compatible-feature bit).
qemu-img's amend also nominally accepts refcount_bits,
cluster_size (rejected — not amendable), data_file,
data_file_raw, encryption parameters, and backing_file /
backing_fmt. Of these, only compat and lazy_refcounts change
header bytes without rewriting cluster/refcount data; the rest are
either out of scope, expensive structural rewrites, or already
owned by another subcommand (rebase owns backing-file changes).
Why this is the right next step and lower-risk than its
predecessors. amend's in-scope options touch only the qcow2
header — the fixed 72-byte (v2) / 104-byte (v3) region plus
the header-extension area. Unlike resize (rewrites L1 / refcount
tables) and snapshot/check --repair (rewrite the snapshot
table and refcount tree), the in-scope amend operations do not
touch L1, L2, refcount blocks, or cluster data. This makes amend
the most contained mutation operation in the roster — but it is
not trivial, because the v2⇔v3 transition has real subtleties
(header length change, v3-only field initialisation/zeroing,
refcount-width constraints, and possible header-extension
relocation — see Open questions).
The relevant existing infrastructure this plan builds on:
-
In-place mutation idiom.
resizeandrebasealready establish the host→guest mutation pattern: probe format host-side, open the fileO_RDWR, attach it as the virtio-block output device, populate a*Configstruct atOPERATION_CONFIG_ADDR, launch the guest op, harvest a*Resultmessage over the serial channel, render human/json.run_resize_guest(src/vmm/src/main.rs:4664) andrun_rebase_guest(src/vmm/src/main.rs:4986) are the reference flows.rebase's selective header-field patch (src/crates/rebase/src/qcow2.rs:265— a 12-byte write at the backing-file offset, no full-header rebuild) is the closest prior art for amend, which is also a small header patch. -
qcow2 header model.
src/crates/qcow2/src/lib.rsalready parses every field amend cares about:version,incompatible_features,compatible_features,autoclear_features,refcount_bits(derived fromrefcount_order),header_length, and the header-extension walk. Named feature-bit constants exist:COMPAT_LAZY_REFCOUNTS = 1 << 0,INCOMPAT_DIRTY/CORRUPT/EXTERNAL_DATA/COMPRESSION/ EXTENDED_L2.compat_str()/compat_value()already map version → "0.10"/"1.1". Field offset constants (VERSION_OFFSET,INCOMPATIBLE_FEATURES_OFFSET = 72,COMPATIBLE_FEATURES_OFFSET = 80,AUTOCLEAR_FEATURES_OFFSET = 88,REFCOUNT_ORDER_OFFSET = 96,HEADER_LENGTH_OFFSET = 100) are all defined. -
qcow2 header writer.
build_header()insrc/crates/qcow2/src/create.rs:329constructs a complete v3 header into a caller-supplied buffer using the sharedwrite_be_u32/u64helpers.resize's grow path reuses it wholesale (src/crates/resize/src/qcow2.rs:513). Caveat:build_header()always emits a v3 header (writesheader_length = 104, refcount_order, the v3 feature words). Acompat=0.10downgrade needs a v2-shaped header (72-byte fixed region, no v3 words), whichbuild_header()cannot currently produce — see Open questions. -
Planner-crate convention.
src/crates/{create,resize,rebase, commit,snapshot}/each exposeno_stdplanner functions that take parsed state + options and emit a patch list, with inline#[cfg(test)]unit tests andtests/round-trip suites. amend getssrc/crates/amend/. -
Guest-op convention.
src/operations/{resize,rebase,…}/each build ano_std,panic = "abort",opt-level = "z"guest binary with a#[no_mangle] _start()that reads its*Config, reads sector 0 viaread_output_sector, dispatches by format, applies patches, and callscall_table.send_*_result. amend getssrc/operations/amend/, checked under the 384 KiB cap byscripts/check-binary-sizes.sh:65. -
Call-table / ABI boundary.
src/shared/src/lib.rsholds therepr(C)*Config/*Resultstructs nearOPERATION_CONFIG_ADDR(ResizeConfigline 3075,RebaseConfigline 3259, etc.) and theCallTablefunction pointers.crates/guest-protocol/proto/guest.protoholds theGuestMessageoneof (resize_result = 12, rebase_result = 13, …). amend adds anAmendConfig,AmendResult,send_amend_resultpointer, andAmendResultMessage. -
-ooption parsing.parse_o_options/parse_create_o_options(src/vmm/src/main.rs:8776/:9034) split-o k=v,k=v, validate(target, key)against a whitelist, and parse typed values (parse_o_bool,parse_o_size_u32). amend's surface is entirely-o-driven (qemu-img amend -o compat=…,lazy_refcounts=…), so this helper is central. -
Test / baseline / fuzz harnesses. Python integration tests in
tests/test_*.py(cross-checked againstqemu-imgas oracle, manifest-driven images, profile-grouped baselines viatests/base.py); cross-version baselines generated byinstar-testdata/scripts/generate-baselines.pyintoexpected-outputs/<op>-info-json/<target>/<version>/; coverage-guided fuzz targets insrc/fuzz/fuzz_targets/; differential fuzzop_*functions inscripts/differential-fuzz.py. Note the qcow2 unit-test feature-gate quirk:cargo test -p qcow2 --features create(seeMakefile:484-511).
Mission and problem statement¶
Implement instar amend such that:
- It accepts a subset of
qemu-img amend's surface: -o OPTIONSis the only way to specify changes, matching qemu-img. v1 recognises exactly two qcow2 keys:compat=0.10|1.1— set the qcow2 compatibility version.lazy_refcounts=on|off— set/clear the v3 lazy-refcounts compatible-feature bit. Any other recognised-but-unsupported qemu-img amend key (refcount_bits,data_file,data_file_raw, encryption keys,backing_file/backing_fmt) is rejected at runtime with a clear "not yet supported" error — the same posturecreate/resizetook for--object/--image-opts.cluster_sizeis rejected as not-amendable (qemu-img rejects it too).
-f FMTforces format detection.-qsuppresses the success line; errors still go to stderr.-
--output {human,json}selects rendering. -
It is qcow2-only in v1. Non-qcow2 inputs (raw, vmdk, vhd, vhdx) are rejected with a clear error — qemu-img amend itself only meaningfully supports qcow2 (and luks, which is out of scope). This mirrors
snapshot(qcow2-only) andcheck --repair(qcow2-only). -
The in-scope mutations are header-only and must be applied safely in place:
compat=1.1upgrade (v2 → v3). Setversion = 3, initialise the v3 feature words (incompatible/compatible/ autoclear = 0 unlesslazy_refcountsis also requested), setrefcount_orderto match the existing refcount width, setheader_length = 104, and preserve / relocate any header extensions that currently sit at the v2 offset 72 into the post-v3-header region (offset ≥ 104). No L1 / refcount / cluster changes.compat=0.10downgrade (v3 → v2). Refuse if any incompatible feature is set (DIRTY,CORRUPT,EXTERNAL_DATA,COMPRESSION,EXTENDED_L2) or if the image uses any v3-only structure a v2 reader cannot understand (non-16-bit refcounts, zero clusters via extended L2, compression). On success, writeversion = 2, zero the v3-only fixed-header words, and re-lay the header to the v2 72-byte shape (relocating extensions back to offset 72). Iflazy_refcountswas set, it must be cleared as part of the downgrade (lazy refcounts is a v3 feature).-
lazy_refcounts=on|off(v3 only; requesting it while downgrading to v2 or against a v2 image is an error): set or clearCOMPAT_LAZY_REFCOUNTSincompatible_features. When clearing, the at-rest image is already refcount-consistent (instar never opens images read-write outside an operation), so no refcount flush is required — but this assumption must be validated against the dirty bit and documented. -
It cross-validates against
qemu-img amend: after an instar amend,qemu-img info/check/compareon the result must agree with the same image amended byqemu-img amend(info-equivalence baselines, same model as create/resize). -
It is exercised by Rust unit tests, Rust round-trip tests, Python integration tests, cross-version baselines, coverage-guided fuzzing of the planner, and differential fuzzing against qemu-img.
Out of scope for v1 (see Future work): refcount_bits changes
(require rewriting the entire refcount tree); data_file /
data_file_raw external-data-file attach/detach; encryption /
LUKS amend; backing_file / backing_fmt via amend (owned by
rebase); any non-qcow2 format; converting between cluster sizes.
Open questions¶
These need resolution during detailed (phase) planning. Each is a real fork in the design, grounded in what the code does today.
-
Downgrade header writer.
build_header()(create.rs:329) only emits v3 headers. Acompat=0.10downgrade needs a v2-shaped header (72-byte fixed region, v3 words absent/zeroed). Options: (a) extendbuild_header()with a target-version parameter; (b) write a small amend-specific header serializer insrc/crates/amend/; (c) do a selective field patch (zero offsets 72–103, set version, fix header_length) à la rebase. Leaning toward (c) for the downgrade and reusingbuild_header()for the upgrade, but the header-extension relocation question (below) may force a fuller rebuild. -
Header-extension relocation. Header extensions live after the fixed header: at offset 72 for v2, at offset 104 (
header_length) for v3. A v2→v3 upgrade must move existing extensions from 72 to ≥104 (and a v3→v2 downgrade must move them back), being careful not to collide with the first refcount/L1 cluster. Question: do any of our test images / real oVirt images actually carry v2 header extensions, or is the common case "no extensions, so relocation is a no-op"? If relocation is needed, where does the extra space come from — is there always slack tocluster_size?qemu-imgrewrites the header cluster; we likely must too. Research qemu'sqcow2_amend_options/qcow2_update_headerfor the exact relocation algorithm and the corrupt-bit crash-safety ordering. -
Refcount-width constraint on downgrade. qcow2 v2 only supports 16-bit refcounts. If a v3 image has
refcount_bits != 16, acompat=0.10downgrade is impossible without rewriting the refcount tree (out of scope). Confirm we simply refuse such downgrades with a clear error rather than attempt them, matching the "refuse rather than guess" posture fromcheck --repair. -
Is amend ever a no-op / idempotent? If the requested options already match the current header (e.g.
compat=1.1on an already-v3 image with the same lazy flag), does qemu-img rewrite anyway or short-circuit? Decide whether instar emits anACTION_NOOP-style result (cf. resize's noop action) and whether the file mtime/header should be touched at all. Match qemu-img's observable behaviour where it is stable across versions; document divergences inKNOWN_AMEND_DIVERGENCES. -
Lazy-refcounts clear safety. qemu, when clearing
lazy_refcounts, ensures refcounts are flushed/consistent before dropping the bit. instar only ever touches images via a single guest operation and never leaves an image open read-write, so an at-rest image should already be consistent. Validate this claim: if theDIRTYincompatible bit is set, refuse (the image was left dirty by some other writer); otherwise clearing the compatible bit is safe. Confirm against the qcow2 spec and qemu source.
Resolved in phase 2 (PLAN-amend-phase-02-qcow2-planner.md,
src/crates/amend/):
- OQ1 / OQ2 (downgrade writer + relocation): option (A) — a
dedicated copy-and-adjust serializer in the amend crate (not
build_header, which only emits v3 and cannot preserve
existing extensions). A version change rebuilds the whole
header cluster, relocating the extension area + backing-file
string to the new fixed-header boundary (72↔104) and bumping
backing_file_offset by the shift; ExtensionRelocationUnsupported
only when the shift would overflow the cluster. Fixture work
confirmed real qemu layouts (v2 exts at 72; v3 header_length
= 112 with a feature-name-table ext; backing-file always
preceded by the backing-format ext). Upgrade writes the
minimal header_length = 104 and omits the feature-name-table
ext (info-equivalent; any residual divergence is recorded in
phase 6).
- OQ3 (refcount width): downgrade refuses refcount_bits != 16
with DowngradeRefcountWidth.
- OQ4 (no-op): target == current for both version and lazy
⇒ AmendAction::NoOp, zero patches (idempotent; the file is
not touched). Whether qemu rewrites anyway on a no-op is
reconciled in phase 6.
- OQ5 (lazy clear safety): any DIRTY/CORRUPT image is
refused (Dirty), so a non-dirty lazy clear is safe with no
refcount flush. A v3→v2 downgrade silently clears inherited
lazy; only an explicit lazy_refcounts=on against a v2
target is refused (LazyRequiresV3). (A review caught and
fixed a first-cut bug that rejected the silent-clear case.)
- OQ6 (crash-safety) remains for phase 3 — the planner emits
only the final target bytes; write ordering / corrupt-bit
guarding is a guest concern.
- Crash-safety / write ordering. Header rewrites must be
crash-safe.
check --repairestablished the pattern of guarding risky in-place mutation with thecorruptincompatible bit and ordering writes so a crash leaves either the old or new valid state. Does a single-cluster header rewrite need that machinery, or is a single sector/cluster write atomic enough at our guarantees? Decide and document.
Resolved in phase 3 (PLAN-amend-phase-03-guest.md,
src/operations/amend/): a direct header write, no corrupt-bit
guard, matching resize/rebase (and qemu's own
qcow2_update_header, which rewrites the header cluster without
the guard). The load-bearing fixed-header fields live in the
first sector (atomic flip), and the host fsyncs the output file
after the guest halts (vmm/src/main.rs file.sync_all()); the
corrupt-bit dance is reserved for check --repair's
multi-cluster, multi-phase mutation. Residual: a 512-byte-sector
image whose relocated extensions span multiple sectors has a
torn-header window on crash — the same window qemu has — accepted
for v1, with a corrupt-bit guard noted as possible future
hardening.
- Does amend need to read more than sector 0? resize/rebase
read sector 0 to detect format, then read more as needed. amend
needs the full header cluster (up to
cluster_size, ≥ 512 B) to see header extensions and compute relocation. Confirm the guest reads the whole first cluster, not just sector 0, before planning.
Resolved in phase 3: yes — the guest reads the whole first
cluster (read_byte_range(.., 0, .., cluster_size)) into a
dedicated buffer before planning, after a cluster_size <=
scratch-limit bounds check. A defensive layout guard also
refuses images whose refcount-table or L1 offset falls inside
cluster 0 (which the whole-cluster rewrite would clobber).
- ABI footprint.
AmendConfig/AmendResultshapes: how much do we pass? Likelytarget_format,flags(a bitfield: set-compat-v2/v3, set-lazy-on/off, quiet), the parsed current header summary the host already probed (version, refcount_bits, feature words, cluster_size, header_length), and an error/action code back. Decide whether the host pre-parses the header and passes a summary (like resize passescurrent_virtual_size) or the guest re-parses from the device. Prefer the resize model: host probes + passes a cross-check, guest re-parses and validates against it.
Resolved in phase 1. The ABI froze on the resize model:
AmendConfig (128 B) carries target_format, flags
(present-bit + value-bit pairs: FLAG_SET_COMPAT/
FLAG_COMPAT_V3, FLAG_SET_LAZY/FLAG_LAZY_ON, FLAG_QUIET),
and the host-probed cross-check (current_version,
current_refcount_bits, both feature words, cluster_size,
virtual_size); AmendResult (64 B) returns action
(noop/amended), resulting_version, resulting_lazy_refcounts,
and error. Only one call-table pointer was added
(send_amend_result), bumping CallTable::VERSION 17 → 18; no
new device-I/O primitive was needed (resolves Open question 6/7's
ABI half — amend reuses resize's read_output_sector /
write_output_sector). Header-length is not carried in the
config: cluster_size bounds the header-cluster read and the
guest derives the rest from its own re-parse.
Execution¶
Phase plans are written one at a time, at the recommended effort, and reviewed before the next is drafted. The phase breakdown mirrors the established subcommand-plan shape (ABI → planner → guest → host → tests → baselines → fuzz → docs), compressed because the mutation is header-only.
| Phase | Plan | Status |
|---|---|---|
1. ABI: AmendConfig/AmendResult, call-table send_amend_result, AmendResultMessage proto, magic/flag/error constants |
PLAN-amend-phase-01-abi.md | Complete |
2. qcow2 amend planner crate (src/crates/amend/): header patch computation for compat up/down + lazy toggle, all validation (downgrade blockers, refcount-width, extension relocation), inline unit tests |
PLAN-amend-phase-02-qcow2-planner.md | Complete |
3. Guest op (src/operations/amend/): read config, read full header cluster, dispatch qcow2, apply patches, send result; binary-size check |
PLAN-amend-phase-03-guest.md | Complete |
4. Host VMM subcommand: AmendArgs clap surface, -o option parsing + validation, host-side format probe, run_amend/run_amend_guest, human/json rendering |
PLAN-amend-phase-04-host-cli.md | Complete |
5. Rust round-trip tests (src/crates/amend/tests/): amend → re-parse, assert header invariants for each transition |
PLAN-amend-phase-05-rust-tests.md | Complete |
6. Python integration tests (tests/test_amend.py): cross-check vs qemu-img amend with post-op info/check/compare, known-divergence registry |
PLAN-amend-phase-06-integration.md | Complete |
7. Cross-version baselines: AMEND_CASES in generate-baselines.py, expected-outputs/amend-info-json/, testdata push |
PLAN-amend-phase-07-baselines.md | Complete (testdata push operator-gated) |
8. Coverage fuzz (fuzz_amend_planners.rs) + differential fuzz (op_amend in differential-fuzz.py) |
PLAN-amend-phase-08-fuzz.md | Complete — harnesses landed; the cluster-size defect they found (core .bss overflow into the op region) is root-caused and fixed (see Defects) |
9. Docs: docs/amend.md, docs/usage.md, CHANGELOG.md, ARCHITECTURE.md/README.md/AGENTS.md, index.md/order.yml |
PLAN-amend-phase-09-docs.md | Complete |
Agent guidance¶
Execution model¶
All implementation work is done by sub-agents, never in the
management session. The management session is reserved for
planning, review, and decision-making. The workflow per step:
plan (high effort) → spawn a sub-agent with the brief →
review the actual files (the summary describes intent, not
necessarily what changed) → fix or retry (improve the brief or
upgrade the model) → commit once satisfied. Use
isolation: "worktree" for risky/experimental steps.
Planning effort¶
The master plan is created at high effort. Phase 2 (qcow2 planner — header relocation, downgrade safety, refcount-width constraints) and phase 1 (ABI) and phase 8 (differential fuzz oracle) should be planned at high effort; they involve format-spec interpretation, the v2/v3 transition algorithm, and cross-validation correctness. Phases 3–7 and 9 follow well-established patterns from resize/rebase and can be planned at medium effort once the briefs front-load the research.
Step-level guidance¶
Each phase plan includes a step table:
| Step | Effort | Model | Isolation | Brief for sub-agent |
|------|--------|-------|-----------|---------------------|
| 1a | medium | sonnet | none | One-sentence summary of what to do and which files to touch |
| 1b | high | opus | worktree | Why this needs high effort: requires understanding X to do Y |
Effort levels — high: multiple files, judgment calls, non-obvious invariants, external spec research. medium: clear brief, well-defined approach. low: purely mechanical.
Model choice — opus for deep reasoning / cross-file architectural understanding / subtle correctness (the v2⇔v3 transition, header relocation, refcount safety, ABI changes bridging VMM and guest). sonnet for well-briefed implementation. haiku for mechanical tasks. When in doubt, skew to the more capable model — a failed implementation wastes more than a heavier model costs. A detailed brief compensates for a lighter model.
Brief for sub-agent — write it as if briefing a colleague who
has never seen the codebase: what to change, which files, what
patterns to follow, and non-obvious constraints (the 384 KiB guest
binary cap, the no_std requirement of the format/planner crates,
the call-table boundary, the repr(C) layout discipline, the
qcow2 --features create test gate). Front-load the research the
planner already did. For example, instead of "write the qcow2
upgrade patch", write "in src/crates/amend/src/qcow2.rs,
implement plan_amend_qcow2(header: &QcowHeader, req: &AmendRequest,
scratch: &mut [u8]) -> Result<AmendPlan, AmendError>. For the
v2→v3 upgrade, reuse the field offsets from
src/crates/qcow2/src/lib.rs (VERSION_OFFSET,
REFCOUNT_ORDER_OFFSET = 96, HEADER_LENGTH_OFFSET = 100) and
emit a single header-cluster Write; relocate any extension at
offset 72 to offset 104 first."
Management session review checklist¶
After a sub-agent completes, verify:
- The files that were supposed to change actually changed (read them, don't trust the summary).
- No unrelated files were modified.
-
make instarbuilds andmake lintis clean. - Guest binaries pass
make check-binary-sizes(384 KiB). -
make test-rust(includingcargo test -p qcow2 --features createand the newamendcrate) passes. - Relevant
make test-integrationtargets pass. -
pre-commit run --all-filespasses. - The changes match the intent of the brief — semantically, not just syntactically.
- Commit message follows project conventions (Co-Authored-By
with model, context window, effort level, and other
settings;
Signed-off-by;Prompt:paragraph).
Administration and logistics¶
Success criteria¶
We will know this plan has been successfully implemented because the following statements will be true:
instar amend -o compat=1.1 <v2-image>produces an image thatqemu-img inforeports ascompat 1.1, and thatqemu-img checkandqemu-img compareagainst the original (data-equivalent) pass.instar amend -o compat=0.10 <v3-image-without-v3-features>produces acompat 0.10image; the same amend against an image with any blocking v3 feature is refused with a clear, qemu-comparable error.instar amend -o lazy_refcounts=on|off <v3-image>toggles the compatible-feature bit and nothing else.- For every in-scope case, an
instar-amended image is info-equivalent to the same image amended byqemu-img amendacross the supported qemu-img version matrix (baselines). make instarbuilds andmake lintis clean.- Guest binaries pass
make check-binary-sizes(384 KiB limit); the newamend.binis registered inscripts/check-binary-sizes.sh. - All Rust unit tests pass (
make test-rust), including the newamendcrate and round-trip tests. - All Python integration tests pass (
make test-integration), includingtests/test_amend.py. - Coverage-guided fuzzing of the amend planner and differential
fuzzing against
qemu-img amendrun clean. pre-commit run --all-filespasses.- Documentation in
docs/(amend.md,usage.md) andARCHITECTURE.md,README.md,AGENTS.md,CHANGELOG.md,docs/plans/index.md,docs/plans/order.ymlare updated.
Future work¶
Obvious extensions deferred from v1:
core.binis at its 64 KiB budget ceiling (surfaced in phase 1). Wiring amend's result sender intocoretook it from 63 680 B (97 %) to 65 304 B (99 %) — and only after moving the result strings off the wire to keep it under the 65 536 B cap (see phase-1 plan Open question 5). ~232 B of headroom remain. The next subcommand/phase that adds anything tocorewill overflow it; at that point we must either keep trimming per-feature or lift the core memory budget (move the operations load address insrc/shared/src/lib.rs's memory map — a loader/layout change affecting every guest binary). An operator decision, not resolved here.refcount_bitsamend. Changing the refcount width requires rewriting the entire refcount tree (and possibly growing/ shrinking refcount blocks). Substantial; reuse thesrc/crates/snapshot/refcount mutators and the resize refcount-table growth helpers if a real workflow demands it.- External data file amend (
data_file,data_file_raw): attach/detach an external data file, manage theINCOMPAT_EXTERNAL_DATAbit and theDATAheader extension. - LUKS / encryption amend (changing keyslots/parameters).
backing_file/backing_fmtvia amend. qemu-img amend can change these, but instar already owns this viarebase -u; decide whether to alias or leave it torebase.- Non-qcow2 amend. qemu-img amend is qcow2-centric; revisit only if a concrete need appears.
- Header-extension relocation hardening. If v1 punts on images carrying v2 header extensions (refusing rather than relocating), lift that restriction here.
- Compressed-image downgrade (documented divergence, phase 6).
instar amend -o compat=0.10refuses a v3 image with the compression (zstd) incompatible feature (ERROR_DOWNGRADE_BLOCKED_FEATURE), because v2 cannot representcompression_typeand instar's v1 does not recompress cluster data — the "refuse rather than guess" posture. qemu-img 10.0.8 accepts the same downgrade (rewriting the compression type).tests/test_amend.pyasserts instar's refusal and records the divergence without failing. Lifting this would mean recompressing zstd→zlib (out of v1 scope).
Bugs fixed during this work¶
List any bugs encountered and fixed during development here. At the
start of phase 1, scan the security-audit GitHub issue tracker
for any open qcow2-header / feature-bit / version-detection issues
that this work should resolve or be aware of.
Defects found during this work¶
- Cluster-size
ERROR_HEADER_MISMATCH(FIXED).instar amendspuriously failed withERROR_HEADER_MISMATCHfor certain qcow2 cluster sizes (512, 2048, 16384, 32768, 262144 …) whileqemu-img amendaccepted the same operation; the default 64 KiB cluster and many other sizes succeeded, which is why the by-example tests (phases 5–7, all built at the default cluster) never caught it. Found by the phase-8 differential fuzzer (op_amend).
Root cause: core.bin's .bss overflowed its 64 KiB budget
(0x10000–0x20000) into the operation region at 0x20000. The
static OUTPUT_DEVICE: Option<VirtioBlock> was linked at 0x20380,
and when core initialised the output block device it wrote the
VirtioBlock struct there — clobbering ~72 bytes of the loaded op's
code at 0x20380–0x203c7. For amend that region held the header
cross-check, whose corrupted bytes then branched data-dependently on
the cluster-size values (error for some sizes, accidental success for
others). Every op's bytes there were corrupted, but only amend had
critical branch logic at that offset. The flat-binary size lint missed
it because the .bin excludes .bss (the file was under budget while
the runtime extent reached 0x203d0).
Root-caused with host-only KVM debugging (a hardware data-write
watchpoint on 0x20380 trapped the writing instruction in core.bin) —
the guest op is too codegen-fragile for source instrumentation.
Fix: OPERATION_LOAD_ADDR raised 0x20000 → 0x22000 (giving
core a 72 KiB region so its .bss no longer overlaps the op);
src/operations/*/linker.ld OPERATION_BASE updated to match; and
scripts/check-binary-sizes.sh rewritten to validate the
.bss-inclusive ELF memory extent (not just the flat .bin size), so
this class of overflow is caught in future. Verified: amend passes all
13 cluster sizes (512 … 2 MiB), the differential fuzzer reports 0
divergences over 60 iterations, and the resize/create/check/snapshot/
map suites are unregressed.
- resize grow on a 64 KiB-cluster qcow2 (PRE-EXISTING, separate, out
of scope).
instar resize <img> 8Mon aqemu-img create -f qcow2 -o cluster_size=65536 <img> 4Mimage fails with resize error 13 (header mismatch). It reproduces identically on the clean branch before this fix (so it is not caused by theOPERATION_LOAD_ADDRmove) and is not in the resize test suite (which passes). Flagged for a separate investigation; likely a distinct resize-planner accounting issue, unrelated to the amend.bsscorruption above.
Documentation index maintenance¶
This master plan has been added to:
docs/plans/index.md— a row in the Master plans table (dated 2026-06-15, status Complete (phases 1-9), phases listed).docs/plans/order.yml— an entry so it appears in the docs navigation. Phase files are not added toorder.yml.
All phases are complete; the index.md status column has been
updated to Complete.
Back brief¶
Before executing any step of this plan, the executing agent should back brief the operator as to its understanding of the plan and how the intended work aligns with it.