Phase 4: output formatting¶
Master plan: PLAN-map.md · Previous phase: PLAN-map-phase-03-host-cli.md
Status: Complete¶
Streaming MapRenderer<'a, W: Write> in src/vmm/src/main.rs
writes each extent to stdout (via a BufWriter over
stdout().lock()) as the MapExtentMessage arrives in the
vCPU loop; host memory stays O(1) regardless of how fragmented
the source is. Human and JSON output match qemu-img map
byte-for-byte modulo the eight quirks now documented in
docs/quirks.md. 21 byte-exact unit tests pin the renderer
against expected output sequences. BrokenPipe on stdout
short-circuits cleanly with exit 0.
Mission¶
Replace the phase 3c placeholder renderer
(format_map_human / format_map_json / print_map_result)
with a streaming MapRenderer that emits each
MapExtentMessage as it arrives during the vCPU loop and
matches qemu-img map's output byte-for-byte for both
--output=human and --output=json. Two divergences are
documented (not fixed) in docs/quirks.md: raw-source
sparseness via SEEK_HOLE (not implemented) and qcow2
compressed-cluster reporting (always emitted as
compressed: false).
After phase 4, the only thing standing between us and the
cross-version baseline matrix is the testdata sweep in
phase 5. Phase 4's own success criterion is byte-equality
against the system qemu-img map for a handful of hand-
crafted fixtures across all five source formats.
Why this is its own phase¶
Phase 3 shipped a working CLI but with structural-only output. Phase 4 has two distinct deliverables that are each non-trivial:
-
Byte-exact output: qemu-img's human format uses 16-char fixed-width columns, special-cases
0(no0xprefix), elides hole / zero-allocated extents entirely (onlydata: truerows appear), and quotes the filename as it was passed onargv. The JSON format always emitscompressed: false, omitsoffsetfor non-data extents, separates objects with,\n, and never indents. Matching these is mechanical but every detail is wrong in the phase 3c placeholder. -
Streaming emission: phase 3 buffers extents in a
Vec<MapExtentMessage>. For pathologically fragmented sources (17M extents on a 1 TiB qcow2) the buffer reaches multi-GiB host memory. Phase 4 swaps the buffer for aMapRendererthat writes each extent to stdout as it arrives in the vCPU loop, bringing host memory back to O(1).
Bundling these together is the right grain — the renderer shape (a struct holding a writer + state) is what enables both, and the test surface is the same.
Architecture¶
MapRenderer struct¶
struct MapRenderer<'a, W: Write> {
writer: &'a mut W,
output_format: MapOutputFormat,
/// The argv string the user passed for the source. Used
/// verbatim in the human-output "File" column. qemu-img
/// echoes whatever was on the command line (e.g. relative
/// paths stay relative).
filename: String,
/// True after the first extent has been emitted. The JSON
/// writer uses this to decide whether to prefix the next
/// extent with `,\n` (subsequent) or write `[` first
/// (initial).
first_extent: bool,
/// Count of extents the renderer has actually written to
/// the output (human-mode emits only `data: true` rows,
/// so this can be lower than the count of extents the
/// guest sent).
extents_written: u64,
}
enum MapOutputFormat {
Human,
Json,
}
impl<'a, W: Write> MapRenderer<'a, W> {
pub fn new(writer: &'a mut W, output_format: &str, filename: String) -> Self { ... }
/// Write the format-specific header / opening character.
/// Called once before any `emit_extent` call.
pub fn begin(&mut self) -> io::Result<()> { ... }
/// Write one extent's representation. For human mode this
/// emits a row only if the extent has `data == true`;
/// holes and zero-allocated extents are silently skipped.
/// For JSON mode every extent is emitted.
pub fn emit_extent(
&mut self,
ext: &guest_::MapExtentMessage,
) -> io::Result<()> { ... }
/// Write the format-specific closing character. Called
/// once after the last `emit_extent` and before printing
/// any error.
pub fn finish(&mut self) -> io::Result<()> { ... }
}
Exact human output specification¶
Header row (4 columns, each left-justified in a 16-char field; last column has no trailing pad):
byte-by-byte:
- Offset (6) + 10 spaces (= 16 chars) +
- Length (6) + 10 spaces (= 16 chars) +
- Mapped to (9) + 7 spaces (= 16 chars) +
- File (4) + \n
Total: 53 bytes.
Data row format:
offsetandlength: lowercase hex with0xprefix unless the value is zero, in which case just the literal0. (E.g.0x10000for 65536, but0for 0.)mapped_to: same rule as offset/length.filename: the argv string the user passed, no quoting, no resolution. Phase 4 captures this from theMapArgs::inputfield.
Holes and zero-allocated extents are elided — no
human-mode row is emitted for them. This matches qemu-img:
only extents with data: true produce visible rows. The
empty-image case prints only the header line (verified
against qemu-img map on a freshly-created qcow2).
Exact JSON output specification¶
Opening: [ (1 byte, no newline).
Per-extent object (always on its own line, no leading whitespace):
{ "start": N, "length": N, "depth": 0, "present": B, "zero": B, "data": B, "compressed": false[, "offset": N]}
Field order is fixed: start, length, depth, present, zero,
data, compressed, offset. offset is emitted only when
data == true. compressed: false is always emitted
(qcow2 compressed-cluster reporting is a documented
divergence; see docs/quirks.md follow-up).
Inter-object separator: ,\n (comma then newline). The
final object has no trailing comma.
Closing: ] (1 byte). No trailing newline — qemu-img does
not emit one.
Whitespace inside each object:
- Single space after { and before }.
- Single space after each :.
- Single space after each , (but the inter-object
separator is ,\n so this rule applies only inside the
object).
- Boolean values lowercase (true / false).
- Numbers are decimal (never hex, scientific notation, or
quoted).
State → triple → field-set mapping¶
| guest state | present | zero | data | emit offset |
|---|---|---|---|---|
"hole" |
false | true | false | no |
"zero" |
true | true | false | no |
"data" |
true | false | true | yes |
This matches the phase 3c map_state_triple table. Phase 4
keeps that helper as the source of truth; the field-set
order changes (compressed becomes a constant false).
Streaming pivot in run_map¶
Phase 3b's vCPU loop pushed MapExtentMessages into
extents: Vec<MapExtentMessage> and called
print_map_result(&extents, ...) after the loop. Phase 4
restructures:
- Before the vCPU loop: open
stdout, wrap in aBufWriter, constructMapRenderer, callbegin(). The header / opening[is emitted before any extent message arrives. For the error path (guest fails to start, KVM crashes), this is a small wasted output but matches the streaming semantic — the user already reading the table sees the header before the failure.
Alternative: defer begin() until the first
MapExtent arrives. This avoids the wasted header on
guest-failure, at the cost of branching on
first_extent inside the loop. Recommendation: defer
for cleanliness (the header is only valid when at
least one message arrives) — see open question 1.
- In the loop:
- On
MapExtentpayload: callrenderer.emit_extent(&e). Errors propagate. - On
MapResultpayload: store + flag (same as phase 3b). -
Other payloads: route via
format_messagewhen verbose. -
After the loop:
- If
MapResulthad a non-OK error: callmap_error_message(code)and write to stderr. Do not callrenderer.finish()— the output may be a partial table or JSON open-bracket; the JSON consumer will see invalid output but the user gets a clear stderr message. - If OK: call
renderer.finish(). For JSON this writes]; for human this is a no-op.
The renderer holds a &mut W so the vCPU loop owns the
underlying stdout lock. BufWriter performance: a 17M-
extent run does 17M small writes; flushing per extent
would dominate runtime. The BufWriter flushes on drop
(via finish() plus Drop) so the user sees the output
in order.
Tests (Vec-backed writer)¶
Unit tests construct MapRenderer against a Vec<u8>
and assert exact byte sequences. Test set:
- Empty extents, human: header only.
- Empty extents, JSON:
[]. - Single data extent, human: byte-for-byte against the expected 16-char-column row.
- Single data extent, JSON: byte-for-byte object.
- Hole-only extent, human: header only (hole elided).
- Hole-only extent, JSON: one object with
present: false, zero: true, data: false, compressed: false, nooffset. - Mixed extents (data + hole + zero-alloc), human: only the data row appears.
- Mixed extents, JSON: all three objects in order
with
,\nseparators. - Zero offset / zero length value, human: the
literal
0(not0x0). - Large u64 file_offset, human: lowercase hex.
- Large u64 file_offset, JSON: decimal.
compressed: falsealways emitted in JSON: grep-style assertion across all three states.- Field order in JSON: regex match on
"start":.*"length":.*"depth":.*"present":.*"zero": .*"data":.*"compressed". - Filename preserved verbatim in human "File"
column: pass a path with
..and a trailing space; assert the bytes match exactly.
Each test starts with MapRenderer::new(&mut buf, ..., "<filename>"),
calls begin(), emits extents, calls finish(), then
compares buf to expected &[u8] literal byte sequences
generated from qemu-img map output for a matching
synthetic image.
Tear-down of phase 3c helpers¶
Phase 3c shipped:
- map_state_triple(state) -> (bool, bool, bool, bool)
- format_map_human(extents) -> String
- format_map_json(extents) -> String
- map_error_message(code) -> Option<&'static str>
- print_map_result(extents, result, output_format)
Phase 4 keeps map_state_triple (now part of MapRenderer)
and map_error_message (used by run_map for the error-
stderr path). Drops format_map_human, format_map_json,
and print_map_result — their tests in
map_renderer_tests get rewritten against MapRenderer.
The phase 3c unit tests that pinned structural invariants (state triple, error messages) survive verbatim. The format-output tests get replaced with byte-exact assertions.
docs/quirks.md additions¶
A new ## map subcommand quirks section with these
entries:
--image-optsis rejected — same as measure; reuse wording.- Backing-chain
depthis always 0 — chain composition is deferred (PLAN-map.mdfollow-up);depthis always 0 in v1. qemu-img walks the chain when present. - Raw source sparseness: instar reports raw images
as one fully-allocated
data: trueextent. qemu-img walksSEEK_HOLE/SEEK_DATAon the underlying file and reports sparse regions aspresent: true, zero: true, data: false. The host-sideSEEK_HOLEprepass is tracked as future work. - VHDX partial-present: instar treats
PAYLOAD_BLOCK_PARTIALLY_PRESENTasdata: true. qemu-img walks the per-sector bitmap and emits per-sector extents. Future work. - VMDK multi-extent: refused with a clear error pointing at qemu-img.
- qcow2 compressed clusters: instar emits
compressed: falsefor every extent, including compressed ones. qemu-img emitscompressed: truefor compressed-cluster extents. The phase 1 walker classifies compressed clusters asDatawith a file offset but does not carry the compressed bit through the FFI / protobuf path. Future work — extendMapExtentRecord/MapExtentMessagewith acompressed: boolfield. - Trailing newline: qemu-img emits no trailing
newline after
]in JSON mode. instar matches this.
Open questions¶
-
Header emission timing: emit before the first extent (cleaner streaming semantics) or after KVM confirms the guest started (cleaner error path)? Recommendation: before. The header is independent of whether the guest succeeds; a partial table is no worse than a no-table-then-error sequence, and the streaming pivot is cleaner.
-
BufWriterflush on error: if an extent emit fails (stdout closed, broken pipe), should the renderer swallow the error or propagate? Recommendation: propagate, but downgradeBrokenPipeto a clean exit (the user piped toheador similar). Need to handle the std-lib pattern forBrokenPipe. -
Filename column quoting: qemu-img passes the argv string verbatim. What about embedded newlines or tab characters? Recommendation: pass verbatim and accept the produced output may be confusing — qemu-img does the same. A future hardening pass can shell-quote suspect strings.
-
JSON streaming and
--output=jsonerror paths: if the guest reportsERROR_HAS_BACKINGafter the[has been written, the user sees a truncated[\non stdout plus the error on stderr. Phase 4 accepts this: JSON consumers should check the exit code, not parse stdout speculatively. Document in quirks. -
Output profile machinery for map: the master plan said "add
map-human/map-jsonoutput profiles if the baseline matrix reveals version drift". Phase 4 verifies against the current system qemu-img only (10.0.8 in dev). Phase 5's baseline sweep is where we discover whether older qemu-img versions diverged in output format. Recommendation: phase 4 does not add the profile machinery; phase 5 adds it if needed. -
Output to a non-terminal: when stdout isn't a tty (piped into
jq, redirected to a file), no extra handling needed —println!/write!work the same way. TheBufWriterlock-grab is the same. -
extents_emittedfrom the guest vs. extents actually written: the guest'sMapResultMessagecounts every extent it sent. The renderer'sextents_writtenmay be lower (human mode skips holes). The host doesn't reconcile these — they serve different purposes (the guest count is for the streaming-protocol audit; the renderer count is internal). Document in the renderer's doc comment.
Execution¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 4a | high | opus | none | Add the MapRenderer<'a, W: Write> struct + MapOutputFormat enum to src/vmm/src/main.rs per the schema in the Architecture section. Implement new, begin, emit_extent, finish. Human mode: 16-char-column rows, hex with lowercase 0x prefix or literal 0 for zeros, elide non-data extents. JSON mode: streaming [...] array with ,\n separators, compressed: false always emitted, offset only for data extents, no trailing newline. Add ≥14 unit tests in mod map_renderer_tests per the "Tests" section using a Vec<u8> writer and exact-byte assertions. Generate expected byte sequences by running qemu-img map against synthetic fixtures during development (commit the expected bytes inline). Keep map_state_triple and map_error_message from phase 3c (they will be reused by 4b). The renderer is unused by run_map until 4b lands — phase 3c's renderer still owns the call site. High effort because: byte-for-byte parity has many small traps (the literal 0 rule, the JSON ,\n separator, the field order, the compressed: false always-emitted rule) and each one is invisible to a structural test. Cross-check expected bytes against qemu-img map --output={human,json} for the matching synthetic images during 4a. Touches only src/vmm/src/main.rs. |
| 4b | medium | sonnet | none | Replace the phase 3c renderer in run_map: drop the extents: Vec<MapExtentMessage> collector and the print_map_result(&extents, &result, ...) call. Construct a MapRenderer before the vCPU loop, call renderer.begin(). In the loop's MapExtent arm call renderer.emit_extent(&e). After the loop, if the result has a non-OK error call map_error_message(code) to stderr; otherwise call renderer.finish(). Delete the obsolete format_map_human / format_map_json / print_map_result functions. Update the phase 3c map_renderer_tests to import from the new MapRenderer-based shape (or delete the tests that targeted the removed functions; the byte-exact tests in 4a cover the new contract). Handle BrokenPipe cleanly per Open question 2. Run make instar, make lint, make test-rust. The test count should remain ≥ 870 (phase 3c had +18; phase 4a adds ≥14 new; removing the deleted-function tests subtracts some, net should be a small positive). |
| 4c | low | sonnet | none | Add a new ## map subcommand quirks section to docs/quirks.md between the measure subcommand quirks (line 791 area) and the next major section. Cover the seven entries listed in the "docs/quirks.md additions" section above. Each entry follows the format used by the measure section: short heading, two-paragraph body where applicable, cross-reference to PLAN-map.md or the relevant source code. No code changes — pure docs. Run pre-commit run --all-files. |
| 4d | low | sonnet | none | Update ARCHITECTURE.md's operations/map/ entry to mention that phase 4 has shipped: byte-for-byte human / JSON output via the MapRenderer struct, streaming emission, documented divergences in docs/quirks.md. Update CHANGELOG.md Unreleased / Added with one line crediting phase 4. Run pre-commit run --all-files. |
Total: 4 commits.
Out of scope for phase 4¶
- Cross-version baseline generation (phase 5).
map-human/map-jsonoutput profile machinery (phase 5 if baselines reveal drift).- Integration tests against real testdata (phase 6).
- Coverage-guided fuzz harness (phase 7).
- Differential fuzz against qemu-img map (phase 8).
- Compressed-cluster reporting fix (future work; the walker / FFI / proto path needs a new bit).
- Raw
SEEK_HOLEhost-side prepass (future work). - VHDX partial-present per-sector walk (future work).
- Backing-chain composition (future work).
Success criteria¶
MapRenderer<'a, W: Write>lands insrc/vmm/src/main.rswithnew,begin,emit_extent,finish.run_mapusesMapRendererfor output; noVec<MapExtentMessage>host-side buffering remains.- For a hand-crafted fragmented qcow2 fixture:
instar map --output=json FIXTUREproduces a byte- identical output toqemu-img map --output=json FIXTURE(modulo the documented divergences).instar map --output=human FIXTUREproduces a byte- identical output toqemu-img map --output=human FIXTURE.- For the same source images:
- Holes are elided in human output.
compressed: falseappears in every JSON object.- No trailing newline after
]. make instarbuilds.make lintclean.make test-rustpasses; renderer tests assert exact byte sequences.pre-commit run --all-filesclean.docs/quirks.mddocuments the seven map-specific divergences.ARCHITECTURE.mdandCHANGELOG.mdreflect phase 4.
Risks and mitigations¶
-
Byte-for-byte traps: the literal-
0rule and thecompressedalways-emit rule are both invisible to structural tests. Mitigation: 4a writes the expected byte sequences inline in the test source, generated by running qemu-img during development. Any future qemu-img format change will fail the byte-exact tests and surface a baseline-update task. -
Streaming write performance: 17M extents through a
BufWriter<Stdout>should be no slower than the Vec-then-render path. Mitigation: phase 4 doesn't micro-optimise; phase 7 / 8 (fuzz / differential) will surface any pathological cases. -
BrokenPipe error spam: piping
instar mapintoheadproduces anEPIPEafter the head closes. Mitigation: 4b catchesBrokenPipespecifically and exits 0 (matches what coreutils does for the same pattern). -
Test fragility: byte-exact tests against the current qemu-img will break when qemu-img changes format. Mitigation: the cross-version baseline matrix in phase 5 handles version-keyed expected outputs; phase 4's tests pin only the current-system output. If qemu-img 11.x changes format, phase 5's output-profile machinery absorbs it.
-
compressed: falsedivergence: emitting the field as a literalfalseeven for compressed clusters produces output that differs from qemu-img for the niche compressed-cluster case. Mitigation: documented in quirks; the differential fuzzer (phase 8) is configured to skip the compressed-cluster paths pending the FFI extension. -
Header before guest failure: writing the human header before the guest has produced any output means a guest startup failure leaves
Offset Length Mapped to File\non stdout. Mitigation: documented in quirks; the exit code distinguishes success from failure; JSON consumers don't see this case because the JSONbegin()only writes[which is also invalid by itself (the consumer checks exit code).
Back brief¶
Before executing any step, the executing agent should
back-brief: which file is being edited (almost always
src/vmm/src/main.rs), which existing function is the
closest template (MapRenderer is genuinely new —
nothing in the codebase has the exact shape; the
streaming pattern is closest to how convert's
progress messages drive send_progress, but the
renderer is simpler), and which expected byte sequences
in the tests came from running qemu-img map against
which synthetic fixtures. The reviewer should verify no
step bleeds into phase 5 (baselines), phase 6
(integration tests), or phase 7 / 8 (fuzz).