PLAN-resize phase 9: preallocation modes¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the instar codebase thoroughly. Read relevant
source files, understand existing patterns (VMM structure, guest
operation layout, shared crate conventions, call table ABI,
format parsing, test infrastructure), and ground your answers in
what the code actually does today. Do not speculate about the
codebase when you could read it instead. Where a question touches
on external concepts (POSIX posix_fallocate, Linux
fallocate(FALLOC_FL_ZERO_RANGE), qemu-img preallocation
semantics), research as needed to give a confident answer. Flag
any uncertainty explicitly rather than guessing.
This is a phase plan under PLAN-resize.md. Refer to that
master plan for overall context. Phases 1–8 shipped the planner,
the guest binary, and the host CLI; phase 9 lifts the
deferred-from-phase-8 --preallocation=falloc|full post-pass.
Mission¶
Replace the two phase-9-pointer errors in run_resize_raw and
run_resize_nonraw with the actual host-side preallocation
finalisation:
off: no post-pass (the default; already handled).falloc:posix_fallocateover the newly-added file region. Reserves disk blocks without writing data.full: zero the newly-added file region. Triesfallocate(FALLOC_FL_ZERO_RANGE)first (fast — no actual writes on btrfs / ext4 / xfs); falls back to a 64 KiBpwriteloop on filesystems that don't support it (tmpfs, NFS, some FUSE).metadata: the qcow2/vhdx grow planners are responsible for this mode (master plan, Per-format resize plans). Phase 9 leaves the planner's existingPreallocationUnsupportedrejection in place; lifting it is queued under Future work.
The "newly-added file region" is interpreted format-by-format:
- raw + grow:
[current_virtual_size, new_virtual_size). Matches qemu exactly. - raw + shrink: no post-pass. Falloc/full + shrink is rejected up-front as nonsensical (you can't preallocate space you're discarding).
- non-raw + grow:
[current_file_size, new_file_size)— i.e. the bytes the planner physically appended past the pre-resize EOF (new L1 region for qcow2 L1Grow, new BAT region for vhd / vhdx / vmdk relocate, etc.). This is a deliberate divergence from qemu, which preallocates the entire data-region span for the new virtual size. Lifting this to full qemu parity requires per-format walk-and- populate logic comparable to add if=/dev/zeroover the data region; queued under Future work. - non-raw + shrink: no post-pass. Same rationale as raw.
The phase plan's master-plan note ("the host knows
total_file_size from ResizeResult and can fallocate the
rest") establishes this interpretation — the host preallocates
the file's new region, not the format's data region.
Phase 13's docs/quirks.md documents the divergence.
What the survey turned up¶
apply_preallocationatsrc/vmm/src/main.rs:8511-8549already does the heavy lifting: takes(file, mode, offset, len), dispatches on"falloc"/"full", callsposix_fallocateorfill_zeros. Reusable as-is; only the error-message prefix ("create: posix_fallocate failed") is operation-specific.fill_zerosatsrc/vmm/src/main.rs:8432-8499does thefallocate(FALLOC_FL_ZERO_RANGE)+ write-loop fallback. Already operation-agnostic.- The current phase-8 raw rejection lives in
run_resize_raw(at the top of its body): rejectsmetadata/falloc/fullwith a "deferred to phase 9" message. - The current phase-8 non-raw rejection lives in
run_resize_nonraw(at the top of its body): rejectsfalloc/fullwith a "deferred to phase 9" message.metadatais routed through to the guest (which the planner currently rejects withPREALLOCATION_UNSUPPORTED). run_create_nonrawatsrc/vmm/src/main.rs:7191-7231is the reference call site forapply_preallocation(used for qcow2's data-region post-pass).
Algorithmic design¶
Shared helper: make apply_preallocation op-agnostic¶
The existing error messages hard-code "create:". Phase 9
parameterises this so the resize call sites surface
"resize:" instead:
fn apply_preallocation(
file: &std::fs::File,
op_name: &str, // NEW: "create" or "resize"
mode: &str,
data_offset: u64,
data_len: u64,
) -> Result<(), Box<dyn std::error::Error>>;
fill_zeros stays op-agnostic (its caller wraps the error
with the op prefix already; resize follows the same pattern).
The two existing apply_preallocation call sites in
run_create_nonraw are updated to pass "create".
Raw post-pass¶
fn run_resize_raw(...) -> Result<(), ...> {
// ... validation as today ...
// Reject shrink + falloc/full combinations up-front; they
// are nonsensical (preallocating space we're discarding).
let mode = args.preallocation.as_str();
if matches!(mode, "falloc" | "full") && new_virtual_size < probed.current_virtual_size {
return Err(format!(
"resize: --preallocation={mode} is meaningless when shrinking"
).into());
}
if mode == "metadata" {
return Err("resize: --preallocation=metadata is not supported for raw".into());
}
let file = OpenOptions::new().read(true).write(true).open(&args.filename)?;
file.set_len(new_virtual_size)?;
// Phase 9: post-pass the newly-added region only. For
// shrink (rejected above for falloc/full) and noop (size
// unchanged) this is a no-op.
if new_virtual_size > probed.current_virtual_size {
let added = new_virtual_size - probed.current_virtual_size;
apply_preallocation(&file, "resize", mode, probed.current_virtual_size, added)?;
}
file.sync_all()?;
// ... render success as today ...
}
Non-raw post-pass¶
fn run_resize_nonraw(...) -> Result<(), ...> {
// ... existing validation up to and including the
// run_resize_guest call ...
// Phase 9: post-pass the newly-appended file region.
// This is BEFORE set_len so we operate on the physical
// appended bytes (set_len trims/extends to the exact EOF
// the planner reported).
let mode = args.preallocation.as_str();
if matches!(mode, "falloc" | "full") && result.action == RESIZE_RESULT_ACTION_SHRINK {
return Err(format!(
"resize: --preallocation={mode} is meaningless when shrinking"
).into());
}
let file = OpenOptions::new().write(true).read(true).open(&args.filename)?;
if matches!(mode, "falloc" | "full")
&& result.file_size_after > result.file_size_before
{
let added = result.file_size_after - result.file_size_before;
apply_preallocation(&file, "resize", mode, result.file_size_before, added)?;
}
file.set_len(result.file_size_after)?;
file.sync_all()?;
// ... render success as today ...
}
Notably the file is opened read+write here (instead of
write-only as in 8b) so apply_preallocation can call
posix_fallocate / fallocate(FALLOC_FL_ZERO_RANGE) —
both syscalls require the fd to be writable but the existing
fill_zeros doesn't care which mode it was opened in.
The metadata-mode story¶
metadata is the qcow2/vhdx-specific mode where the new L2 /
BAT entries are pre-populated to point at zero clusters.
Phase 2c's PreallocationUnsupported rejection in the qcow2
planner stays in place; phase 5b's VHDX planner does the
same. Phase 9 doesn't lift either rejection — it stays
queued under master-plan Future work ("QCOW2 grow with
Preallocation::Metadata").
For phase 9, when the user passes --preallocation=metadata:
- raw: rejected with
"--preallocation=metadata is not supported for raw"(no metadata to populate). - non-raw: routed through to the guest; the planner
returns
PreallocationUnsupported; the host maps this to"preallocation mode not supported by this format"via the existingmap_resize_error. Matches the post-phase-2c behaviour.
Shrink + preallocation rejection¶
falloc / full only make sense when growing. Phase 9
rejects them up-front for shrink:
(qemu's behaviour: it accepts the combination silently and
performs the shrink without preallocating; matching qemu
would mean silently ignoring the flag. We reject explicitly
for clarity. Documented in docs/quirks.md as a deliberate
divergence.)
Sparse-format data-region preallocation: deferred¶
As noted in the mission, phase 9 does not match qemu's behaviour of preallocating the entire data-region span for sparse formats. The Future-work entry will land per-format data-region preallocation passes (qcow2: walk every L1 entry, allocate L2 tables and data clusters; vhdx: write every BAT entry's data block; vmdk: walk every GD entry; vhd: walk every BAT entry). Each format's pass is comparable in complexity to a half-create operation.
Test surface¶
Unit tests live alongside the existing parse_* tests in
resize_size_parser_tests:
apply_preallocationno-ops formode="off"andlen=0.apply_preallocationfalloc-mode allocates the requested range (verified viastat -c %bblock count).apply_preallocationfull-mode zeros the range (verified by reading back and asserting all-zero).apply_preallocationfull-mode's write-loop fallback triggers whenFALLOC_FL_ZERO_RANGEreturnsEOPNOTSUPP(mocked via a test wrapper that interceptsfallocate).
The fallback-path test is the trickiest. The current code
calls libc::fallocate directly, with no abstraction layer
to mock. Options:
- Add a
fill_zeros_with_fallbackvariant that takes a closure for the fallocate call, so tests can inject a no-op closure that returnsEOPNOTSUPP. - Skip the fallback test in unit tests; rely on the integration suite + a feature flag that disables the fallocate fast path during testing.
- Test on a filesystem that doesn't support
FALLOC_FL_ZERO_RANGE(e.g. tmpfs in some configurations).
Recommendation: option 1. Add a fill_zeros_inner that
takes a fallocate_fn closure; production callers pass the
real syscall, tests pass a closure that returns EOPNOTSUPP.
fill_zeros becomes a thin wrapper.
End-to-end integration tests (raw + qcow2 + vhd + vhdx + vmdk × off / falloc / full / metadata) land in phase 11.
Public API delta¶
None outside src/vmm/src/main.rs. The apply_preallocation
signature changes (adds op_name) but it's a private
function within the VMM binary — no external callers.
Open questions¶
-
Should non-raw post-pass match qemu's data-region semantics? v1 doesn't. Documented divergence. Recommendation: add the per-format data-region walk in a follow-up phase once user demand is clear; for now the host-side post-pass is "preallocate the appended region of the file", which is a tiny fraction of what qemu does. Queue under master-plan Future work.
-
Should
--preallocation=metadatafor non-raw + grow silently succeed when the planner rejects? No — the user explicitly asked for a mode the planner doesn't support; surfacing the rejection is the right behaviour. -
falloc+ shrink rejection wording. qemu silently accepts; we reject. Recommendation: reject with a clear message ("preallocation is meaningless when shrinking") so the user notices they've passed conflicting flags. Document the divergence. -
Order of
apply_preallocationvsset_lenfor non-raw.apply_preallocationfirst (operates onresult.file_size_afterbytes the planner already wrote), thenset_lento commit the exact EOF. The other order would haveset_lenextend the file with sparse zeros, thenapply_preallocationwould materialise them — also correct, but adds latency between the guest's writes and the post-pass. Recommendation: post-pass first, thenset_len. The planner'sfile_size_afteris already the target EOF; the post-pass just guarantees the bytes are physically allocated. -
fsyncordering.sync_allafter both post-pass andset_lenso the kernel commits everything atomically. Matches the existingrun_create_raw's flow. -
apply_preallocationerror mapping. Currentlyapply_preallocationreturnsBox<dyn Error>. The resize call sites can propagate directly. The host's stderr message is whatever the helper produced; we don't wrap. -
metadatamode for raw. qemu rejects this combination with"qemu-img: Preallocation can only be set for image creation"— wait, that's create-specific. For resize, qemu accepts--preallocation=metadataon raw but it's a no-op (raw has no metadata). Recommendation: match qemu's accept-but-noop behaviour for resize + raw + metadata? Or reject explicitly? Phase 9 takes the explicit-reject path for clarity. -
The
fill_zeros_innerinjection. If we don't want to add the closure-based injection just for tests, an alternative is a conditional-compilation flag (#[cfg(test)]) that swaps the implementation. Less ergonomic but smaller change. Recommendation: closure injection — it doesn't add runtime overhead (the closure is a function pointer) and the test ergonomics are much better.
Execution¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 9a | medium | sonnet | none | Make apply_preallocation op-agnostic. In src/vmm/src/main.rs, change its signature to add an op_name: &str argument that gets used in error messages (replacing the hard-coded "create:" prefix). Refactor fill_zeros to expose a fill_zeros_inner(fd, offset, length, fallocate_fn) taking a closure; the public fill_zeros calls it with the real libc::fallocate. Update the two existing apply_preallocation call sites in run_create_nonraw to pass "create". Add unit tests for fill_zeros_inner covering (a) the fast path (closure returns success), (b) the EOPNOTSUPP fallback (closure returns Err(EOPNOTSUPP) → write-loop fires + verifies bytes are zero on a tempfile), (c) the unrelated-error path (closure returns Err(EIO) → bubbles up). make instar, make lint, make test-rust, pre-commit run --all-files clean. |
| 9b | medium | sonnet | none | Wire the resize post-pass. In run_resize_raw (in src/vmm/src/main.rs), replace the existing falloc/full/metadata rejection with: reject metadata for raw with the message documented above; reject falloc/full + shrink as meaningless; otherwise call apply_preallocation(&file, "resize", mode, current_virtual_size, added) after set_len and before sync_all, where added = new_virtual_size.saturating_sub(current_virtual_size). In run_resize_nonraw, replace the falloc/full rejection: open the output file with read+write (not write-only as it is today), call apply_preallocation(&file, "resize", mode, result.file_size_before, result.file_size_after - result.file_size_before) after the guest finishes and before the final set_len(result.file_size_after). Reject falloc/full + shrink for non-raw similarly. Verify make instar builds. Add a smoke test that constructs a ResizeArgs with each preallocation mode and runs Cli::try_parse_from (no actual KVM launch — just verifies clap still accepts every mode). |
Out of scope for phase 9¶
- Per-format data-region preallocation walks for sparse
formats (qemu's actual behaviour for
falloc/fullon qcow2 / vhdx / vmdk / vhd-dynamic). Queued under master- plan Future work. - Lifting the planner's
PreallocationUnsupportedrejection for qcow2metadatamode (master-plan Future work). - Cross-version baselines (phase 10).
- End-to-end integration tests against real images (phase 11).
- Differential fuzzing (phase 12).
- Documentation, CHANGELOG, follow-ups (phase 13).
Success criteria for phase 9¶
cargo build -p instarclean.make instarbuilds.make lint,pre-commit run --all-filesclean.make test-rustpasses; newfill_zeros_innertests succeed.instar resize -f raw --preallocation=falloc foo.raw +1Gsucceeds end-to-end (verified manually if/dev/kvmis available; otherwise verified by phase 11).--preallocation=falloc+ shrink rejected with the documented message.--preallocation=metadatafor raw rejected with the documented message.apply_preallocation's"create:"prefix is replaced with the caller'sop_name; bothrun_create_nonrawand the resize call sites pass the right name.
Sub-agent guidance¶
Read these files before starting any step:
src/vmm/src/main.rs:8432-8549(fill_zeros+apply_preallocation).src/vmm/src/main.rs:7191-7231(therun_create_nonrawcall site that currently passes"create"-style error prefixes through the existing helper).src/vmm/src/main.rs::run_resize_raw(the function whose rejection branch phase 9 replaces).src/vmm/src/main.rs::run_resize_nonraw(same, for non-raw).src/vmm/src/main.rs::ResizeArgs(the preallocation field is already there from 8a; nothing to add).
The management session review checklist is the same as prior phases.