Instar and qemu-img security audit¶
Prompt¶
Before responding to questions or discussion points in this document, explore the instar codebase thoroughly. Read relevant source files, understand existing patterns (project structure, command-line argument handling, input source abstractions, output formatting, error handling), and ground your answers in what the code actually does today. Do not speculate about the codebase when you could read it instead. Where a question touches on external concepts (disk image format specs, KVM isolation, compression algorithms, CVE databases), research as needed to give a confident answer. Flag any uncertainty explicitly rather than guessing.
Situation¶
We have been working on instar, a security focused replacement for
qemu-img for a while now. Our focus has been on basic functionality
such as implementing the info, check, and convert subcommands.
However, we must not forget that our primary drivers are to provide
a command line compatible replacement for qemu-img which resolves
the security issues that qemu-img has. This planning document is
an attempt to ensure we keep that focus as we get closer to releasing
instar.
Mission and problem statement¶
Your goal in this plan is to act as an adversarial pentester. You
are authorised to run as many research agents as required in order
to achieve your goal -- which is to find exploitable vulnerabilities
in instar. Those vulnerabilities should be tracked in this document.
It is likely that referring to qemu-img or the greater qemu code
base might help you find flaws, and any flaws you find in those tools
should be tracked in this document as well.
Existing test image corpus¶
The shakenfist/instar-testdata repository contains test images
used by the integration test suite, defined in tests/manifest.json
(currently 44 images). This corpus is a valuable asset for seeding
fuzzing and validating security properties, but it has significant
gaps that this audit should fill.
What we already have¶
- Real-world images: CirrOS, Debian (multiple architectures), Shaken Fist production images, QEMU iotest images, Disk2VHD.
- Edge cases: Min/max cluster sizes (512B, 2MB), refcount widths (1-bit, 64-bit), extended L2, ZSTD compression, lazy refcounts, dirty/corrupt feature bits.
- Backing chains: Two-layer and three-layer chains, cross- format chains (QCOW2 on VMDK).
- Raw format variants: MBR, GPT, no-partition-table, sparse, truncated, corrupted, misleading headers, minimal 1-byte.
- Malicious/CVE images: Backing file path traversal (CVE-2015-5163), external data file (CVE-2024-32498), VMDK path traversal, unknown incompatible feature bits.
- Malformed images: AFL-discovered VHD/VMDK, corrupt format headers (VMDK version, VMDK descriptor, VHDX region, VHD disk type), overlapping clusters, refcount errors, leaked clusters.
- Format detection: VDI, QED, ISO, LUKS v1/v2 (detection only, not full parsing).
Gaps to fill during this audit¶
The following categories of adversarial images are not yet in the corpus and should be created during Phase 2:
- Compression bombs: QCOW2 with extreme expansion ratio compressed clusters. VMDK with DEFLATE grains that decompress to excessive sizes.
- Circular backing chains: A->B->A and A->B->C->A loops.
- Deep backing chains: Chains exceeding the 16-device limit.
- Integer overflow triggers: QCOW2 with L1 table size near
u32::MAX, cluster_bits at boundaries, refcount_order yielding extreme widths. - Polyglot/format confusion: Files valid as two formats simultaneously. QCOW2 magic with VMDK body.
- Truncated structured formats: QCOW2 with header cut short mid-field. VMDK truncated after magic. VHD with footer at wrong offset. VHDX with partial metadata region.
- Oversized field values: Virtual size claiming petabytes. L1/refcount table offsets beyond EOF. Snapshot counts in the millions.
- VMDK-specific: Zero grain size. Extremely large grain size. Multi-extent descriptors (should be rejected). Descriptor with embedded newlines/control characters.
- VHD/VHDX-specific: BAT entries pointing beyond EOF. Block size of zero. VHDX with conflicting dual headers (different sequence numbers, both with valid CRC).
All new test images should be added to instar-testdata/custom/
under an appropriate subdirectory (e.g. custom/audit/) and
registered in tests/manifest.json with safety: "malicious" or
safety: "malformed" as appropriate.
Open questions¶
- How do we find security flaws in
instarandqemu-img? What techniques apart from reading the code are we going to use?
Answer: We should use a layered approach combining multiple techniques, each of which catches different classes of bugs:
- Differential fuzzing (random walks) -- finds behavioral divergences and crashes via randomized operation sequences.
- Adversarial image crafting -- hand-crafted malformed images targeting specific vulnerability classes (integer overflows, compression bombs, backing chain attacks, format confusion).
- CVE reproduction -- confirm that every known qemu-img CVE is mitigated by instar's architecture.
- Static analysis and unsafe code review -- audit all
unsafeblocks, review integer arithmetic for overflow potential, and runcargo clippy/cargo auditfor known issues. - Boundary/interface auditing -- specifically audit the VMM (host-side) code since bugs there bypass the KVM sandbox entirely.
- Coverage-guided fuzzing -- use
cargo-fuzz(libFuzzer) or AFL on theno_stdparser crates directly to find crashes in format parsing code without needing the full VMM.
Progress¶
| Phase | Status | Branch | Notes |
|---|---|---|---|
| 1a. Unsafe code audit | Done | static-analysis |
All unsafe blocks classified (0 unsound, 1 fragile in VMM, ~5 fragile in guest FFI boundary). SAFETY comments added to VMM. |
| 1b. Integer arithmetic review | Done | static-analysis |
VHDX BAT as u32 truncation found and fixed (u32::try_from().ok()?). All checked arithmetic patterns verified. |
| 1c. Static analysis tooling | Done | static-analysis |
Nightly clippy clean (12 lints fixed), cargo audit 0 vulns, truncating cast table added to docs/security-audits.md. |
| 2. Adversarial image crafting | Done | adversarial-testing |
61 adversarial images across 12 categories. Scripts in instar-testdata/scripts/. |
| 3. Differential fuzzing | Done | differential-fuzzing |
1070-line fuzzer with libyal cross-validation. CI workflow: 100/200/1000 iterations (PR/merge/nightly). Auto-files issues on divergences. |
| 4. CVE reproduction | Done | cve-reproduction |
6 CVEs verified (19 tests, 7 reproducer images, 0 bypasses). All mitigated by existing architecture. |
| 5. VMM boundary audit | Done | (main) | 8 bugs fixed: sector bounds checking (2 High), BackingStore overflow/capacity (2 Medium), IO buffer cap (Medium), sandboxed info exit handling (Medium), DebugBuffer OOM (Moderate), SerialDecoder cap (Low). Plus desc_idx validation. |
| 6. Coverage-guided fuzzing | In progress | audit |
Harness infrastructure, fuzz targets, corpus seeding, CI smoke tests merged. Detailed plan: PLAN-coverage-fuzzing.md |
Commits on static-analysis:
1. 9f40bdd Fix VHDX BAT calculation integer overflow
2. 403e3fc Add SAFETY comments to VMM unsafe blocks
3. 0433bb9 Add Phase 1 security audit results
4. c3205bc Update docs for static analysis audit
5. c1e6519 Fix nightly clippy lints, add truncating cast audit
Bugs found: 9 total (all fixed). - Phase 1b: VHDX BAT integer overflow (medium severity) - Phase 5: No sector bounds check in do_read/do_write (high) - Phase 5: Integer overflow in sector*sector_size (high) - Phase 5: Integer overflow in offset+buf.len() in BackingStore (medium) - Phase 5: No capacity enforcement on BackingStore writes (medium) - Phase 5: Unbounded IO buffer from guest data_desc.len (medium) - Phase 5: run_sandboxed_info silently ignores unknown exits (medium) - Phase 5: DebugBuffer (COM2) unbounded String growth (moderate) - Phase 5: SerialDecoder buffer no size cap (low)
Execution¶
Phase 1: Static analysis and code review¶
Before running anything, audit the code for structural weaknesses.
1a. Unsafe code audit¶
Enumerate every unsafe block in the guest code and VMM. For each
one, document:
* What invariant it relies on.
* Whether that invariant is enforced or merely assumed.
* Whether a malicious image could violate the invariant.
Pay special attention to:
* Raw pointer dereferences in the call table and config areas.
* The CallTable magic validation (0x494D4147) -- is this
sufficient? Can a malformed image influence the config area?
* Static mutable state accessed via unsafe cells.
* Memory layout assumptions (e.g. .bss not overlapping config at
0x80000).
1b. Integer arithmetic review¶
Search for integer arithmetic in header parsing that could overflow: * L1/L2 table size calculations in QCOW2 (CVE-2014-0223 analog). * Cluster size shifts and multiplications. * Refcount table offset computations. * VMDK grain directory/table size calculations. * VHD/VHDX BAT size and offset calculations.
For each case, verify that overflow is either impossible (types are wide enough) or explicitly checked.
1c. Static analysis tooling¶
- Run
cargo clippywith all warnings on the full workspace. - Run
cargo auditto check for known vulnerable dependencies. - Grep for
as u32,as u16,as usizetruncating casts that could lose bits.
Phase 2: Adversarial image crafting¶
Generate hand-crafted malicious images targeting specific attack
classes. For each class, create test images and verify instar either
rejects them or handles them safely. All new images should be added
to shakenfist/instar-testdata/custom/audit/ and registered in
tests/manifest.json (see "Gaps to fill" in the test corpus
section above for the full list of missing categories).
2a. Header corruption attacks¶
- QCOW2 with
cluster_bitsoutside valid range (< 9 or > 21). - QCOW2 with
l1_sizethat would cause integer overflow when multiplied by entry size. - QCOW2 with
refcount_orderyielding extreme refcount widths. - QCOW2 v3 with unknown incompatible feature bits set.
- VMDK with grain size of 0 or extremely large values.
- VHD with footer checksum mismatch.
- VHDX with invalid CRC-32C in headers.
- Files with valid magic bytes but truncated headers.
2b. Compression bomb attacks¶
- QCOW2 with a compressed cluster that expands to a very large size relative to the compressed input.
- Verify decompression buffers are bounded (current limit is 2 sectors / 128KB -- is this enforced for all cluster sizes?).
- VMDK with DEFLATE-compressed grains that expand excessively.
2c. Backing chain attacks¶
- Image with backing file path containing
../traversal. - Image with backing file path pointing to
/etc/passwdor similar sensitive file. - Image with circular backing chain (A -> B -> A).
- Image with deeply nested backing chain (> 16 levels).
- Image with backing file that claims a different format than it actually is.
- Verify the host-side path allowlist rejects all of these before they reach the guest.
2d. Format confusion attacks¶
- A file with QCOW2 magic but VMDK structure after the header.
- A file that is valid as multiple formats simultaneously (polyglot).
- A raw file with no partition table (should be rejected by
default, accepted only with
--unsafe-quirks). - A file with format magic at unusual offsets.
2e. Oversized field attacks¶
- QCOW2 claiming virtual size > physical file size by orders of magnitude.
- Snapshot count fields with extreme values.
- L1 table offset pointing beyond end of file.
- Refcount table offset pointing beyond end of file.
- VMDK descriptor claiming more extents than exist.
Phase 3: Differential fuzzing (random walks)¶
This is the original plan, refined with specifics.
- Pick a random seed (so a given chain can be recreated later for debugging). Log this seed to a per-run log file.
- Randomly select a format and various attributes for it and generate an input file. Log the format and attributes for the run. Attributes to vary:
- Format: qcow2, raw, vmdk, vpc, vhdx.
- Virtual size: powers of 2 from 1MB to 8GB.
- Cluster size (qcow2): 512, 4K, 64K, 256K, 2M.
- Compression: on/off.
- Backing files: 0, 1, or 2 levels deep.
- Data patterns: zeros, random, structured (partition tables).
- Randomly select a chain of operations to perform, and execute them
against both
qemu-imgandinstar(on separate copies of the image of course) and compare output at each stage. Log each operation performed for debugging purposes. If the output differs, exit with a descriptive message describing how to achieve the difference. Operationsinstarhas already tracked as unsafe quirks in docs/ should be avoided as they will by definition result in differences. - Operations to chain:
info(compare JSON output).check(compare exit codes and error detection).convertto each supported output format.compareof original vs converted (should be identical).convertwith-c(compressed output).- Multi-step: convert A->B->C, verify content preserved.
- Run for a configurable number of iterations (suggest 1000+ per session) with timeout per operation (30 seconds).
Phase 4: CVE reproduction¶
For each known qemu-img CVE, verify that instar is not vulnerable. Where possible, obtain or craft a reproducer image and confirm instar handles it safely.
Priority CVEs to test: * CVE-2024-32498 -- external data file QCOW2 feature used to read host files. Instar rejects this feature bit, verify it. * CVE-2015-5163 -- backing file path traversal. Verify host-side allowlist blocks this. * CVE-2022-47951 -- VMDK descriptor used to access host files. Verify VMDK descriptor parsing doesn't follow file paths. * CVE-2015-5162 -- resource exhaustion via compressed images. Verify decompression bounds. * CVE-2014-0223 -- integer overflow in QCOW1 L1 table size. Instar doesn't support QCOW1, but verify analogous QCOW2 paths. * CVE-2024-3567 -- qemu-img info DoS via crafted input. Verify instar doesn't hang or consume excessive resources.
Phase 5: VMM boundary audit¶
The KVM sandbox is instar's primary security boundary. Bugs in the VMM (host-side) code bypass this entirely. Audit:
- Virtio-block emulation -- Can a malicious guest craft virtio requests that cause the VMM to read/write outside the image file? Check bounds validation on sector offsets and lengths.
- Serial protocol handling -- Can malformed protobuf messages from the guest cause the VMM to crash or misbehave? Test with truncated, oversized, and malformed serial messages.
- Memory mapping -- Can the guest access VMM memory through the KVM memory map? Verify no host memory is mapped into the guest address space beyond what's intended.
- Device MMIO handling -- Are all MMIO register accesses bounds-checked? Can the guest trigger out-of-bounds access in the MMIO handlers?
- Signal handling -- Can the guest cause the VMM to enter an unexpected state via signals or KVM exits?
Phase 6: Coverage-guided fuzzing of parser crates¶
This phase has been expanded into a detailed standalone plan. See PLAN-coverage-fuzzing.md for the full plan covering harness architecture, fuzz targets, corpus seeding, CI workflow, and success criteria.
Summary: Build cargo-fuzz (libFuzzer) harnesses for each
no_std parser crate (qcow2, vmdk, vhd, vhdx, raw, luks) using
a mock CallTable that reads from fuzzer input. Seed with the
existing test image corpus (including Phase 2 adversarial images).
Run nightly via CI with automatic issue filing for crashes. Target
24+ hours cumulative with no unresolved crashes.
Vulnerability tracking¶
Discovered vulnerabilities are not tracked in this file. This plan is the audit methodology; findings go elsewhere so that CI automation can create and update them without merge conflicts, and so that they have proper workflow states.
Two-stage approach¶
- While the repo is private: Use GitHub Issues with the
security-auditlabel. All issues are already invisible to the public since the repo is private. - Once the repo is public: Enable GitHub Security Advisories
(Settings > Code security -- this option only appears for
public repos). Migrate any open
security-auditissues into GHSA drafts and close the issues. GHSA drafts remain private until explicitly published, so unresolved findings never become visible to attackers.
CI fuzzing jobs should create issues automatically via
gh issue create --label security-audit when they find
failures. Each issue should include the seed, format, operation
chain, and enough detail to reproduce the finding.
Public audit log¶
At the conclusion of each audit (or audit phase), update a public
document at docs/security-audits.md recording:
- What was audited -- which components and phases were covered.
- Techniques used -- from the methodology list above.
- Duration and scale -- how long fuzzing ran, how many differential iterations were executed, etc.
- Resolved findings -- number of bugs found and fixed, with links to commits. Enough detail for users to understand the scope of what was tested.
- Test images added -- how many new adversarial images were added to the corpus as a result.
- Standing security properties -- a summary of the architectural properties (KVM isolation, RAW hardening, backing chain allowlist, feature bit enforcement, bounded decompression, Rust memory safety) that are verified during every audit.
The document should not include details of unresolved vulnerabilities -- those remain tracked privately in GitHub Security Advisories (GHSA) until fixed. The goal is to give users confidence that instar takes security seriously, without creating a roadmap for attackers.
The document should also include instructions for reporting
vulnerabilities via GitHub Security Advisories and link to the
existing CVE analysis in docs/security.md.
Link the new document from docs/index.md in the "Platform
Analysis" table alongside the existing security.md entry.
Administration and logistics¶
Infrastructure¶
The audit phases have different compute requirements:
- Phases 1, 2, 4, 5 (code review, image crafting, CVE reproduction, VMM boundary audit) are interactive work done in Claude Code sessions. No dedicated infrastructure needed.
- Phase 3 (differential fuzzing) and Phase 6 (coverage- guided fuzzing) are long-running jobs that need dedicated machines.
The long-running phases will run on VM-based CI runners (not bare metal) so that a runaway fuzzing job cannot damage the host. The runners need nested virtualisation enabled for KVM access:
- Differential fuzzing (Phase 3): Runs on a
[self-hosted, debian-12, xl]runner (or a dedicatedfuzzinglabel if isolation from the regular CI queue is desired). Requires KVM (for running instar) and qemu-img (for comparison). Runs 1000+ iterations per session with a 30-second timeout per operation. - Coverage-guided fuzzing (Phase 6): Runs on a standard
VM runner. Runs
cargo-fuzzagainst parser crates directly -- no KVM needed. Intended to run for extended periods (24+ hours cumulative across all parser crates).
These should be implemented as workflow_dispatch CI workflows
so they can be triggered on demand and also scheduled nightly
once stable. Each workflow should:
- Accept configurable parameters (iteration count, seed, fuzz duration).
- Log results to artifacts.
- Automatically file
security-auditissues viagh issue createwhen failures are found.
Success criteria¶
We will know when this plan has been successfully implemented because the following statements will be true:
- All six phases have been executed.
- All known qemu-img CVEs have been verified as mitigated.
- Coverage-guided fuzzing has run for a minimum duration (suggest 24+ hours cumulative across all parser crates) with no new crashes.
- Differential fuzzing has run 1000+ iterations with no unexplained divergences.
- All
unsafeblocks have been documented and justified. - All discovered bugs are tracked as GitHub Issues (with the
security-auditlabel) or GitHub Security Advisories, with sufficient detail to reproduce each finding.
Future work¶
We should list obvious extensions, known issues, unrelated bugs we encountered, and anything else we should one day do but have chosen to defer to here so that we don't forget them.
- Continuous fuzzing infrastructure -- Set up OSS-Fuzz or similar to continuously fuzz instar parser crates.
- Property-based testing -- Use
proptestorarbitrarycrates for structured random input generation within the Rust test suite. - Formal verification -- For critical arithmetic paths (cluster lookup, refcount computation), consider using tools like Kani or Prusti to prove absence of overflow.
- QEMU iotests integration -- QEMU has an extensive iotest suite (tests/qemu-iotests/) that exercises edge cases. Running a subset against instar would increase confidence.
- Multi-device backing chain fuzzing -- Phase 6 fuzzes single-device parsing only. Simulating backing chains (split fuzz input across multiple virtual devices) would cover chain resolution logic and cross-format backing file handling.
- Sanitizer variants -- Run coverage-guided fuzzing with AddressSanitizer (ASan), MemorySanitizer (MSan), and UndefinedBehaviorSanitizer (UBSan) to catch memory safety and undefined behaviour issues that don't manifest as crashes under the default libFuzzer configuration.
- Automated fuzzer bug fixes -- Scheduled CI job that picks
up
security-auditissues filed by fuzzers and uses Claude Code to propose fixes as PRs. Detailed plan: PLAN-fuzz-autofix.md.
Bugs fixed during this work¶
Bugs found and fixed during audit execution are tracked as
closed GitHub Issues with the security-audit label. The
docs/security-audits.md public log summarises resolved
findings with links to commits.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.