Reading Order¶
In which order should you read the instar source code?
Lions organised his UNIX commentary around the logical flow of the system rather than alphabetical file order. We do the same here: the reading order follows a request from the moment it enters the host-side binary until a result comes back out. By the end, you will have read every major module and understood how they connect.
Prerequisites: Read docs/technology-primer.md first. It covers KVM,
virtio, page tables, protection rings, and bare-metal execution -- all
background knowledge assumed below.
Phase 1: The Host Side (VMM)¶
All files in this phase are under src/vmm/src/.
Step 1: main.rs -- The entry point¶
What you will find: The clap argument parser that defines every
subcommand (info, copy, check, compare, convert). The main()
function that dispatches to the appropriate handler. The entire KVM
lifecycle: creating a VM, allocating guest memory, setting up page tables
and a GDT, loading the guest binaries, configuring the vCPU, and running
the main vCPU loop.
What to pay attention to:
-
The constants at the top of the file define the guest memory layout. These are duplicated from
shared/src/lib.rsbecause the VMM is a normalstdRust binary whilesharedisno_std. The comment "must match shared crate" appears frequently -- this is a deliberate design tension. The shared crate is the source of truth; the VMM duplicates values for practical reasons. -
The
setup_long_mode()function is the most unusual code in the VMM. It configures the vCPU to start directly in 64-bit long mode, bypassing real mode and protected mode entirely. It writes page tables and a GDT into guest memory, sets CR0/CR3/CR4/EFER, and configures segment registers. This is what eliminates the need for a BIOS or bootloader. -
The vCPU run loop (
loop { match vcpu.run() { ... } }) handles VM exits. The exit reasons you will see are:IoOut(serial port writes from the guest),MmioWrite/MmioRead(virtio MMIO from the guest),Hlt(guest halted -- operation complete), andShutdown(triple fault or error). Each exit type dispatches to a handler. -
The function
write_operation_config()serialises an operation-specific config struct (e.g.InfoConfig,ConvertConfig) directly into guest memory atOPERATION_CONFIG_ADDR. This is how the host passes parameters to the guest without any syscall or IPC mechanism.
Read this file first because it is the frame that holds everything else together. It is also the largest single file in the codebase (~1800 lines), so budget time accordingly.
Step 2: virtio/block.rs, virtio/mmio.rs -- Device emulation¶
What you will find: The implementation of virtio-block devices using
the virtio-queue crate. The VirtioBlockDevice struct that manages
MMIO registers, virtqueue state, and request processing. The MMIO
read/write handlers that the vCPU run loop dispatches to on
MmioRead/MmioWrite exits.
What to pay attention to:
-
The MMIO register layout follows the virtio 1.0 specification. Each device occupies a 4KB region starting at
MMIO_BASE. Register offsets like0x000(MagicValue),0x070(QueueNotify),0x100+(device-specific config) are defined by the spec. -
When the guest writes to the QueueNotify register, the VMM processes all pending descriptors in the virtqueue. Each descriptor chain represents a virtio-blk request: a header (sector number + operation type), a data buffer, and a status byte.
-
The virtio-block layer translates virtqueue requests into host file I/O. This is the security boundary in action: the VMM reads/writes raw bytes from files. It never interprets those bytes as image format data.
-
The
validate_io_request()method checks every I/O request before it reaches the backing store: sector must be within device capacity, the offset calculation useschecked_multo prevent integer overflow, and the buffer size is capped atMAX_IO_BUFFER_SIZE(1 MB) to prevent OOM from a malicious guest descriptor. Theread_descriptor()helper validates descriptor indices against the queue size.
Step 3: io_thread.rs -- Threaded I/O¶
What you will find: The I/O thread that handles virtio-block
requests off the main vCPU thread when ioeventfd is enabled. The
IoDevice struct that wraps a file handle with sector-aligned I/O.
What to pay attention to:
-
With ioeventfd, the guest's doorbell write does not cause a VM exit. Instead, KVM signals an eventfd. The I/O thread waits on that eventfd and processes requests asynchronously while the vCPU continues running. This is the key performance optimisation.
-
DeviceRoledistinguishes input (read-only) from output (writable) devices. This is enforced at the I/O thread level, not just in the guest. A compromised guest that tries to write to an input device will have the request rejected by the host.
Step 4: chain.rs -- Backing chain discovery¶
What you will find: The host-side logic for discovering and
validating QCOW2 backing file chains. The BackingChain struct that
holds the ordered list of images. Path validation against a security
allowlist. Circular reference and depth limit checks.
What to pay attention to:
-
Chain discovery runs on the host side, iteratively. The VMM runs
instar infoon each image in the chain (launching a fresh KVM guest each time) to extract the backing file path. This is critical for security: the host validates paths before opening files, so a malicious image cannot trick the guest into reading arbitrary host files. -
The
validate_backing_path()function checks paths against an allowlist. Without this, a crafted QCOW2 backing file path like../../../../etc/shadowcould be used for directory traversal.
Step 5: config.rs, error.rs, backing.rs, stats.rs¶
What you will find: Supporting modules for configuration file parsing, error types, backing store abstractions, and performance statistics.
What to pay attention to in backing.rs: The BackingStore enforces
device capacity on every write via checked_add overflow protection and
an explicit end > capacity check. This is a defence-in-depth layer
behind the virtio-block validation -- even if the virtio layer were
bypassed, the backing store would reject writes beyond the image
boundary.
Read the others lightly. They exist to support the main flow and contain few surprising design decisions.
Phase 2: The ABI Contract (Shared)¶
The single file src/shared/src/lib.rs is the most important file in
the codebase after vmm/main.rs.
Step 6: shared/src/lib.rs -- The contract¶
What you will find: Every type, constant, and address that both
the host-side VMM and the guest-side operations must agree on. The
CallTable struct (the guest's "syscall table"). All operation config
structs (InfoConfig, CopyConfig, CheckConfig, CompareConfig,
ConvertConfig). All result structs. Memory layout constants with
compile-time overlap assertions.
What to pay attention to:
-
The
CallTableis the central abstraction. It is a C-repr struct of function pointers that the core binary writes to a fixed memory address (CALL_TABLE_ADDR = 0x80000). Operations read this struct to call back into the core for I/O. This is the guest's equivalent of a syscall table, except it is just a struct in memory -- no privilege transition needed because the guest has no privilege levels. -
Every struct is
#[repr(C)]to ensure a stable ABI between separately compiled binaries. The core and operations are compiled independently and loaded at different addresses. They share no Rust-level linking; their only connection is the binary layout of these structs. -
The memory map constants (
CALL_TABLE_ADDR,OPERATION_CONFIG_ADDR,SCRATCH_MEM_BASE,STACK_BASE, etc.) define the guest's physical address space. Theconst _: () = assert!(...)blocks are compile-time checks that regions do not overlap. This is a pattern worth noting: memory layout bugs in bare-metal code are catastrophic and hard to debug, so catching them at compile time is valuable. -
The
cached_read!macro generates sector-cached read functions. Every format crate needs to read typed values from byte offsets within a virtio device. Reading a full sector for each 4-byte value would be wasteful, so the macro generates functions that cache the most recently read sector. This pattern appears in every format crate. -
The
bump_allocator!macro provides heap allocation for operations that needalloc(ZSTD decompression, deflate compression). It is backed by a fixed address range in scratch memory, never frees, and can be reset to zero between logical operations. This is appropriate because the guest runs one operation and halts -- there is no long- running process that needs a general-purpose allocator.
Step 6a: shared/src/format_detection.rs -- Magic number matching¶
What you will find: The detect_format_from_header() function that
identifies image formats by their magic numbers. Also detect_vhd_footer()
(VHD stores its magic at the end of the file) and ISO detection at a
non-zero offset.
Why this is in shared: Both the guest info operation and the host-side
chain discovery need format detection. Putting it in shared avoids
duplication.
Step 6b: shared/src/bitmap.rs -- Overlap detection¶
What you will find: A 1-bit-per-unit bitmap used by the check operation to detect cluster/grain/block overlaps. When two L2 entries point to the same host cluster, this bitmap catches it.
Phase 3: The Guest Side (Core)¶
Step 7: core/src/main.rs -- Guest boot¶
What you will find: The _start() entry point that runs when the
vCPU begins executing. Device initialisation (creating VirtioBlock
instances by probing MMIO addresses). The setup_call_table() function
that writes function pointers to CALL_TABLE_ADDR. The call_operation()
function that jumps to the operation binary at OPERATION_LOAD_ADDR.
What to pay attention to:
-
The
SingleThreadCellwrapper is the guest's solution to Rust's requirement that statics beSync. Since the guest runs on exactly one vCPU with no threads, aSyncwrapper aroundUnsafeCellis sound. This is well-documented in the code with safety comments. -
The call table setup populates every function pointer. The
verbose_printpointer is conditionally set to eitherct_debug_printorct_silent_print(a no-op) based on the verbose flag in the operation config. This avoids serial I/O overhead when verbose output is not requested. -
After setting up the call table, the core calls
call_operation(), which transmutes theOPERATION_LOAD_ADDRinto a function pointer and calls it. This is how two separately compiled binaries (core + operation) cooperate at runtime. The operation binary's_startfunction is at the start of its loaded image. -
The
cstr_to_str()function converts null-terminated C strings to Rust&str. It validates UTF-8 and has a length limit to prevent unbounded reads on unterminated strings. This defensive coding is important because the function is called with pointers from operation binaries that may be processing untrusted data.
Step 7a: core/src/virtio.rs -- Guest-side virtio driver¶
What you will find: The VirtioBlock struct that implements the
guest side of the virtio protocol. MMIO register reads/writes, virtqueue
descriptor chain construction, and request submission.
What to pay attention to:
-
This is the mirror image of
vmm/src/virtio.rs. The VMM implements the device side; the core implements the driver side. Both must agree on the MMIO register layout, descriptor format, and request header structure. -
The DMA pool (
DMA_POOL_BASE) is used for virtio request headers, data buffers, and status bytes. These must be in guest-physical memory that both the guest CPU and the VMM can access.
Step 7b: core/src/serial.rs -- Guest-side serial protocol¶
What you will find: Functions for sending protobuf-encoded messages
over the serial port (send_init, send_progress, send_error,
send_complete, send_info_result, etc.). The read_config() function
that receives configuration from the VMM at startup.
What to pay attention to:
- The serial port is used for structured messages, not for general I/O.
All messages are framed Protocol Buffer messages. The guest writes bytes
to I/O port 0x3F8 (COM1) using x86 OUT instructions. The VMM receives
them via
IoOutVM exits.
Phase 4: An Operation (Info)¶
Now that you understand the infrastructure, read a complete operation to see how it all fits together.
Step 8: operations/info/src/main.rs -- Format detection¶
What you will find: The _start() entry point for the info
operation. It reads the call table from CALL_TABLE_ADDR, gets the
operation config, reads sectors from the input device via the call table,
detects the image format, extracts metadata (virtual size, cluster size,
backing file, etc.), and sends results back via send_info_result().
What to pay attention to:
-
The operation is
#![no_std]and#![no_main]. It has no standard library, no main function, and no runtime. It is a flat binary loaded at a fixed address. -
All I/O goes through the call table.
(call_table.read_input_sector)()reads a sector;(call_table.send_info_result)()sends results. The operation never touches hardware directly -- it calls functions that the core provided. -
Format detection is a cascade of magic number checks. QCOW2 magic
0x514649fbat offset 0, VMDK magic0x564d444bat offset 0, VHD magicconectixat the end of the file, etc. If no magic matches and unsafe quirks are disabled, the image must have a valid MBR or GPT partition table to be accepted as raw. -
For each detected format, the operation reads format-specific metadata. QCOW2 gets the full header parsed (version, cluster_bits, L1 table, incompatible features, header extensions for backing format and external data file). VMDK gets descriptor parsing. VHD/VHDX get footer/header parsing.
Phase 5: A Format Crate (QCOW2)¶
Step 9: crates/qcow2/src/lib.rs -- QCOW2 parsing¶
What you will find: Header constant definitions. The read_cluster()
function that does L1 → L2 → data lookup. Extended L2 subcluster support.
Compressed cluster decompression (zlib and ZSTD behind feature flags).
Refcount table reading. Backing chain walking via read_virtual_offset().
What to pay attention to:
-
This crate is used by every operation that touches QCOW2 data (info, check, compare, convert). It is the canonical implementation -- operations do not maintain their own QCOW2 parsing code.
-
The L1/L2 lookup is the heart of QCOW2. A virtual offset is split into an L1 index, an L2 index, and an intra-cluster offset. The L1 table maps to L2 tables; each L2 entry maps to a host cluster. Extended L2 entries are 16 bytes instead of 8, with subcluster allocation bitmaps.
-
Compressed clusters require special handling. The cluster's host offset and compressed size are packed into the L2 entry differently from standard clusters. The compressed data may span sector boundaries, so it is read into
COMPRESSED_BUF_SIZE(cluster size + one sector). -
The
read_virtual_offset()function implements backing chain flattening. If a cluster is unallocated in the top image, it walks down the chain to find it in a backing image. Each backing image may be a different format (QCOW2, raw, VMDK, VHD), so the function dispatches to the appropriate reader based on theChainConfigmetadata. -
Checked arithmetic (
checked_mul,checked_add) is used throughout for calculations involving untrusted header values. An integer overflow in an L1 table size calculation could cause an out-of-bounds read.
Phase 6: The Convert Operation¶
Step 10: operations/convert/src/main.rs -- Full pipeline¶
What you will find: The most complex operation. Reads virtual content
from any supported input format (with backing chain flattening and
decompression), writes to any supported output format (raw, QCOW2, VMDK,
VHD, VHDX). QCOW2 output involves writing headers, L2 tables, refcount
tables, and optionally compressed clusters. VMDK output uses a configurable
grain size (4KB-64KB via ConvertConfig.output_grain_size). VHD and VHDX
output use a configurable block size (via ConvertConfig.output_block_size)
with format-specific defaults and validation ranges.
What to pay attention to:
-
The
ScratchLayoutstruct computes buffer addresses at runtime based on the output cluster size. This is necessary because QCOW2 cluster sizes range from 512 bytes to 2MB, and the scratch memory region is shared among multiple buffers. Three conceptual buffers (header, L2 table, refcount block) share a single "multipurpose" buffer because they are used in non-overlapping phases. -
QCOW2 output uses "linear cluster allocation with OFLAG_COPIED" -- the simplest possible allocation strategy. Clusters are written sequentially, each L2 entry gets OFLAG_COPIED set (meaning refcount is exactly 1), and refcount metadata sizing uses iterative convergence because the refcount tables themselves consume clusters that need refcounting. The
--extended-l2flag switches from 8-byte to 16-byte L2 entries with subcluster allocation/zero bitmaps. This halves the entries per L2 table (requiring more L1 entries) and setsincompatible_featuresbit 4 in the output header. Written data clusters have their subcluster bitmaps computed by scanning each 2 KiB range for zeros (compute_subcluster_bitmap()); compressed clusters use a zero bitmap per the QCOW2 spec. The--luks-encrypt-passphraseflag enables LUKS-encrypted output (crypt_method=2). The VMM generates random key material and passes it to the guest at a dedicated memory region. The guest builds a LUKS v1 header (PBKDF2 + AFsplitter + AES-XTS key wrapping), writes it to clusters 1-K, encrypts each data cluster with AES-XTS, and stores an EXT_ENCRYPT_HEADER extension pointer in the QCOW2 header. -
The
bump_allocator!()macro provides heap allocation for compression (miniz_oxide) and ZSTD decompression (ruzstd). The heap is reset (HEAP_POS.store(0, ...)) between clusters to prevent exhaustion.
Phase 7: The Test Infrastructure¶
Step 11: tests/ -- Integration tests¶
What you will find: Python tests using testtools and stestr. The
test base class in base.py. The manifest (manifest.json) that defines
test images with their expected properties. Tests compare instar output
against qemu-img output to verify drop-in compatibility.
What to pay attention to:
-
Tests are organised by operation:
test_info.py,test_check.py,test_compare.py,test_convert.py,test_security.py,test_adversarial.py,test_cve_reproduction.py. -
Safety levels control which tests run:
safe(default),caution(edge cases),malicious(CVE reproducers, opt-in only). -
The oslo.utils cross-validation tests (
test_oslo_crossval.py) verify that instar's format detection agrees with OpenStack'sformat_inspector. This catches drift between the two implementations.
Step 12: src/fuzz/ -- Coverage-guided fuzzing¶
What you will find: A cargo-fuzz (libFuzzer) project with 13 fuzz
targets that exercise the no_std parser crates directly, without the
VMM/KVM stack. The harness module (src/lib.rs) provides a mock
CallTable backed by thread-local fuzzer input. Fuzz targets are split
into buffer-based (header parsing) and CallTable-dependent (L1/L2 lookup,
refcount traversal, decompression, BAT/grain lookup).
What to pay attention to:
-
The mock
CallTableinsrc/fuzz/src/lib.rsreplaces real sector I/O with reads from the fuzzer input buffer. This is the key abstraction that decouples parsers from the VMM for fuzzing. -
Buffer-based targets (e.g.
fuzz_qcow2_header) callparse()methods directly with the fuzz input as&[u8]. No CallTable needed. -
CallTable-dependent targets (e.g.
fuzz_qcow2_l1l2) initialise parser state via the mock CallTable, then exercise lookup functions at both fixed and fuzz-derived offsets. -
The corpus seeding script (
scripts/extract-fuzz-corpus.py) extracts seed images from the separateinstar-testdatarepository, filtered by format. No adversarial images are stored in the main repo. -
The CI workflow (
coverage-fuzz.yml) runs nightly and on PRs that touch fuzz/parser code. Crashes are minimised and filed as GitHub Issues immediately.
Step 13: scripts/differential-fuzz.py -- Differential fuzzing¶
What you will find: A Python script that generates random valid images
and compares instar output against qemu-img (and optionally libyal tools).
This is complementary to coverage-guided fuzzing: differential fuzzing
explores valid image space, while coverage fuzzing explores malformed input
space.
Step 14: .github/workflows/fuzz-autofix.yml -- Automated bug fixes¶
What you will find: A CI workflow that closes the loop on fuzzer
findings. It picks up open security-audit issues (filed by the coverage
and differential fuzzers), invokes Claude Code to diagnose and fix the
crash, verifies the fix by rebuilding and running tests, and creates a PR.
What to pay attention to:
-
Two fix attempts per issue. The second attempt receives the diff and failure output from the first as additional context.
-
Complexity guardrails: 30-turn limit, max 3 source files changed, no cross-crate changes, no new dependencies. Issues that exceed these limits are labelled
autofix-complexfor human attention. -
The workflow uses
--body-filefor issue comments and PR bodies to avoid YAML escaping issues with markdown in shell heredocs.
Summary: The Complete Data Flow¶
To trace a single instar info image.qcow2 from start to finish:
- VMM
main()parses CLI args, opensimage.qcow2 - VMM creates a KVM VM, allocates 32MB guest memory
- VMM writes page tables and GDT into guest memory
- VMM loads
core.binat0x10000,info.binat0x20000 - VMM writes
InfoConfigto0x81000in guest memory - VMM creates virtio-block device backed by
image.qcow2 - VMM configures vCPU (long mode, stack, entry point) and runs it
- Core
_start()initialises virtio devices, sets up call table - Core jumps to
info.binat0x20000 - Info
_start()reads sector 0 via call table, detects QCOW2 magic - Info parses QCOW2 header, extracts virtual size, backing file, etc.
- Info calls
send_info_result()via call table - Core sends protobuf message over serial port (x86 OUT)
- VMM receives bytes via
IoOutVM exit, decodes protobuf - Info returns; core calls
send_complete(), thenHLT - VMM sees
Hltexit, formats output, prints to stdout