Direct Memory I/O for KVM Guests¶

This document describes how to implement direct memory-based data transfer for bare-metal KVM guests, where input data is mapped into guest memory before execution and output is read from a different region after VM exit.

Size Limitations¶

Warning: Direct memory I/O is NOT suitable for large files.

For disk images in the 10-100+ GB range, use virtio-block instead. Direct memory requires mapping the entire input/output into the guest's physical address space, which is impractical for files larger than ~1-2 GB.

Practical size limits:

Constraint	Limit	Impact
Guest physical address space	~1 TB (40-bit)	Hard limit on total mapped memory
Host memory allocation	System RAM	Cannot map more than available
Practical single buffer	~1-2 GB	Beyond this, use virtio-block

For large file processing, direct memory would require implementing a complex chunking protocol - at which point virtio-block is simpler and provides random access natively.

Overview¶

Direct memory I/O is the simplest approach for small data transfers. The VMM allocates memory regions, maps them into the guest's physical address space, and the guest reads/writes directly to these regions. No virtio protocol or complex device emulation is required.

Best for: Metadata, configuration, small files (< 1 GB)

Memory Region Setup¶

KVM uses memory slots (memslots) to map guest physical addresses to host virtual addresses. The key ioctl is KVM_SET_USER_MEMORY_REGION:

struct kvm_userspace_memory_region {
    __u32 slot;              // Slot ID (0-509, higher bits for address space)
    __u32 flags;             // KVM_MEM_LOG_DIRTY_PAGES, KVM_MEM_READONLY
    __u64 guest_phys_addr;   // Guest physical address
    __u64 memory_size;       // Size in bytes (page-aligned)
    __u64 userspace_addr;    // Host virtual address backing this memory
};

Key Constraints¶

Maximum memslots: ~512 (query via KVM_CAP_NR_MEMSLOTS)
Slots must not overlap in guest physical address space
Memory size must be page-aligned
For large page support, align lower 21 bits of guest and host addresses

Memory Region Flags¶

KVM_MEM_LOG_DIRTY_PAGES - Track modified pages (for migration)
KVM_MEM_READONLY - Writes trigger KVM_EXIT_MMIO instead of succeeding

Recommended Memory Layout¶

Guest Physical Address Space:

0x00000000 - 0x000FFFFF    Reserved (real mode structures if needed)
0x00100000 - 0x0FFFFFFF    Guest RAM (code, stack, heap)
0x10000000 - 0x100FFFFF    Input buffer (1MB example)
0x10100000 - 0x101FFFFF    Output buffer (1MB example)
0xF0000000 - 0xF0000FFF    Coalesced MMIO ring (if using)

Buffer Placement Best Practices¶

Place I/O buffers outside guest code/stack to avoid memslot overlap
Use read-only memslots for input data to catch accidental writes
Align buffers to 2MB boundaries for large page efficiency
Consider placing output buffer at higher addresses for clear separation

Guest Completion Signaling¶

The guest must signal when it has finished processing. Options:

HLT Instruction¶

The simplest approach - guest executes HLT, causing KVM_EXIT_HLT:

// Guest code
unsafe { asm!("hlt"); }

// VMM handling
match vcpu.run()? {
    VcpuExit::Hlt => {
        // Read output from guest memory
        let output = read_guest_memory(output_buffer_addr);
    }
}

Port I/O Signal¶

Guest writes to a specific port to signal completion with status:

// Guest code - signal success
unsafe {
    asm!("out dx, al", in("dx") 0x3f8u16, in("al") 0u8);
}

// VMM handling
VcpuExit::IoOut(port, data) if port == 0x3f8 => {
    let status = data[0];
    // Read output based on status
}

Memory-Mapped Status¶

Guest writes status to a known memory location:

// Guest code
unsafe {
    let status_ptr = 0x10200000 as *mut u32;
    *status_ptr = PROCESSING_COMPLETE;
}

VMM can poll this location or use KVM_EXIT_MMIO with readonly region.

Coalesced MMIO/PIO¶

For higher-performance scenarios, KVM supports coalesced I/O where multiple writes are batched into a ring buffer without causing VM exits.

Ring Buffer Structure¶

struct kvm_coalesced_mmio {
    __u64 phys_addr;      // Guest physical address of write
    __u32 len;            // Length (1, 2, 4, or 8 bytes)
    union {
        __u32 pad;
        __u32 pio;        // 1 if port I/O, 0 if MMIO
    };
    __u8 data[8];         // Data written
};

struct kvm_coalesced_mmio_ring {
    __u32 first, last;    // Ring indices
    struct kvm_coalesced_mmio coalesced_mmio[];
};

Registration¶

struct kvm_coalesced_mmio_zone zone = {
    .addr = 0xF0000000,
    .size = 0x1000,
};
ioctl(vm_fd, KVM_REGISTER_COALESCED_MMIO, &zone);

Accessing the Ring¶

The ring is accessible via mmap at KVM_COALESCED_MMIO_PAGE_OFFSET:

void *ring_page = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE,
                       MAP_SHARED, vm_fd,
                       KVM_COALESCED_MMIO_PAGE_OFFSET * PAGE_SIZE);
struct kvm_coalesced_mmio_ring *ring = ring_page;

Processing Pattern¶

while (ring->first != ring->last) {
    struct kvm_coalesced_mmio *entry = &ring->coalesced_mmio[ring->first];
    process_write(entry->phys_addr, entry->data, entry->len);
    ring->first = (ring->first + 1) % KVM_COALESCED_MMIO_MAX;
}

Synchronization¶

Memory Barriers¶

When sharing memory between host and guest:

// Before updating shared index
smp_wmb();  // Write memory barrier

// After reading shared data
smp_rmb();  // Read memory barrier

Cache Coherency¶

x86 maintains cache coherency automatically for normal memory
Use volatile access or compiler barriers to prevent optimization issues
For non-coherent DMA, explicit cache management may be needed

Implementation Example (Rust)¶

VMM Setup¶

// Allocate input and output buffers
let input_buffer = allocate_aligned(INPUT_SIZE, PAGE_SIZE);
let output_buffer = allocate_aligned(OUTPUT_SIZE, PAGE_SIZE);

// Copy input data
input_buffer.copy_from_slice(&input_data);

// Map input buffer (read-only for guest)
let input_region = kvm_userspace_memory_region {
    slot: 1,
    flags: KVM_MEM_READONLY,
    guest_phys_addr: INPUT_GPA,
    memory_size: INPUT_SIZE as u64,
    userspace_addr: input_buffer.as_ptr() as u64,
};
vm.set_user_memory_region(input_region)?;

// Map output buffer (read-write for guest)
let output_region = kvm_userspace_memory_region {
    slot: 2,
    flags: 0,
    guest_phys_addr: OUTPUT_GPA,
    memory_size: OUTPUT_SIZE as u64,
    userspace_addr: output_buffer.as_ptr() as u64,
};
vm.set_user_memory_region(output_region)?;

Guest Access¶

// Guest code - process data
let input = unsafe {
    slice::from_raw_parts(INPUT_GPA as *const u8, INPUT_SIZE)
};
let output = unsafe {
    slice::from_raw_parts_mut(OUTPUT_GPA as *mut u8, OUTPUT_SIZE)
};

// Process input, write to output
process(input, output);

// Signal completion
unsafe { asm!("hlt"); }

VMM Completion Handling¶

loop {
    match vcpu.run()? {
        VcpuExit::Hlt => {
            // Copy output back
            let result = output_buffer[..output_len].to_vec();
            return Ok(result);
        }
        VcpuExit::Shutdown => {
            return Err("Guest crashed");
        }
        _ => continue,
    }
}

Advantages¶

Simplicity: No protocol overhead, just memory access
Performance: Zero-copy possible, no virtqueue processing
Latency: Minimal exit overhead (just HLT or I/O)
Debugging: Easy to inspect memory contents

Limitations¶

Size Constraints: Cannot handle files larger than available memory (~1-2 GB practical limit)
Fixed Layout: Input/output sizes determined at setup
No Streaming: Cannot process data larger than allocated buffers
Synchronization: Guest must complete before VMM reads output
No Flow Control: Guest cannot signal partial progress easily
No Random Access for Large Files: Would require complex chunking protocol

When to Use¶

Direct memory I/O is ideal for:

Small data only (< 1 GB)
Single-shot processing tasks
Metadata or configuration processing
Low-latency requirements
Simple guest implementations
Initial prototyping before implementing virtio

Do NOT use for:

Large disk images (10s-100s of GB) - use virtio-block
Streaming data - use virtio-vsock
Any file larger than available system memory

References¶

📝 Report an issue with this page