Other Data Transfer Mechanisms for KVM Guests¶
This document covers additional data transfer mechanisms beyond direct memory, virtio-vsock, and virtio-block.
Port I/O (IN/OUT Instructions)¶
Overview¶
Port I/O uses x86 IN/OUT instructions to communicate with the VMM. This is the mechanism used for legacy PC hardware like serial ports.
Exit Structure¶
struct {
__u8 direction; // KVM_EXIT_IO_IN (0) or KVM_EXIT_IO_OUT (1)
__u8 size; // 1, 2, 4, or 8 bytes
__u16 port; // I/O port number
__u32 count; // Number of operations
__u64 data_offset; // Offset in kvm_run for data
} io;
Common Ports¶
| Port | Device |
|---|---|
| 0x3f8-0x3ff | COM1 serial |
| 0x2f8-0x2ff | COM2 serial |
| 0x80 | Debug port |
| 0x60-0x64 | Keyboard controller |
Trade-offs¶
Advantages: - Simple to implement - Fast (faster than MMIO) - No memory mapping required
Limitations: - x86 only - Small data size (1-8 bytes per operation) - VM exit per operation (unless using ioeventfd) - Limited port space (65536 ports)
Use Case¶
Best for low-frequency control/status operations, not bulk data.
Memory-Mapped I/O (MMIO)¶
Overview¶
MMIO uses memory accesses to unmapped guest physical addresses to trigger VM exits, similar to how hardware registers work.
Exit Structure¶
struct {
__u64 phys_addr; // Guest physical address
__u8 data[8]; // Data (up to 8 bytes)
__u32 len; // Access length
__u8 is_write; // 1 = write, 0 = read
} mmio;
Trade-offs¶
Advantages: - Works on all architectures - Natural for device register emulation - Can use readonly memory regions for read traps
Limitations: - Slower than port I/O on x86 - VM exit per access - Limited to 8 bytes per operation
ioeventfd / irqfd¶
ioeventfd¶
Avoids VM exits by signaling an eventfd on specific I/O or MMIO access.
struct kvm_ioeventfd {
__u64 datamatch; // Match value (optional)
__u64 addr; // Address to monitor
__u32 len; // Access size
__s32 fd; // eventfd to signal
__u32 flags; // KVM_IOEVENTFD_FLAG_*
};
Flags:
- KVM_IOEVENTFD_FLAG_DATAMATCH: Only trigger on matching value
- KVM_IOEVENTFD_FLAG_PIO: Port I/O (vs MMIO)
- KVM_IOEVENTFD_FLAG_DEASSIGN: Remove registration
Use Case: Notification-only communication (doorbell pattern).
irqfd¶
Inject interrupts into guest via eventfd.
struct kvm_irqfd {
__u32 fd; // eventfd
__u32 gsi; // Guest interrupt line
__u32 flags;
__u32 resamplefd; // For level-triggered emulation
};
Use Case: Host-to-guest notifications without VM entry overhead.
Combined Pattern¶
Guest writes to doorbell → ioeventfd signals host
Host completes work → irqfd injects interrupt
Guest handles interrupt → reads result from shared memory
This pattern achieves near-zero-exit data transfer.
Custom MMIO Device¶
Overview¶
A custom MMIO-based device provides a middle ground between direct memory I/O and full virtio implementations. It combines memory-mapped control registers with DMA-like data transfer and ioeventfd/irqfd for efficient signaling.
This approach emulates a simplified hardware device without implementing a real hardware protocol or the full virtio specification.
Architecture¶
Guest Physical Address Space:
┌─────────────────────────────────────────────┐
│ Control Registers (4KB, MMIO) │ 0xFE000000
│ 0x00: command (u32) │
│ 0x04: status (u32) │
│ 0x08: data_gpa (u64) ← guest phys addr│
│ 0x10: data_length (u64) │
│ 0x18: result_gpa (u64) │
│ 0x20: result_length (u64) │
│ 0x28: doorbell (u32) ← triggers work │
├─────────────────────────────────────────────┤
│ Data Buffer (variable, shared memory) │ 0x10000000
│ Guest writes input here, host reads │
├─────────────────────────────────────────────┤
│ Result Buffer (variable, shared memory) │ 0x20000000
│ Host writes output here, guest reads │
└─────────────────────────────────────────────┘
Protocol¶
Commands:
| Value | Command | Description |
|---|---|---|
| 0x01 | READ_CHUNK | Read chunk from input at offset |
| 0x02 | WRITE_CHUNK | Write chunk to output at offset |
| 0x03 | GET_SIZE | Query input file size |
| 0x04 | FLUSH | Flush output to storage |
Status:
| Value | Status | Description |
|---|---|---|
| 0x00 | IDLE | Ready for command |
| 0x01 | BUSY | Processing command |
| 0x02 | COMPLETE | Command finished successfully |
| 0x03 | ERROR | Command failed |
Operation Flow¶
1. Guest writes command parameters to registers
2. Guest writes to doorbell → triggers ioeventfd (no VM exit)
3. Host receives eventfd notification
4. Host reads command from guest memory (or MMIO ring)
5. Host performs operation (read/write file)
6. Host writes result to result buffer
7. Host updates status register
8. Host injects interrupt via irqfd (optional)
9. Guest reads status and result
VMM Implementation¶
// Register ioeventfd for doorbell
struct kvm_ioeventfd doorbell = {
.addr = 0xFE000028, // Doorbell address
.len = 4,
.fd = eventfd(0, 0),
.flags = 0, // MMIO (not PIO)
};
ioctl(vm_fd, KVM_IOEVENTFD, &doorbell);
// Register irqfd for completion notification
struct kvm_irqfd completion_irq = {
.fd = eventfd(0, 0),
.gsi = 32, // IRQ line
};
ioctl(vm_fd, KVM_IRQFD, &completion_irq);
// Main loop
while (1) {
// Wait for doorbell
uint64_t val;
read(doorbell.fd, &val, sizeof(val));
// Read command from guest control registers
uint32_t cmd = read_guest_mmio(0xFE000000);
uint64_t offset = read_guest_mmio(0xFE000008);
uint64_t length = read_guest_mmio(0xFE000010);
// Process command
switch (cmd) {
case CMD_READ_CHUNK:
pread(input_fd, guest_data_buffer, length, offset);
break;
case CMD_WRITE_CHUNK:
pwrite(output_fd, guest_result_buffer, length, offset);
break;
}
// Signal completion
write_guest_mmio(0xFE000004, STATUS_COMPLETE);
uint64_t one = 1;
write(completion_irq.fd, &one, sizeof(one));
}
Guest Implementation¶
const CTRL_BASE: u64 = 0xFE00_0000;
const CMD_READ_CHUNK: u32 = 0x01;
struct CustomDevice {
ctrl: *mut u8,
data_buffer: *mut u8,
}
impl CustomDevice {
fn read_chunk(&mut self, offset: u64, length: u64) -> &[u8] {
unsafe {
// Write command parameters
(self.ctrl.add(0x00) as *mut u32).write_volatile(CMD_READ_CHUNK);
(self.ctrl.add(0x08) as *mut u64).write_volatile(offset);
(self.ctrl.add(0x10) as *mut u64).write_volatile(length);
// Ring doorbell
(self.ctrl.add(0x28) as *mut u32).write_volatile(1);
// Poll for completion (or wait for interrupt)
while (self.ctrl.add(0x04) as *mut u32).read_volatile() != STATUS_COMPLETE {
core::hint::spin_loop();
}
// Return data slice
slice::from_raw_parts(self.data_buffer, length as usize)
}
}
}
Large File Support¶
For large files (10s-100s of GB), the custom device uses chunked access:
For a 100GB file with 1MB chunks:
- 100,000 READ_CHUNK commands
- Each command: ~1μs doorbell + ~10ms I/O = ~10ms per chunk
- Total: ~17 minutes (similar to virtio-block)
The 64-bit offset field in the command structure supports files up to 16 EB.
Comparison with Virtio-block¶
Without rust-vmm crates:
| Aspect | Custom MMIO | Virtio-block |
|---|---|---|
| Protocol complexity | Low (~600 lines) | High (~1600 lines) |
| Complexity advantage | ✓ ~60% less code |
With rust-vmm crates:
| Aspect | Custom MMIO | Virtio-block |
|---|---|---|
| Protocol complexity | ~550 lines | ~450 lines |
| Complexity advantage | ✓ ~20% less code | |
| Standardization | None | VIRTIO spec |
| Tooling | None | qemu, libvirt |
| Batching | Manual | Native (virtqueue) |
| Scatter-gather | Manual | Native |
| Error handling | Custom | Standardized |
| Guest driver crate | ✗ | ✓ (virtio-drivers) |
Key insight: The rust-vmm ecosystem has eliminated Custom MMIO's main advantage (implementation simplicity). Virtio-block is now easier to implement AND provides better features.
Trade-offs¶
Advantages: - Zero VM exits with ioeventfd/irqfd - Flexible protocol design - Easy to debug (simple state machine) - Natural fit for request/response patterns - Good for learning how device emulation works
Limitations: - Non-standard (no ecosystem support) - No longer simpler than virtio-block (with rust-vmm crates) - Must implement batching manually for throughput - Requires careful synchronization design - No existing implementations to reference - No guest driver crate available
When to Use¶
Good fit: - Learning/educational purposes - Protocols that don't fit virtio semantics - When zero VM exits is absolutely critical - Very simple, single-purpose devices
Consider virtio-block instead when: - Using Rust (rust-vmm crates available) - Production deployment planned - Need scatter-gather for performance - Want standardized error handling - Integration with existing tooling required
Recommendation changed: With rust-vmm crates, virtio-block is now the simpler choice for most use cases. Custom MMIO is primarily useful for learning or specialized protocols.
Implementation Complexity¶
From scratch:
| Component | Lines of Code |
|---|---|
| Guest driver | ~100-200 |
| VMM device | ~200-400 |
| Protocol handling | ~100-200 |
| Total | ~400-800 |
With kvm-ioctls (for ioeventfd):
| Component | Lines of Code |
|---|---|
| Guest driver | ~100-200 |
| VMM device | ~150-300 |
| Protocol handling | ~100-200 |
| Total | ~350-700 |
Compared to Virtio-block with rust-vmm crates: ~450 lines
The complexity advantage of Custom MMIO has largely disappeared. Use virtio-block unless you have a specific reason not to.
Virtio-console¶
Overview¶
Stream-based communication device, more sophisticated than serial port I/O.
Features¶
- Multiple ports (up to 32768)
- Bidirectional streams
- Integration with host console/tty
- Supports both console and generic port modes
Device ID¶
VIRTIO_ID_CONSOLE (3)
Trade-offs¶
Advantages: - Higher throughput than serial - Multiple channels - Standard virtio transport
Limitations: - Stream semantics (no message boundaries) - Still requires virtio implementation
Use Case¶
Text/log output, interactive console, moderate data transfer.
Virtio-fs (virtiofs)¶
Overview¶
Shared filesystem using FUSE protocol over virtio transport.
Architecture¶
Guest Host
| |
+-- virtiofs driver |
| | |
| +-- virtio-fs device --|-- virtiofsd daemon
| | |
+-- /mnt/shared <--------------|-- /host/shared
Features¶
- DAX (Direct Access) for zero-copy file access
- Metadata caching
- Multiple request queues
Use Case¶
When file semantics are natural (config files, large data files).
Vhost Mechanism¶
Overview¶
Kernel-accelerated virtio backend that bypasses qemu for data path.
Components¶
- vhost-net: Network acceleration
- vhost-scsi: SCSI acceleration
- vhost-vsock: Vsock acceleration
Benefits¶
- Kernel handles data path
- Reduces context switches
- Near-native performance
Limitations¶
- Requires kernel support
- More complex setup
PCI Passthrough (VFIO)¶
Overview¶
Assign physical PCI devices directly to guests for bare-metal performance.
Requirements¶
- IOMMU (Intel VT-d, AMD-Vi)
- Compatible device
- VFIO driver
Trade-offs¶
Advantages: - Native hardware performance - Full device capabilities
Limitations: - Device dedicated to single VM - Hardware dependency - Complex setup - No live migration
Use Case¶
When specific hardware acceleration is required (GPU, NIC, NVMe).
Shared Memory (ivshmem)¶
Overview¶
Inter-VM shared memory device for zero-copy communication.
Features¶
- Direct memory sharing between VMs
- Optional interrupt mechanism
- Flexible size
Use Case¶
High-performance inter-VM communication.
Hypercalls¶
Overview¶
Direct guest-to-hypervisor calls for special operations.
Common Hypercalls¶
| Number | Name | Purpose |
|---|---|---|
| 1 | KVM_HC_VAPIC_POLL_IRQ | APIC interrupt polling |
| 5 | KVM_HC_KICK_CPU | Wake target CPU |
| 9 | KVM_HC_CLOCK_PAIRING | TSC synchronization |
| 12 | KVM_HC_MAP_GPA_RANGE | Request page mapping |
Trade-offs¶
Advantages: - Very low latency - Direct communication
Limitations: - Architecture-specific - Register-sized data only - Custom protocol required
Comparison Summary¶
| Mechanism | Throughput | Latency | Complexity | Architecture |
|---|---|---|---|---|
| Port I/O | Low | Medium | Low | x86 only |
| MMIO | Low | Medium | Low | All |
| ioeventfd/irqfd | High | Low | Medium | All |
| Custom MMIO Device | High | Low | Low-Medium | All |
| Virtio-console | Medium | Medium | Medium | All |
| Virtio-fs | High | Medium | High | All |
| Vhost | Very High | Low | High | All |
| VFIO | Native | Native | Very High | Requires HW |
| Hypercalls | Low | Very Low | Low | Varies |
Recommendations¶
| Use Case | Recommended Mechanism |
|---|---|
| Simple signaling | ioeventfd + irqfd |
| Debug output | Port I/O (serial) |
| Bulk data, one-shot | Direct memory |
| Streaming data | Virtio-vsock |
| Block semantics | Virtio-block |
| Prototyping block I/O | Custom MMIO device |
| File sharing | Virtio-fs |
| Maximum performance | Vhost or VFIO |
| Inter-VM | ivshmem |