QCOW2 L1/L2 Tables - Address Translation¶
QCOW2 uses a two-level table structure for mapping guest (virtual) disk offsets to host (physical) file offsets. This design enables sparse allocation and efficient copy-on-write operations.
Address Translation Overview¶
Guest Offset (logical address)
|
v
+-------------------+
| L1 Table | Memory-resident, one entry per L2 table
| [l1_index] |
+-------------------+
|
v (L2 table offset)
+-------------------+
| L2 Table | One cluster per table, cached on demand
| [l2_index] |
+-------------------+
|
v (cluster offset + in-cluster offset)
+-------------------+
| Data Cluster | Actual disk data
+-------------------+
Index Calculations¶
Given a guest offset, compute the table indices:
// Configuration derived from header
cluster_size = 1 << cluster_bits; // e.g., 65536 (64 KB)
l2_entries = cluster_size / l2_entry_size; // e.g., 8192 for standard entries
l2_bits = log2(l2_entries); // e.g., 13
// Address translation
l1_index = guest_offset >> (l2_bits + cluster_bits);
l2_index = (guest_offset >> cluster_bits) & (l2_entries - 1);
in_cluster_offset = guest_offset & (cluster_size - 1);
// Final host offset
l2_table_offset = l1_table[l1_index] & L1E_OFFSET_MASK;
cluster_descriptor = l2_table[l2_index];
host_cluster_offset = cluster_descriptor & L2E_OFFSET_MASK;
host_offset = host_cluster_offset + in_cluster_offset;
Example: 64 KB Clusters (cluster_bits = 16)¶
cluster_size = 65536 bytes
l2_entries = 65536 / 8 = 8192 entries per L2 table
l2_bits = 13
For guest offset 0x12345678:
l1_index = 0x12345678 >> (13 + 16) = 0x12345678 >> 29 = 0
l2_index = (0x12345678 >> 16) & 0x1FFF = 0x1234 & 0x1FFF = 0x1234
in_cluster_offset = 0x12345678 & 0xFFFF = 0x5678
L1 Table Entry Format (64 bits)¶
63 62 56 55 9 8 0
+---+----------+-------------------------------------+------------+
| C | Reserved | L2 Table Offset (47 bits) | Reserved |
+---+----------+-------------------------------------+------------+
| |
| +-- Must be cluster-aligned (bits 0-8 = 0)
+-- COPIED flag (refcount == 1)
| Bits | Name | Description |
|---|---|---|
| 0-8 | Reserved | Must be zero |
| 9-55 | Offset | L2 table offset (cluster-aligned) |
| 56-62 | Reserved | Must be zero |
| 63 | COPIED | Set if L2 table refcount is exactly 1 |
Masks:
Special values: - Entry = 0: L2 table not allocated (all clusters in this range unallocated)
L2 Table Entry Format - Standard (64 bits)¶
63 62 61 56 55 9 8 1 0
+---+---+----------+-------------------------------+-------+---+
| C | Z | Rsvd | Host Cluster Offset | Rsvd | Z |
+---+---+----------+-------------------------------+-------+---+
| | | |
| | +-- Cluster-aligned offset |
| +-- COMPRESSED flag |
+-- COPIED flag +-- ZERO flag
| Bits | Name | Description |
|---|---|---|
| 0 | ZERO | Cluster reads as zeros (v3+ only) |
| 1-8 | Reserved | Must be zero |
| 9-55 | Offset | Host cluster offset (cluster-aligned) |
| 56-61 | Reserved/Size | Reserved for standard, size for compressed |
| 62 | COMPRESSED | Cluster is compressed |
| 63 | COPIED | Refcount is exactly 1 |
Masks:
#define L2E_OFFSET_MASK 0x00fffffffffffe00ULL
#define L2E_STD_RESERVED_MASK 0x3f000000000001feULL
#define QCOW_OFLAG_COPIED (1ULL << 63)
#define QCOW_OFLAG_COMPRESSED (1ULL << 62)
#define QCOW_OFLAG_ZERO (1ULL << 0)
Cluster Type Determination¶
typedef enum QCow2ClusterType {
QCOW2_CLUSTER_UNALLOCATED, // offset=0, zero=0: read from backing
QCOW2_CLUSTER_ZERO_PLAIN, // offset=0, zero=1: reads as zeros
QCOW2_CLUSTER_ZERO_ALLOC, // offset!=0, zero=1: allocated but zeros
QCOW2_CLUSTER_NORMAL, // offset!=0, compressed=0: standard data
QCOW2_CLUSTER_COMPRESSED, // compressed=1: compressed data
} QCow2ClusterType;
QCow2ClusterType get_cluster_type(uint64_t l2_entry) {
if (l2_entry & QCOW_OFLAG_COMPRESSED) {
return QCOW2_CLUSTER_COMPRESSED;
}
uint64_t offset = l2_entry & L2E_OFFSET_MASK;
if (l2_entry & QCOW_OFLAG_ZERO) {
return offset ? QCOW2_CLUSTER_ZERO_ALLOC : QCOW2_CLUSTER_ZERO_PLAIN;
}
return offset ? QCOW2_CLUSTER_NORMAL : QCOW2_CLUSTER_UNALLOCATED;
}
Compressed Cluster Descriptor¶
When bit 62 (COMPRESSED) is set, the entry format changes:
63 62 61 csize_shift 0
+---+---+---------------------------+------------------------------+
| 0 | 1 | Compressed Sectors - 1 | Compressed Data Offset |
+---+---+---------------------------+------------------------------+
^ ^ ^
| | |
| +-- Number of 512B sectors +-- NOT cluster-aligned!
+-- COMPRESSED = 1 (COPIED always 0)
The bit positions are dynamic based on cluster size:
csize_shift = 62 - (cluster_bits - 8);
csize_mask = (1 << (cluster_bits - 8)) - 1;
cluster_offset_mask = (1ULL << csize_shift) - 1;
// Parsing a compressed entry
uint64_t compressed_offset = l2_entry & cluster_offset_mask;
int nb_sectors = ((l2_entry >> csize_shift) & csize_mask) + 1;
int compressed_size = nb_sectors * 512 -
(compressed_offset & 511); // Adjust for alignment
Example for 64 KB clusters (cluster_bits = 16):
csize_shift = 62 - 8 = 54
csize_mask = 0xFF (8 bits for sector count)
cluster_offset_mask = (1 << 54) - 1
Bits 0-53: Compressed data offset (54 bits)
Bits 54-61: Sector count - 1 (8 bits, max 255+1 = 256 sectors = 128 KB)
Bit 62: COMPRESSED = 1
Bit 63: COPIED = 0 (always for compressed)
Extended L2 Entries (Subclusters)¶
When incompatible feature bit 4 (EXTL2) is set, L2 entries are 128 bits (16 bytes) instead of 64 bits, enabling subcluster allocation.
Each cluster is divided into 32 subclusters, each tracked independently.
Extended L2 Entry Format (128 bits)¶
First 64 bits: Standard L2 entry (bit 0 unused)
Second 64 bits: Subcluster bitmap
63 32 31 0
+----------------------------------+--------------------------------+
| Zero Bitmap (32 bits) | Allocation Bitmap (32 bits) |
+----------------------------------+--------------------------------+
| |
+-- Bit N: subcluster N reads as zeros |
+-- Bit N: subcluster N allocated
#define QCOW_EXTL2_SUBCLUSTERS_PER_CLUSTER 32
#define QCOW_OFLAG_SUB_ALLOC(X) (1ULL << (X))
#define QCOW_OFLAG_SUB_ZERO(X) (1ULL << ((X) + 32))
// Subcluster index calculation
subcluster_bits = cluster_bits - 5; // 32 subclusters
subcluster_size = cluster_size / 32;
sc_index = (guest_offset >> subcluster_bits) & 31;
// Check subcluster status
bool is_allocated = bitmap & QCOW_OFLAG_SUB_ALLOC(sc_index);
bool is_zero = bitmap & QCOW_OFLAG_SUB_ZERO(sc_index);
Subcluster Types¶
typedef enum QCow2SubclusterType {
QCOW2_SUBCLUSTER_UNALLOCATED_PLAIN, // Not allocated, read backing
QCOW2_SUBCLUSTER_UNALLOCATED_ALLOC, // Cluster allocated, SC not
QCOW2_SUBCLUSTER_ZERO_PLAIN, // Reads zeros, no allocation
QCOW2_SUBCLUSTER_ZERO_ALLOC, // Allocated but reads zeros
QCOW2_SUBCLUSTER_NORMAL, // Normal allocated data
QCOW2_SUBCLUSTER_COMPRESSED, // Entire cluster compressed
QCOW2_SUBCLUSTER_INVALID, // Invalid state
} QCow2SubclusterType;
Important constraints: - Compressed clusters cannot use subclusters (bitmap must be 0) - Extended L2 requires cluster_bits >= 14 (minimum 16 KB clusters) - Minimum subcluster size is 512 bytes
Subcluster Bitmap Validation (check operation)¶
The instar check operation validates extended-L2 subcluster bitmaps
against the QCOW2 spec's invalid-combination rules. The following
invariants are enforced:
Standard (non-compressed) entries:
| # | Condition | Meaning |
|---|---|---|
| I1 | alloc_bits & zero_bits != 0 |
Subcluster simultaneously allocated and all-zero |
| I2 | host_offset == 0 && alloc_bits != 0 |
Bitmap claims subclusters allocated but no host cluster |
| I3 | host_offset != 0 && alloc_bits == 0 && zero_bits == 0 |
Host cluster allocated but no subcluster references it |
Compressed entries:
| # | Condition | Meaning |
|---|---|---|
| C1 | sc_bitmap != 0 (excluding legacy alloc_bits=0xFFFF_FFFF) |
Spec reserves all 64 bitmap bits as zero |
Note: C1 accepts alloc_bits == 0xFFFF_FFFF with zero_bits == 0
for compatibility with images produced by older QEMU versions.
Each violation increments corruptions, total_errors, and
subcluster_errors in the check result. Detailed messages are
emitted via debug_print (visible with --verbose).
Implemented by qcow2::validate_subcluster_bitmap() in the qcow2
crate, called from the L2 walk in the check operation.
The COPIED Flag¶
The COPIED flag (bit 63) is an optimization hint:
- Set (1): Cluster/table refcount is exactly 1; can modify in-place
- Clear (0): May be shared with snapshots; must COW before writing
This avoids refcount lookups during normal writes when the flag is set.
Rules: 1. COPIED is never set for compressed clusters 2. COPIED must be consistent with actual refcount 3. When creating snapshots, COPIED is cleared on shared clusters 4. When refcount drops to 1, COPIED should be set
L2 Table Caching¶
qemu loads L2 tables on demand and caches them:
// L2 cache sizing formula
l2_cache_max_disk = l2_cache_size * cluster_size / l2_entry_size;
// For 1 MB cache, 64 KB clusters, standard entries:
// 1 MB * 64 KB / 8 = 8 GB of disk addressable
L2 tables are loaded in "slices" (typically 4 KB) rather than full clusters for memory efficiency.
Table Constraints¶
- L1 table: Must be contiguous in image file
- L2 tables: Each exactly one cluster; cluster-aligned
- Offsets: All table offsets must be cluster-aligned
- Reserved bits: Must be zero; reject images with non-zero reserved bits
- Maximum L1 size: 32 MB (qemu limit)
Backing File Chain Resolution¶
When a cluster is unallocated:
1. Check local L2 entry
2. If unallocated (offset=0, zero=0):
a. If backing file exists: recurse into backing file
b. If no backing file: return zeros
3. Continue until allocated cluster found or chain ends
The backing file path is stored after the header (see backing_file_offset).