QCOW2 Reference Counting System¶

The refcount system tracks how many times each cluster is referenced, enabling snapshots, copy-on-write, and free space management.

Overview¶

Every cluster in a QCOW2 image has an associated reference count: - 0: Cluster is free (available for allocation) - 1: Cluster is used by one reference (can modify in-place) - >= 2: Cluster is shared (snapshots); must COW before writing

Two-Level Refcount Structure¶

Like the L1/L2 tables, refcounts use a two-level hierarchy:

Cluster Index
     |
     v
+-------------------+
| Refcount Table    |  Memory-resident, one entry per refcount block
| [table_index]     |
+-------------------+
     |
     v  (refcount block offset)
+-------------------+
| Refcount Block    |  One cluster, contains many refcount entries
| [block_index]     |
+-------------------+
     |
     v
   Refcount Value

Index Calculations¶

// Configuration from header
refcount_bits = 1 << refcount_order;  // e.g., 16 bits for order=4
refcount_block_entries = cluster_size * 8 / refcount_bits;

// For cluster at byte offset 'cluster_offset'
cluster_index = cluster_offset / cluster_size;
refcount_table_index = cluster_index / refcount_block_entries;
refcount_block_index = cluster_index % refcount_block_entries;

Example: 64 KB Clusters, 16-bit Refcounts¶

cluster_size = 65536 bytes
refcount_bits = 16 (refcount_order = 4)
refcount_block_entries = 65536 * 8 / 16 = 32768 entries per block

Each refcount block covers: 32768 * 64 KB = 2 GB of clusters

Refcount Table Entry (64 bits)¶

 63                                              9  8            0
+------------------------------------------------+--------------+
|        Refcount Block Offset (55 bits)         |   Reserved   |
+------------------------------------------------+--------------+
                         |
                         +-- Must be cluster-aligned

Bits	Name	Description
0-8	Reserved	Must be zero
9-63	Offset	Refcount block offset (cluster-aligned)

Masks:

#define REFT_OFFSET_MASK    0xfffffffffffffe00ULL
#define REFT_RESERVED_MASK  0x1ffULL

Special values: - Entry = 0: Refcount block not allocated (all clusters free)

Variable Refcount Widths¶

The refcount_order field (v3 only) specifies refcount entry size:

Order	Bits	Max Value	Use Case
0	1	1	Single allocation bit
1	2	3	Limited snapshots
2	4	15	Limited snapshots
3	8	255	Moderate snapshots
4	16	65535	Default, many snapshots
5	32	4 billion	Extreme cases
6	64	Unlimited	Maximum precision

Version 2 images always use 16-bit refcounts (order = 4).

Reading Refcount Entries¶

uint64_t get_refcount(void *refcount_block, int index, int refcount_order) {
    int refcount_bits = 1 << refcount_order;

    switch (refcount_order) {
    case 0:  // 1-bit
        return (((uint8_t*)refcount_block)[index / 8] >> (index % 8)) & 0x1;
    case 1:  // 2-bit
        return (((uint8_t*)refcount_block)[index / 4] >> (2 * (index % 4))) & 0x3;
    case 2:  // 4-bit
        return (((uint8_t*)refcount_block)[index / 2] >> (4 * (index % 2))) & 0xf;
    case 3:  // 8-bit
        return ((uint8_t*)refcount_block)[index];
    case 4:  // 16-bit
        return be16_to_cpu(((uint16_t*)refcount_block)[index]);
    case 5:  // 32-bit
        return be32_to_cpu(((uint32_t*)refcount_block)[index]);
    case 6:  // 64-bit
        return be64_to_cpu(((uint64_t*)refcount_block)[index]);
    }
}

Note: Multi-byte entries are stored in big-endian format.

Free Cluster Allocation¶

To find a free cluster:

uint64_t find_free_cluster(BDRVQcow2State *s) {
    uint64_t cluster_index = s->free_cluster_index;

    while (1) {
        uint64_t refcount = get_refcount_for_cluster(s, cluster_index);
        if (refcount == 0) {
            s->free_cluster_index = cluster_index + 1;
            return cluster_index * s->cluster_size;
        }
        cluster_index++;
        // Check bounds...
    }
}

The free_cluster_index is a hint for where to start searching.

Refcount Updates¶

When allocating or freeing clusters:

int update_refcount(BlockDriverState *bs, uint64_t offset,
                    uint64_t length, int addend) {
    // For each cluster in range [offset, offset+length):
    //   1. Load refcount block (allocate if needed)
    //   2. Read current refcount
    //   3. Add 'addend' (-1 for free, +1 for alloc)
    //   4. Write new refcount
    //   5. If refcount became 0, update free_cluster_index
    //   6. If refcount became 1, may need to set COPIED flag
}

Lazy Refcounts¶

When compatible feature bit 0 (LAZY_REFCOUNTS) is set: - Refcount updates may be deferred - DIRTY incompatible bit set while image is open - On clean close, DIRTY bit cleared - On unclean shutdown, refcounts may be inconsistent

This improves write performance at the cost of requiring qemu-img check after crashes.

Refcount Table Growth¶

When allocating clusters beyond current refcount table coverage:

Allocate new, larger refcount table
Allocate new refcount blocks as needed
Copy existing table entries
Update header atomically
Free old table

Self-describing allocation: New refcount structures must track themselves, creating a chicken-and-egg problem solved by allocating at end of image and computing refcounts for new structures.

Relationship to COPIED Flag¶

The COPIED flag in L1/L2 entries is an optimization:

if (l2_entry & QCOW_OFLAG_COPIED) {
    // Refcount is 1, can write in-place
    write_to_cluster(offset, data);
} else {
    // May be shared, need to check refcount
    if (get_refcount(cluster) > 1) {
        // COW: allocate new cluster, copy, update L2
        new_cluster = allocate_cluster();
        copy_cluster(old_cluster, new_cluster);
        update_l2_entry(l2_index, new_cluster | QCOW_OFLAG_COPIED);
        decrement_refcount(old_cluster);
    }
    write_to_cluster(new_offset, data);
}

Consistency Checking¶

To verify refcount consistency:

Build temporary refcount table by scanning all L1/L2 tables
For each referenced cluster, increment temporary refcount
Compare computed refcounts with on-disk refcounts
Verify COPIED flags match actual refcounts

qemu-img check image.qcow2

Refcount Metadata Sizing¶

// Calculate metadata requirements
int64_t clusters = (disk_size + cluster_size - 1) / cluster_size;
int64_t refcount_block_entries = cluster_size * 8 / refcount_bits;
int64_t refcount_blocks = (clusters + refcount_block_entries - 1)
                          / refcount_block_entries;
int64_t refcount_table_entries = refcount_blocks;
int64_t refcount_table_size = refcount_table_entries * 8;

// Don't forget refcount blocks for metadata clusters themselves!

Snapshot Refcount Interaction¶

When creating a snapshot: 1. Copy current L1 table to snapshot 2. Increment refcounts for all referenced clusters (+1) 3. Clear COPIED flags on shared clusters

When deleting a snapshot: 1. Decrement refcounts for all snapshot-referenced clusters (-1) 2. Free clusters that reach refcount 0 3. Set COPIED flags where refcount becomes 1

Implementation Notes¶

Caching: Refcount blocks should be cached; frequent lookups during I/O
Atomic updates: Table updates must be atomic (write new, update header)
Overflow: Check for refcount overflow before incrementing
Underflow: Decrementing refcount 0 indicates corruption
Alignment: All refcount structures must be cluster-aligned

📝 Report an issue with this page