Skip to content

qemu-img Quirks

This document describes known behaviors in qemu-img that differ from what one might expect, and how instar handles these cases.

Quirk Classification: Safe vs Unsafe

Quirks are classified into two categories based on their security implications:

Safe Quirks

Safe quirks affect output formatting or calculation methods but do not introduce security vulnerabilities. Examples include:

  • Size rounding (to block or sector boundaries)
  • Number formatting (banker's rounding, significant figures)
  • VHD size calculation methods

instar mimics safe quirks by default for qemu-img compatibility. Use --ignore-quirks to get more intuitive behavior.

Unsafe Quirks

Unsafe quirks are behaviors that can enable security vulnerabilities or reduce format identification accuracy. Examples include:

  • RAW as fallback format - Treating any unrecognized file as a valid raw disk image, which enables backing file disclosure attacks
  • ISO reported as RAW - Not detecting ISO 9660 format, reducing format visibility for policy decisions

instar does NOT mimic unsafe quirks by default. Instead, instar applies additional validation (e.g., requiring MBR/GPT partition tables for raw images, detecting ISO 9660 format). Use --unsafe-quirks to match qemu-img's behavior for compatibility testing.

Summary

Flag Safe Quirks Unsafe Quirks
(default) Enabled (qemu-img compatible) Disabled (secure)
--ignore-quirks Disabled (intuitive output) Disabled (secure)
--unsafe-quirks Enabled (qemu-img compatible) Enabled (insecure)

See configuration.md for full flag documentation.

Extra Detail Mode

instar can provide additional format-specific information that qemu-img does not output. This extra information is disabled by default for qemu-img compatibility, but can be enabled with the --extra-detail flag.

VDI Format-Specific Information

qemu-img does not output format-specific information for VDI (VirtualBox) images, even though the format contains useful metadata:

{
    "format": "vdi",
    "format-specific": {
        "type": "vdi",
        "data": {
            "image-type": "dynamic",
            "block-size": 1048576,
            "blocks-in-image": 10,
            "blocks-allocated": 0,
            "uuid": "914d94c9-e6a6-4968-9064-29fd03a9cdc2"
        }
    }
}

Default behavior: instar matches qemu-img by not outputting VDI format-specific information.

With --extra-detail flag: instar outputs the VDI format-specific section, providing additional metadata about the image structure.

When to Use --extra-detail

Use this flag when you need: - VDI image type (dynamic vs fixed) - VDI block allocation statistics - VDI image UUID

The extra information is particularly useful for: - Debugging VirtualBox image issues - Migration planning (understanding allocation patterns) - Image inspection and auditing


QCOW2 disk_size Calculation

Classification: Safe Quirk

Observed Behavior

For QCOW2 files, qemu-img info reports a disk size that may differ from the actual file size on disk. For example, with a generated QCOW2 v2 test file:

  • Actual file size (from stat or ls -l): 196616 bytes
  • qemu-img reported disk size: 197120 bytes (192 KiB)
  • Difference: 504 bytes

Root Cause

qemu-img calculates the "disk size" based on the QCOW2 internal structure, specifically by finding the highest allocated offset in the file's metadata (L1 table, refcount table, etc.) and rounding up to a sector boundary (512 bytes).

For the test file: - L1 table offset: 196608 (0x30000) - L1 table has 1 entry (8 bytes) - Actual file end: 196608 + 8 = 196616 bytes - qemu-img calculation: 196608 + 512 = 197120 bytes (sector-aligned)

Why This Happens

qemu-img appears to calculate "disk size" as the expected size based on the image's internal structure, not the actual filesystem size. This calculation:

  1. Finds the highest used offset in metadata structures
  2. Rounds up to the nearest sector boundary (512 bytes)
  3. Reports this as the "disk size"

This approach makes sense for images that might be sparse or have trailing allocations, but can report larger sizes than the actual file.

instar Behavior

Default behavior: instar matches qemu-img by calculating disk size based on the image's internal metadata structure, rounded up to sector boundaries. This ensures drop-in replacement compatibility.

With --ignore-quirks flag: instar reports the actual file size from the underlying storage, matching what stat or ls -l reports.

Why Match qemu-img?

Since instar aims to be a drop-in replacement for qemu-img info, matching the output exactly (including this calculation) reduces friction for users migrating from qemu-img. Scripts and tools that parse qemu-img output will work unchanged.

The --ignore-quirks flag provides an escape hatch for users who need the true filesystem size.

Test Implications

The test file qcow2_v2.qcow2 in instar-testdata was generated with qemu-img (qemu-img create -f qcow2 -o compat=0.10 ...). By matching qemu-img's calculation, tests can perform exact output comparison.

Block-Rounded Disk Size

Classification: Safe Quirk

Observed Behavior

qemu-img reports "disk size" rounded up to filesystem block boundaries (4096 bytes), not the actual file size.

For the QCOW2 v2 test file: - Actual file size: 196616 bytes - qemu-img disk size: 200704 bytes (196 KiB) - Calculation: ceil(196616 / 4096) * 4096 = 49 * 4096 = 200704

instar Behavior

Default behavior: instar matches qemu-img by rounding file size up to 4096-byte blocks.

With --ignore-quirks flag: instar reports the actual file size.

Human-Readable Size Formatting

Classification: Safe Quirk

Observed Behavior

qemu-img uses %0.3g printf format (3 significant figures) for human-readable sizes. This rounds to 3 significant figures, with the number of decimal places depending on the magnitude:

For values >= 100 (displayed as integers):

Rounds to nearest integer using "round half to even" (banker's rounding): - 126.998 GiB → "127 GiB" (rounds up from 126.998) - 192.5 KiB → "192 KiB" (rounds to even from 192.5) - 256.5 KiB → "256 KiB" (rounds to even from 256.5) - 127.5 GiB → "128 GiB" (rounds to even from 127.5)

For values 10-99 (displayed with 1 decimal place):

Standard rounding applies: - 20.6875 MiB → "20.7 MiB" (rounds from 20.6875) - 15.44 KiB → "15.4 KiB" (rounds from 15.44)

Technical Details

This behavior stems from C printf's %0.3g format which: 1. Rounds to 3 significant figures using "round half to even" (banker's rounding) 2. Removes trailing zeros after the decimal point 3. For integer results, displays no decimal point

The key distinction is at exact midpoints (like 192.5): C rounds to the nearest even number (192), while Rust's default round() rounds away from zero (193).

instar Behavior

Default behavior: instar matches qemu-img's formatting using banker's rounding: - Values >= 100: round to nearest integer (ties to even) - Values 10-99: round to 1 decimal place (ties to even) - Values 1-9: round to 2 decimal places (ties to even) - Values < 1: round to 3 decimal places (ties to even)

With --ignore-quirks flag: instar uses consistent rounding with 1 decimal place when the value is not a whole number (e.g., "192.5 KiB" instead of "192 KiB").

Child Node File Length

Classification: Safe Quirk

Observed Behavior

In qemu-img 8.0+, the Child node '/file' section reports a "file length" (human) or "virtual-size" (JSON) that may differ from the actual filesystem size.

qemu-img reports the larger of: 1. The actual filesystem file size 2. The calculated size based on internal metadata (e.g., L1 table offset rounded up to sector boundary for QCOW2)

For files with data beyond the metadata structures (like real disk images), qemu-img reports the actual file size. For minimal files where the metadata calculation exceeds the actual size (like empty test images), it reports the metadata-based calculation.

Example

For a minimal QCOW2 v2 test file: - Actual file size: 196616 bytes - L1 table calculation: (196608 + 512) = 197120 bytes - qemu-img file length: max(196616, 197120) = 197120 bytes

For a real disk image (cirros): - Actual file size: 21692416 bytes - L1 table calculation: much smaller (metadata is at the start) - qemu-img file length: max(21692416, calc) = 21692416 bytes

instar Behavior

Default behavior: instar matches qemu-img by reporting the larger of the actual file size and the internal metadata calculation.

With --ignore-quirks flag: instar reports the actual filesystem size.

Summary of --ignore-quirks Effects

When --ignore-quirks is specified:

Field Default (qemu-img compatible) With --ignore-quirks
disk size Block-rounded (4096 bytes) Actual file size
file length max(actual, metadata calc) Actual file size
Size formatting 3 significant figures 1 decimal place

File Sparseness and Git

Classification: Safe Quirk (environmental, not a qemu-img behavior)

Observed Behavior

qemu-img's reported "disk size" depends on the actual allocation of sparse files on disk. When disk images are transferred through git (clone, fetch), sparse holes may be filled with zeros, increasing the reported disk size.

For example, the iotest-dynamic-1G.vhdx file: - Original (sparse): disk size 66.1 MiB - After git clone: disk size 100 MiB (holes filled with zeros) - After fallocate -d: disk size 66.1 MiB (holes restored)

Root Cause

Git stores file contents as blobs and does not preserve sparse file semantics. When git writes a file during checkout, it writes all bytes sequentially, effectively "filling in" sparse holes with actual zero bytes. This increases the file's allocated blocks on disk.

CI/Testing Implications

Test baselines are generated with sparse files. When the testdata repository is cloned in CI, the files may lose sparseness, causing disk_size mismatches.

Solution

After cloning the testdata repository, restore sparse holes using cp --sparse=always which is more robust than fallocate -d:

find downloaded/ -type f \( \
    -name "*.qcow2" -o \
    -name "*.vmdk" -o \
    -name "*.vhd" -o \
    -name "*.vhdx" -o \
    -name "*.img" \
\) -print0 | while IFS= read -r -d '' file; do
    cp --sparse=always "$file" "$file.sparse"
    mv "$file.sparse" "$file"
done

Why cp --sparse=always instead of fallocate -d?

fallocate -d (FALLOC_FL_PUNCH_HOLE) can only punch holes in contiguous zero-filled regions that are aligned to filesystem block boundaries. Files with partial zero blocks (blocks containing mostly zeros but a few non-zero bytes) cannot have those regions converted to holes.

cp --sparse=always reads the file content and writes a new file, skipping zero-filled blocks entirely. This correctly handles files with complex sparse patterns where fallocate -d would leave extra blocks allocated.

Test Framework Handling

Even with cp --sparse=always, re-sparsified files may not have identical block allocation patterns to the original. Different filesystems, kernel versions, or sparse detection algorithms can result in significantly different allocation patterns.

For this reason, the test comparison framework (tests/helpers/comparators.py) looks up the actual disk size from the filesystem at test time using os.stat().st_blocks * 512 and substitutes this value into the expected output before comparison. This ensures:

  1. Tests compare against the filesystem's actual view of the file
  2. No reliance on potentially stale baseline values for disk size
  3. Exact matching instead of arbitrary tolerance thresholds

This approach is more scientifically correct than using tolerance, because: 1. actual-size reflects filesystem allocation, not image content 2. We're testing that instar correctly reports what the filesystem says 3. Both instar and the test framework query the same filesystem state

Note

This is not a qemu-img quirk per se, but rather a filesystem/git interaction that affects qemu-img output consistency in CI environments.

VHD Virtual Size Calculation

Classification: Safe Quirk

Observed Behavior

qemu-img calculates VHD virtual size differently depending on the creator application that produced the VHD file. The VHD footer contains both a "current size" field (explicit virtual size in bytes) and CHS geometry values (cylinders, heads, sectors per track).

For Virtual PC and legacy qemu VHDs (creator_app = "vpc " or "qemu"):

qemu-img calculates virtual size from CHS geometry:

virtual_size = cylinders × heads × sectors_per_track × 512

For modern applications (Hyper-V, Disk2vhd, XenServer, Azure, etc.):

qemu-img uses the disk_size field directly from the VHD footer.

Example

For the virtualpc-dynamic.vhd test image (created by Virtual PC): - Footer disk_size field: 136,365,211,648 bytes - CHS geometry: 65,278 cylinders × 16 heads × 255 sectors - CHS-calculated size: 65,278 × 16 × 255 × 512 = 136,363,130,880 bytes - qemu-img reports: 136,363,130,880 bytes (CHS calculation)

The difference (2,080,768 bytes) exists because Virtual PC's geometry algorithm cannot exactly represent the requested size, so it rounds down to the nearest CHS-representable value.

Why This Matters

Virtual PC and original qemu create VHD files that rely on CHS geometry for compatibility with legacy systems. Using the disk_size field directly for these images would report a larger virtual size than the geometry can address, potentially causing data corruption if writes exceed the CHS-addressable range.

Maximum CHS Geometry

When CHS geometry reaches maximum values (65,535 × 16 × 255 = 267,382,800 sectors = ~127 GiB), qemu-img falls back to using the disk_size field regardless of creator application. This prevents truncation for large disks.

Known Creator Applications

Creator App Size Method Application
vpc CHS Microsoft Virtual PC
qemu CHS qemu (legacy)
qem2 disk_size qemu (modern)
win disk_size Microsoft Hyper-V
d2v disk_size Disk2vhd
tap\0 disk_size XenServer
CTXS disk_size XenConverter
wa\0\0 disk_size Microsoft Azure

instar Behavior

Default behavior: instar matches qemu-img by checking the creator_app field and using CHS calculation for "vpc " and "qemu" creators (unless CHS is at maximum), or disk_size field for all others.

With --ignore-quirks flag: Currently no change; the VHD size calculation always matches qemu-img for maximum compatibility.

RAW as Fallback Format

Classification: Unsafe Quirk - This behavior enables security vulnerabilities.

Observed Behavior

qemu-img treats any file that does not match a known format's magic number as a "raw" disk image. This includes:

  • Actual raw disk images (with MBR/GPT partition tables)
  • Plain text files
  • Binary data files
  • Corrupted or truncated images
  • Random garbage

For example, a simple text file:

$ echo "This is just a plain text file." > /tmp/test.txt
$ qemu-img info /tmp/test.txt
image: /tmp/test.txt
file format: raw
virtual size: 512 B (512 bytes)
disk size: 4 KiB

Why This Matters

This behavior has important implications:

  1. No format validation: qemu-img cannot distinguish between a genuine raw disk image and arbitrary data. A user could upload a PDF, JPEG, or executable and qemu-img would happily call it a "raw" disk image.

  2. Testing considerations: When testing format detection, any file that fails to match known formats will be reported as "raw" rather than "unknown" or generating an error.

Security Implications: The Root Cause of Backing File Attacks

This "raw as fallback" behavior is the fundamental design flaw that enables backing file disclosure attacks (CVE-2015-5163, CVE-2024-32498, etc.).

Consider what happens when qemu-img processes a QCOW2 image with backing_file = "/etc/shadow":

  1. qemu-img opens the QCOW2 image and parses its header
  2. qemu-img sees the backing file reference to /etc/shadow
  3. qemu-img opens /etc/shadow and tries to detect its format
  4. /etc/shadow has no recognized magic number (it's a text file)
  5. qemu-img treats /etc/shadow as a "raw" disk image
  6. qemu-img reads the file contents as disk data

If qemu-img instead rejected files that don't match any known disk image format, the attack would fail at step 5. The backing file would be rejected as "not a valid disk image" rather than being slurped up as "raw" data.

This design choice - treating unknown files as valid raw images rather than rejecting them - is what transforms a simple path reference into a data exfiltration vulnerability. A more defensive design would require backing files to have recognizable disk image headers (QCOW2, VMDK, VHD, or at minimum a valid MBR/GPT partition table for raw images).

Note: instar avoids this vulnerability entirely through its KVM sandbox architecture - the guest cannot open arbitrary files regardless of format detection behavior. See format-detection-safety.md for details.

Cloud Environment Implications

In cloud environments (OpenStack, etc.), format validation cannot rely solely on qemu-img. OpenStack's Glance uses oslo.utils format_inspector which detects GPT/MBR partition tables to distinguish "actual disk images" from "files we don't recognize."

Comparison with oslo.utils format_inspector

oslo.utils takes a different approach:

File Type qemu-img oslo.utils
MBR-partitioned disk raw gpt (detects MBR)
GPT-partitioned disk raw gpt
FAT filesystem (no partition) raw raw
Plain text file raw raw
Random garbage raw raw
Corrupted QCOW2 raw (usually) error or raw

oslo.utils can distinguish between "files with valid partition tables" (likely real disk images) and "files we don't recognize" (both labeled "raw" but with different confidence levels).

instar Behavior

Default behavior (secure): instar requires files detected as "raw" to have a valid partition table (MBR or GPT). Files without recognized format headers AND without valid partition tables are rejected as "unknown format" rather than being silently accepted as raw images.

This prevents the backing file disclosure attacks described above, because /etc/shadow would be rejected as "not a valid disk image" rather than being treated as a raw disk.

With --unsafe-quirks flag: instar matches qemu-img's behavior, treating any unrecognized file as a valid raw image. This is required for exact qemu-img output compatibility but should only be used in controlled testing environments, never in production.

Partition table detection: instar checks for: - MBR: Valid 0xAA55 signature at offset 510-511, with at least one partition entry having a valid boot flag (0x00 or 0x80) - GPT: Protective MBR with partition type 0xEE, followed by valid GPT header at LBA 1

See format-coverage.md for comparison with oslo.utils format_inspector.

Test Images

The instar-testdata repository includes several test cases for this behavior:

  • raw-random-garbage.raw - Random bytes (detected as raw)
  • raw-misleading-header.raw - QCOW2 magic but invalid header (detected as raw)
  • raw-minimal-1byte.raw - Single byte file (detected as raw)

ISO 9660 Detection vs RAW

Classification: Unsafe Quirk - Related to format identification accuracy.

Observed Behavior

qemu-img does not specifically detect ISO 9660 (CD/DVD image) format. Instead, it treats ISO files as "raw" disk images:

$ qemu-img info ubuntu.iso
image: ubuntu.iso
file format: raw
virtual size: 4.7 GiB (5046586880 bytes)
disk size: 4.7 GiB

Why This Matters

ISO 9660 is a distinct filesystem format used for CD/DVD images, with a well-defined structure: - Primary Volume Descriptor at sector 16 (byte offset 32768) - Standard identifier "CD001" at bytes 1-5 of the PVD

Treating ISO files as "raw" means: 1. Cloud platforms cannot distinguish ISOs from actual raw disk images 2. Policy decisions (e.g., "reject ISO uploads") require external detection 3. Format-specific handling (e.g., mount options) cannot be automated

instar Behavior

Default behavior (secure): instar detects ISO 9660 format by checking for the "CD001" magic at byte offset 32769. ISO files are reported as file format: iso rather than raw. This allows: - OpenStack/Glance to identify and policy-control ISO uploads - Better format reporting for administrators - Accurate format statistics

With --unsafe-quirks flag: instar matches qemu-img's behavior, treating ISO files as "raw" disk images. This is required for exact qemu-img output compatibility but provides less information about the actual file format.

Technical Details

ISO 9660 detection checks for: - "CD001" identifier at byte offset 32769 (32768 + 1) - Works with both small (512-byte) and large (65536-byte) sector sizes

The detection is performed after other format checks (QCOW2, VMDK, VHD, etc.) but before the partition table validation for raw images.

Check Operation Format Handling

Classification: Unsafe Quirk - Related to format identification and validation accuracy.

Quirk 1: Format Misidentification

Observed Behavior

qemu-img check only recognizes QCOW2 format. All other image formats (VMDK, VHDX, VHD, VDI, etc.) are treated as "raw" format:

$ qemu-img check image.vmdk
qemu-img: Could not open 'image.vmdk': Unknown image format
# Or with older versions:
This image format does not support checks

qemu-img does not attempt to detect the actual format when running check.

Why This Matters

  1. Format misidentification: A valid VMDK image is not recognized as VMDK - it's either rejected or processed as unknown/raw format.

  2. Reduced visibility: Administrators cannot determine what format an image actually is using qemu-img check.

instar Behavior

Default behavior (secure): instar detects the actual format of the image using the same detection logic as instar info. VMDK images are identified as "vmdk", VHDX as "vhdx", etc.

With --unsafe-quirks flag: instar matches qemu-img's behavior, only detecting QCOW2 format. All other formats are reported as "raw".

Quirk 2: Lack of Validation for Non-QCOW2 Formats

Observed Behavior

qemu-img check only performs structural validation for QCOW2 images. For all other formats, it reports that checks are not supported and exits with success:

$ qemu-img check simple.vmdk
This image format does not support checks
$ echo $?
0  # Success exit code despite no validation performed

This means a corrupt VMDK, VHDX, or VHD file would appear to "pass" the check simply because qemu-img didn't actually examine it.

Why This Matters

  1. False sense of security: Users may believe an image has been validated when no validation occurred.

  2. Missed corruptions: Corrupt headers, invalid offsets, and malformed metadata are not detected for non-QCOW2 formats.

instar Behavior

Default behavior (secure): instar performs format-appropriate validation for supported formats:

  • VMDK: Validates header version (1-3), capacity > 0, grain size power of 2, descriptor offset within file bounds
  • VHDX: Validates file signature and region table signature at offset 0x30000
  • VHD: Validates footer cookie and disk type (2=fixed, 3=dynamic, 4=diff)

Images with structural problems are marked with FLAG_HAS_CORRUPTIONS and report specific error counts. Images that pass validation are marked FLAG_VALID.

With --unsafe-quirks flag: instar skips validation for non-QCOW2 formats, matching qemu-img's behavior. Non-QCOW2 images are marked as FLAG_NOT_SUPPORTED | FLAG_VALID without examination.

Test Images (Planned)

The following corrupt test images are planned for instar-testdata to validate corruption detection. Tests skip gracefully if these files do not exist:

Image Format Corruption
vmdk-corrupt-version.vmdk VMDK Invalid version (255)
vhdx-corrupt-region.vhdx VHDX Invalid region table signature
vhd-corrupt-disktype.vhd VHD Invalid disk type (255)

These images should be placed in custom/format-coverage/ when created.

Summary

Mode Format Detection Validation
Default (secure) All formats QCOW2, VMDK, VHDX, VHD
--unsafe-quirks QCOW2 only QCOW2 only

Check JSON Schema Consistency

Classification: Safe Quirk - Affects JSON output schema predictability.

Observed Behavior

qemu-img check --output=json conditionally omits fields from its JSON output when their values are zero. For example, a QCOW2 image with no corruptions produces:

{
    "filename": "test.qcow2",
    "format": "qcow2",
    "check-errors": 0,
    "image-end-offset": 262144,
    "total-clusters": 2,
    "allocated-clusters": 0,
    "fragmented-clusters": 0
}

The corruptions, leaks, and refcount-errors fields are absent. They only appear when their values are greater than zero:

{
    "filename": "corrupt.qcow2",
    "format": "qcow2",
    "check-errors": 3,
    "corruptions": 3,
    "image-end-offset": 262144,
    ...
}

Why This Matters

  1. Inconsistent schema: Callers must handle both the presence and absence of these fields, adding complexity to JSON parsing.

  2. Brittle tooling: Tools that expect a fixed set of fields may break when corruptions are first encountered, or may silently treat missing fields as absent rather than zero.

  3. API contract ambiguity: It is unclear whether a missing field means "zero errors" or "not checked".

instar Behavior

Default behavior (consistent schema): instar always includes corruptions, leaks, and refcount-errors in JSON output, regardless of their values. This provides a predictable, fixed schema that callers can rely on:

{
    "filename": "test.qcow2",
    "format": "qcow2",
    "check-errors": 0,
    "corruptions": 0,
    "leaks": 0,
    "refcount-errors": 0,
    "image-end-offset": 262144,
    "total-clusters": 2,
    "allocated-clusters": 0,
    "fragmented-clusters": 0
}

With --unsafe-quirks flag: instar matches qemu-img's behavior, omitting corruptions, leaks, and refcount-errors when their values are zero.

Current Validation Limitations

instar's QCOW2 check implementation has the following limitations compared to qemu-img:

  1. Partial L2 table validation: Only the first sector of each L2 table is validated (approximately 12.5% coverage for 64KB clusters). The fragmentation calculation is based on this partial sample.

  2. No refcount validation: The refcount table offset is verified, but individual refcount entries are not read or validated. This means:

  3. refcount-errors will always be 0
  4. leaks will always be 0

Users comparing instar output against qemu-img check may notice these discrepancies, particularly for images with refcount issues or extensive L2 table corruption beyond the first sector.

Future Additions

Additional quirks will be documented here as they are discovered during compatibility testing.

📝 Report an issue with this page