Phase 5 — differential-fuzz timeout classification (category B2)¶
Parent plan: PLAN-fuzzing-bugs.md
Goal¶
Stop the differential-fuzz harness from filing
exit_code_divergence issues when qemu-img times out — those
divergences reflect a known upstream pathology, not an instar
defect.
Closes: #336, #334, #315.
Planning effort¶
Low. The change is in scripts/differential-fuzz.py and is
mechanical once the harness's qemu-runner code is located.
Investigation¶
Issue bodies look like:
"instar_rc": 0, "qemu_rc": -1,
"qemu_stderr": "TIMEOUT after 30s",
"operation": "resize", "apply_shrink": true,
"options": ["cluster_size=512", "lazy_refcounts=on"]
The harness wraps qemu-img calls with a 30-second timeout
(grep scripts/differential-fuzz.py for TIMEOUT /
subprocess / timeout=). When the timeout fires the harness
sets qemu_rc = -1 and qemu_stderr = "TIMEOUT after 30s",
then compares exit codes against instar's. Since instar
succeeded (rc = 0) and qemu reported -1, the divergence
check trips.
The correct classification is inconclusive: external tool unavailable. The harness already has a notion of skipped operations; it should treat qemu timeouts the same way.
Implementation¶
In scripts/differential-fuzz.py:
- Locate the function that runs
qemu-imgand translates asubprocess.TimeoutExpiredintoqemu_rc = -1/qemu_stderr = "TIMEOUT after 30s". - After capturing the timeout, instead of recording an
exit_code_divergence, increment aninconclusive_qemu_timeoutcounter and skip the divergence check for this iteration. - Surface the counter in the run summary so we don't lose visibility on how often qemu hangs.
- Adjust any tests under
scripts/tests/(if they exist — check) that depended on the old behaviour.
The harness already handles qemu-img not installed and
similar by skipping — model the timeout case the same way.
Verification¶
- Re-run each seed:
None should report
python3 scripts/differential-fuzz.py \ --instar src/target/release/instar \ --seed 59095591 --iterations 752 --fail-fast python3 scripts/differential-fuzz.py \ --instar src/target/release/instar \ --seed 804298385 --iterations 196 --fail-fast python3 scripts/differential-fuzz.py \ --instar src/target/release/instar \ --seed 829673908 --iterations 405 --fail-fastexit_code_divergenceon the iteration listed in the corresponding issue (#336, #334, #315). All three should now produce aninconclusiverecord. - Confirm the run summary mentions the
inconclusive_qemu_timeoutcounter and that it is non-zero for the runs above. - Make sure real exit-code divergences (e.g. the ones from category B1, until phase 4 lands) are still reported.
Steps¶
| Step | Effort | Model | Isolation | Brief |
|---|---|---|---|---|
| 5a | low | sonnet | none | In scripts/differential-fuzz.py, change qemu-img timeout handling: instead of recording the timeout as an exit_code_divergence, classify it as inconclusive_qemu_timeout (matching the existing skip-style classifications). Surface the counter in the per-run summary. Don't change the timeout value (30s). |
| 5b | low | sonnet | none | Run the three seeds in the Verification section and confirm no divergence is reported. |
| 5c | low | sonnet | none | Close the three issues with gh issue close <n> -c "Not a bug: qemu-img upstream hangs on this adversarial qcow2 shrink input. Harness updated in <sha> to classify qemu-img timeouts as inconclusive; see PLAN-fuzzing-bugs-phase-05-diff-fuzz-timeouts.md.". |
Commit shape¶
One commit for step 5a ("differential-fuzz: classify qemu-img timeouts as inconclusive"). 5b is verification, 5c is housekeeping.