Plan: Multi-distro install + qemu-img differential CI¶
Status: Drafted, not started¶
Drafted as a follow-up to v0.2 packaging work. Do not start until
the v0.2 release ships and the install smoke test in
.github/workflows/functional-tests.yml (job package-smoke) has
been observed to be stable for a couple of weeks.
Goal¶
Run instar's full functional test suite against the .deb / .rpm
packages on a representative matrix of Linux distributions in the
GitHub merge queue. Each matrix entry installs the package, points
the test harness at /usr/bin/instar, and runs every
tests/test_*.py against the qemu-img version that ships with
that distribution.
The point is two-fold:
- Catch packaging regressions. Asset-path errors, runtime dependency mistakes, mode-bit drift, resolver fallback bugs that don't surface in the in-tree build layout. The PR-level smoke test catches the obvious cases on debian:trixie; the merge-queue matrix catches them across distros.
- Catch qemu-img output-profile regressions. instar adapts
its qemu-img-compatible output to the locally detected qemu-img
version (see
tests/test_oslo_crossval.pyand the version detection code insrc/vmm/). Today that adaptation is only exercised against whatever qemu-img happens to be installed on the build runner. Running across distros means we exercise it against several real qemu-img versions that users actually run, and we catch the case where instar's profile selection logic has drifted away from the reality of a specific distro.
Existing infrastructure to reuse¶
The pieces of this are mostly already in place:
- Test harness already takes
INSTAR_BINARY_PATH(tests/base.py:261) andINSTAR_TESTDATA_PATH(tests/base.py:64). Pointing the existing test suite at a distro-installed binary requires no harness changes. - instar-testdata already provides the test corpus and
qemu-img baseline output. Cloning is parametrized via
GITLAB_TESTDATA_TOKENand is already done in every job infunctional-tests.yml. - Smoke script
tools/test-package-install.shaccepts both.deband.rpmand any distro image. It can be the per-matrix-entry inner test runner with minor extensions (running the full suite instead of the smoke checks). - Pattern for matrix testing inside merge_group is well
established in shakenfist sibling repos:
shakenfist/shakenfist/.github/workflows/functional-tests.ymlsplitsfunctional_matrix_pr(PR-level smoke) fromfunctional_matrix_merge(merge-queue cluster tests). Lift the same shape.shakenfist/kerbside-patches/.github/workflows/functional-tests.ymlhas a Rocky 9/10 + Fedora cross-distro matrix that's conceptually almost identical to what instar needs.
- KVM-capable self-hosted runners are already set up
(
[self-hosted, debian-12, xl]is used by every functional test job today, with/dev/kvmpassthrough into containers).
Open design decisions¶
These need to be resolved before implementation. Each is more of a policy call than a technical hurdle.
1. glibc baseline¶
The current build container is based on debian:trixie
(glibc 2.41), so the produced binaries refuse to run anywhere
older than glibc 2.39. That excludes Rocky/RHEL 9 (2.34),
Debian 12 (2.36), Ubuntu 22.04 LTS (2.35), and Fedora 38 and
older.
Three options:
- Pin the build base to
debian:bookworm(glibc 2.36). Single artifact set covers Debian 12+, Ubuntu 22.04+, Fedora 38+, Rocky 10. Still excludes Rocky 9. Lowest-friction way to widen support without per-distro builds. - Pin to
rockylinux:9(glibc 2.34). Widest coverage including Rocky 9 / RHEL 9. Risk: nightly Rust + Docker buildkit toolchain availability on Rocky 9 is less polished than Debian's; some apt-side build deps (libqcow-utils, libvhdi-utils) may not have direct Rocky equivalents. - Build per-distro artifacts. Most flexible (each artifact is glibc-perfect for its target distro), most expensive (matrix grows by N, artifact count grows by N).
Recommendation: start with debian:bookworm as a single
build base. If Rocky 9 demand materializes, add a parallel
rockylinux:9-based build. Don't go to per-distro builds
until forced by an actual incompatibility.
2. qemu-img baseline drift¶
instar-testdata captures qemu-img reference output. That output was captured against one specific qemu-img version. The matrix will run against several different qemu-img versions (qemu 7.x on Debian 12, 8.x on Ubuntu 24.04, 9.x on Fedora 40+, etc.). Some divergence between baseline and live qemu-img output is expected and not a bug.
tests/test_oslo_crossval.py already has a precedent
(assert_known_oslo_divergence) for "we know this differs and
that's fine". Extend that pattern: assertions that record
"these fields legitimately differ on qemu X.Y" rather than
asserting bit-for-bit equality.
The clean path:
- Categorize each test assertion as either portable (must match on every qemu-img we run against) or version-specific (matches the baseline only on the exact qemu-img version that produced the baseline).
- Make version-specific assertions skip or use a tolerant comparator on other versions, with a clear log line saying "skipped because qemu-img X.Y != baseline qemu-img X.Y'".
- The merge-queue matrix records per-distro qemu-img versions in the test output so the source of any divergence is obvious.
This is the largest design block; do not start the matrix implementation until this comparator policy is resolved. A mismatch storm on first run will make the whole job permanently red and people will tune it out.
3. PR vs merge-queue split¶
Mirror the shakenfist pattern:
- Pull request events: build, unit tests, integration on
the build tree (existing functional-tests.yml jobs), plus
the
package-smokejob introduced for v0.2 (debian:trixie install + minimal smoke). - Merge queue events: everything above, plus the new
package-matrixjob described below. Entries: 6+ distro packages, full test suite each.
Gate via if: github.event_name == 'merge_group' /
!= 'merge_group'.
4. Build-once vs build-per-distro¶
The simple model is one build job that produces the .deb and .rpm, then N matrix entries that consume those artifacts. That works only if a single artifact is glibc-compatible with every matrix entry, which constrains decision (1).
If we go per-distro builds, the matrix is (build, test)
pairs and the workflow is more complex. Avoidable by picking
the right glibc baseline.
Distro matrix¶
Initial proposal once the above decisions are made:
| Distro | Package | qemu-img range | Notes |
|---|---|---|---|
| Debian 12 | .deb | 7.2 | LTS, current "stable" |
| Debian 13 | .deb | 9.x | "trixie", current build base for v0.2 |
| Ubuntu 22.04 | .deb | 6.2 | LTS, oldest target |
| Ubuntu 24.04 | .deb | 8.2 | LTS |
| Fedora latest | .rpm | 9.x | Bleeding edge qemu-img |
| Rocky/RHEL 9 | .rpm | 8.2 | Wide enterprise install |
| Rocky/RHEL 10 | .rpm | 9.x | Newer enterprise |
Seven entries. Add openSUSE Leap, Arch, or Alpine only when
specific user demand surfaces. Each entry runs the full
functional suite, so total wall time is roughly
max(per-entry runtime) ≈ 30-45 min if they parallelize on
distinct runners.
Implementation phases¶
Phase 1: glibc baseline¶
- Decision: pin build base to
debian:bookworm(or whatever is decided in design block 1). - Update
src/.devcontainer/DockerfileFROMline. - Re-run
make instarand confirm artifact still works in the existing test suite and in the v0.2package-smokejob. - Re-run smoke test against the older-glibc image
(e.g.
debian:bookworm,ubuntu:22.04,rockylinux:9) to confirm the package now installs there. - Update CHANGELOG/README minimum-glibc note.
This phase is independently valuable -- it widens v0.2.x patch-release compatibility without any of the matrix work. Could be merged as soon as decided.
Phase 2: comparator policy for qemu-img drift¶
- Audit
tests/for assertions that depend on qemu-img output. - For each, decide: portable, version-specific, or baseline-only.
- Implement
assert_qemu_compatible(...)helper that takes a baseline value and a tolerance (exact, semver-compatible, field-subset, or skip). - Refactor the existing assertions to use it.
- This phase ships independently; no CI changes yet.
Phase 3: matrix runner script¶
- Generalize
tools/test-package-install.shinto something liketools/test-package-functional.shthat:- Installs the package in a distro container,
- Runs the full Python integration suite inside the container (or by ssh into a per-distro VM, depending on KVM access strategy),
- Sets
INSTAR_BINARY_PATH=/usr/bin/instarandINSTAR_TESTDATA_PATH=....
- Decide whether the test runs inside the container (faster, but the container needs Python + pytest + qemu-utils) or outside with the container providing only the installed binary (cleaner, but container shells must surface enough back to the host runner).
- Reuse the smoke script's docker run skeleton.
Phase 4: workflow integration¶
- Add
merge_group:trigger to.github/workflows/functional-tests.yml. - Add
package-matrixjob, gated to merge_group only. - Matrix over the seven distros above.
- Each entry: checkout, Docker, build .deb/.rpm, prepare testdata, call the runner script with the right (package, distro-image) pair.
- Output: per-distro test result JSON, summarized on the merge queue check.
Phase 5: enable GitHub merge queue¶
- One-time GitHub setting: Settings -> Branches -> branch
protection on
develop/main-> "Require merge queue". - Verify a real PR merge through the queue exercises the matrix and gates on it.
Dependencies and risks¶
- GITLAB_TESTDATA_TOKEN must be available to merge_group events. Self-hosted runners already get it; verify the same for this new job class.
- Self-hosted runner pool sizing -- seven concurrent matrix entries running KVM workloads, each pulling distro images and running ~30 min of tests. Probably not an issue given shakenfist/kerbside-patches handle multi-machine cluster topologies in the same merge-queue model, but worth measuring on first runs.
- GitHub merge queue has its own quirks (no rerun on
individual entries, retries reset the whole queue). If a
single distro is flaky it'll block all merges. Plan: any
matrix entry that fails twice in a row gets temporarily
marked
continue-on-error: truewhile the underlying issue is investigated; do not let one flaky distro hold the whole queue.
Out of scope¶
- Publishing the packages to apt/dnf repositories (PPA, Copr, internal mirrors). Separate project.
- Code-signing the packages. Separate project.
- macOS/Windows packaging -- instar requires
/dev/kvmand cannot run there. - ARM (aarch64) packaging -- explicitly deferred per PLAN-release.md until test hardware exists.