Skip to content

Automated SPICE test harness

Prompt

Before responding to questions or discussion points in this document, explore the kerbside codebase thoroughly. Read relevant source files, understand existing patterns (the SPICE protocol implementation in kerbside/spiceprotocol/, the proxy connection model in kerbside/proxy.py, the source driver abstraction in kerbside/sources/, the REST API in kerbside/api.py, the SQLAlchemy/alembic data model in kerbside/db.py and alembic/, Pydantic-based config in kerbside/config.py, audit logging, the .vv file generation path, and the legacy Python SPICE client at testclient/ryll/). Ground your answers in what the code actually does today. Do not speculate about the codebase when you could read it instead. Where a question touches on external concepts (SPICE protocol, QXL, vdagent, libvirt graphics, OpenStack Nova consoles, oVirt console API, Shaken Fist), research as needed to give a confident answer. Flag any uncertainty explicitly rather than guessing.

This plan also crosses repository boundaries. Phases that land in other repos must consult those repos in the same grounded way:

  • shakenfist/ryll — the Rust SPICE client. The control socket and digest decoding land here. Explore the channel handlers under shakenfist-spice-renderer/src/channels/, the headless session at shakenfist-spice-renderer/src/session.rs, and the input/output event enums in shakenfist-spice-renderer/src/channels/mod.rs.
  • shakenfist/uncalibrated-sextant — the UEFI Rust test target guest. The visual digest format spec lives at docs/visual-digest-format.md; the encoder at src/digest.rs; the serial drain at src/serial.rs. No changes are expected here for v1 of this plan, but the digest crate extraction must not change the on-wire payload.
  • A new repo for the shared shakenfist-visual-digest crate, holding the encoder, decoder, and a CLI decoder.

All planning documents for this plan — master and every phase plan — live in this repo's docs/plans/, regardless of which repo each phase's implementation lands in. Kerbside is the main driver of this work, and keeping every phase plan co-located keeps the design conversation in one searchable place. Sub-agents executing a cross-repo phase must therefore be briefed that the plan they are following lives in shakenfist/kerbside/docs/plans/ even when all of their commits will land elsewhere.

Consult ARCHITECTURE.md for the overall proxy architecture, channel model, and connection lifecycle. Consult AGENTS.md for build commands, project conventions, key file map, and code organisation. The docs/ tree contains protocol documentation derived from upstream SPICE sources.

When we get to detailed planning, I prefer a separate plan file per detailed phase. These separate files should be named for the master plan, in the same directory as the master plan, and simply have -phase-NN-descriptive appended before the .md file extension. Phase status is tracked in the table under the Execution section below.

I prefer one commit per logical change, and at minimum one commit per phase. Do not batch unrelated changes into a single commit. Each commit should be self-contained: it should build, pass tests, and have a clear commit message explaining what changed and why.

Situation

We have three components that together could form an automated SPICE test harness, but no glue between them:

  • Kerbside (this repo) — the SPICE proxy. Existing tests are unit tests in kerbside/tests/ plus a tempest plugin (tempest-plugin/kerbside_tempest_plugin/tests/api/test_spice_via_kerbside.py) that asserts the SPICE link handshake against a Nova instance but does not authenticate or drive guest I/O.
  • Ryll (shakenfist/ryll) — the Rust SPICE client. Its --headless mode (shakenfist-spice-renderer/src/session.rs:405-609) runs a tokio::select! loop that drains channel events, optionally emits a single keypress every 2 seconds via --cadence as a latency probe, and optionally pastes text via --paste-text. The input event enum already covers keyboard, mouse, and paste; the channel event enum already exposes display surfaces, cursor, latency samples, and statistics. There is no scripting layer and no control interface.
  • Uncalibrated Sextant (shakenfist/uncalibrated-sextant) — the UEFI Rust test target guest. Three scenes (Awaiting → Booting → Parked) driven by keypresses and a clipboard-paste decryption challenge. Two assertion oracles already exist: a versioned QR digest in the bottom-right of the framebuffer (spec at docs/visual-digest-format.md in that repo, encoder at src/digest.rs) that updates per painted line and on cursor blink, and a structured plain-text event log drained over serial at shutdown (src/serial.rs:78).

The current Kerbside CI deploys a full Kolla-OpenStack all-in-one before running the tempest plugin. OpenStack is not part of our long-term ecosystem; it was the hypervisor stack with the most readily available tooling. The CI workflow lives in .github/workflows/functional-tests.yml. Hypervisor backends live at kerbside/sources/{base.py,ovirt.py,shakenfist.py} — there is no static / test backend today.

A latency loadtest exists at loadtests/latency/. It runs against uefi-latency-guest (a separate image fetched from images.shakenfist.com) using the legacy Python SPICE client at testclient/ryll/. The tempest plugin does not currently reference uefi-latency-guest. The user's original framing mentioned a tempest test using it; on inspection, only the loadtest does. This may indicate planned work that never landed, or a misremembered detail — to be clarified during phase 4 planning.

The Python client at testclient/ryll/ is the predecessor to shakenfist/ryll. The Rust rewrite was started from lessons learned building the Python version, and is now a strict feature superset. The loadtest is the only remaining consumer of the Python client; removing the testclient/ryll/ tree is part of this plan (phase 4). Note the namespace collision: "Ryll" in this document means the Rust client (shakenfist/ryll) unless the prefix testclient/ or the qualifier "legacy Python" is used.

The digest format on Uncalibrated Sextant is a versioned binary protocol (10-byte header, eight per-channel chained CRC32C hashes, up to 44 bytes of raw recent events, 4-byte framebuffer integrity hash). The encoder exists in one repo; no decoder exists anywhere.

Mission and problem statement

Build an end-to-end automated SPICE test harness that exercises Kerbside as a proxy in front of a real qemu/KVM guest (Uncalibrated Sextant), driven by Ryll over a control socket, with assertions against both the live QR digest and the post-mortem serial drain.

The harness must:

  • Not require OpenStack or any other heavyweight control plane to run. Direct qemu + Kerbside + Ryll, glued by a small static hypervisor driver, is the v1 target.
  • Re-cover the latency loadtest's scope on the new substrate early, so the control socket design is validated against an existing real test before we build new ones on top.
  • Allow the tempest plugin to grow Sextant-driven scenarios (paste round-trip, scene transitions, digest assertions) without changing how it integrates with whatever cloud is underneath. The same plugin should be able to run against direct qemu, against the existing OpenStack lane, and (eventually) against a Shaken Fist deployment.
  • Leave the door open for the control socket to be reused for non-test purposes — interactive debugging, one-off bug repros, and a possible SPICE MCP server later. This shapes the verb set: coarse, intent- bearing primitives, not raw protocol pokes.

Out of scope for v1:

  • Mouse, USB redirection, clipboard via vdagent, audio, and WebDAV. These can be added once the spine is in place and Sextant grows the matching scenes.
  • Replacing the existing OpenStack CI lane. We add a parallel direct-qemu lane first; whether to demote or retire the OpenStack lane is a follow-up decision.
  • A Shaken Fist CI lane. That waits on the SF installer rewrite.

Open questions

These are deferred to the relevant phase plans, not the master plan, but listed here so they are not forgotten:

  • Shared crate naming and home. Working name is shakenfist-visual-digest. Open: own repo vs living in one of the existing repos. Lean toward own repo since neither side naturally owns it. Decide at the start of phase 1.
  • Ryll digest dependency model. Cargo optional feature (digest-decode off by default) so production builds carry zero digest code, vs harness- side decode via the crate's CLI tool with Ryll only exposing raw screenshots. Lean feature flag; decide at the start of phase 6.
  • Control socket transport and framing. Unix socket
  • line-delimited JSON is the working assumption. Open: whether to also stream DigestUpdated events asynchronously, or require polling. Recommend bidirectional: requests/responses for actions, unsolicited event stream for observations (digest, latency, frames). Decide during phase 3.
  • Phase 4 scope: does a tempest uefi-latency-guest test exist? Grep says no, only the loadtest does. Either the user has planned-but-not-landed work, or the framing was approximate. Resolve before drafting phase 4.
  • Sextant image distribution. Working assumption is commit the qcow2 to this repo. Open: size budget, LFS vs plain git, whether to also expose it as a CI artifact for cross-repo consumers.
  • Direct-qemu CI runner shape. GitHub Actions runners need /dev/kvm. We have self-hosted runners with vm, static, debian-12 labels (per .github/actionlint.yaml). Confirm which label set gives us nested KVM during phase 5.
  • OpenStack CI lane fate. Once direct-qemu CI is proven, do we retire it, demote to nightly, or keep on PRs? Defer until after phase 7.

Execution

Phases land across three repos. The "Repo" column tells you where the code goes. Every phase plan lives in this directory regardless of repo, so the plan stays searchable from one place.

Phase Repo Plan Status
1. Shared visual-digest crate new (shakenfist-visual-digest) PLAN-test-harness-phase-01-digest-crate.md Implementation complete; Sextant PR pending operator
2. Static source driver kerbside PLAN-test-harness-phase-02-static-hypervisor.md Implementation complete
3. Control socket on Ryll ryll PLAN-test-harness-phase-03-control-socket.md Implementation complete; PR pending operator
4. Port latency loadtest to control socket and remove legacy testclient/ryll/ kerbside PLAN-test-harness-phase-04-port-latency.md Not started
5. Direct-qemu CI workflow kerbside PLAN-test-harness-phase-05-direct-qemu-ci.md Not started
6. Digest decoding in Ryll ryll PLAN-test-harness-phase-06-digest-decoding.md Not started
7. First Sextant scenario tempest test kerbside PLAN-test-harness-phase-07-scenario-test.md Not started
8. OpenStack CI lane disposition kerbside PLAN-test-harness-phase-08-openstack-disposition.md Not started

Indicative effort and model recommendations (firmed up when each phase plan is written):

Phase Effort Model Notes
1 medium sonnet Self-contained crate extraction; spec already exists. New repo bootstrap is the main novelty.
2 medium sonnet Follows the existing BaseSource interface in kerbside/sources/base.py. Bounded scope, clear pattern.
3 high opus New protocol design that must serve both tests and future MCP. Touches Ryll's tokio event loop. Getting the verb set wrong is expensive to undo.
4 medium sonnet Rewrite the loadtest to drive the control socket, then delete the legacy testclient/ryll/ tree (the loadtest is its only consumer). Also touches loadtests/latency/Dockerfile and any README / AGENTS.md references. Validates phase 3's API; small but high-signal.
5 high opus New CI workflow integrating multiple binaries, debugging KVM/runner-environment unknowns, many edge cases.
6 high opus SurfaceMirror integration, QR detection on draw-event change, subtle correctness around when to re-decode.
7 medium sonnet Composes phase 6 primitives into a scenario. Once the spine is in place this is mostly glue.
8 low sonnet A CI workflow tweak plus a documentation update.

Sequencing notes

  • Phases 1 and 2 are independent and can run in either order (or in parallel).
  • Phase 3 is the long-lead item; ideally we start it as soon as phase 1 has a working decoder we can prototype against in Ryll. Note that phase 3 itself does not depend on phase 1 — the control socket lands without any digest knowledge.
  • Phase 4 validates phase 3 in isolation; it does not need phase 6.
  • Phase 5 needs phases 2 (static hypervisor) and 3 (control socket) but not phases 1 / 6 / 7.
  • Phase 6 needs phase 1 (the decoder crate).
  • Phase 7 needs phases 1, 2, 3, 5, and 6 — the first phase that exercises the full stack.
  • Phase 8 is a follow-up to phase 7.

So the critical path is roughly: 2 → 3 → 5, with 1 → 6 in parallel, joining at 7. Phase 4 fits opportunistically after 3.

Agent guidance

This plan follows the conventions in PLAN-TEMPLATE.md at the repo root. The execution model, effort levels, model-choice guidance, brief-writing standards, and management-session review checklist all apply unchanged and are not duplicated here. Phase plans should fill in the per-step tables described there.

Specific notes for this plan:

  • Cross-repo work needs cross-repo back-briefing. Before starting a phase whose code lives in another repo, the management session should confirm the operator has the matching repo checked out and that the sub-agent understands the phase plan lives in shakenfist/kerbside/docs/plans/ even though the commits will land elsewhere.
  • Each phase that crosses repos should produce its own commit(s) in the right repo. Do not bundle ryll and kerbside changes into the same git operation. Commit messages in the implementing repo should reference the master plan path (e.g. shakenfist/kerbside/docs/plans/PLAN-test-harness.md) so the trail back is obvious.
  • The control socket protocol is the highest-impact design decision in this plan. Phase 3's plan should be reviewed end-to-end before any implementation starts, and the phase 4 port should be drafted concurrently to catch verb-set gaps early.

Back brief

Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.

Administration and logistics

Success criteria

We will know when this plan has been successfully implemented because the following statements will be true:

  • pre-commit run --all-files passes on every commit on this branch.
  • A new tempest test in tempest-plugin/kerbside_tempest_plugin/tests/api/ boots an Uncalibrated Sextant qcow2 under direct qemu/KVM, fronts it with Kerbside via the static hypervisor driver, drives the Awaiting → Booting → paste → Parked sequence via Ryll's control socket, asserts the QR digest reflects the inputs sent, and asserts the serial drain at shutdown matches the expected event sequence. All of this runs without OpenStack.
  • The legacy latency loadtest in loadtests/latency/ drives Ryll's control socket. Its latency numbers are unchanged within noise vs the pre-port baseline.
  • The legacy Python SPICE client at testclient/ryll/ has been deleted, along with any remaining references to it in the Dockerfile, README, AGENTS.md, and ARCHITECTURE.md.
  • Ryll's production binary does not carry the visual-digest crate unless built with the digest-decode feature.
  • The shared shakenfist-visual-digest crate is the sole home of the digest format. Uncalibrated Sextant consumes it for encoding; Ryll consumes it for decoding; the format spec lives there.
  • README.md, ARCHITECTURE.md, AGENTS.md, and the relevant pages under docs/ are updated to describe the static hypervisor driver, the direct-qemu test workflow, and the way Sextant + Ryll wire into the tempest plugin.

Future work

Items deliberately deferred:

  • Mouse / USB redirect / clipboard via vdagent / audio / WebDAV scenario tests. Need matching scenes or hooks in Sextant first; Sextant's pointer collector is explicitly marked deferred in its ARCHITECTURE.md.
  • Shaken Fist CI lane. Blocked on the SF installer rewrite (per session conversation 2026-06-02).
  • SPICE MCP server. The control socket is shaped so that an MCP server fronting it is a follow-up exercise, not a redesign. Out of scope here.
  • Type checking via mypy. Kerbside has no tox -emypy env; setting one up is a separate cleanup.
  • pre-commit install per clone. The hook config is in the repo but each developer/runner must run pre-commit install to wire it as a git hook. Worth a contributors' note when we touch CONTRIBUTING / the readme.

Bugs fixed during this work

This section should list any bugs we encounter during development that we fixed.

(None yet.)

Documentation index maintenance

docs/plans/index.md should gain a Master-plans entry for this plan when the first phase plan is written, and the row should track overall status. Phase plan files themselves should not be added to index.md.

📝 Report an issue with this page