Automated SPICE test harness¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the kerbside codebase thoroughly. Read
relevant source files, understand existing patterns (the
SPICE protocol implementation in kerbside/spiceprotocol/,
the proxy connection model in kerbside/proxy.py, the
source driver abstraction in kerbside/sources/, the REST
API in kerbside/api.py, the SQLAlchemy/alembic data model
in kerbside/db.py and alembic/, Pydantic-based config in
kerbside/config.py, audit logging, the .vv file generation
path, and the legacy Python SPICE client at
testclient/ryll/). Ground your answers in what the code
actually does today. Do not speculate about the codebase
when you could read it instead. Where a question touches on
external concepts (SPICE protocol, QXL, vdagent, libvirt
graphics, OpenStack Nova consoles, oVirt console API,
Shaken Fist), research as needed to give a confident
answer. Flag any uncertainty explicitly rather than
guessing.
This plan also crosses repository boundaries. Phases that land in other repos must consult those repos in the same grounded way:
shakenfist/ryll— the Rust SPICE client. The control socket and digest decoding land here. Explore the channel handlers undershakenfist-spice-renderer/src/channels/, the headless session atshakenfist-spice-renderer/src/session.rs, and the input/output event enums inshakenfist-spice-renderer/src/channels/mod.rs.shakenfist/uncalibrated-sextant— the UEFI Rust test target guest. The visual digest format spec lives atdocs/visual-digest-format.md; the encoder atsrc/digest.rs; the serial drain atsrc/serial.rs. No changes are expected here for v1 of this plan, but the digest crate extraction must not change the on-wire payload.- A new repo for the shared
shakenfist-visual-digestcrate, holding the encoder, decoder, and a CLI decoder.
All planning documents for this plan — master and every
phase plan — live in this repo's docs/plans/,
regardless of which repo each phase's implementation
lands in. Kerbside is the main driver of this work, and
keeping every phase plan co-located keeps the design
conversation in one searchable place. Sub-agents
executing a cross-repo phase must therefore be briefed
that the plan they are following lives in
shakenfist/kerbside/docs/plans/ even when all of
their commits will land elsewhere.
Consult ARCHITECTURE.md for the overall proxy
architecture, channel model, and connection lifecycle.
Consult AGENTS.md for build commands, project
conventions, key file map, and code organisation. The
docs/ tree contains protocol documentation derived from
upstream SPICE sources.
When we get to detailed planning, I prefer a separate plan
file per detailed phase. These separate files should be
named for the master plan, in the same directory as the
master plan, and simply have -phase-NN-descriptive
appended before the .md file extension. Phase status is
tracked in the table under the Execution section below.
I prefer one commit per logical change, and at minimum one commit per phase. Do not batch unrelated changes into a single commit. Each commit should be self-contained: it should build, pass tests, and have a clear commit message explaining what changed and why.
Situation¶
We have three components that together could form an automated SPICE test harness, but no glue between them:
- Kerbside (this repo) — the SPICE proxy. Existing
tests are unit tests in
kerbside/tests/plus a tempest plugin (tempest-plugin/kerbside_tempest_plugin/tests/api/test_spice_via_kerbside.py) that asserts the SPICE link handshake against a Nova instance but does not authenticate or drive guest I/O. - Ryll (
shakenfist/ryll) — the Rust SPICE client. Its--headlessmode (shakenfist-spice-renderer/src/session.rs:405-609) runs atokio::select!loop that drains channel events, optionally emits a single keypress every 2 seconds via--cadenceas a latency probe, and optionally pastes text via--paste-text. The input event enum already covers keyboard, mouse, and paste; the channel event enum already exposes display surfaces, cursor, latency samples, and statistics. There is no scripting layer and no control interface. - Uncalibrated Sextant
(
shakenfist/uncalibrated-sextant) — the UEFI Rust test target guest. Three scenes (Awaiting → Booting → Parked) driven by keypresses and a clipboard-paste decryption challenge. Two assertion oracles already exist: a versioned QR digest in the bottom-right of the framebuffer (spec atdocs/visual-digest-format.mdin that repo, encoder atsrc/digest.rs) that updates per painted line and on cursor blink, and a structured plain-text event log drained over serial at shutdown (src/serial.rs:78).
The current Kerbside CI deploys a full Kolla-OpenStack
all-in-one before running the tempest plugin. OpenStack
is not part of our long-term ecosystem; it was the
hypervisor stack with the most readily available
tooling. The CI workflow lives in
.github/workflows/functional-tests.yml. Hypervisor
backends live at kerbside/sources/{base.py,ovirt.py,shakenfist.py}
— there is no static / test backend today.
A latency loadtest exists at loadtests/latency/. It
runs against uefi-latency-guest (a separate image
fetched from images.shakenfist.com) using the legacy
Python SPICE client at testclient/ryll/. The tempest
plugin does not currently reference
uefi-latency-guest. The user's original framing
mentioned a tempest test using it; on inspection, only
the loadtest does. This may indicate planned work that
never landed, or a misremembered detail — to be
clarified during phase 4 planning.
The Python client at testclient/ryll/ is the
predecessor to shakenfist/ryll. The Rust rewrite was
started from lessons learned building the Python
version, and is now a strict feature superset. The
loadtest is the only remaining consumer of the Python
client; removing the testclient/ryll/ tree is part
of this plan (phase 4). Note the namespace collision:
"Ryll" in this document means the Rust client
(shakenfist/ryll) unless the prefix testclient/ or
the qualifier "legacy Python" is used.
The digest format on Uncalibrated Sextant is a versioned binary protocol (10-byte header, eight per-channel chained CRC32C hashes, up to 44 bytes of raw recent events, 4-byte framebuffer integrity hash). The encoder exists in one repo; no decoder exists anywhere.
Mission and problem statement¶
Build an end-to-end automated SPICE test harness that exercises Kerbside as a proxy in front of a real qemu/KVM guest (Uncalibrated Sextant), driven by Ryll over a control socket, with assertions against both the live QR digest and the post-mortem serial drain.
The harness must:
- Not require OpenStack or any other heavyweight control plane to run. Direct qemu + Kerbside + Ryll, glued by a small static hypervisor driver, is the v1 target.
- Re-cover the latency loadtest's scope on the new substrate early, so the control socket design is validated against an existing real test before we build new ones on top.
- Allow the tempest plugin to grow Sextant-driven scenarios (paste round-trip, scene transitions, digest assertions) without changing how it integrates with whatever cloud is underneath. The same plugin should be able to run against direct qemu, against the existing OpenStack lane, and (eventually) against a Shaken Fist deployment.
- Leave the door open for the control socket to be reused for non-test purposes — interactive debugging, one-off bug repros, and a possible SPICE MCP server later. This shapes the verb set: coarse, intent- bearing primitives, not raw protocol pokes.
Out of scope for v1:
- Mouse, USB redirection, clipboard via vdagent, audio, and WebDAV. These can be added once the spine is in place and Sextant grows the matching scenes.
- Replacing the existing OpenStack CI lane. We add a parallel direct-qemu lane first; whether to demote or retire the OpenStack lane is a follow-up decision.
- A Shaken Fist CI lane. That waits on the SF installer rewrite.
Open questions¶
These are deferred to the relevant phase plans, not the master plan, but listed here so they are not forgotten:
- Shared crate naming and home. Working name is
shakenfist-visual-digest. Open: own repo vs living in one of the existing repos. Lean toward own repo since neither side naturally owns it. Decide at the start of phase 1. - Ryll digest dependency model. Cargo optional
feature (
digest-decodeoff by default) so production builds carry zero digest code, vs harness- side decode via the crate's CLI tool with Ryll only exposing raw screenshots. Lean feature flag; decide at the start of phase 6. - Control socket transport and framing. Unix socket
- line-delimited JSON is the working assumption. Open:
whether to also stream
DigestUpdatedevents asynchronously, or require polling. Recommend bidirectional: requests/responses for actions, unsolicited event stream for observations (digest, latency, frames). Decide during phase 3. - Phase 4 scope: does a tempest uefi-latency-guest test exist? Grep says no, only the loadtest does. Either the user has planned-but-not-landed work, or the framing was approximate. Resolve before drafting phase 4.
- Sextant image distribution. Working assumption is commit the qcow2 to this repo. Open: size budget, LFS vs plain git, whether to also expose it as a CI artifact for cross-repo consumers.
- Direct-qemu CI runner shape. GitHub Actions
runners need
/dev/kvm. We have self-hosted runners withvm,static,debian-12labels (per.github/actionlint.yaml). Confirm which label set gives us nested KVM during phase 5. - OpenStack CI lane fate. Once direct-qemu CI is proven, do we retire it, demote to nightly, or keep on PRs? Defer until after phase 7.
Execution¶
Phases land across three repos. The "Repo" column tells you where the code goes. Every phase plan lives in this directory regardless of repo, so the plan stays searchable from one place.
| Phase | Repo | Plan | Status |
|---|---|---|---|
| 1. Shared visual-digest crate | new (shakenfist-visual-digest) |
PLAN-test-harness-phase-01-digest-crate.md | Implementation complete; Sextant PR pending operator |
| 2. Static source driver | kerbside | PLAN-test-harness-phase-02-static-hypervisor.md | Implementation complete |
| 3. Control socket on Ryll | ryll | PLAN-test-harness-phase-03-control-socket.md | Implementation complete; PR pending operator |
4. Port latency loadtest to control socket and remove legacy testclient/ryll/ |
kerbside | PLAN-test-harness-phase-04-port-latency.md | Not started |
| 5. Direct-qemu CI workflow | kerbside | PLAN-test-harness-phase-05-direct-qemu-ci.md | Not started |
| 6. Digest decoding in Ryll | ryll | PLAN-test-harness-phase-06-digest-decoding.md | Not started |
| 7. First Sextant scenario tempest test | kerbside | PLAN-test-harness-phase-07-scenario-test.md | Not started |
| 8. OpenStack CI lane disposition | kerbside | PLAN-test-harness-phase-08-openstack-disposition.md | Not started |
Indicative effort and model recommendations (firmed up when each phase plan is written):
| Phase | Effort | Model | Notes |
|---|---|---|---|
| 1 | medium | sonnet | Self-contained crate extraction; spec already exists. New repo bootstrap is the main novelty. |
| 2 | medium | sonnet | Follows the existing BaseSource interface in kerbside/sources/base.py. Bounded scope, clear pattern. |
| 3 | high | opus | New protocol design that must serve both tests and future MCP. Touches Ryll's tokio event loop. Getting the verb set wrong is expensive to undo. |
| 4 | medium | sonnet | Rewrite the loadtest to drive the control socket, then delete the legacy testclient/ryll/ tree (the loadtest is its only consumer). Also touches loadtests/latency/Dockerfile and any README / AGENTS.md references. Validates phase 3's API; small but high-signal. |
| 5 | high | opus | New CI workflow integrating multiple binaries, debugging KVM/runner-environment unknowns, many edge cases. |
| 6 | high | opus | SurfaceMirror integration, QR detection on draw-event change, subtle correctness around when to re-decode. |
| 7 | medium | sonnet | Composes phase 6 primitives into a scenario. Once the spine is in place this is mostly glue. |
| 8 | low | sonnet | A CI workflow tweak plus a documentation update. |
Sequencing notes¶
- Phases 1 and 2 are independent and can run in either order (or in parallel).
- Phase 3 is the long-lead item; ideally we start it as soon as phase 1 has a working decoder we can prototype against in Ryll. Note that phase 3 itself does not depend on phase 1 — the control socket lands without any digest knowledge.
- Phase 4 validates phase 3 in isolation; it does not need phase 6.
- Phase 5 needs phases 2 (static hypervisor) and 3 (control socket) but not phases 1 / 6 / 7.
- Phase 6 needs phase 1 (the decoder crate).
- Phase 7 needs phases 1, 2, 3, 5, and 6 — the first phase that exercises the full stack.
- Phase 8 is a follow-up to phase 7.
So the critical path is roughly: 2 → 3 → 5, with 1 → 6 in parallel, joining at 7. Phase 4 fits opportunistically after 3.
Agent guidance¶
This plan follows the conventions in PLAN-TEMPLATE.md
at the repo root. The execution model, effort levels,
model-choice guidance, brief-writing standards, and
management-session review checklist all apply unchanged
and are not duplicated here. Phase plans should fill in
the per-step tables described there.
Specific notes for this plan:
- Cross-repo work needs cross-repo back-briefing. Before
starting a phase whose code lives in another repo, the
management session should confirm the operator has the
matching repo checked out and that the sub-agent
understands the phase plan lives in
shakenfist/kerbside/docs/plans/even though the commits will land elsewhere. - Each phase that crosses repos should produce its own
commit(s) in the right repo. Do not bundle ryll and
kerbside changes into the same git operation. Commit
messages in the implementing repo should reference the
master plan path (e.g.
shakenfist/kerbside/docs/plans/PLAN-test-harness.md) so the trail back is obvious. - The control socket protocol is the highest-impact design decision in this plan. Phase 3's plan should be reviewed end-to-end before any implementation starts, and the phase 4 port should be drafted concurrently to catch verb-set gaps early.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.
Administration and logistics¶
Success criteria¶
We will know when this plan has been successfully implemented because the following statements will be true:
pre-commit run --all-filespasses on every commit on this branch.- A new tempest test in
tempest-plugin/kerbside_tempest_plugin/tests/api/boots an Uncalibrated Sextant qcow2 under direct qemu/KVM, fronts it with Kerbside via the static hypervisor driver, drives the Awaiting → Booting → paste → Parked sequence via Ryll's control socket, asserts the QR digest reflects the inputs sent, and asserts the serial drain at shutdown matches the expected event sequence. All of this runs without OpenStack. - The legacy latency loadtest in
loadtests/latency/drives Ryll's control socket. Its latency numbers are unchanged within noise vs the pre-port baseline. - The legacy Python SPICE client at
testclient/ryll/has been deleted, along with any remaining references to it in the Dockerfile, README, AGENTS.md, and ARCHITECTURE.md. - Ryll's production binary does not carry the
visual-digest crate unless built with the
digest-decodefeature. - The shared
shakenfist-visual-digestcrate is the sole home of the digest format. Uncalibrated Sextant consumes it for encoding; Ryll consumes it for decoding; the format spec lives there. README.md,ARCHITECTURE.md,AGENTS.md, and the relevant pages underdocs/are updated to describe the static hypervisor driver, the direct-qemu test workflow, and the way Sextant + Ryll wire into the tempest plugin.
Future work¶
Items deliberately deferred:
- Mouse / USB redirect / clipboard via vdagent /
audio / WebDAV scenario tests. Need matching scenes
or hooks in Sextant first; Sextant's pointer
collector is explicitly marked deferred in its
ARCHITECTURE.md. - Shaken Fist CI lane. Blocked on the SF installer rewrite (per session conversation 2026-06-02).
- SPICE MCP server. The control socket is shaped so that an MCP server fronting it is a follow-up exercise, not a redesign. Out of scope here.
- Type checking via mypy. Kerbside has no
tox -emypyenv; setting one up is a separate cleanup. pre-commit installper clone. The hook config is in the repo but each developer/runner must runpre-commit installto wire it as a git hook. Worth a contributors' note when we touch CONTRIBUTING / the readme.
Bugs fixed during this work¶
This section should list any bugs we encounter during development that we fixed.
(None yet.)
Documentation index maintenance¶
docs/plans/index.md should gain a Master-plans entry
for this plan when the first phase plan is written, and
the row should track overall status. Phase plan files
themselves should not be added to index.md.