Phase 5: Direct-qemu CI workflow¶
Part of PLAN-test-harness.md. This phase
lives entirely in kerbside. The plan file lives here in
docs/plans/ per the master plan's single-home rule.
Goal¶
Stand up a new GitHub Actions lane that boots an Uncalibrated Sextant qcow2 under direct qemu/KVM on a self-hosted runner, fronts it with kerbside (configured against phase 2's static source driver), and connects ryll (phase 3's headless control socket) through kerbside as a smoke check that the spine works end-to-end. No OpenStack, no oVirt, no Shaken Fist.
Phase 5 builds the lane. Phase 7 builds the rich Sextant scenario tempest test that runs in the lane. Phase 5's smoke check is deliberately minimal — its job is to prove kerbside + ryll + qemu + sextant wire up correctly, not to assert anything about Sextant's UI behaviour.
This phase is scope-bounded to:
- A new workflow file (working name
.github/workflows/direct-qemu-functional.yml). - A small set of helper scripts under
tools/direct-qemu/that the workflow calls (no large inline shell in YAML, per the global CLAUDE.md convention). - A committed Sextant qcow2 image under
tests/fixtures/so the workflow does not have to build Sextant. - A documented procedure for rebuilding that qcow2 when Sextant changes (a short shell wrapper, not a new CI job).
- A single smoke-test Python script that drives ryll's control socket through kerbside and asserts the lane works.
Out of scope for phase 5:
- A scenario tempest test (paste round-trip, scene transitions, digest assertions). That's phase 7.
- QR digest decoding in ryll. That's phase 6.
- Restoring keypress-to-screen latency semantics. Phase 6 too.
- A
headlessCargo feature for ryll. Phase 6 (already tracked in the master plan's phase 6 row). Phase 5's container image carries the same heavy GUI/audio runtime stack that phase 4's loadtest image does — by intent, not oversight. - A second runner shape: phase 5 picks one runner label set
that has
/dev/kvmand ships against it. If a future phase wants the lane to run on additional shapes, that's separate work. - Replacing or demoting the existing OpenStack lane. That's phase 8.
- Wiring the latency loadtest into GitHub Actions. Tracked in phase 4's Future work; not bundled here.
- Multi-arch (aarch64 / ppc64le) Sextant builds.
Decisions baked into this plan¶
These are judgment calls made while drafting, surfaced explicitly so they can be challenged before code lands.
- Runner label set: try
[self-hosted, vm]first. The existing functional-tests workflow runs main matrix jobs on[self-hosted, vm]and a sanity job on[self-hosted, vm, debian-12];.github/actionlint.yamlregistersvm,static, anddebian-12. Phase 5's first workflow step probes/dev/kvmand dumpslsmod | grep kvmso an absent KVM device fails loudly and obviously, not silently via a qemu hang. If[self-hosted, vm]does not expose/dev/kvmto the workflow, the implementing agent for step 5c iterates by pushing label-set variants and watching CI; this iteration is expected and is the reason phase 5 is rated high effort. - Run on the runner host, not in a container. Self-hosted
runners with the
vmlabel are themselves VMs and provide the isolation boundary; nesting docker for KVM passthrough adds complexity (privileged containers,--device /dev/kvm, cgroup juggling) without buying isolation we don't already have. The workflowapt-get installs qemu, ovmf, openssl, and the runtime libs ryll needs, builds ryll fresh frommainHEAD (mirroring phase 4's loadtest Dockerfile pattern but without the docker wrapper), and runs everything as processes on the runner. Each run starts clean because the runner is ephemeral. - Sextant qcow2 ships in this repo at
tests/fixtures/uncalibrated-sextant.qcow2. Current build is 576 KB (per the Sextant Makefile'sqemu-img convertstep) — small enough that plain git is fine and Git LFS is unnecessary. Storing it here makes the workflow deterministic, removes a cross-repo fetch step, and keeps the failure modes of "Sextant changed underneath us" visible in PR diffs. Atools/direct-qemu/rebuild-sextant-qcow2.shwrapper documents how to refresh it (clones Sextant, runsmake release, copies the output). The wrapper is not invoked by CI; it's a developer convenience. - TLS material is generated fresh inside each workflow run.
openssl one-liners in
tools/direct-qemu/generate-tls.shproduce a self-signed CA and a kerbside proxy cert+key into/tmp/kerbside-tls/. No certs in the repo. Reasons: (a) avoids committing key material, (b) avoids cert expiry surprises, (c) the smoke client trusts the generated CA via env var. - Kerbside runs as a background process under the workflow,
not as a systemd unit. Direct binary invocation
(
kerbside &with PID capture and akillin cleanup) is the simplest viable shape for an ephemeral CI run. systemd unit work belongs to a future "package kerbside for hosts" phase, not here. - The smoke check is a stand-alone Python script, not a
tempest test. Phase 7 owns the tempest scenario test;
phase 5 deliberately does not touch
tempest-plugin/kerbside_tempest_plugin/. The smoke script lives attools/direct-qemu/smoke-client.py, opens the ryll control socket, runshello,status,screenshot, assertsagent_connected == trueand that at least one surface is populated and the screenshot is a non-empty PNG. Exits non-zero on any RPC error or missing surface. The script is also used as a local debugging tool by developers, not just CI. - One sources.yaml shape, generated inline by
lane-up.sh. No template file underetc/. The workflow knows the UUID, hypervisor IP, port, and ticket because it generates them. Inline heredoc keeps the wiring obvious and removes a layer of indirection. - Smoke check covers control-socket through kerbside, not
control-socket direct to ryll. The whole point of the
phase is to prove kerbside's role in the connection path.
Ryll's
--urlflag points at the kerbside .vv endpoint; ryll's TCP SPICE connection therefore goes through kerbside, and the control socket exposes whether that connection succeeded (agent_connected, surfaces populated). If we pointed ryll at qemu's SPICE port directly we'd bypass the thing being tested. .vvretrieval uses kerbside's REST API directly. Avoids pulling in keystoneauth1 / openstacksdk just to ask kerbside for a console URL. The smoke script issues a plainurllib.requestGET against kerbside's API for the static console's.vvfile, writes it to disk, and passes the file to ryll via--file. (--urlwould also work if kerbside's API is publicly addressable; the file path is simpler.)- Artifact retention: 90 days, matching functional-tests. No reason to diverge.
- No matrix. A single job, single runner, single sextant qcow2. Matrix variants (multiple qemu machine types, multiple SPICE configurations, TLS vs plaintext) are deferred; we pick one combination that we know works.
Situation¶
Existing CI¶
.github/workflows/functional-tests.ymlis the OpenStack lane. Runner labels[self-hosted, vm](main matrix) and[self-hosted, vm, debian-12](sanity). Wallclock ~90–100 minutes; depends on a kolla-ansible all-in-one deploy and invokes./tools/run-tempest-tests. The tempest plugin's one test (tempest-plugin/kerbside_tempest_plugin/tests/api/test_spice_via_kerbside.py) asserts the SPICE link handshake against a Nova-provisioned instance..github/actionlint.yamlregisters exactly three self-hosted labels:vm,static,debian-12. Nokvmlabel exists; the master plan's open question about which set gives nested KVM is resolved by phase 5 empirically.tox.inihaspy3,flake8,cover,bindep. No tempest env. Tempest is invoked bytools/run-tempest-tests, not via tox.
Phase 2 output (already on this branch)¶
kerbside/sources/static.pyreads a YAML file atSOURCES_PATH(default./sources.yaml). Each console entry needsuuid,name,hypervisor,hypervisor_ip,insecure_port,ticket;secure_portandhost_subjectare optional.etc/example-static-sources.yamlis the canonical example and includes the QEMU command-line equivalent we'll run in CI:- Kerbside's CLI entry point is
kerbside(pyproject.tomllines 114–116). Config comes from/etc/kerbside/kerbside.iniplusKERBSIDE_*env vars (overriding the INI). Phase 5 sets env vars rather than writing an INI file.
Phase 3 output (on the ryll test-harness-control-socket branch)¶
--headless,--url <URL>,--file <PATH>,--control-socket <PATH>are the relevant flags (ryll/src/config.rs:23..45).- Protocol at
ryll/docs/control-socket-protocol.md: NDJSON framed over a Unix-domain socket, mode 0600. Verbs:hello,status,screenshot,subscribe,send_key,paste,unsubscribe. Events:latency,agent_connected,paste_completed,paste_failed,dropped. - Single-client; second connection is rejected with
busy.
Phase 4 output (already on this branch)¶
- Multi-stage Dockerfile at
loadtests/latency/Dockerfilecarries the cargo-build pattern that phase 5 reuses (without the docker wrapper). loadtests/latency/orchestrator.pyis a 295-line control-socket client. Phase 5's smoke client is a smaller, different-shape script (no cadence, no CSV — just hello + status + screenshot + assert), so reuse is partial at best. Concretely: the framing helpers (_send,_recv_line) and the hello handshake are worth lifting; the cadence thread and CSV writer are not. Phase 5 writes its own.
Uncalibrated Sextant¶
- Build:
scripts/build.shrunscargo build --releasein a docker container;scripts/mkesp.shpackages the BOOTX64.EFI into a 33 MB raw image; the Makefile converts to qcow2. - Current qcow2 size: 576 KB. UEFI firmware (OVMF) is required
by the guest; the runner must
apt-get install ovmfand qemu must be pointed at the OVMF code+vars files. - Serial:
src/serial.rswrites via UEFI Serial Protocol. The qemu launcher must add-serial file:<path>to capture it. - The repo's
make release-verifyboots both the raw image and the qcow2 headless, pollsdist/serial.logfor the banner"Hello from Uncalibrated Sextant\r"within 30s. Phase 5 reuses that banner string as a "guest is alive" oracle.
Mission and problem statement¶
After phase 5:
.github/workflows/direct-qemu-functional.ymlruns on PR and on push to the test-harness branch. The workflow:- Probes the runner for
/dev/kvm, qemu, ovmf; fails fast with a readable diagnostic if any are missing. - Installs runtime deps for ryll (libgl1, libx11-6, libxcb1, libxkbcommon0, libwayland-client0, libasound2, libssl3) and build deps for the cargo step.
- Clones ryll from
mainHEAD and builds the release binary (cargo build --release --no-default-features -p ryll— same as phase 4). - Installs kerbside in a venv on the runner.
- Generates self-signed TLS material via
tools/direct-qemu/generate-tls.sh. - Boots the sextant qcow2 from
tests/fixtures/in qemu with KVM acceleration, OVMF firmware, SPICE on a known port with a known ticket, and serial captured to a file. - Writes a sources.yaml inline and starts kerbside with
KERBSIDE_SOURCES_PATHand TLS env vars set. - Polls kerbside's healthcheck endpoint until ready,
fetches the static console's
.vvfile via the kerbside API, and runstools/direct-qemu/smoke-client.pyagainst ryll launched headless against that.vvfile. - Asserts smoke-client exits 0 and the qemu serial log contains the Sextant boot banner.
- On any failure or success, uploads the kerbside log, qemu serial log, ryll stderr, and the smoke-client output as a workflow artifact (90-day retention).
- Tears down qemu and kerbside processes.
tests/fixtures/uncalibrated-sextant.qcow2is committed (~576 KB, plain git).tools/direct-qemu/rebuild-sextant-qcow2.shdocuments the refresh procedure; it is not called by CI.tools/direct-qemu/{generate-tls.sh, start-qemu.sh, start-kerbside.sh, lane-up.sh, lane-down.sh, smoke-client.py}are the workflow's building blocks.lane-up.shandlane-down.share the two scripts the workflow invokes directly; the others are called by them.pre-commit run --all-filesis clean on every commit.
Open questions¶
These do not block writing this plan but must be resolved before or during the implementation:
- Which runner label set has
/dev/kvm? The plan defaults to[self-hosted, vm]; step 5c iterates by pushing variants if the first attempt fails. Resolution: empirical, via CI. No code change required if[self-hosted, vm]works. - Is OVMF (
ovmfpackage on Debian) available on the runners, or does it need a custom build? The Debian 12 package is current enough for Sextant in the local build process; assume it works in CI. Verify in step 5c's first push. - Does the runner's qemu support
q35,accel=kvmwithout extra flags?qemu-system-x86_64 -machine q35,accel=kvmis the same invocation Sextant's local build uses; should work on a runner with KVM enabled. If not,accel=tcgfallback is acceptable for the smoke check (slow but functional). - Does kerbside have a healthcheck endpoint suitable for startup polling? If yes, use it. If no, fall back to "wait for the configured port to accept connections". This is a small read-the-code task for the step 5b sub-agent, not a blocker for the plan.
- Whether to add a
tox -edirect-qemuenv that runs the smoke client locally. Not in scope for this phase, but worth considering as a follow-up convenience. Noted in Future work.
Execution¶
Each step is one logical change → one commit on the
test-harness branch (the existing kerbside phase branch).
Per the master plan's single-branch discipline, do not open
new branches.
| Step | Repo | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|---|
| 5a. Commit Sextant qcow2 + rebuild script | kerbside | low | sonnet | none | Build (or fetch) a fresh release qcow2 from /srv/kasm_profiles/mikal/vscode/src/shakenfist/uncalibrated-sextant. Run make release in that repo (it uses docker; the runner has docker). Copy the resulting dist/uncalibrated-sextant.qcow2 to tests/fixtures/uncalibrated-sextant.qcow2 in kerbside and git add it (plain git, no LFS). Add tools/direct-qemu/rebuild-sextant-qcow2.sh: a short bash script that takes an optional --sextant-repo PATH arg (default /srv/kasm_profiles/mikal/vscode/src/shakenfist/uncalibrated-sextant), cds in, runs make release, copies the output to the kerbside fixtures path, prints the SHA256 of the new image, and reminds the operator to commit the result. Use set -euo pipefail. Update tests/fixtures/README.md (create if missing) with a one-paragraph "Sextant qcow2: how it was built, when it was last refreshed, how to refresh" note. Verify the image with qemu-img info and include the size + format + virtual size in the commit body. One commit. |
| 5b. Add lane glue scripts | kerbside | medium | sonnet | none | Create tools/direct-qemu/ and add: (1) generate-tls.sh — uses openssl to mint a self-signed CA + a kerbside proxy cert/key with CN matching kerbside-ci, writes everything to a tempdir passed in argv. Idempotent. (2) start-qemu.sh — takes qcow2 path, OVMF code+vars paths, SPICE port, ticket, serial log path; backgrounds qemu with the canonical Sextant launch args from phase 5 plan situation section; writes PID to a file. (3) start-kerbside.sh — takes sources.yaml path, TLS dir, log path; writes sources.yaml inline (heredoc with UUID/port/ticket arguments substituted in), exports KERBSIDE_* env vars, backgrounds kerbside, writes PID. Polls the configured port (curl or nc) until kerbside accepts connections; gives up after 30s. (4) smoke-client.py — stdlib-only Python 3.10+, connects to ryll's control socket, performs hello with protocol_version "1.0", runs status, asserts agent_connected is true and surfaces is non-empty within a 30-second budget (polling status every second); runs screenshot for surface 0 and asserts the returned PNG is non-empty. Logs each step to stderr. Exits 0 on success, 1 on assertion failure, 2 on any RPC/socket/parse error. Lift the framing helpers (_send, _recv_line) from loadtests/latency/orchestrator.py; don't reimplement them. (5) lane-up.sh — top-level workflow entry point, calls 1+2+3 in order, fetches the .vv file from kerbside's API (read the kerbside API source to find the right URL shape; the test_spice_via_kerbside tempest plugin does the same thing — model the request on it), launches ryll --headless --file <vv> --control-socket /tmp/ryll-ci.sock & and writes its PID. (6) lane-down.sh — kills ryll/kerbside/qemu PIDs from their pidfiles (best-effort, never errors), removes the TLS tempdir, prints "lane down". Use set -euo pipefail everywhere except lane-down.sh where best-effort is the goal. bash -n and shellcheck clean. python3 -m py_compile on the Python script. One commit. |
| 5c. Add direct-qemu CI workflow | kerbside | medium | sonnet | none | Create .github/workflows/direct-qemu-functional.yml. Triggers: push to test-harness branch, pull_request against develop. Runs-on: [self-hosted, vm] (the open question about runner labels is resolved empirically; expect to iterate by pushing label-set variants if this combination fails to expose /dev/kvm). Single job, no matrix. Job timeout: 60 minutes. Steps in order: (1) actions/checkout@v4. (2) ls -la /dev/kvm and lsmod \| grep kvm \|\| true and qemu-system-x86_64 --version and which ovmf-firmware-x86-64-code.bin (or whatever Debian calls the OVMF path) — diagnostic; fail fast on missing /dev/kvm. (3) apt-get update && apt-get install -y qemu-system-x86 ovmf openssl curl jq libgl1 libx11-6 libxcb1 libxkbcommon0 libwayland-client0 libasound2 libssl3 build-essential cmake pkg-config libasound2-dev libxcb1-dev libxkbcommon-dev libwayland-dev libegl1-mesa-dev libgl1-mesa-dev libxcb-render0-dev libxcb-shape0-dev libxcb-xfixes0-dev libx11-dev libssl-dev. (4) Install Rust via rustup (the runner may already have it; check first with which cargo). (5) Clone ryll from https://github.com/shakenfist/ryll.git to a tempdir at main HEAD; cargo build --release --no-default-features -p ryll; copy target/release/ryll to /usr/local/bin/ryll. (6) Install kerbside: python3 -m venv /tmp/kerbside-venv && /tmp/kerbside-venv/bin/pip install -e . from the checked-out repo. Expose the venv on PATH. (7) tools/direct-qemu/lane-up.sh — single call, exit code is the lane health. (8) tools/direct-qemu/smoke-client.py /tmp/ryll-ci.sock — exits non-zero on lane failure. (9) Assert the Sextant boot banner appears in the serial log: grep "Hello from Uncalibrated Sextant" /tmp/sextant-serial.log. (10) Always-run step that uploads /tmp/kerbside.log, /tmp/sextant-serial.log, /tmp/ryll-ci.stderr, /tmp/smoke-client.log, and any core dumps as direct-qemu-artifacts (actions/upload-artifact@v4, retention-days: 90, if-no-files-found: warn). (11) Always-run cleanup step that calls tools/direct-qemu/lane-down.sh. Use actionlint locally before committing. Pre-commit clean. Do not run the workflow as part of this step — the operator will push and watch CI; if the first run fails on runner-label or KVM probe, expect a follow-up commit on this same branch to adjust. One commit. |
Sequencing notes¶
- 5a is a leaf change. Land it first so steps 5b and 5c have a real qcow2 to point at.
- 5b depends on 5a (the lane scripts reference the qcow2 path). Lands second.
- 5c depends on 5b (the workflow calls lane-up.sh / lane-down.sh / smoke-client.py). Lands third.
- After 5c lands, the operator pushes the branch and watches CI. The first push is expected to surface runner-label or KVM-probe issues; follow-up commits on the same branch adjust the workflow until CI is green. This is not a separate phase — it's part of finishing 5c.
Branch and PR shape: All phase 5 commits land on the
existing test-harness branch of kerbside. The operator opens
the kerbside PR (which already carries phases 1–4's kerbside
work) once phase 5 is done. Whether to split the kerbside PR
into per-phase PRs is a master-plan-level decision, not a
phase 5 decision.
Agent guidance¶
This phase plan follows the conventions in PLAN-TEMPLATE.md
at the kerbside repo root. The execution model, effort levels,
model-choice guidance, brief-writing standards, and management-
session review checklist all apply unchanged and are not
duplicated here.
Notes specific to phase 5:
- Don't try to use docker in the workflow. The phase 4 loadtest image uses docker because it's a portable loadtest harness shipped to users; the phase 5 lane runs in CI on an ephemeral runner and gains nothing from containerisation. Building ryll directly on the runner is the simpler path.
- Don't reimplement what phase 2/3/4 already gave us. The static source driver is already wired up; the control socket protocol is documented and tested; the smoke client lifts framing helpers from the loadtest orchestrator. If a sub-agent feels like it needs a new protocol verb or a new sources.yaml field, push back — that's protocol drift that wants its own plan.
- CI iteration is part of finishing 5c. The phase plan
cannot pre-resolve which runner label set has
/dev/kvm; the implementing agent must accept that pushing the branch and watching CI is the resolution mechanism. Each iteration is a small follow-up commit on the same branch, not a new phase. - Heavy ryll runtime is acceptable. Phase 6's
headlessCargo feature is the cure; phase 5 lives with the bloat. Don't try to slim the apt-get list by skipping libs the GUI stack needs at runtime — ryll will fail to start. - The smoke check should fail loudly. If kerbside
doesn't come up, fail. If the .vv fetch returns 4xx/5xx,
fail. If ryll's hello rejects, fail. If
agent_connectednever becomes true within the budget, fail. Silent success on a half-broken lane is the worst outcome — better a noisy red workflow.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the step and how the work you intend to do aligns with that step's brief.
Administration and logistics¶
Success criteria¶
Phase 5 is done when:
tests/fixtures/uncalibrated-sextant.qcow2exists in the repo andqemu-img inforeports a valid qcow2.tools/direct-qemu/contains the seven scripts described above; each is executable;bash -nandshellcheckclean;python3 -m py_compileclean on the Python script..github/workflows/direct-qemu-functional.ymlexists,actionlintreports no errors against the local registered-label list, and a CI run against the test-harness branch reaches the smoke-client step.- A CI run completes the smoke-client step with exit 0 and the workflow finishes green. Wallclock budget: ≤ 30 min (ryll build dominates, ~8–15 min; everything else is sub-minute).
- Artifacts are uploaded and downloadable from the workflow run.
- The master plan's phase-5 row is marked "Implementation complete; PR pending operator".
pre-commit run --all-filesfrom kerbside root is clean.
Future work¶
Items deliberately deferred from phase 5:
- Scenario tempest test. Phase 7. Phase 5 leaves the tempest plugin tree alone so phase 7 has a clean seam.
- QR digest assertion in the smoke check. Phase 6 adds
digest decoding to ryll behind the
digest-decodeCargo feature; once that lands, phase 5's smoke client can be extended (or phase 7's tempest test can ride on top) to verify the digest reflects sent input. - Slim CI image via ryll's
headlessCargo feature. Phase 6. The phase 5 workflow's apt-get list will shrink noticeably once that lands. tox -edirect-qemufor local runs. A local convenience env that runslane-up.shand the smoke client without requiring a full GitHub Actions environment. Small; worth doing as a developer-experience follow-up.- A second runner shape. If the lane proves useful and
we want it on
[self-hosted, static]or[self-hosted, vm, debian-12]too, add a matrix later. - OpenStack lane disposition. Phase 8 decides whether to retire, demote-to-nightly, or keep the OpenStack lane. Phase 5 explicitly does not change the OpenStack lane.
Bugs fixed during this work¶
(None yet.)