Phase 05: Handle SPICE_MSG_DISPLAY_STREAM_DESTROY_ALL¶
Prompt¶
Before answering questions or making design decisions in this
document, read the relevant ryll source. Key files:
shakenfist-spice-renderer/src/channels/display.rs:135-143
(the StreamState struct — plain data, no OS resources),
shakenfist-spice-renderer/src/channels/display.rs:1002-1152
(the display-channel message dispatch; STREAM_CREATE / DATA /
CLIP / DESTROY / DATA_SIZED / ACTIVATE_REPORT are handled,
DESTROY_ALL is not),
shakenfist-spice-renderer/src/channels/display.rs:517
(streams: HashMap<u32, StreamState> — the only state the
handler needs to touch), and
shakenfist-spice-protocol/src/constants.rs:179-211 (the
display_server opcode module; STREAM_DESTROY_ALL is
missing). Cross-reference the SPICE protocol at
/srv/src-reference/spice/spice-protocol/spice/enums.h:497
(SPICE_MSG_DISPLAY_STREAM_DESTROY_ALL = 126),
/srv/src-reference/spice/spice-common/spice.proto:787
(empty payload), and the one-line equivalent in
spice-gtk/src/channel-display.c:1878-1881 which calls
clear_streams(channel) — itself a for-loop over c->streams
that destroys each and zeroes c->nstreams. ryll's
equivalent is self.streams.clear().
This phase fixes bug K5 — "Unhandled
SPICE_MSG_DISPLAY_STREAM_DESTROY_ALL (msg type 126)" —
identified from the pedantic zip at 08:36Z during dogfooding
session 001. The master plan flags this as small ("Phase 05
— own plan (small); empty payload, one-line equivalent in
spice-gtk").
Situation¶
What we already established¶
The opcode and payload. SPICE display message 126 is
SPICE_MSG_DISPLAY_STREAM_DESTROY_ALL with an empty body
(the .proto declares it as Empty stream_destroy_all). The
server emits it whenever it wants to tear down every active
stream at once — typically immediately before a resolution
change or surface reconfiguration, where each individual
STREAM_DESTROY (msg 125) would be redundant book-keeping.
ryll currently treats it as an unknown opcode. The
display-channel dispatch at
shakenfist-spice-renderer/src/channels/display.rs:1148
catches anything not explicitly matched and calls
logging::log_unknown_once("display", msg_type, payload).
That registers the gap key display:hexdump:126 in the
warn_once registry the very first time msg 126 arrives.
session 001's pedantic zip recorded exactly that key.
Stream state lives in a single field. Each
DisplayChannel owns
streams: HashMap<u32, StreamState> (display.rs:517).
StreamState is plain data — surface id, codec, dest
rectangle, optional cached DHT segments (display.rs:135-143).
No OS handles, no decoder state, nothing to clean up beyond
the Drop impl that runs implicitly when the HashMap is
cleared. spice-gtk has clear_streams(channel) which does
the same thing plus an explicit decoder teardown; the ryll
equivalent really is self.streams.clear().
The graceful-degrade backstop already works. The
STREAM_DATA / STREAM_DATA_SIZED handler at
display.rs:1072 looks up the stream via
self.streams.get_mut(&stream_id) and silently no-ops on
miss. So a STREAM_DATA for a stream that DESTROY_ALL would
have removed is harmless in the current code — provided the
server doesn't reuse the stream_id for a new stream with
different parameters before re-issuing STREAM_CREATE.
What we don't yet know¶
Whether ryll has ever experienced the stream-id-reuse hazard
in the wild. Servers normally send STREAM_CREATE for the
new generation immediately after STREAM_DESTROY_ALL, which
overwrites the stale entry in our HashMap and recovers
correct behaviour. So the worst observable today is:
- A small per-resolution-change memory leak (one
StreamStateper orphaned stream — bounded by the server's stream count, typically 1-2). - The
display:hexdump:126gap key persisting in--pedanticreports until the fix lands, distracting reviewers from real gaps.
Neither rises above "low impact" in the master plan's K5 classification.
Mission and problem statement¶
Make ryll handle SPICE_MSG_DISPLAY_STREAM_DESTROY_ALL
explicitly, matching spice-gtk's clear_streams semantic:
drop every entry from the in-channel streams HashMap and
log an info! line so a maintainer reading the trace can
see the event. The display:hexdump:126 gap key must stop
appearing.
The phase succeeds when:
- A guest-driven resolution change (the most common trigger
for msg 126) does not add
display:hexdump:126to the warn_once registry. - The stream HashMap is empty immediately after the DESTROY_ALL is processed, so any subsequent STREAM_CREATE starts from a clean slate.
- No regression in the existing
STREAM_DESTROYhandler (which still removes a single stream by id). - No new dependencies or refactors — the change is a single match arm plus a constant declaration.
Approach¶
The fix¶
Add the opcode to shakenfist-spice-protocol/src/constants.rs's
display_server module:
Slot it between STREAM_DESTROY (125) and the next contiguous
opcode (MIGRATE if defined, otherwise the gap before draw
opcodes at 302). This keeps the numeric order obvious to
anyone scanning the enum.
Add a new arm to the display-channel dispatch at
display.rs:1136 (right after STREAM_DESTROY):
display_server::STREAM_DESTROY_ALL => {
// Empty payload — server signals "tear down every
// active stream", typically before a resolution change
// or surface reconfiguration. Equivalent to spice-gtk's
// clear_streams(channel) at channel-display.c:1855.
info!(
"display: stream_destroy_all (clearing {} streams)",
self.streams.len()
);
self.streams.clear();
}
That's the entire functional change. Two locations, both mechanical.
Add it to the protocol's message-name table¶
shakenfist-spice-protocol/src/logging.rs:383 already maps
STREAM_DESTROY → "stream_destroy" for the human-readable
trace logging. Add the parallel entry for STREAM_DESTROY_ALL
→ "stream_destroy_all" so trace output stays consistent.
Tests¶
The display channel's message dispatch is &mut self over a
fully-constructed DisplayChannel, which needs mpsc channels,
Arcs, snapshots, capture, byte counter, and notification
plumbing. Building a test harness for this single match arm
would be disproportionate.
The right level of testing for K5:
- Protocol-constant test in
shakenfist-spice-protocol/src/constants.rs::tests(or a freshtestsmod if none exists in that file) — a one-line assertion thatdisplay_server::STREAM_DESTROY_ALL == 126. Pins the value against future typos / accidental edits. Cheap. - Manual integration check: trigger a resolution change
on the guest (e.g.
xrandr --output ... --mode 1024x768then back to native), confirm--pedanticdoes not logdisplay:hexdump:126after the fix, and a debug-level trace shows the newstream_destroy_all (clearing N streams)line.
No unit test for the self.streams.clear() side effect is
worth its construction cost. The HashMap's clear is
trivially correct, and the integration check confirms the
end-to-end path.
Why not also clear surfaces?¶
A natural question: should DESTROY_ALL also drop any
self.surfaces state that the streams referenced? spice-gtk's
clear_streams only touches streams — surfaces are managed
by a separate SURFACE_DESTROY message family. Mirror that
narrow scope; expanding to surfaces would be speculative.
Why not also reset codec / cache state?¶
The cached DHT segments inside each StreamState are
already dropped by the HashMap::clear call (they're owned
Option<Vec<u8>> fields). The shared GLZ dictionary and the
image cache are per-channel state independent of streams,
and the SPICE server is responsible for sending appropriate
invalidation messages for those (INVAL_ALL_PIXMAPS and
similar) — folding their reset into the DESTROY_ALL handler
would invent semantics the protocol doesn't guarantee.
Tasks¶
- Add
pub const STREAM_DESTROY_ALL: u16 = 126;toshakenfist-spice-protocol/src/constants.rsin thedisplay_servermodule, immediately afterSTREAM_DESTROY. - Add
display_server::STREAM_DESTROY_ALL => "stream_destroy_all"to the message-name lookup inshakenfist-spice-protocol/src/logging.rs. - Add a new match arm in
shakenfist-spice-renderer/src/channels/display.rsafter the existingSTREAM_DESTROYarm. Body:info!log with current stream count, thenself.streams.clear(). Comment cites the spice-gtk equivalent for traceability. - Add a sanity unit test in
shakenfist-spice-protocol(constants::tests::display_stream_destroy_all_opcode_pinned) assertingdisplay_server::STREAM_DESTROY_ALL == 126. - Update
PLAN-session-001-feedback.mdK5 row in the "Confirmed bugs" table with the resolution pointer. - Update
PLAN-session-001-feedback.mdExecution table row for Phase 05 → Done. - Manual integration check (deferred operator action,
bundles with the running Phase 02–04 manual
checklist): connect to a guest that supports resolution
changes, trigger one (
xrandror display-settings UI), then dump the warn_once registry (the gap-popup in the bottom status bar) and confirmdisplay:hexdump:126is absent. With--verbose, also confirm the newdisplay: stream_destroy_all (clearing N streams)info line appears in the trace.
Out of scope¶
- Wider audit of unhandled display opcodes. The
--pedanticmode already does this systematically by registering everydisplay:hexdump:*gap key; K5 is the one entry from session 001 that warranted its own phase. Future-session gaps surface naturally without preemptive audit. - Clearing surfaces or codec caches in the same handler. Different protocol messages own those concerns; folding them in invents semantics. See "Why not…" rationale above.
- Refactoring
STREAM_DESTROYandSTREAM_DESTROY_ALLto share an implementation. They operate on different shapes (one entry by id vs. clear all); merging would obscure rather than simplify. - Headless / web mode tests. The display channel is the same code on every frontend; the protocol-constant guard covers the only ryll-specific invariant. No mode-specific surface.
- Telemetry on stream lifecycle. Counting destroys / destroy-alls in the snapshot would be informative for diagnosing churn but is independent of K5's "handle the opcode" fix. Revisit only if a future bug points at stream churn as a root cause.
Open questions¶
-
Should the
info!log becomedebug!once the fix has been live for one dogfooding session?info!is right for now — it's the most visible signal that the fix is reaching the handler, and the message is rare (one per resolution change). If a future session shows it as noise, demote in a follow-up. -
Should we proactively also handle other documented display opcodes that have empty payloads or trivial semantics? No — drive-by additions break the "session 001 → phase plan → fix" provenance chain that makes the master plan auditable. Each unhandled opcode that turns up in a future pedantic zip gets its own narrow fix.
Companion docs¶
None new. ARCHITECTURE.md already describes the display
channel and its stream / surface separation; this change
operates entirely within the established model. AGENTS.md
§20 ("--pedantic mode: registry observer pattern") already
covers how the warn_once registry surfaces gaps like
display:hexdump:126 — the fix's effect is to make that gap
disappear from the registry, which §20's machinery handles
automatically once the opcode is matched explicitly.