Channel diagnostics audit¶
Why this doc exists¶
Every active SPICE channel should surface enough state in a bug report that an operator can characterise its current behaviour without re-running the session with extra logging. This document records the per-channel observability audit as a living checklist: when a new channel is added, a new section should be added here before the channel lands. It was spun out of phase 4 of the stream-caps-and-flap work, driven by sessions 002d/002e where audio went silent during video playback and the bug report contained no packet counts, no decoder errors, and no device-side stats to distinguish between competing hypotheses.
Today's audit matrix¶
The table below is a snapshot of the state at phase-4 plan-write time (2026-05-18). Update the Fields and Gap columns when a channel's snapshot is expanded.
| Channel | Fields | Channel-specific signals | Gap |
|---|---|---|---|
| display | 40 | rich (post-phase-3) | none |
| main | 15 | mm_time, session_id, keepalive_timeout_fired | OK |
| inputs | 14 | motion_count, recent_events | OK |
| cursor | 14 | cache_entries, cache_contents | OK |
| playback | 8 | none | all audio-side state |
| usbredir | 8 | none | all USB-side state |
| webdav | 8 | none | all HTTP-side state |
| record | 0 | channel skipped at link time | document the skip |
Minimum diagnostic baseline¶
Every channel snapshot should publish the transport common eight fields
(already universal across all channels) plus four new baseline additions.
The baseline additions are added inline per snapshot struct rather than
via a shared #[serde(flatten)]'d struct — less DRY but avoids the
serde-flatten footgun and is consistent with today's pattern.
Transport common (already universal)¶
| Field | Description |
|---|---|
bytes_in |
Total bytes received on this channel since session start |
bytes_out |
Total bytes sent on this channel since session start |
last_recv_ts_secs |
Wall-clock timestamp (seconds) of the most recent received message; None if nothing received yet |
last_send_ts_secs |
Wall-clock timestamp (seconds) of the most recent sent message; None if nothing sent yet |
ping_recv_count |
Number of PING messages received from the server |
pong_send_count |
Number of PONG replies sent to the server |
last_ping_recv_ts_secs |
Timestamp of the most recent PING received; None if no PING received yet |
writer_dropped_count |
Number of outbound writes dropped because the write queue was full |
New baseline additions (phases 4B–4E)¶
These four fields should be added to every channel snapshot that does not already have them. Phase 4B adds them to inputs, cursor, and main; phases 4C–4E add them to playback, usbredir, and webdav alongside channel-specific fields.
| Field | Description |
|---|---|
messages_recv_by_opcode |
Map of server-opcode → receive count since session start; gives a complete picture of what message types the server has sent on this channel |
messages_send_by_opcode |
Map of client-opcode → send count since session start |
last_unknown_opcode |
The most recent opcode the channel received but did not recognise; surfaces protocol-coverage gaps that warn_once would otherwise swallow silently |
unknown_opcode_count |
Total number of unrecognised opcodes received since session start |
The recent_action_ring field (a bounded ring of structured "interesting
event" rows) was discussed as a fifth baseline addition but is deferred
— display's recent_decodes is the only precedent and promoting it to
every channel adds complexity without a clear immediate payoff. Revisit
if a future channel needs a time-ordered event log.
Per-channel wish-lists¶
display¶
The display channel already meets the baseline and, post phase 3, carries
rich channel-specific instrumentation: per-stream packet/decode/frame
counters, the active video decoder backend (exposed as video_decoder_backend
per stream, e.g. "MJPEG (ImageIO)" or "H264 (openh264)"), recent decode-duration
rings, STREAM_REPORT fields, and a streams_recently_destroyed ring that feeds
the phase-7 flap-notification work. Phase 6 adds H.264 support and per-stream
decoder backend identification. No additions needed in phase 4.
main¶
Main already has session identity (session_id), multimedia clock state
(mm_time, mm_time_recv_ts_secs), and keepalive-timeout bookkeeping
(keepalive_timeout_fired). The client_keepalive_send_count and
last_client_keepalive_send_ts_secs fields described in earlier
revisions of this audit were removed in phase 11A — the main-channel
spurious-PONG keepalive they tracked was a K1 band-aid made redundant
by the 370d8ce5 fix. (The inputs channel retains the same field
names — see the "inputs" section above — to track its KEY_MODIFIERS
idle restatement, kept on the cross-channel-idleness hypothesis;
see PLAN-stream-caps-and-flap-phase-11-remove-pong-keepalive.md.) The
four new baseline fields (messages_recv_by_opcode etc.) will be
added in phase 4B.
One nuance: the per-opcode map is less useful on main than on other
channels because the agent-related opcodes (VD_AGENT_*) are nested
under main's payload framing — a flat opcode map at the main-channel
level won't distinguish agent message subtypes. This is acceptable for
now; agent observability is a separate concern covered by the vdagent
probe in phase 8.
inputs¶
Inputs already has motion_count and a recent_events ring that
records the last N input events with timestamps. The four new baseline
fields will be added in phase 4B. No channel-specific gaps identified.
cursor¶
Cursor already has cache_entries (current number of cached cursors)
and cache_contents (a summary of what is cached). The four new
baseline fields will be added in phase 4B. No channel-specific gaps
identified.
playback¶
Playback today has only the eight transport common fields. An operator reading a playback snapshot cannot tell whether the server sent any audio data, whether the client decoded it, or whether the audio device actually consumed it. The target shape after phase 4C should include:
Session metadata:
- Whether a playback session is currently active and, if so, when it
started and what multimedia clock value the server assigned at start
- Cumulative count of PLAYBACK_START messages received since session
open (each start begins a new audio session)
- Cumulative count of PLAYBACK_STOP messages received
- The codec in use for the current session (Opus, raw PCM, or an
unrecognised codec number)
- The sample rate and channel count negotiated at the most recent start
- The last volume levels per channel and the last mute flag, as set
by the server
- The last latency hint the server sent (in milliseconds)
Data plumbing:
- Count of PLAYBACK_DATA packets received from the server
- Count of those packets successfully decoded
- Count of those packets that failed to decode
- Total compressed bytes received (sum of DATA payload lengths)
- Total PCM bytes produced by the decoder
- A recent ring of per-packet decode durations in microseconds
(capacity ~64; useful for spotting decode-latency spikes)
Device pipeline (fed from audio-thread atomics): - Total cpal output-callback invocations since the most recent device open (proves the audio device is pulling) - Count of callbacks where the ring buffer had zero ready samples at callback entry — these are true underruns where we handed the device silence - Count of times decoded samples were dropped because the ring buffer was full (encoder running ahead of the device clock) - Total samples consumed by the device since the most recent device open
These counters together let an operator answer: "did the server send audio? did we decode it? did the device pull it?" — the three questions the 002d/002e audio-silence bug required.
usbredir¶
Usbredir today has only the eight transport common fields. A bug report for a USB redirect problem gives no information about which devices are currently forwarded or whether any data is flowing. The target shape after phase 4D should include:
Handshake: - The server capability bitmap observed during the usbredir hello exchange - The client capability bitmap we sent
Device tracking: - The list of currently-redirected devices; for each device: USB vendor ID, product ID, device class, time the redirect was established, bytes transferred to the guest, and bytes received from the guest - Cumulative count of device-connect events since channel open - Cumulative count of device-disconnect events since channel open - Timestamp of the most recent device connect or disconnect event
webdav¶
Webdav today has only the eight transport common fields. A bug report for a guest file-sharing problem gives no information about whether any HTTP requests have been processed. The target shape after phase 4E should include:
- Count of HTTP requests received over the spice-vmc transport
- Total bytes sent in HTTP response bodies
- Number of currently-open HTTP connection objects (active sessions)
- Timestamp of the most recent HTTP request received
- Timestamp of the most recent HTTP response sent
- Count of times the decompressed-size limit was exceeded (currently a warn-only path; surfacing the count lets operators know the limit is being hit)
record¶
The record channel is skipped at link time in main_channel.rs (the
handler logs Skipping channel: record and does not establish the
channel). There is no user-visible surface for microphone capture today
and no RecordSnapshot struct. This is explicitly in scope for phase
4 to document rather than implement — the channel stays skipped with
a one-line comment explaining why. If microphone capture is added in
a future phase, a RecordSnapshot should be created at that point
following the baseline template in this document.
How to add a new channel¶
- Define a snapshot struct in
shakenfist-spice-renderer/src/snapshots.rs. Start with the eight transport common fields (copy from any existing snapshot) plus the four baseline additions (messages_recv_by_opcode,messages_send_by_opcode,last_unknown_opcode,unknown_opcode_count). Add channel-specific fields below a separating comment. - Wire opcode counting in the channel's message-dispatch
matcharm: incrementmessages_recv_by_opcode[opcode]on every receive andmessages_send_by_opcode[opcode]on every send. Treat unrecognised opcodes asunknown_opcode_count += 1/last_unknown_opcode = Some(opcode)rather than silently ignoring them. - Add a
ChannelSnapshotsarm for the new snapshot type inshakenfist-spice-renderer/src/snapshots.rs. - Extend
snapshot_json_forinryll/src/bugreport.rsso the new channel's snapshot is included in the appropriate bug-report channel type. - Write a serialiser test mirroring the
test_display_snapshot_serialisespattern: construct a snapshot with all fields populated, serialise to JSON, and assert that each field is present with the expected value. This catches serde renames and flatten footguns before they reach a real bug report. - Update this document: add a new section under Per-channel wish-lists with a brief description of what the new channel surfaces and what gaps (if any) remain.