Phase 1: Display-channel instrumentation¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the ryll codebase thoroughly. Read the
referenced source files, understand existing patterns (the
channel snapshot system, the recent_decodes ring, the
channel-state.json bug-report artefact), and ground your
answers in what the code actually does today. Do not speculate
about the codebase when you could read it instead.
Goal¶
Add enough display-channel signal to a bug report's
channel-state.json that a maintainer can tell, without
re-running the session, whether a "video not keeping up"
complaint is caused by slow image decode, the read loop
falling behind the socket, or SPICE-level ACK backpressure.
At the end of this phase no threading changes have been made. The display channel still runs as a single async task; we have only added measurement of its hot path. The data this phase produces is what justifies (or refutes) phases 2–4.
Concretely, the bug report's channel-state.json for the
display channel gains three families of fields:
- Per-decode wall time on every entry in
recent_decodes, plus session-cumulative counters and a min/max/mean over the recent ring. - Socket-read fill stats: how often the 256 KB read chunk came back full (a cheap proxy for "the OS recv buffer had bytes waiting when we read") and the max chunk size observed.
- ACK-send stats: total ACKs sent and the recent interval distribution between ACK sends.
Scope is display only. Cursor / playback / main are out of scope for this phase; if their data is needed later we revisit.
Design¶
Per-decode wall time¶
DecodeResult in
shakenfist-spice-renderer/src/snapshots.rs:15-31 currently
records image type, dimensions, cache flag, success flag, and a
session-relative timestamp — but no duration. Add:
#[derive(Debug, Clone, Serialize)]
pub struct DecodeResult {
pub image_type: String,
pub image_id: u64,
pub width: u32,
pub height: u32,
pub from_cache: bool,
pub success: bool,
pub timestamp_secs: f64,
/// Wall-clock microseconds spent decompressing this image.
/// Zero for `FromCache` and for failures that short-circuit
/// before the decoder is invoked.
pub decode_duration_us: u32,
}
u32 microseconds covers ~71 minutes per decode, which is well
beyond any plausible per-image budget. Microseconds (not
milliseconds) keep small Lz4 / Pixmap decodes resolvable; many
finish in well under 1 ms.
In shakenfist-spice-renderer/src/channels/display.rs the
decode entry point is decode_image_and_emit at line 1251.
Decompression dispatch (decompress_glz, decompress_spice_lz4,
decode_mjpeg_frame, raw Pixmap copy) happens between the
ImageDescriptor read at line 1293 and the record_decode
call at line 1672. Wrap the decompression dispatch with an
Instant::now() / elapsed(). For FromCache and for early
returns that never invoke the decoder, the duration is 0.
Aggregate decode stats on DisplaySnapshot¶
Add to DisplaySnapshot in
shakenfist-spice-renderer/src/snapshots.rs:34-59:
/// Total decodes recorded since session start. Counts every
/// `record_decode` call, including failures and cache hits.
pub decode_total_count: u64,
/// Decodes where `success == false`. Subset of the total.
pub decode_failed_count: u64,
/// Decodes where `from_cache == true`. Subset of the total.
pub decode_from_cache_count: u64,
/// Min / max / mean of `decode_duration_us` over the entries
/// currently in `recent_decodes` for which the decoder ran
/// (i.e. excluding cache hits and short-circuited failures).
/// Zero when the ring contains no such entries.
pub decode_recent_min_us: u32,
pub decode_recent_max_us: u32,
pub decode_recent_mean_us: u32,
The two cumulative counters live on the channel struct and
increment in record_decode. The three recent-window stats
are computed at snapshot-emit time over the cloned ring in
update_snapshot (display.rs:2120-2150) — no extra state.
Socket-read fill stats¶
run_loop in display.rs:599-642 reads up to 256 KB into a
stack-allocated chunk: [u8; 262144] and then processes the
&chunk[..n] slice. When n == chunk.len() the OS recv buffer
had at least one full chunk waiting at read time, which is a
cheap signal that the read loop is not consuming faster than
the server is sending.
Add to the channel struct (and mirror onto DisplaySnapshot):
/// Total reads from the socket since session start. One per
/// `s.read(&mut chunk).await` that returned `n > 0`.
pub socket_read_count: u64,
/// Reads where `n == 262144` (the full chunk size). A high
/// ratio of these to `socket_read_count` indicates the OS recv
/// buffer was usually non-empty when we read — i.e. the read
/// loop is not keeping up with arrival rate.
pub socket_reads_at_chunk_cap: u64,
/// Largest `n` observed over the session. Useful as a sanity
/// check when chunk capacity is later tuned.
pub socket_max_chunk_bytes: u32,
Update at display.rs:619 and following (after the read
returns and before the chunk is processed). FIONREAD-style
direct buffer inspection is rejected: it requires raw-fd access
to a tokio stream and adds a per-platform code path, and the
fill-rate proxy answers the diagnostic question well enough.
ACK-send stats¶
process_messages at display.rs:666-700 calls
self.send_ack() whenever message_count - last_ack >=
ack_window. The snapshot already exposes ack_window,
message_count, and last_ack, but a static snapshot can't
tell whether ACKs are firing on a healthy cadence or whether a
long pause has accumulated unACKed messages. Add:
/// Total ACK messages sent to the server since session start.
pub ack_send_count: u32,
/// Session-relative seconds of the most recent ACK send. None
/// until the first ACK is sent.
pub last_ack_send_ts_secs: Option<f64>,
/// Intervals (seconds) between the last N consecutive ACK
/// sends, oldest first. Bounded ring; see RECENT_ACK_INTERVALS_CAP.
pub recent_ack_intervals_secs: VecDeque<f64>,
Cap the ring at 32. The math: with a typical ack_window of
1000 messages and a busy display session at thousands of msgs/s,
32 intervals is roughly the last 10–30 seconds of activity —
enough to see whether ACKs paused without bloating the snapshot.
Define a constant alongside MAX_RECENT_DECODES in display.rs:
Increment and record in send_ack (or in process_messages at
the call site, whichever already owns the timing).
Bug-report flow¶
DisplaySnapshot derives Serialize (snapshots.rs:34) and is
serialized to channel-state.json via
ChannelSnapshots::snapshot_json_for("display") at
ryll/src/bugreport.rs:260-292. New fields therefore flow into
the bug report without any change to the bug-report code path.
The existing test test_display_snapshot_serialises at
ryll/src/bugreport.rs:1853 covers serialisation shape and
needs to be extended to assert the new fields are present.
Steps¶
Step 1: Extend DecodeResult and DisplaySnapshot¶
- In
shakenfist-spice-renderer/src/snapshots.rs: - Add
decode_duration_us: u32toDecodeResult. - Add
decode_total_count,decode_failed_count,decode_from_cache_count,decode_recent_{min,max,mean}_us,socket_read_count,socket_reads_at_chunk_cap,socket_max_chunk_bytes,ack_send_count,last_ack_send_ts_secs,recent_ack_intervals_secstoDisplaySnapshot. - Update the
Defaultderive —VecDequeand numeric types default to empty/zero so the derive still works.
Step 2: Time the decode in decode_image_and_emit¶
- In
shakenfist-spice-renderer/src/channels/display.rs: - At the start of
decode_image_and_emit(around line 1251) do not start the timer yet; we only want to time the decompression work, not the header parsing. - Once
image_typeis resolved (around line 1294) and just before the decompression dispatch, capturelet decode_start = std::time::Instant::now();. - After the decompression dispatch produces
decompressed(around line 1668, just before therecord_decodecall at 1672), computelet decode_duration_us = decode_start.elapsed().as_micros() as u32;saturating atu32::MAX(usetry_intowithunwrap_or(u32::MAX)). - For
FromCacheand for early returns that never invoke the decoder,decode_duration_usis0. - Pass the value into the
DecodeResultliteral at 1672.
Step 3: Aggregate decode counts and recent-window stats¶
- Add fields to the channel struct (mirroring the snapshot):
decode_total_count: u64,decode_failed_count: u64,decode_from_cache_count: u64. - In
record_decode(display.rs:2113), increment the counters from theDecodeResultflags. - In
update_snapshot(display.rs:2120-2150), after cloningrecent_decodesintosnap, iterate the ring and computemin / max / meanofdecode_duration_usover entries where!from_cache && success. Empty result → all three are 0.
Step 4: Socket-read fill stats¶
- Add fields to the channel struct:
socket_read_count: u64,socket_reads_at_chunk_cap: u64,socket_max_chunk_bytes: u32. - In
run_loop(display.rs:619+), after theif n == 0check and beforeself.byte_counter.add(...): self.socket_read_count += 1;if n == chunk.len() { self.socket_reads_at_chunk_cap += 1; }self.socket_max_chunk_bytes = self.socket_max_chunk_bytes.max(n as u32);- Mirror to the snapshot in
update_snapshot.
Step 5: ACK-send stats¶
- Add fields to the channel struct:
ack_send_count: u32,last_ack_send_ts_secs: Option<f64>,recent_ack_intervals_secs: VecDeque<f64>. - Define
const RECENT_ACK_INTERVALS_CAP: usize = 32;nearMAX_RECENT_DECODES. - At the existing ACK call site (
display.rs:693-694) — or insend_ackitself if that keeps the math local — after a successful ACK send: - Compute
now = self.traffic.elapsed().as_secs_f64(). - If
last_ack_send_ts_secsisSome(prev), pushnow - previntorecent_ack_intervals_secs; pop the front if length exceedsRECENT_ACK_INTERVALS_CAP. - Set
last_ack_send_ts_secs = Some(now). ack_send_count += 1.- Mirror to the snapshot in
update_snapshot(clone the ring).
Step 6: Tests¶
- Extend
test_display_snapshot_serialisesatryll/src/bugreport.rs:1853to set the new fields on aDisplaySnapshotand assert they survive serialisation. - Add a unit test for the recent-window aggregation: build a
VecDeque<DecodeResult>with mixed cache-hit / failure / success entries and assert that the min/max/mean computation excludes cache hits and failures. - Add a unit test for the ACK-interval ring: push 40 intervals and assert that only the most recent 32 are retained, in order.
Step 7: Manual verification¶
make buildandmake testsucceed.make lint(orpre-commit run --all-files) passes.- Run ryll against a SPICE server, trigger a bug report after
a few seconds of activity, unzip it, and verify
channel-state.jsonfor the display channel contains all new fields with plausible values: decode_total_count > 0.decode_recent_max_us > decode_recent_min_uson a busy session, both> 0.socket_read_count > 0,socket_max_chunk_bytes <= 262144.ack_send_count > 0after enough traffic to exceed an ACK window, with a non-emptyrecent_ack_intervals_secs.
Step 8: Documentation¶
- Update
docs/troubleshooting.mdwith a short section describing the new fields and how to read them when diagnosing "video not keeping up" reports. Cross-reference the master plan. - Mention the new fields in
ARCHITECTURE.md's description ofDisplaySnapshot(if such a description exists there). - No
README.mdupdate needed — the diagnostic fields are internal to bug reports, not user-facing.
Administration and logistics¶
Success criteria¶
DecodeResultcarriesdecode_duration_usand the field appears inchannel-state.jsonfor every recorded decode.DisplaySnapshotexposes cumulative decode counters, recent-window min/max/mean decode duration, socket-read fill stats, and ACK-send stats.recent_ack_intervals_secsis capped at 32 entries.- All new fields default to zero / empty so existing tests that
build a default
DisplaySnapshotkeep passing. make build,make test, andmake lint(orpre-commit run --all-files) all pass.- A bug report generated during a session with display traffic contains plausible values for every new field.
Risks¶
- Per-decode
Instant::now()cost.Instant::now()on Linux is avDSO-fastclock_gettimecall (tens of ns). Two of them per decode at a few thousand decodes per second is below the noise floor of the work we're already doing. as u32truncation on slow decodes. A decode taking71 minutes would saturate. Such a case is itself a bug worth seeing;
unwrap_or(u32::MAX)makes it visible rather than crashing.- ACK-ring memory. 32 ×
f64= 256 bytes per session. Negligible. - Snapshot clone cost.
update_snapshotalready clones therecent_decodesring (capMAX_RECENT_DECODES = 20) on every call; adding a 32-entryrecent_ack_intervals_secsclone is negligible. - Backward compatibility of
channel-state.json. The file is internal to bug reports and parsed by maintainers, not by external tooling, so adding fields is safe.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.