Phase 4: Render-side arrival-to-display latency¶
Prompt¶
Before responding to questions or discussion points in this
document, explore the ryll codebase thoroughly. Read the
referenced source files, understand existing patterns (the
ChannelEvent enum, the per-channel event-tx senders, the
egui process_events drain loop, the existing LatencyTracker
that measures network-only PING-to-PING), and ground your
answers in what the code actually does today. Do not speculate
about the codebase when you could read it instead.
Goal¶
Make "decode and I/O look healthy but the screen is still
laggy" a falsifiable hypothesis. After phase 4, a bug report
exposes how long display events (ImageReady* and
DisplayMark) sit in the channel→app mpsc queue before the
egui frame loop picks them up. A maintainer reading the
report can then say: phases 1–3 say socket / decode /
writer-task are fine; phase 4 lag is high; therefore the
egui loop is starved (probably by long-running synchronous
work inside App::update), and that's where to look.
This phase is optional per the master plan and is only justified once a real "video not keeping up" bug report collected with phases 1–3 active shows decode and I/O are not the bottleneck. Implementing it now means the next report already has the data, no second pass needed.
Out of scope: - "Inside the egui paint" timing. That would require instrumenting eframe or the platform compositor and is disproportionate to the diagnostic value. - End-to-end "pixels visible to user" timing. Unmeasurable without external instrumentation. - Changes to renderer or egui architecture.
Design¶
Definition of render-side latency¶
Render-side latency = the time between event emission in
the display channel and event processing in the egui frame
loop. Specifically, for an event e:
where produced_at_secs is captured in display.rs just
before event_tx.send(e) and consumed_at_secs is captured
in app.rs immediately after event_rx.try_recv() returns
Ok(e). Both use self.traffic.elapsed().as_secs_f64(),
which shares a single epoch across the renderer and the host
(verified during phase 1 — TrafficSink::elapsed() returns
start.elapsed() from a single fixed Instant).
Two event types are instrumented:
ImageReady/ImageReadyChroma/ImageReadyAlpha: per-image emission. High cadence (many per server frame). Catches "egui is processing one ImageReady per egui frame while the channel is emitting ten" — i.e. the renderer cannot keep up with the decode rate.DisplayMark: per-server-frame boundary. Lower cadence. Catches "egui frame loop is paced at 30 fps but the server is sending at 60 fps and the marks back up".
The two metrics share plumbing and tell different stories; both are kept.
What this catches that phases 1–3 do not¶
| Phase 1 sees | Phase 4 sees |
|---|---|
| Decode time (renderer-side) | Time the event sat in the mpsc before app pickup |
| Socket-read fill ratio | egui loop pickup cadence |
| ACK-window cadence | (n/a) |
Phase 4's ImageReady* lag rising while phase 1's
decode_recent_max_us and socket_reads_at_chunk_cap look
healthy is the diagnostic pattern that points at the
renderer/egui loop. With phases 1–3 plus phase 4, every
phase of the pipeline has at least one observable.
Event schema changes¶
In shakenfist-spice-renderer/src/channels/mod.rs:
pub enum ChannelEvent {
// ... existing variants ...
ImageReady {
display_channel_id: u8,
surface_id: u32,
left: u32,
top: u32,
width: u32,
height: u32,
pixels: Vec<u8>,
#[allow(dead_code)]
image_id: u64,
/// Session-relative seconds at the moment the
/// emitting channel called `event_tx.send`. Used by
/// the app to compute mpsc-queue lag. See
/// PLAN-video-keeping-up-phase-04.
produced_at_secs: f64,
},
ImageReadyChroma {
// ... existing fields, plus:
produced_at_secs: f64,
},
ImageReadyAlpha {
// ... existing fields, plus:
produced_at_secs: f64,
},
DisplayMark {
produced_at_secs: f64,
},
// ... other variants unchanged ...
}
DisplayMark is currently a unit variant (DisplayMark).
Phase 4 promotes it to a struct variant so it can carry the
timestamp. All consumers must be updated, but in practice
there's only one (the app's process_events).
Emission sites in display.rs¶
Six call sites build ImageReady* events:
- Line 522 —
build_image_eventforCompositeMode::Normal - Line 532 —
build_image_eventforCompositeMode::ChromaKey - Line 543 —
build_image_eventforCompositeMode::AlphaBlend - Line 1194 — direct
event_tx.send(ImageReady { … })for MJPEG stream frames - Line 1888 — direct
build_image_eventcall for one decode path - Line 1905 — direct
build_image_eventcall for another decode path
…plus one DisplayMark emit site (display.rs:962).
The cleanest implementation centralises the timestamp grab
in build_image_event so all three ImageReady* paths share
it. Direct call sites set the timestamp explicitly. The
DisplayMark site captures the timestamp at the
event_tx.send line.
App-side lag tracking¶
In ryll/src/app.rs, add to RyllApp (next to video_drop_count):
/// Phase-04: bounded ring of mpsc-queue lag samples (μs)
/// for `ChannelEvent::ImageReady*` events. Each entry is
/// `(consumed_at - produced_at) * 1_000_000` from the app's
/// event-drain loop. Cap 32, oldest evicted.
recent_image_ready_lag_us: VecDeque<u32>,
/// Phase-04: same shape but for `ChannelEvent::DisplayMark`.
recent_display_mark_lag_us: VecDeque<u32>,
Define const RECENT_LAG_RING_CAP: usize = 32; near the
existing FPS_WINDOW_SIZE constant.
In process_events (around ryll/src/app.rs:1398):
while let Ok(event) = self.event_rx.try_recv() {
let consumed_at_secs = self.traffic.elapsed().as_secs_f64();
match event {
ChannelEvent::ImageReady { produced_at_secs, .. }
| ChannelEvent::ImageReadyChroma { produced_at_secs, .. }
| ChannelEvent::ImageReadyAlpha { produced_at_secs, .. } => {
let lag_us = ((consumed_at_secs - produced_at_secs).max(0.0)
* 1_000_000.0) as u32;
push_with_cap(&mut self.recent_image_ready_lag_us, lag_us);
// ... existing blit logic, with destructuring of the
// remaining fields ...
}
ChannelEvent::DisplayMark { produced_at_secs } => {
let lag_us = ((consumed_at_secs - produced_at_secs).max(0.0)
* 1_000_000.0) as u32;
push_with_cap(&mut self.recent_display_mark_lag_us, lag_us);
// ... existing DisplayMark handling ...
}
// ... other variants ...
}
}
push_with_cap is a tiny local helper:
fn push_with_cap(ring: &mut VecDeque<u32>, value: u32) {
ring.push_back(value);
if ring.len() > RECENT_LAG_RING_CAP {
ring.pop_front();
}
}
Saturate negative differences at zero (.max(0.0)) to handle
the harmless case where consumed_at reads back slightly
earlier than produced_at due to monotonic clock granularity
on some platforms.
Snapshot fields¶
In ryll/src/bugreport.rs, extend AppSnapshot:
pub struct AppSnapshot {
// ... existing fields ...
/// Phase-04: min / max / mean μs of mpsc-queue lag over
/// the most recent `ImageReady*` events processed by the
/// egui frame loop. Window size is the
/// `RECENT_LAG_RING_CAP` constant in `app.rs`. Zero when
/// no qualifying samples have been observed. A high mean
/// when phase-1 decode and socket-fill metrics look
/// healthy implicates the egui loop / GUI thread as the
/// bottleneck. See PLAN-video-keeping-up-phase-04.
pub image_ready_lag_recent_min_us: u32,
pub image_ready_lag_recent_max_us: u32,
pub image_ready_lag_recent_mean_us: u32,
/// Same shape but for `DisplayMark` events (per-frame
/// boundary). Lower cadence than `image_ready_*` so the
/// recent window covers a longer real-time interval.
pub display_mark_lag_recent_min_us: u32,
pub display_mark_lag_recent_max_us: u32,
pub display_mark_lag_recent_mean_us: u32,
}
No raw VecDeque<u32> rings in the snapshot — the
aggregates are sufficient for the diagnostic question. This
keeps session.json small and matches phase 1's
decode_recent_* pattern.
In update_app_snapshot() (ryll/src/app.rs:1930-1970),
compute the six values from the channel struct's rings:
let (img_min, img_max, img_mean) =
recent_lag_stats(&self.recent_image_ready_lag_us);
snap.image_ready_lag_recent_min_us = img_min;
snap.image_ready_lag_recent_max_us = img_max;
snap.image_ready_lag_recent_mean_us = img_mean;
let (mark_min, mark_max, mark_mean) =
recent_lag_stats(&self.recent_display_mark_lag_us);
snap.display_mark_lag_recent_min_us = mark_min;
snap.display_mark_lag_recent_max_us = mark_max;
snap.display_mark_lag_recent_mean_us = mark_mean;
recent_lag_stats is a private helper analogous to phase 1's
recent_decode_duration_stats:
fn recent_lag_stats(ring: &VecDeque<u32>) -> (u32, u32, u32) {
if ring.is_empty() {
return (0, 0, 0);
}
let mut min = u32::MAX;
let mut max = 0;
let mut sum: u64 = 0;
for &v in ring {
if v < min { min = v; }
if v > max { max = v; }
sum += v as u64;
}
let mean = u32::try_from(sum / ring.len() as u64)
.unwrap_or(u32::MAX);
(min, max, mean)
}
Within-batch correlation: a known limitation¶
process_events drains the event channel in a tight
try_recv loop. Every event drained in one call sees the
same consumed_at_secs (modulo a few clock ticks). So if 10
ImageReady events were pending when an egui frame fires,
the 10 lag samples will be close together in time and tightly
correlated. The max sample in the ring is therefore the most
informative single number; the mean is biased by batch size.
This is documented in the doc comments on the snapshot fields
and accepted. A future refinement could sample at most one
event per drain call to decorrelate, but the diagnostic
question — "did anything sit in the queue for a long time?"
— is answered by max either way.
Where produced_at_secs is set¶
To keep emission-site changes minimal, plumb the timestamp
through build_image_event so the three composite variants
share one capture point:
fn build_image_event(
&self,
surface_id: u32,
base: &DrawBase,
img: DecompressedImage,
composite: CompositeMode,
) -> ChannelEvent {
let produced_at_secs = self.traffic.elapsed().as_secs_f64();
match composite {
CompositeMode::Normal => ChannelEvent::ImageReady {
// ... existing fields ...
produced_at_secs,
},
// ... similarly for ChromaKey and AlphaBlend ...
}
}
The MJPEG-stream emit (display.rs:1194) inlines an
ImageReady literal; it captures the timestamp directly at
the event_tx.send line.
The DisplayMark emit (display.rs:962) captures the
timestamp at its single line.
Steps¶
Step 1: Extend ChannelEvent variants with produced_at_secs¶
- In
shakenfist-spice-renderer/src/channels/mod.rs, addproduced_at_secs: f64toImageReady,ImageReadyChroma, andImageReadyAlpha. PromoteDisplayMarkfrom a unit variant to a struct variant with a singleproduced_at_secs: f64field. - Update any
match ChannelEvent::DisplayMarkpatterns in the codebase tomatch ChannelEvent::DisplayMark { .. }.
Step 2: Set timestamps at emission sites¶
- In
shakenfist-spice-renderer/src/channels/display.rs, modifybuild_image_eventto captureself.traffic.elapsed().as_secs_f64()once at the top and embed it in all three composite variants. - Update the direct
ImageReadyliteral at line 1194 to setproduced_at_secs. - Update the
DisplayMarksend at line 962 to capture and pass the timestamp.
Step 3: Add app-side lag rings¶
- Define
const RECENT_LAG_RING_CAP: usize = 32;nearFPS_WINDOW_SIZEinryll/src/app.rs. - Add
recent_image_ready_lag_us: VecDeque<u32>andrecent_display_mark_lag_us: VecDeque<u32>toRyllApp, initialised to empty in the constructor. - Add the
push_with_capandrecent_lag_statsprivate helper fns (free functions inapp.rs; co-locate with the existing helpers).
Step 4: Compute lag in process_events¶
- At the top of the
while let Ok(event) = …loop body, captureconsumed_at_secs = self.traffic.elapsed().as_secs_f64(). - For each of the four event variants, destructure
produced_at_secs, compute the lag, andpush_with_capinto the matching ring. Saturate negatives at zero. - Keep all existing handler logic intact — phase 4 only adds the lag-recording lines.
Step 5: Extend AppSnapshot¶
- Add the six new fields to
AppSnapshotinryll/src/bugreport.rswithDefaultcovered by0. - In
update_app_snapshot(), callrecent_lag_statsover each ring and assign the three values to the matching snapshot fields.
Step 6: Tests¶
recent_lag_stats_empty_ring_returns_zeros— pure function test.recent_lag_stats_computes_min_max_mean— feed[100, 300, 200]and assert(100, 300, 200).push_with_cap_caps_at_RECENT_LAG_RING_CAP— push 40 distinct values; assert ring length is 32 and the oldest 8 are evicted.app_snapshot_image_ready_lag_serialises— extendtest_app_snapshot_serialisesinryll/src/bugreport.rsto set the six new fields and assert they appear in the JSON.
A direct integration test "ImageReady emitted with timestamp T, app processes at T+lag, snapshot reflects lag" requires spinning up a display channel with a fake stream; that's heavier than the diagnostic value warrants. The unit tests plus the existing snapshot-serialisation test cover the contract.
Step 7: Build, test, lint, pre-commit gates¶
make build, make test, make lint, and
pre-commit run --all-files all pass.
Step 8: Documentation¶
- Extend the "Reading display
channel-state.jsonfor 'video not keeping up'" section indocs/troubleshooting.mdto cover the newAppSnapshotfields insession.jsonand how to read them in combination with phases 1–3's data. Specifically: a highimage_ready_lag_recent_max_uswhen phase-1decode_recent_max_usandsocket_reads_at_chunk_caplook healthy points at the egui loop. - Update
ARCHITECTURE.md'sAppSnapshotrow to mention the new fields. - Mark phase 4
Donein the master plan's execution table. - Promote the master plan's Open Question to a Resolved
decision: render-side latency = mpsc-queue lag for
ImageReady*andDisplayMarkevents; rationale already in this plan's Design section.
Administration and logistics¶
Success criteria¶
- All four event variants (
ImageReady,ImageReadyChroma,ImageReadyAlpha,DisplayMark) carryproduced_at_secs: f64. AppSnapshotexposes six new fields: min/max/mean μs of recent-window lag forImageReady*andDisplayMark.- A bug report from a real session shows
image_ready_lag_recent_mean_usin the low tens of μs on a healthy egui loop and rises observably under artificial load. - All existing tests pass.
- New tests in step 6 pass.
make build,make test,make lint, andpre-commit run --all-filesall pass.
Risks¶
- Within-batch correlation (documented in Design). The
maxover the ring is the most informative number; the mean is biased toward batch size. Accepted; documented. DisplayMarkenum-variant promotion is a breaking API change inside the renderer crate. The crate's only consumer in this repo is ryll itself; bothdisplay.rs(emit) andapp.rs(consume) are updated in the same commit. No external consumers.produced_at_secsfield clash with future Serde round-tripping.ChannelEventis not serialised today (it lives entirely in-process); the new field is fine for now. IfChannelEventever grows aSerializederive, the field naturally serialises with the rest.- Lag during egui startup. Before the first egui paint, events may queue for hundreds of ms. The ring fills with startup-lag samples that aren't representative of steady-state. After 32 events the ring rotates and steady-state values dominate; the doc comment notes this.
- Clock granularity on some platforms.
Instanton macOS / Linux is nanosecond-grade; on Windows it's 100 ns. Either way far below the μs unit we report. Saturating negative differences at zero handles the theoretical edge case. - Wasted work if real bug reports show phases 1–3 always catch the bottleneck. Phase 4 was framed as optional; the cost of doing it now is one focused change-set, and the cost of skipping it is needing a second collection pass when the first report says "decode and I/O look fine". Pre-empting that round-trip seems worth the cost.
Back brief¶
Before executing any step of this plan, please back brief the operator as to your understanding of the plan and how the work you intend to do aligns with that plan.