Phase 2: Repaint cadence fix¶

Parent plan: PLAN-idle-cpu-and-latency.md

Goal¶

Stop driving the egui render loop at an unconditional 60 Hz. Phase 1 measured this single line as the cause of ~6 of 6.24 idle cores: each repaint marks the frame dirty, wgpu re-rasterises the full scene, and Mesa's llvmpipe spreads that across all 16 CPU rasteriser threads. The same behaviour exists on systems with real GPUs — it just costs power instead of CPU there.

Target: idle CPU under 10% of one core, with no perceptible change in interactive responsiveness.

Background¶

ryll/src/app.rs:2169:

ctx.request_repaint_after(std::time::Duration::from_millis(16));

This is the only repaint trigger that runs every frame. Two other repaint sites exist:

app.rs:411: ctx.request_repaint() in the constructor's connection setup.
app.rs:1438 and app.rs:1659: request_repaint_after(1s) inside conditional dialog branches.

Channel events arrive on event_rx (a tokio mpsc::Receiver) and are drained by process_events() at app.rs:493 which uses try_recv() in a loop. egui only wakes if something has called request_repaint; if it doesn't wake, the events sit in the queue.

Channel tasks themselves run on the tokio runtime and have no access to egui::Context today. That's the obstacle: the natural fix is "wake egui when an event is pushed", but the pusher (a channel handler) doesn't have a context handle.

Approach¶

Two viable shapes:

Option A: Pass `egui::Context` to channel tasks¶

Plumb egui::Context (cheaply cloneable; it's an Arc internally) into each channel handler's spawn site, store it on the handler, and call ctx.request_repaint() after every event_tx.send(...).

Pros: precisely event-driven; egui sleeps fully when idle. Cons: touches many files (every channel handler), grows constructor signatures, couples the protocol crates to egui (or requires a trait abstraction to avoid that coupling).

Option B: Single bridging task¶

Spawn one tokio task at startup that owns an egui::Context clone and a clone of the event sender's notification source. Whenever an event is enqueued, the task calls ctx.request_repaint().

Implementation: add a tokio::sync::Notify shared between the producer (channel handlers) and the bridging task. Channel handlers call notify.notify_one() after each event_tx.send(...). The bridging task does notify.notified().await; ctx.request_repaint(); in a loop.

Pros: minimal change to channel handlers (one new line per send site). egui still sleeps when idle. No egui dependency in protocol crates.

Cons: slightly indirect; a stray notify_one without a matching send would cause a wasted repaint (harmless).

Option C: Leave the polling loop, slow it down¶

Change Duration::from_millis(16) to from_millis(33) or from_millis(50). Half or a third of the CPU. Trivial, no other changes.

Pros: one-character change. Reversible. No event-routing work. Still responsive to most inputs.

Cons: still rasterises 20-30 full frames per second of nothing. Doesn't fix the underlying behaviour. CPU target (<10% of one core) is unreachable with this alone.

Recommendation¶

Option B with a fallback timer. Event-driven repaints for everything that matters (channel events, mouse movement, keyboard input — egui handles the latter two itself), plus a slow periodic repaint (say, 1 Hz) so sparklines and time-based UI elements still update.

This gives:

Idle CPU collapse from ~6 cores to near zero (egui sleeps; one tokio task wakes once a second to ping the bandwidth tracker).
Full responsiveness to channel events (display updates, cursor moves, audio).
Sparklines tick once a second, matching their actual data-arrival cadence.

The 1 Hz fallback is much weaker than the 60 Hz current state — 60x less work — but still catches anything that relies on Instant::elapsed() style logic (transient status messages, the bandwidth tracker's tick()).

Constraints and edge cases¶

Mouse-over-surface and keyboard input already trigger egui repaints via egui's own input handling; no fix needed there.
Bandwidth tracker ticks once per second (app.rs:bandwidth.tick() at line ~1050). The 1 Hz fallback covers this exactly.
Cadence mode (--cadence) injects a keystroke every 2 seconds; that's its own task that calls request_repaint already, or relies on the existing 60 Hz. Need to verify and possibly add an explicit repaint call there.
Bug-report status message timeout (5-second fade at app.rs:1080-1086) needs at least one repaint after the deadline to clear the label. The 1 Hz fallback covers this too.
Connection-state transitions (connect, disconnect) are channel events; the new event-driven path handles them.
TLS handshake and connection retries happen in async tasks before any event_rx.send. If they need to update the UI mid-handshake, they need a repaint trigger too. Verify.

Steps¶

Step	Effort	Model	Isolation	Brief for sub-agent
2a	high	opus	none	In ryll/src/app.rs, spawn a "repaint bridge" tokio task during `RyllApp::new`. The task holds (1) an `egui::Context` clone obtained from `cc.egui_ctx.clone()` in the eframe `creator`, and (2) a clone of an `Arc<tokio::sync::Notify>`. The task body is `loop { notify.notified().await; ctx.request_repaint(); }`. Store the `Arc<Notify>` on `RyllApp` so the management code that pushes channel events can call `notify.notify_one()`. Then, every place that sends on `event_tx` (the same `mpsc::Sender<ChannelEvent>` that channel handlers hold) must call `notify.notify_one()` immediately after. Identify those sites: search for `event_tx.send(` in `ryll/src/channels/`. Pass the `Arc<Notify>` into each channel handler's `new()` alongside the existing `event_tx`. This is intrusive — multiple files change. Then in `update()` at app.rs:2169, replace `request_repaint_after(16ms)` with `request_repaint_after(1s)` as a fallback for time-based UI elements (sparklines, status message expiry). Verify the cadence mode keystroke injection still wakes egui; if not, add a `notify.notify_one()` there too. Add a brief `// Repaint when channel events arrive; 1s fallback for time-based UI.` comment near the new code.
2b	low	sonnet	none	Manual smoke test against `make test-qemu`: (a) connect, observe idle CPU drops to <10% of one core after a few seconds with no input; (b) move mouse over the surface — UI responds without lag; (c) type — guest sees keystrokes; (d) trigger a status message (F8 with no surface) — message appears and clears within ~5 seconds; (e) bandwidth sparkline updates once per second. Document the measured idle CPU in this plan file and update the master plan's success-criteria check.

Success criteria for this phase¶

Idle CPU under 10% of one core (measured: connected, no input, no display activity, mouse outside window).
All interactive behaviour unchanged: typing, mouse, scroll, dialog open/close, status messages, sparklines.
pre-commit run --all-files passes; make test passes.
Single commit for step 2a (the implementation), single commit for step 2b (the measurement note appended to this file).

📝 Report an issue with this page