Phase 5 — Auto-snapshot bug-report mode¶

Phase 5 of PLAN-stream-caps-and-flap.md.

Prompt¶

Before responding to questions or discussion points in this document, explore the ryll codebase thoroughly. The bug-report plumbing already exists end-to-end — your job is to wire a new automated trigger to the existing assembly + write path, not to redesign either. Specifically read:

ryll/src/bugreport.rs — BugReport::new (assembly) and BugReport::write_zip (file emission).
ryll/src/app.rs:2186 — the call site that builds a BugReport for a manual F8 trigger; manual_bug_report_dir for the existing dir-resolution logic.
ryll/src/config.rs — how CLI args are declared and threaded through.
ryll/src/main.rs:228 and :772 — how bug_report_dir is resolved and handed to the renderer.

Flag uncertainty explicitly rather than guessing.

Goal¶

Add a flight-data-recorder mode that fires a complete bug report on a fixed cadence into a rolling subdirectory. Operator sets it once at session start; whatever happens during the run is captured by construction, regardless of whether the operator notices a symptom in real time.

This directly addresses the session-002f problem: audio worked for the entire run, so no bug-report was filed — but if audio had gone silent for 30 seconds mid-session the operator might not have caught it in time to trigger a report manually. With auto-snapshot at a 30-second cadence, the silent window would have been captured in two reports either side and we could correlate playback counters across the boundary.

Scope¶

In scope:

New CLI flag --auto-snapshot-interval SECONDS (0 or unset = disabled, anything ≥ 1 enables periodic capture).
New CLI flag --auto-snapshot-cap N (default 20) — the maximum number of auto-snapshot zips kept on disk; oldest pruned when capacity is exceeded.
A tokio interval task spawned at session start when the interval is set; ticks every N seconds; assembles a BugReport via the existing BugReport::new path and writes it via write_zip into a dedicated auto-snapshots/ subdirectory of the operator's bug-report dir.
Auto-generated description matching the manual-report convention: "auto-snapshot T+47.3s" (session uptime embedded so an operator can correlate across snapshots).
The auto-snapshot's BugReportType is Connection (already exists; lightweight; carries the new playback diagnostics from phase 4 because the channel-state dispatch checks the report type to pick which channel's snapshot to embed). Decide in 5A: do we want Connection (lightweight, just main) or extend to a new BugReportType::AutoSnapshot that includes playback/usbredir/webdav too? Working answer: extend.
Per operator direction, pcap stays in auto-snapshot zips. The disk cost (~700 KiB pcap + ~150 KiB JSON + screenshot if Display ≈ ~1 MiB per zip, ~20 MiB at the default cap of 20) is acceptable for the diagnostic value.
Filename scheme: auto-snapshots/ryll-auto-snapshot-<utc_iso>-T+<uptime_secs>.zip — UTC ISO timestamp for sorting, uptime appended so the filename alone tells you when in the run it fired.
Operator awareness — startup notification. When the auto-snapshot interval task spawns at session start, push one NotifySeverity::Info notification via push_notification with NotificationSource::Internal: "Auto-snapshot mode enabled — every {N}s, max {cap} snapshots, saving to {path}". One-shot, never repeated, no cool-down (this is the operator's confirmation that the flag took effect). Without it, the operator has no in-app signal that the mode is active until the first zip lands on disk.
Operator awareness — live counter in the stats panel. Surface auto_snapshots_saved: u64 and auto_snapshots_pruned: u64 on a snapshot the stats panel reads (AppSnapshot is the natural home; it already carries other session-wide counters). Render a single line in the existing stats panel as "Auto-snapshot: {saved}/{cap}" when the mode is enabled (hide the line entirely when disabled so it doesn't add visual noise to non-auto sessions). Operator can glance at it any time without scrolling notifications. Updated by the interval task after each successful write_zip (saved += 1) and after each prune (pruned += deleted).

Out of scope:

A hamburger-menu live toggle. CLI-only for this phase. Adding a UI toggle later is straightforward but doesn't add diagnostic value over the CLI flag (operator decides before the run whether they're hunting for an intermittent issue).
Per-snapshot description customisation. The auto string is deterministic; an operator who wants a specific description should use the manual F8 path.
Compression of the pcap before zip. The existing zip flow already compresses; extra work for marginal saving.
Screenshot in auto-snapshots. Decision: include the screenshot if it can be captured cheaply from the existing trigger-snapshot ring (AppSnapshot's surfaces already provide the latest frame). If reaching that data from a non-GUI thread is intrusive, drop the screenshot for auto reports — pcap + channel-state is the high-value payload. Document the decision in code comments.
A notification per snapshot. At 30s × 10min that's 20 notifications, which would bury everything else in the panel. The startup notification + live stats-panel counter cover awareness without the noise. If a future operator wants per-snapshot logging, a separate --auto-snapshot-verbose flag with a milestone-cadence notification ("10 saved, 1 pruned" every Nth tick) is the cheap-to-add follow-up; deferred deliberately.
Disconnect-triggered auto-snapshot. The existing auto-disconnect path already fires a bug report on disconnect; auto-snapshot mode is independent of that.

Open questions¶

Q1 (decide in 5A): how does the auto-snapshot task reach the data it needs? BugReport::new takes references to traffic, channel_snapshots, app_snapshot, notifications, plus the target host/ port and optional surface pixels. The GUI thread holds these on the RyllApp struct. Working proposal:
Introduce an AutoSnapshotState struct holding Arc<TrafficCapture>, Arc<ChannelSnapshots>, Arc<Mutex<AppSnapshot>>, Arc<Mutex<NotificationStore>>, plus the resolved target host/port, output dir, cap, and interval. Construct once on session bring-up after RyllApp::new.
The interval task owns the AutoSnapshotState and calls a new BugReport::new_auto(...) helper that wraps BugReport::new with the auto description and a None screenshot path.
All the existing fields on RyllApp are already Arc-backed (verify by reading app.rs:2186 and nearby) — if not, this step bumps them to Arc.
Q2 (decide in 5A): how do we prune to the cap? Working proposal: after each successful write_zip, scan auto-snapshots/ for ryll-auto-snapshot-*.zip, sort by filename (which is timestamp-ordered), keep the newest N, delete the rest. Single-pass, no need to track state in memory across ticks. If the operator deletes some zips between ticks, our prune still works correctly (it operates on whatever's on disk).
Q3 (decide in 5A): what happens on write_zip failure? The existing manual path surfaces failures via the NotificationStore. The auto-snapshot task should NOT spam the UI on every failed tick (e.g. if disk is full, every 30 seconds would be too noisy). Working proposal: log at warn on the first failure, emit a single NotifySeverity::Warn notification with a 5-minute cool-down, log at debug thereafter, but never block the interval task — keep ticking in case the underlying problem clears.
Q4 (open): does auto-snapshot mode work without --bug-report-dir? Working answer: the existing manual_bug_report_dir() fallback chain (--bug-report-dir → --capture/bug-reports/ → cwd) applies unchanged. Auto-snapshots go into <that_dir>/auto-snapshots/. If neither flag is set, zips land in ./auto-snapshots/ in the current working directory — operator can run --auto-snapshot-interval 30 from any directory and find their data afterwards.

Design notes¶

Where it slots in¶

[ryll startup]
    │
    ├─ parse args (--auto-snapshot-interval N, --auto-snapshot-cap M)
    │
    └─ RyllApp::new(...) returns the GUI app
            │
            └─ on session start (post-handshake):
                    │
                    └─ if auto_snapshot_interval > 0:
                            │
                            ├─ resolve dir = <bug_report_dir>/auto-snapshots/
                            ├─ build AutoSnapshotState (Arc'd handles)
                            └─ tokio::spawn(auto_snapshot_loop(state))
                                    │
                                    └─ interval.tick() every N seconds:
                                            ├─ BugReport::new_auto(...)
                                            ├─ report.write_zip(dir)
                                            ├─ prune dir to cap
                                            └─ continue loop

Filename scheme¶

Each zip:

auto-snapshots/ryll-auto-snapshot-2026-05-18T20-37-42Z-T+47.3s.zip

ISO-style UTC timestamp + session uptime. The directory listing sorts chronologically; uptime tells the operator how far into the session each snapshot landed without opening the metadata.json.

Disk pressure¶

Per snapshot (rough order of magnitude): - channel-state.json: 5–15 KiB - metadata.json, session.json, notifications.json, runtime-metrics.json: ~5 KiB combined - pcap: ~700 KiB per 20-second window (varies with bandwidth) - screenshot.png: ~700 KiB for a 1920×1472 surface - screenshot-region.png: not applicable for auto-snapshots (no region selected)

Per zip after compression: 700 KiB – 1.5 MiB. At the default cap of 20, total disk is ~30 MiB. Acceptable; adjustable via --auto-snapshot-cap.

Interaction with `--capture`¶

--capture <dir> already creates <dir>/bug-reports/ for manual reports. Auto-snapshots go into <dir>/bug-reports/auto-snapshots/ when --capture is set and no explicit --bug-report-dir is provided. Documented in 5B.

Interaction with `BugReport::new`'s 2-second metric¶

sample¶

BugReport::new blocks for 2 seconds to sample runtime metrics. The interval task runs on its own tokio task; the blocking sample happens on a spawn_blocking thread so the tokio runtime stays responsive. If interval N < 3 seconds, samples overlap (mostly harmless — each is independent and the data is per-sample). Document a minimum recommended interval of 10 s in the CLI help text.

Execution step table¶

Step	Effort	Model	Isolation	Brief for sub-agent
5A	medium	sonnet	none	Core implementation. Add `auto_snapshot_interval: Option<u64>` and `auto_snapshot_cap: Option<usize>` to `ryll/src/config.rs` (clap derive). Thread through `main.rs` to `RyllApp::new`. Add `AutoSnapshotState` struct in a new `ryll/src/auto_snapshot.rs` (or `bugreport.rs` if cleaner) holding Arc'd handles to traffic, channel_snapshots, app_snapshot, notifications, plus resolved target_host/port, output_dir, cap, interval. Per Q1 working proposal: if any of these fields on `RyllApp` aren't already `Arc`-backed, bump them. Add `BugReport::new_auto(...)` helper that wraps `BugReport::new` with auto-generated description (`format!("auto-snapshot T+{:.1}s", session_uptime_secs)`) and `BugReportType::AutoSnapshot` (new variant — defaults its channel name dispatch to include playback + main + display + cursor + inputs + usbredir + webdav so a single zip carries everything). Spawn the interval task from `RyllApp::on_session_ready` (or wherever the session-bring-up code lives) when `auto_snapshot_interval` is set. At spawn time push one `NotifySeverity::Info` notification (`NotificationSource::Internal`): `"Auto-snapshot mode enabled — every {N}s, max {cap} snapshots, saving to {path}"` — one-shot, no cool-down. Add `auto_snapshots_saved: u64` and `auto_snapshots_pruned: u64` to `AppSnapshot` (or the snapshot the stats panel reads); bump after each successful write_zip / prune respectively. Render in the existing stats panel as `"Auto-snapshot: {saved}/{cap}"` only when the mode is enabled (hide the line entirely when disabled). Implement the prune-to-cap step per Q2: glob `auto-snapshots/ryll-auto-snapshot-*.zip`, sort by filename, delete oldest beyond cap. Per Q3: handle write_zip errors with a notification cool-down (5 min, single warn log on first failure). Per Q4: subdirectory is `<bug_report_dir>/auto-snapshots/` using `manual_bug_report_dir`'s fallback chain. Verify `make build && make test && make lint && pre-commit run --all-files`.
5B	low	haiku	none	Docs touch-up. Update `README.md` (if it covers CLI flags) and `docs/configuration.md` to document `--auto-snapshot-interval` and `--auto-snapshot-cap`. Add a short paragraph to `docs/troubleshooting.md` under or near the playback-observability section explaining when to enable auto-snapshot mode (intermittent issues, flight-data-recorder use case). Cross-link from `docs/libvirt-spice-recommendations.md`'s "Side-by-side testing recipe" section as an alternative to manual periodic reports. Run `pre-commit run --all-files`.
5C	—	—	—	Operator smoke test. Run a ryll session against `sf-4` with `--auto-snapshot-interval 30 --auto-snapshot-cap 20`. Let it run for ≥ 3 minutes while doing typical workload. Confirm: (a) at session start the notification panel shows one `Info` notification confirming auto-snapshot mode is enabled with the interval, cap, and target path; (b) the stats panel shows `"Auto-snapshot: N/20"` and N increments by 1 every ~30 s; (c) `<bug-report-dir>/auto-snapshots/` contains 6+ zips spaced ~30 s apart; (d) each zip's `channel-state.json` shows playback fields populating differently across snapshots (proves the snapshot is being re-taken not just the same data re-zipped); (e) after the cap is exceeded, oldest zips are pruned and the stats counter still reads N/20 (no overflow); (f) pcap is present in each zip and contains the ~20 seconds of traffic preceding the snapshot; (g) the notification panel does NOT show one notification per snapshot (proves we didn't accidentally make it noisy). This is operator verification, not a code change.

Commits: one per step (5A, 5B). 5C is operator verification.

Test plan¶

Automated (5A):

Unit test for the filename generator: given (utc_now, uptime_secs), produces the expected pattern.
Unit test for the prune helper: given a list of N+5 fake filenames, deletes the 5 oldest (by lexical sort, which is timestamp sort by construction).
Integration test (if feasible): construct a small AutoSnapshotState, run the interval loop for 2 ticks (interval = 1 s), assert two zips appear in the target dir and both deserialise without error.

Manual (5C):

Operator-driven; the smoke test in 5C is the contract.

Documentation impact¶

README.md / docs/configuration.md: new CLI flag docs (5B).
docs/troubleshooting.md: paragraph on auto-snapshot mode and when to use it (5B).
docs/libvirt-spice-recommendations.md: brief reference from the side-by-side testing recipe (5B).
Phase 10 (documentation phase) will further consolidate.

Success criteria¶

--auto-snapshot-interval 30 produces a new zip every ~30 seconds in <bug-report-dir>/auto-snapshots/.
Each zip is a full bug-report artefact equivalent to a manual F8 trigger (channel-state.json with all phase-4 diagnostics, pcap, metadata, runtime-metrics).
Rolling cap enforced; oldest zips pruned when cap is exceeded.
Disk usage stays bounded at roughly cap × ~1 MiB per zip.
The auto-snapshot task does not interfere with the GUI thread, the audio thread, or the manual F8 report path.
Operator awareness: one Info notification at session start confirms the mode is enabled (with interval, cap, and target path). A live counter in the stats panel shows "Auto-snapshot: {saved}/{cap}" and increments on each successful write; hidden when the mode is disabled. No per-snapshot notifications.
make build && make test && make lint && pre-commit run --all-files clean.

Back brief¶

Before executing 5A, the implementing sub-agent should back-brief the operator with:

Which fields on RyllApp need to become Arc-backed (if any).
Whether they're introducing a new BugReportType::AutoSnapshot variant or piggybacking on Connection.
The filename scheme (confirm UTC ISO + uptime).
How they're handling the cpal/audio-thread thread-safety concern (should be a non-issue if channel_snapshots is already Arc<ChannelSnapshots> — verify).

📝 Report an issue with this page