Phase 2: Encoder pipeline (framebuffer → H.264 NAL units)¶
Prompt¶
Before responding to questions or making changes, explore the
codebase. Read the master plan at
docs/plans/PLAN-web-frontend.md (especially Resolutions §2
encoder choice and §4 frame pacing) and the Phase 1 plan and
its execution history. Key files for this phase:
ryll/src/capture.rs— the closest in-tree precedent for an openh264 pipeline. TheVideoWriterstruct (~line 175) shows the lifecycle:Encoder::new(),RgbaSliceU8wrapper,YUVBuffer::from_rgb_source()for RGBA→YUV conversion,force_intra_frame()for keyframe-on-demand,encoder.encode(&yuv)returning a layered bitstream, and the NAL-extraction loop (types 7=SPS, 8=PPS, 5=IDR slice, else=non-IDR slice).shakenfist-spice-renderer/src/display/surface.rs— owns RGBA pixels,pixels(&self) -> &[u8],size(&self) -> (u32, u32),is_dirty(&self) -> bool,consume_dirty(&mut self) -> bool(read-and-clear).shakenfist-spice-renderer/src/session.rs— the existing orchestrator that the--webmode will eventually layer with its ownEncoderTaskdriver (Phase 4–5). Phase 2 does not wire production usage; we only build the encoder in isolation.ryll/Cargo.toml—openh264 = { version = "0.6", optional = true }is already gated behind thecapturefeature. Phase 2 will add the dep to the renderer'sCargo.toml(unconditional for now per master plan; the "ship one fat binary" stance defers feature-gating to future work).
External references: openh264 0.6 docs, the H.264 NAL unit
type table (7=SPS, 8=PPS, 5=IDR, 1=non-IDR), webrtc-rs's
H264Payloader (in webrtc-rtp) for the Phase 3 pre-flight.
Flag any uncertainty rather than guessing.
Goal¶
Build a live H.264 encoder driven by framebuffer mutation, ready to be consumed by the WebRTC plumbing in Phase 3. After this phase:
shakenfist-spice-renderer/src/encoder/is a new module containing aFrameSourcetrait, anH264Encoderstateful encoder, anEncoderTaskasync driver (30 fps cap, encode-on-dirty, keyframe-on-demand), and supporting types (EncodedFrame,EncoderControl).- A test harness produces synthetic frames via a simple
FrameSourceimpl and the encoder writes the resulting Annex-B NAL stream to a.h264file that can be played withffplay test.h264. - A small unit test confirms
webrtc-rs's H.264 packetiser accepts our NAL output without errors. This is the master plan's required pre-flight before Phase 3 takes a hard dependency on H.264.
No network code. No --web mode wiring. No GUI changes.
Scope¶
In:
- A new
encodermodule inside the renderer crate. - Adding
openh264 = "0.6"andwebrtc-rtp(or the parentwebrtccrate, scoped to just the rtp packetiser) to the renderer'sCargo.toml. - A
FrameSourcetrait that decouples the encoder from how pixels arrive. Phase 4–5 will provide a production impl; Phase 2 ships only a synthetic test impl. - Stateful H.264 encoder with explicit keyframe-on-demand.
- 30 fps cap + encode-on-dirty + force-keyframe-on-attach pacing in an async tokio task.
- Annex-B NAL-unit output framing. (Annex-B prepends each NAL
with the start code
00 00 00 01; this is whatwebrtc-rs's H.264 packetiser expects, and whatffplayaccepts as a raw.h264file.) - Test harness that produces a
.h264file consumable byffplay. - Unit test against
webrtc-rs::rtp::codecs::h264::H264Payloader.
Out:
- Production wiring (the encoder receiving pixels from real
SPICE display channel events). That belongs in Phase 4–5
when the
--weborchestrator exists. - VP8 swap. The pre-flight in step 2e is just a test — if it fails, we add VP8 in a follow-up commit, but Phase 2 ships H.264-only by default.
- Per-rectangle dirty tracking. Master plan defers to future work.
- Hardware encoders (NVENC/QSV/VAAPI). Future work.
- Cargo feature gating. Master plan defers; the encoder is unconditionally compiled in MVP.
- Bitrate / rate-control tuning beyond openh264 defaults. We can revisit if the Phase 5 CPU-budget instrumentation surfaces a problem.
- Re-removing the
capturefeature gate from the existing--captureflow. The two paths can coexist; consolidation is a separate follow-up.
Approach¶
Module shape¶
shakenfist-spice-renderer/src/encoder/:
mod.rs— re-exports the public API.frame_source.rs— theFrameSourcetrait + aSyntheticFrameSourcetest impl.h264.rs— theH264Encoder(stateful wrapper around openh264), RGBA → YUV conversion, NAL-unit extraction in Annex-B framing.task.rs— theEncoderTaskasync driver with the 30 fps / dirty / keyframe loop.
The renderer's lib.rs re-exports:
pub mod encoder;
pub use encoder::{
EncodedFrame, EncoderControl, EncoderTask, FrameSource,
H264Encoder,
};
Frame source abstraction¶
pub trait FrameSource: Send + 'static {
/// Acquire the next frame to encode. Returns None if no
/// new frame is available since the last call. The
/// implementation handles dirty tracking and any
/// synchronisation with concurrent writers; the encoder
/// just consumes.
///
/// The returned reference is valid until the next call
/// to `next_frame`. Implementations may copy pixels into
/// an internal staging buffer to satisfy this lifetime.
fn next_frame(&mut self) -> Option<FrameRef<'_>>;
}
pub struct FrameRef<'a> {
pub width: u32,
pub height: u32,
pub rgba: &'a [u8],
/// Wall-clock timestamp in microseconds. Used by the
/// task to derive RTP timestamps later in Phase 3. Pick
/// a stable origin (e.g. encoder start instant) and
/// monotonically increase.
pub timestamp_us: u64,
}
Production usage in Phase 4–5 will provide a FrameSource
that owns/observes the renderer's surface map. Phase 2 only
ships SyntheticFrameSource for testing — animated test
pattern, e.g. a moving gradient over a fixed background.
H.264 encoder (stateful)¶
pub struct H264Encoder {
inner: openh264::encoder::Encoder,
width: u32,
height: u32,
}
impl H264Encoder {
pub fn new(width: u32, height: u32) -> Result<Self> { ... }
/// Encode one RGBA frame, returning Annex-B framed NAL
/// units. If `force_keyframe` is true, this frame is
/// emitted as an IDR (which produces SPS/PPS NALs too).
pub fn encode(
&mut self,
rgba: &[u8],
force_keyframe: bool,
) -> Result<EncodedFrame> { ... }
}
pub struct EncodedFrame {
/// Annex-B-framed NAL units. Each NAL is prefixed with
/// the 4-byte start code `00 00 00 01`. Concatenating
/// `EncodedFrame::nal_units` over time produces a valid
/// `.h264` Annex-B stream.
pub nal_units: Vec<Vec<u8>>,
pub timestamp_us: u64,
pub keyframe: bool,
}
Key constraint inherited from openh264 / capture.rs: even
dimensions required (w & !1, h & !1). If the surface
size is odd, round down and crop.
Async task driver¶
pub struct EncoderTask<S: FrameSource> {
encoder: H264Encoder,
source: S,
output: mpsc::Sender<EncodedFrame>,
control: mpsc::Receiver<EncoderControl>,
fps_cap: u32,
}
pub enum EncoderControl {
/// Force the next frame to be an IDR keyframe. Called
/// by Phase 3+ whenever a new viewer attaches.
RequestKeyframe,
/// Stop the task.
Stop,
}
impl<S: FrameSource> EncoderTask<S> {
pub fn spawn(...) -> tokio::task::JoinHandle<Result<()>> {
tokio::task::spawn_blocking(|| {
// openh264 is a sync C library; run on a blocking
// thread to avoid stalling the tokio executor on
// long encodes.
...
})
}
}
The task loop selects on:
control_rx.recv()— handleRequestKeyframe(set akeyframe_pendingflag) andStop(break).- Tick at
1_000_000 / fps_capmicrosecond cadence (33333 us at 30 fps). On tick, callsource.next_frame(). IfSome, encode (forcing keyframe ifkeyframe_pending) and send theEncodedFrameonoutput. IfNone, skip the tick — no idle frames.
The spawn_blocking choice trades async fairness for
encoder simplicity: openh264 is pure CPU; running it inside
the async executor would block other tasks. A blocking
thread is the right model. The select! loop runs inside
that thread using tokio::runtime::Handle::current() to
poll the channels — or, simpler, the task uses
std::thread::spawn + blocking_recv / blocking_send on
crossbeam-style channels and reserves tokio for the rest.
Implementation detail; the agent picks whichever is cleanest.
VP8 contingency pre-flight (step 2e)¶
The master plan says: "Before declaring Phase 2 done, run a
small integration test through webrtc-rs's H.264
packetiser to confirm browser playback works." We can
approximate "browser playback works" with a unit test that:
- Creates a
webrtc::rtp::codecs::h264::H264Payloader. - Feeds it the NAL bytes from a few
EncodedFrames produced by the encoder (with the Annex-B start codes stripped, since the payloader expects raw NALs). - Asserts: the payloader succeeds, produces non-empty RTP payloads, and the first packet of an IDR contains payload type indicating SPS/PPS/IDR aggregation per RFC 6184.
If that succeeds, Phase 2 is done with H.264. If it fails
(packetiser rejects the NALs, missing SPS/PPS handling,
etc.), the contingency is a single follow-up commit
introducing a VP8 alternative via vpx-encode. Same
FrameSource and EncoderTask shape; different inner
encoder. Don't pre-build the VP8 path; only build it if
2e fails.
Prerequisites¶
- Phase 1 complete. (It is.)
- thought-bubble rebased onto current develop. (It is.)
Steps¶
| Step | Effort | Model | Isolation | Brief for sub-agent |
|---|---|---|---|---|
| 2a | low | sonnet | none | Scaffold the encoder module. Create shakenfist-spice-renderer/src/encoder/{mod,frame_source,h264,task}.rs (the latter three as empty stubs with TODO-style placeholders). Define the public types in mod.rs (re-exports) and the FrameSource / FrameRef types in frame_source.rs. Add openh264 = "0.6" to shakenfist-spice-renderer/Cargo.toml. Add pub mod encoder; and the re-exports to the renderer's lib.rs. Run make build, make lint, make test, pre-commit run --all-files. No new logic yet — this commit just establishes the module shape. Single commit. |
| 2b | high | opus | worktree | Implement H264Encoder in h264.rs. Lift the openh264 setup from ryll/src/capture.rs:175-310 (read it carefully): Encoder::new(), RgbaSliceU8::new, YUVBuffer::from_rgb_source, force_intra_frame(), encode(&yuv), the NAL-extraction loop with type-7 (SPS), type-8 (PPS), type-5 (IDR), type-1 (non-IDR) handling. Differences from capture.rs: (1) no MP4 muxing — emit Annex-B framed NAL units directly (each NAL prefixed with [0x00, 0x00, 0x00, 0x01]); (2) keyframe-on-demand via a force_keyframe: bool parameter, not just on first frame; (3) no last_timestamp_ms state — that lives in the task. The struct holds (width, height, openh264 Encoder). Even-dimension constraint: w & !1, h & !1; round down + return Err if either becomes 0. Add a unit test that encodes a 64x64 black frame, asserts the result has at least one IDR + SPS + PPS, and asserts subsequent encodes (without force_keyframe) produce smaller bitstreams. Run make checks. Worktree because the openh264 lifecycle has subtle failure modes (encoder state corruption on bad inputs) and we want to be able to throw the worktree away if the experiment goes sideways. Single commit. |
| 2c | medium | sonnet | none | Implement EncoderTask and EncoderControl in task.rs. The task takes a FrameSource, an H264Encoder, an mpsc::Sender<EncodedFrame>, an mpsc::Receiver<EncoderControl>, and an fps_cap: u32. Loop: select on the control receiver and a timer tick at 1_000_000 / fps_cap us. On RequestKeyframe, set a keyframe_pending flag for the next frame. On Stop, break. On tick, source.next_frame() — if Some(frame), encode (forcing keyframe if pending), send EncodedFrame on output, clear the flag; if None, skip. Use tokio::task::spawn_blocking because openh264 is a sync C library and blocking the tokio executor on encode times is bad. Inside the blocking thread, channels are tokio::sync::mpsc accessed via blocking_recv / blocking_send (or try_recv if you prefer non-blocking polling). Returns JoinHandle<Result<()>>. Add a unit test that spawns the task with a SyntheticFrameSource (defined in 2d, but stub-feed it here for the test if 2d is not yet landed — or better: have 2d add the test atop 2c's groundwork). Run make checks. Single commit. |
| 2d | medium | sonnet | none | Build the test harness. Add a SyntheticFrameSource to frame_source.rs that generates a 1280x720 RGBA frame with an animated gradient (e.g. a moving horizontal sine band over a fixed checkerboard); the timestamp is the frame index × 33333 us. Add an integration-style test in shakenfist-spice-renderer/tests/encoder_smoke.rs that: spawns an EncoderTask with the synthetic source and a 30 fps cap; runs it for ~3 seconds (90 frames); collects the output EncodedFrames; concatenates the Annex-B-framed NALs; writes the result to target/encoder_smoke.h264. Test asserts: at least one keyframe, total frame count ≈ 90 (allow ±5 for tick scheduling), file is non-empty. The test does not assert decode correctness — that's a manual ffplay target/encoder_smoke.h264 check after the test runs. Mention that in the test's doc comment so the human verifying the phase knows. Run make checks. Single commit. |
| 2e | high | opus | worktree | The webrtc-rs H.264 packetiser pre-flight. Add webrtc-rtp = "0.x" (or webrtc = "0.x" scoped) as a renderer dev-dependency. Add a unit test in shakenfist-spice-renderer/tests/webrtc_h264_smoke.rs that: encodes ~3 frames with the H264Encoder (one IDR + two non-IDR); for each EncodedFrame, strips the Annex-B start codes from each NAL and feeds the raw NALs to webrtc::rtp::codecs::h264::H264Payloader::payload(mtu, &nal) (MTU ~1200); asserts the payloader returns Ok(non_empty_payloads) and that for the IDR frame, payloads include the SPS and PPS NAL types as STAP-A or single-NAL-unit packets (per RFC 6184). If the test passes: Phase 2 is done. If it fails or hits API surprises (e.g. payloader expects a different framing): commit the test as-is documenting the failure, then a follow-up commit adds VP8 via vpx-encode as the encoder backend. Worktree because this is the highest-risk pre-flight: if the test surprises us, the whole branch may need reshaping and we want to be able to throw the worktree away. Single commit (or two if VP8 contingency triggers). |
After 2e, the renderer crate's src/encoder/ has mod.rs,
frame_source.rs, h264.rs, task.rs. The renderer's
Cargo.toml has openh264 as a regular dep and
webrtc-rtp (or scoped webrtc) as a dev-dep. Tests in
shakenfist-spice-renderer/tests/ cover the smoke run +
the packetiser pre-flight.
Step details¶
Step 2b expanded brief¶
The lift from capture.rs is mostly mechanical, but pay
attention to:
-
Annex-B framing.
capture.rsbuilds an MP4-shapedframe_dataVec by appending NAL bodies without start codes (because MP4 uses length-prefix framing). Phase 2 needs Annex-B framing: each NAL prefixed with[0x00, 0x00, 0x00, 0x01]. Build the output asVec<Vec<u8>>where each innerVecis one Annex-B framed NAL. The caller can concatenate for.h264file output or strip start codes for RTP payloading. -
SPS/PPS handling. Each IDR encode emits SPS (NAL type 7) and PPS (NAL type 8) followed by the IDR slice (type 5). Non-IDR encodes emit only the slice (type 1). Emit all NALs from each
encode()call; don't filter SPS/PPS to first frame only — the receiver may need them after a forced keyframe. -
Keyframe-on-demand. openh264's
force_intra_frame()is sticky — it sets a flag that the next encode consumes. Call it before the nextencode()wheneverforce_keyframe == true. Test that calling it twice between encodes still produces only one IDR (don't double-force). -
Encoder error recovery. If
encode()returnsErr, the encoder may be in a bad state. Strategy: log the error, return it from the publicencode()method, and let the caller decide whether to recreate the encoder. Don't try to recover internally. -
No
Mp4Writer/mp4::AvcConfig. Drop all the MP4 muxing code entirely — it does not belong in the live path.
Step 2c expanded brief¶
The async runtime model has two reasonable shapes:
spawn_blocking— the encoder loop runs on tokio's blocking pool. Channels aretokio::sync::mpscaccessed viablocking_recv(). Simple, plays well with tokio.std::thread::spawn— fully detached from tokio. Channels arecrossbeam::channel. More independent but adds a dep.
Pick spawn_blocking. It's already idiomatic in this
codebase (see how the audio path uses cpal, which is
similar shape).
The 30 fps tick can be a std::thread::sleep inside the
blocking thread — a tokio timer would require entering the
async context which is awkward inside spawn_blocking. A
simple Instant::now() + Duration::from_micros(33_333) -
elapsed sleep handles drift well enough. If frame encode
takes longer than the budget, skip the next tick (don't
catch up, just move on).
The keyframe-pending flag is local to the loop — no need for atomic / mutex since the loop is single-threaded.
Step 2e expanded brief¶
The test must speak Annex-B → raw NAL conversion: strip the start codes before handing NALs to the payloader. In pseudo-code:
for nal_with_start_code in encoded_frame.nal_units {
// Strip the leading 4-byte start code [0,0,0,1].
let raw_nal = &nal_with_start_code[4..];
let payloads = payloader.payload(1200, &raw_nal.into())?;
assert!(!payloads.is_empty());
}
If the payloader API differs (the H264Payloader takes
Bytes or has a different contract), adapt — the goal is
"the packetiser doesn't reject our NAL output". If it
rejects, that's the signal to switch to VP8. Document the
exact failure mode in the commit message before adding the
VP8 alternative.
If the test passes: this is a small, fast unit test that should live in the codebase as a regression check. It does not boot a full WebRTC connection — that's Phase 3.
Acceptance criteria¶
make lintpasses after each of 2a, 2b, 2c, 2d, 2e.make testpasses after each step.- After 2d:
target/encoder_smoke.h264exists after the test run, plays inffplay(manual verification step, documented in the test). - After 2e: the webrtc-rs H.264 packetiser unit test passes, OR a follow-up commit lands the VP8 alternative.
pre-commit run --all-filespasses.- The encoder module is self-contained; no back-deps on
ryll, no use of
egui, no use ofeframe. Verify withgrep -E "egui|eframe|ryll" shakenfist-spice-renderer/src/encoder/returning nothing. - Each of 2a–2e is a single commit on
thought-bubblewith a message that follows project conventions.
Risks¶
- openh264 API drift between versions.
capture.rsuses openh264 0.6. If a sub-agent reads outdated docs and imports a newer-version API, the build will fail. Anchor on whatcapture.rsalready uses. - Annex-B vs MP4 framing confusion. Easy to mix up.
The brief is explicit but the test in 2d is the safety
net (
ffplaywill reject malformed framing immediately). spawn_blocking+ tokio mpsc.blocking_recvworks insidespawn_blockingbecause the runtime allows blocking calls there. Butblocking_sendrequires the channel not to be in a single-threaded local set. Keep channels at module / static scope (noLocalSet).- Frame-rate drift under load. If encode takes >33 ms occasionally, the next tick is skipped rather than catching up. Acceptable for 30 fps. If we ever bump to 60 fps and load matters, we'll need a smarter scheduler.
- webrtc-rs H.264 packetiser surprises. The pre-flight
in 2e exists specifically to catch this. If the
packetiser API does not accept the NAL framing we
produce, the contingency is VP8. The risk is that VP8
via
vpx-encodehas its own surprises, but those would be discovered before Phase 3 starts. webrtc-rtpcrate version churn. The webrtc-rs ecosystem has been moving. Pin a version and verify it builds inside the devcontainer.
Documentation updates¶
After 2e, update:
ARCHITECTURE.md— note the new encoder module, where it sits in the data flow (DisplaySurface → FrameSource → H264Encoder → EncoderTask → mpsc→ Phase 3 WebRTC), and the pacing model. AGENTS.md— notemake testnow exercises the encoder smoke test; mentionffplay target/encoder_smoke.h264as a manual verification step for the encoder.docs/multi-mode-parity.md— no row change yet; the encoder isn't user-visible until Phase 4 wires it up.docs/plans/PLAN-web-frontend.md— flip the Phase 2 row in the Execution table from "Not written" to "In progress" on first phase commit, "Complete" on 2e.docs/plans/index.md— Web frontend status row stays at "In progress"; phases-list line gets Phase 2 marked complete when 2e lands.
These doc updates can be batched into the 2e commit, or done as a follow-up commit on top of 2e.
Estimated total scope¶
Roughly 800–1200 lines of new code across five commits, the bulk of it in 2b (H.264 encoder, ~300 lines) and 2c (async task driver, ~200 lines). 2a is scaffolding (~50 lines), 2d is a test harness (~150 lines), 2e is a test (~100 lines) plus possibly a VP8 commit if the contingency triggers (another ~300 lines).
Back brief¶
Before executing 2a, the implementing agent should
back-brief: which dependency versions will be added, the
concrete FrameSource and EncodedFrame API shapes, and
how the agent intends to handle the tokio / spawn_blocking
boundary. Do not start editing without the back-brief.
Subsequent steps (2b–2e) follow the same pattern: back-brief first, edit second.