Reproducible product videos with Playwright + Remotion
How CaptureBeam pairs Playwright (capture) with Remotion (render) to compute polished motion video from a YAML script. Synthetic cursor, auto-zoom, captions — all derived, none recorded.
CaptureBeam pairs two specific tools: Playwright for capture, Remotion for render. Most of the engineering is in how they hand off to each other. Here's the architecture.
The split
Capture is expensive. Render is cheap. The split lets us re-render without re-capturing — switch aspect ratios, swap brand presets, change the cursor color, all in seconds. Capture only re-runs when the YAML or the underlying UI changes.
Two artifacts come out of capture:
- raw.webm — the bare browser recording from Playwright's recordVideo, or per-frame screenshots in --high-res mode.
- events.json — every step's resolved bounding rect, target metadata, timestamps, per-keystroke timing, network-idle moments, recovery flags.
What Playwright does
Playwright runs the YAML steps against a real browser. Each step:
- Resolves the NL target via getByRole / getByLabel / getByText / getByTestId, with multi-attempt recovery and a persistent historical-success cache.
- Captures the resolved bounding rect at action time and writes it to events.json.
- Saves an element screenshot to out/anchors/<stepId>.png as a visual anchor for future re-runs.
- Hides the OS cursor via cursor: none injection on every navigation, plus parking the mouse at (-1, -1) between steps.
recordVideo runs in the background and produces raw.webm at the configured quality (1080p / 1440p / 4K). For 4K-grade output we switch to per-frame screenshots stitched with ffmpeg, which is slower but pixel-perfect.
What Remotion does
Remotion takes raw.webm + events.json and composes the polished output. The key insight: every visual layer is computed from events, never from the recorded video.
- Synthetic cursor. Eased rect-to-rect motion between step targets, with click ripples and motion blur at high speed. Drawn in Remotion as an absolutely-positioned div, not in the recording.
- Auto-zoom. Camera transform keys around step rects. 350ms ease-in, hold for the action, 250ms ease-out.
- Highlights, captions, title card, ripples. All per-step schema fields, rendered as overlay layers above the recording.
The recording layer underneath is just a video tag inside a rounded clip. Switching aspect ratios is a render-time flag because the camera transform handles the crop. Switching brand presets is a render-time flag because the cursor and stage CSS are computed.
The recovery chain
On re-run against a new build, target resolution can fail. Our chain:
- Try the cached resolved selector from the previous run.
- If that misses, re-resolve from the NL intent.
- If that misses, fall back to the previous run's bounding rect (rect-fallback recovery, today).
- On the roadmap: visual-anchor template-match against the cached element screenshot. High confidence → click; medium → tag recovery: visual-match; low → emit a needs-review diff with old vs. new screenshots side-by-side.
Each recovery attempt is logged in events.json so post-mortems are possible. The runner doesn't silently fail; it tries hard, then bubbles a diff to a human.
The bundle cache
Remotion bundling is the slowest fixed cost — about 5–10 seconds per cold render. We hash the contents of src/render and src/schema; if unchanged, we reuse the bundle from .remotion-cache and skip bundling entirely. Re-renders against a stable composition are sub-second to start.
Open architecture
The whole pipeline is a Node script you can read in an afternoon. No proprietary capture format. No closed render engine. We bet on Playwright and Remotion specifically because both are battle-tested open-source projects with active dev communities — and we wanted the underlying tools to be ones engineers already trusted.
If you want to extend the renderer with a new visual effect, write a Remotion component and reference it from src/render. If you want a new step type, add a case to the discriminated union in src/schema.ts and a handler in src/runner.ts. There's no hidden magic.