$ emrebener
home personal-projects pi harness skills

Pi Harness Skills

Two MCP-free skills for AI coding agents: browser-tools, a self-contained Node.js skill that gives an agent the same browser automation Playwright MCP offers, and web-search, a single Node script that turns a self-hosted SearXNG instance into a search backend. No MCP server in the middle for either. Both ship in one repo. I use them daily inside Pi, the AI coding harness from Mario Zechner.

Pi is the reason these exist. Pi doesn’t support MCP at all. Mario’s argument is that MCP tools cost tokens whether or not you use them, and skills are the better primitive. I agree. But the gap that leaves is real: Playwright MCP and Chrome DevTools MCP are how almost every agent today gets a browser. If your harness doesn’t speak MCP, you don’t get one. Same story for web search: Anthropic’s web-search tool, Tavily, Serper, Perplexity, all MCP or paid API.

1. What I built

Skill-shaped alternatives to both. The project started as a fork of Mario’s own first cut and grew, across a few long Claude Code sessions, into something I now reach for daily. The shape that emerged:

  • One-shot CLI scripts, no always-on daemon. browser-tools is 23 small scripts and a shared lib.js. Each connects to Chromium on :9222 over the Chrome DevTools Protocol, does its job, and exits. The agent invokes them like any other shell command. web-search is one script.
  • Zero idle context cost. Nothing loads until the agent triggers the skill. Pi’s whole pitch is keeping context cheap; this matches it.
  • Transparent and hackable. Plain JavaScript files the user can read and modify. No protocol layer, no MCP server, nothing in front of the actual work.

The headline browser features cover what an agent actually needs to drive a page: accessibility snapshots with stable refs, named multi-sessions with isolated ports and profiles, background capture of console and network activity, actionability waits on every interaction, plus the long tail of dialogs, file uploads, drag-and-drop, select boxes, key chords, hover, scroll, tabs, performance traces, cookie inspection, and readable-content extraction.

2. The hardest part: the dialog-handler race

Native alert / confirm / prompt dialogs hang agents. In a side-by-side benchmark, an unhandled confirm() hung the original browser-click.js for 268 seconds before the run had to be aborted. The fixed version wrapped the same scenario in 76. Permissions popups don’t show up in feature lists, but they kill agent runs the first time you hit one.

The race is structural and the fix took a while to find. browser-dialog.js exists to attach a CDP dialog handler before the click that triggers the dialog. The first version attached the handler in a separate process from the click. Half the time the dialog fired before the handler was ready, and the test hung. A sleep 0.5 before the click was flaky; a bigger sleep was slower and still flaky.

The fix: browser-dialog.js writes a marker file the moment its CDP handler is attached. The agent’s script waits for the file before clicking.

browser-dialog.js accept &
until [ -f ~/.cache/browser-tools/sessions/default/dialog-armed ]; do sleep 0.1; done
browser-click.js @e3

A file is the lowest-common-denominator readiness signal that works from a bash until loop in a script the model just wrote. No IPC primitive to learn, no shared memory, no socket. Just a path the dialog tool documents and the model can check. The whole “agent-friendly design” of the skill is in that snippet: tools the model can compose with &, until, and [ -f ... ], because that’s the surface area the model already knows.

3. The pattern: state in the DOM, not in a daemon

The skill works for an agent for three reasons that all reduce to the same idea: state lives in the lowest-friction place the next process can read.

Refs live in the DOM, not in memory. browser-snapshot.js returns a compact accessibility tree (button "Sign Up" [ref=e5]) and writes the refs as attributes onto the page. Every interaction tool accepts @e5 in place of a CSS selector. The agent stops guessing selectors and gets it right the first time. Because the refs are on the page, they survive across stateless CLI calls. No daemon needed to remember them.

Zero always-on context cost. Playwright MCP and Chrome DevTools MCP inject 20+ tool definitions into every conversation, whether or not a browser is ever needed. This skill loads nothing until it triggers. The CLI scripts are documented inside SKILL.md, which Pi reads lazily.

Sessions isolate on the filesystem. Each named session gets its own browser, port, profile, and log directory. Two agents can drive two different sites at once with no collisions. I hit a port-collision bug myself during eval runs, and that’s what motivated the design.

A few more constraints fell out of asking “what does the agent need, and nothing more”:

  • A background monitor (browser-monitor.js start) records console messages, uncaught errors, and network activity. CDP doesn’t replay history, so the monitor has to be attached before the activity you care about. The skill’s docs make that explicit so the model doesn’t trip on it.
  • Interactions wait for elements to be visible, enabled, and stable before firing. No more flaky clicks on a button mid-animation.
  • npm install pulls a private Chromium build (~150 MB, one time). No system browser required. A Node-version guard in package.json turns Node 26’s puppeteer-extraction bug, traced down to a pinned 2020-era extract-zip dependency, into an upfront EBADENGINE instead of a silent broken install.

What it won’t do: cross-browser engines, full network mocking, Playwright’s trace viewer. Those are authoring tools. An agent doesn’t need them, so they’re not in the skill.

4. The search half: same shape, smaller

web-search is the other half of what an agent needs. One Node script, no dependencies (Node 18+ has fetch built in), pointed at a self-hosted SearXNG instance.

SearXNG is what makes this practical. It’s an open-source metasearch engine that aggregates 70+ sources (Google, Bing, DuckDuckGo, Brave, Wikipedia, GitHub) into one ranked list, with no tracking and no per-query bill. Self-host it once (Docker compose, ten minutes on a small VPS or a spare machine), and the agent has a search backend it can hammer as hard as it wants.

search.js "claude code skills"
search.js "claude code" --count 5 --time week
search.js "openai release notes" --category news --json

Filters cover the obvious axes: result count, category (general / news / images / videos), time range, language, safe-search level. Output is plain text by default or a JSON array with --json when the agent wants to parse it. Configuration is a SEARXNG_URL env var or a gitignored config.json. The env var wins if both are set.

A hundred lines of Node plus SearXNG replaces what most agents reach for: Perplexity, Tavily, Serper, Brave’s API, Anthropic’s web-search tool. None of those are bad. They all charge per query and route the agent’s traffic through a third party. For an agent that searches dozens of times in a single run, and most spec-driven loops do, that math gets noticeable fast.