Two MCP-Free Skills for Pi: A Browser and a Search Box

I’ve been using Pi, the AI coding harness from Mario Zechner. It’s fast and responsive, the agents are sharp, and it skips a lot of overhead other harnesses carry. But that last bit comes with a catch: Pi doesn’t support MCP servers at all. Mario’s argument is that MCP tools cost tokens whether you use them or not, and skills are the better primitive for agent capabilities. I agree with the take.

That’s a real gap Pi leaves you with. Playwright MCP and Chrome DevTools MCP are how most agents get a browser today. If your harness can’t speak MCP, you pretty much don’t get a browser (aside from some paid tools, which I won’t get into).

So I built one. An actual skill as an alternative to Playwright MCP.

browser-tools is a self-contained Node.js skill that gives Pi agents the same browser automation Playwright MCP offers, with no MCP server in the middle. Other coding harnesses like Claude Code or Codex CLI support MCP servers, so Playwright MCP or Google Chrome MCP work with them fine. Pi needed a non-MCP solution, which didn’t really exist.

This project started as a fork of Mario’s own first cut at the idea and grew, across a few long Claude Code sessions, into something I now reach for daily. Mario’s version had a few rough edges: a hard-coded MacOS path for Chrome, no support for multiple sessions, a handful of missing features.

What the skill ships with

23 small CLI scripts. One shared lib.js. No always-on daemon. Each script connects to Chromium on :9222 over the Chrome DevTools Protocol, does its job, and exits. The agent calls them like any other shell command.

The headline features:

Accessibility snapshot with stable refs. browser-snapshot.js returns a compact accessibility tree: button "Sign Up" [ref=e5]. Every interaction tool accepts @e5 in place of a CSS selector. This is the one thing I missed most when working without Playwright MCP — the agent was constantly guessing selectors and getting them wrong. Refs end that.
Named multi-sessions. Each session gets its own browser, port, profile and logs. Two agents can drive two different sites at once with no collisions. I hit a port-collision bug myself during eval runs, and that’s what motivated the design.
Console and network capture. A background daemon (browser-monitor.js start) records console messages, uncaught errors, and network activity. browser-console.js --errors and browser-network.js --failed surface what matters. CDP doesn’t replay history, so the daemon has to be attached before the activity you care about. The skill’s docs make that explicit so the agent doesn’t trip on it.
Actionability waits everywhere. Interactions wait for elements to be visible, enabled, and stable. No more flaky clicks on a button that’s still mid-animation.
All the rest: dialogs, file uploads, drag-and-drop (auto-detects HTML5 vs mouse-gesture), select boxes, key chords, hover, scroll, tabs, performance traces, cookie inspection, readable-content extraction.

The dialog-handler race and the marker-file fix

The MCP-parity push made me add browser-dialog.js for native alert / confirm / prompt. The first version had a race: the dialog handler attached just after the click fired, so half the time the dialog was missed and the test hung. I tried a sleep 0.5 before the click and got flaky behavior; a bigger sleep was slower and still flaky.

The fix is a small thing I’m pretty proud of. browser-dialog.js writes a marker file the moment its handler is attached:

browser-dialog.js accept &
until [ -f ~/.cache/browser-tools/sessions/default/dialog-armed ]; do sleep 0.1; done
browser-click.js @e3

The agent doesn’t need IPC. A file is the lowest-common-denominator readiness signal that works from a bash until loop in a script the model just wrote.

In a side-by-side benchmark against the pre-fix skill, an unhandled native confirm() hung the old browser-click.js for 268 seconds before the run had to be aborted. The new dialog tool wrapped the same scenario in 76. Permissions popups don’t show up in feature lists. They kill agent runs the first time you hit one.

Why it works for an agent

The design hinges on a few constraints: zero idle context cost, transparency over a server protocol, and state that lives in the DOM rather than in a daemon.

Zero always-on context cost. Playwright MCP and Chrome DevTools MCP inject 20+ tool definitions into every conversation, whether or not a browser is ever needed. This skill loads nothing until it triggers. Pi’s whole pitch is keeping context cheap, and this matches that.

Transparent and hackable. 23 small scripts the user can read and modify. No daemon, no protocol layer to reason about.

Refs live in the DOM, not in memory. The snapshot writes attributes onto the page, so even across stateless CLI calls @e5 keeps pointing at the same element. No daemon needed for state.

Standard install. npm install pulls a private Chromium build (~150 MB, one time). No system browser required. Works on Linux, macOS and Windows. There’s a Node-version guard in package.json because Node 26 has a puppeteer extraction bug I traced down to a pinned 2020-era extract-zip dependency. The guard turns a silent broken install into an upfront EBADENGINE.

What it won’t do: cross-browser engines, full network mocking, Playwright’s trace viewer. Those are authoring tools, and an agent doesn’t need them.

How it held up in a spec-driven run

I just finished a spec-driven implementation loop where my Pi agent used browser-tools to verify every UI change as it went. The agent took screenshots, read console errors, clicked through forms, and caught a regression I would’ve missed. The skill never got in the way.

The search skill: a Node script over SearXNG

Browser automation was only half of what I needed; the other half was search. The second skill in the repo is web-search, and it answers the same brief as the first: an agent needs to look something up, your harness can’t run a search MCP server, and you don’t want to wire up a paid API just so the model can google a stack trace.

It’s a single Node script with no dependencies. Node 18+ has fetch built in, so that covers it. The script points at a self-hosted SearXNG instance and returns ranked results.

SearXNG is what makes this practical. It’s an open-source metasearch engine that aggregates results from 70+ sources (Google, Bing, DuckDuckGo, Brave, Wikipedia, GitHub and more) and returns them in a single ranked list with no tracking and no per-query bill. You self-host it — Docker compose file, ten minutes on a small VPS or a spare machine — and your agent has a search backend it can hammer as hard as it wants. No API tab running up.

search.js "claude code skills"
search.js "claude code" --count 5 --time week
search.js "openai release notes" --category news --json

Filters cover the obvious axes: result count, category (general / news / images / videos), time range (day / week / month / year), language, safe-search level. The set of filters the script accepts are documented in the README. Output is plain text by default or a JSON array with --json for when the agent wants to parse it.

Configuration is one of two things: a SEARXNG_URL env var, or web-search/config.json (gitignored). The env var wins if both are set.

The combination of self-hosted SearXNG plus a 100-line Node script replaces what most agents reach for: Perplexity, Tavily, Serper, Brave’s API, the Anthropic web-search tool. None of those are bad, but they all charge per query and route your agent’s traffic through a third party. SearXNG is free, private, and yours. For an agent that searches dozens of times in a single run, that math gets noticeable fast.

Install and try them

Repo: github.com/Emrebener/Pi-Harness-Skills.

Drop the folder in your Pi (or Claude Code) skills directory. browser-tools/ needs npm install once (it pulls the bundled Chromium). web-search/ needs no install at all, just point SEARXNG_URL at your SearXNG. Then your agent has a browser and a search engine, with no MCP server in sight.