# sidekar - Full Reference

> The sidecar for AI agents. A coordination and automation substrate with four capability pillars: browser automation (CDP + Chrome extension), desktop automation (macOS), an agent communication bus, and background automation (monitor + cron).

## What is sidekar?

sidekar is a single binary that gives AI agents the ability to see, interact with, and coordinate through browsers, desktops, and each other. It is not an orchestrator, not a harness, and not an agent OS. It equips autonomous agents without replacing their control loop. Install with `curl -fsSL sidekar.dev/install | sh`. That downloads the binary and runs `sidekar install` to add `SKILL.md` to each detected agent’s skills directory.

## Install

```sh
curl -fsSL sidekar.dev/install | sh
```

## Getting started (step by step)

1. **Install:** run the curl command above (binary + `sidekar install` for skills).
2. **Chrome extension (optional):** load unpacked from `extension/`, run `sidekar device login`, sign in from the extension popup, then use `sidekar ext ...` commands.
3. **`sidekar device login` (optional):** device auth against sidekar.dev for remote session access (web dashboard, tunnel workflows).
4. **Launch an agent:** `sidekar <agent> [args…]` for any agent CLI on PATH (e.g. `sidekar claude`, `sidekar codex`).

## Core capabilities (four pillars)

### 1. Browser automation (CDP + Chrome extension)

Full Chrome DevTools Protocol access. Navigate, click, type, screenshot, read page content. The optional Chrome extension complements CDP for the same automation goals. Smart perception returns compact page summaries, not raw DOM dumps. Agents escalate from text to screenshot only when needed.

**Web search and multi-page reads** (`search`, `read-urls`) and **batch** (multi-step sequences) are **use cases** on the browser stack, not separate product pillars.

**Key tools:** navigate, read, text, ax-tree, observe, screenshot, click, type, fill, press, scroll, keyboard, search, read-urls, batch

**batch** example (multi-step in one call):
```json
{"actions": [
  {"tool": "click", "target": "--text Continue", "retries": 2},
  {"tool": "wait-for-nav"},
  {"tool": "screenshot"}
]}
```

**Perception escalation order (stop at the first that works):**
1. `read` - Reader-mode text extraction (articles, docs, search results). Low cost.
2. `ax-tree -i` / `observe` - Interactive elements with ref numbers. Low cost.
3. `text` - Full page text with refs. Low-medium cost.
4. `dom` - HTML structure with selectors. Medium cost.
5. `screenshot` (cropped) - Visual of one element. Medium cost.
6. `screenshot` (full page) - Full visual. High cost.

**Element targeting priority:**
1. Refs from ax-tree/observe/text (click 3, type 5 hello)
2. Text search (click --text Submit)
3. CSS selectors (#id, [data-testid="..."])
4. eval with querySelector
5. Coordinates from screenshot (last resort)

### 2. Desktop automation (macOS)

macOS Accessibility API integration. Find and interact with native app elements, take screenshots, launch and manage applications. No browser required.

**Key tools:** desktop_screenshot, desktop_apps, desktop_windows, desktop_find, desktop_click, desktop_launch, desktop_activate, desktop_quit

### 3. Agent communication bus

Agents discover and message each other via a local bus (SQLite-backed). Send requests, broadcast, hand off tasks.

**Key tools:** register, unregister, bus who, bus send, bus done

**Message kinds:** request (expects response), response (answers a prior request), fyi (informational)

### 4. Background automation

Watch browser tabs for meaningful changes and run tools on a cron schedule. Reactive alerts and unattended runs; results delivered via the bus.

**Key tools:** monitor start/stop/status, cron_create, cron_list, cron_delete

## Tool Categories

Core tools are always available. Extended tools load on demand via `tools(action: "load", category: "<name>")`:

- **forms**: select, upload, drag, clear, focus, dialog, paste, clipboard, inserttext
- **nav**: back, forward, reload, wait-for, wait-for-nav, find, resolve
- **debug**: console, network, block, eval, dom, storage, cookies, sw, security
- **media**: viewport, zoom, grid, media, animations, pdf, download
- **desktop**: desktop_screenshot, desktop_apps, desktop_windows, desktop_find, desktop_click, desktop_launch, desktop_activate, desktop_quit
- **session**: hover, lock, unlock, activate, minimize, kill, monitor, frames, frame
- **meta**: config, install

## Tab Isolation

Multiple agents share the same Chrome instance. Each session owns its tabs. Never touch tabs you didn't create. Use `newtab` to open new tabs, `tabs` to list yours, `close` to remove them.

## Profiles

- `launch` - default shared profile
- `launch --profile <name>` - isolated browser instance
- `launch --profile new` - auto-generated profile ID
- `launch --headless` - no visible window
- `launch --browser brave` - use a specific browser

## Common Patterns

**Navigate and read:** Call navigate with URL. Auto-brief returned.

**Fill a form:** Use `fill` with fields object: `{"#email": "user@example.com", "#password": "secret"}`

**Search the web:** Call search with query. Results extracted automatically.

**Research multiple pages:** Call read-urls with a list of URLs. Content extracted in parallel.

**Rich text editors:** click the editor, then use keyboard (not type) to preserve cursor position.

## Supported agents

Claude Code, Codex, Cursor, Copilot, Gemini CLI, Windsurf, Cline, ChatGPT Desktop, OpenCode, Pi, and others via `sidekar install` or the skills registry.

## Pricing

- Personal: Free (all features, no limits)
- Professional: $9/month (honor system, for work that pays)

## Links

- Website: https://sidekar.dev
- Install: https://sidekar.dev/install
- Compact version: https://sidekar.dev/llms.txt