# PhoneWork — Roadmap ## ✅ Milestone 2: Mailboy as a Versatile Assistant (COMPLETED) **Goal:** Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users can control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session. ### M2.1 — Direct Q&A (no CC session required) The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?". **Changes:** - Update the system prompt: give the mailboy explicit permission to answer questions directly - Add a heuristic in `_run_locked`: if there's no active session and the message looks like a question (ends with `?`, contains `what/how/why/explain`), skip tool loop and reply directly - Zero new code; pure prompt + logic tweak in `orchestrator/agent.py` --- ### M2.2 — Shell Tool Run arbitrary shell commands on the host machine and return stdout/stderr. Covers: check git status, tail logs, kill a process, `ps aux | grep`, `pip list`, etc. **New tool:** `orchestrator/tools.py` → `ShellTool` ``` name: "run_shell" args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30) returns: {stdout, stderr, exit_code} ``` **Safety guards:** - Blocklist of destructive patterns (`rm -rf /`, `format`, `mkfs`, `shutdown`, `reboot`, `dd if=`, `:(){:|:&};:`) — refuse with a clear error - `cwd` must be under `WORKING_DIR` (reuse `_resolve_dir`) or be an explicit absolute path approved by the user (raise a confirmation request) - Timeout hard cap: 120 s; for longer tasks see M2.4 **New slash command:** `/shell ` (bypasses LLM; runs directly) --- ### M2.3 — File Operations Tool Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments" **New tool:** `orchestrator/tools.py` → `FileOpsTool` (single tool, `action` dispatch) ``` name: "file_ops" args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search), max_bytes (int, default 8000) ``` - `read`: read file, truncate to `max_bytes`, return content - `list`: recursive directory tree (depth-limited to 3), file sizes - `search`: grep-like ripgrep/Python search for `query` in `path` - `send`: upload and send file via `bot/feishu.py::send_file()` (already implemented) — tool receives `chat_id` via context var (add alongside `current_user_id`) **Safety:** all paths must be under `WORKING_DIR` --- ### M2.4 — Long-Running Task Manager This is the key UX upgrade. `claude -p` and shell commands that take minutes need fire-and-forget with completion notification. **Design:** - Add `BackgroundTask` dataclass: `{task_id, description, started_at, status, conv_id_or_none}` - `TaskRunner` singleton in `agent/task_runner.py`: - `submit(coro, description, notify_chat_id) -> task_id` - wraps coroutine in `asyncio.create_task`; on completion sends a Feishu notification via `send_text(notify_chat_id, ...)` - stores tasks in-memory dict `{task_id: BackgroundTask}` - When `manager.send()` is called for a CC session: - if `cc_timeout > 60`: automatically run in background, return immediately with `"⏳ Task # started. I'll notify you when it's done."` - otherwise run inline as today **New tool:** `run_background` — explicitly submits any shell command or CC prompt as a background task and returns `task_id` immediately. **New slash command:** `/tasks` — list running/completed background tasks with status. **New tool:** `task_status` — check status of a specific `task_id`, optionally get output so far. **Notification format:** ``` ✅ Task #abc123 done (42s) /new todo_app: fix the login bug [CC output truncated to 800 chars]... ``` --- ### M2.5 — Web Tool Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文" **Backend:** 秘塔AI Search MCP (`https://metaso.cn/api/mcp`) — mainland China accessible, official API, Bearer token auth. Requires `METASO_API_KEY` in `keyring.yaml`. Get key at: https://metaso.cn/search-api/api-keys **New tool:** `WebTool` (three actions dispatched via one tool) ``` name: "web" args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int) ``` - `search`: calls `metaso_web_search` — returns top results (title + snippet + URL) - `scope` options: `webpage` (default), `paper`, `document`, `video`, `podcast` - `fetch`: calls `metaso_web_reader` with `format=markdown` — extracts clean content from URL - `ask`: calls `metaso_chat` — RAG answer combining search + generation (for quick factual Q&A) **Implementation:** HTTP POST to `https://metaso.cn/api/mcp` with JSON-RPC body, `Authorization: Bearer ` header. Use `httpx.AsyncClient` (already installed). **New config key:** `METASO_API_KEY` in `keyring.yaml` and `config.py` (optional — WebTool disabled gracefully if not set) --- ### M2.6 — Scheduling & Reminders Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes" **Design:** `agent/scheduler.py` — thin wrapper around `asyncio` with: - `schedule_once(delay_seconds, coro, description)` — fire once - `schedule_recurring(interval_seconds, coro_factory, description, max_runs)` — repeat N times - All scheduled jobs send a Feishu notification on completion (same as M2.4) - Jobs stored in-memory; cleared on server restart (acceptable for now) **New tool:** `scheduler` ``` args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int), message (str), conv_id (str, optional — if set, forward to that CC session) ``` **New slash command:** `/remind m|h|s ` — set a reminder without LLM --- ## Implementation Order 1. **M2.1** — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk) 2. **M2.4** — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6) 3. **M2.2** — Shell tool (most-used phone use case) 4. **M2.3** — File ops tool (`send_file` already done; rest is straightforward) 5. **M2.5** — Web tool (秘塔AI MCP; needs `METASO_API_KEY`) 6. **M2.6** — Scheduling (builds on M2.4 notification infra) --- ## Files to Create / Modify | File | Change | |------|--------| | `orchestrator/agent.py` | M2.1 prompt update + question heuristic | | `orchestrator/tools.py` | Add `ShellTool`, `FileOpsTool`, `WebTool`, `TaskStatusTool`, `SchedulerTool` | | `agent/task_runner.py` | New — `TaskRunner` singleton, `BackgroundTask` dataclass | | `agent/scheduler.py` | New — `schedule_once`, `schedule_recurring` | | `bot/commands.py` | Add `/shell`, `/tasks`, `/remind` commands | | `bot/feishu.py` | Add `chat_id` context var for file send from tool | | `bot/handler.py` | Pass `chat_id` into context var alongside `user_id` | | `requirements.txt` | Add `httpx` (if not already present as transitive dep) | --- ## Verification Checklist - [x] M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call - [x] M2.2: Send "check git status in todo_app" — `ShellTool` runs, output returned - [x] M2.2: Send "rm -rf /" — blocked by safety guard - [x] M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned - [x] M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat - [x] M2.4: Start a long CC task (e.g. `--timeout 120`) — bot replies immediately, notifies on finish - [x] M2.4: `/tasks` — lists running task with elapsed time - [x] M2.5: "Python 3.13 有哪些新特性?" — `web ask` returns RAG answer from metaso - [x] M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown - [x] M2.6: `/remind 10m deploy check` — 10 min later, message arrives in Feishu --- --- ## ✅ Milestone 3: Multi-Host Architecture (Router / Host Client Split) (COMPLETED) **Goal:** Split PhoneWork into two deployable components — a public-facing **Router** and one or more **Host Clients** behind NAT. A user can be served by multiple nodes simultaneously. Intelligence is split: the router runs a cheap LLM for routing decisions only; each node runs the full mailboy LLM for execution. A standalone script preserves the current single-machine experience. ### Architecture ``` ┌──────────┐ WebSocket ┌────────────────────────────────────┐ │ Feishu │◄────────────►│ Router (public VPS) │ │ Cloud │ │ - Feishu event handler │ └──────────┘ │ - Router LLM (routing only) │ │ - Node registry + active node map │ │ - NO mailboy, NO sessions │ └───────────┬────────────────────────┘ │ WebSocket (host clients connect in) ┌───────────┴────────────────────────┐ │ │ ┌──────────▼──────────┐ ┌────────────▼────────┐ │ Host Client A │ │ Host Client B │ │ (home-pc) │ │ (work-server) │ │ - Mailboy LLM │ │ - Mailboy LLM │ │ - CC sessions │ │ - CC sessions │ │ - Shell / files │ │ - Shell / files │ │ - Task runner │ │ - Task runner │ └─────────────────────┘ └─────────────────────┘ ``` **Key design decisions:** - Host clients connect TO the router (outbound WebSocket) — NAT-transparent - A user can be registered on multiple nodes simultaneously - The **router LLM** decides *which node* to route each message to (cheap, one-shot) - The **node mailboy LLM** handles the full orchestration loop (sessions, tools, CC) - Each node maintains its own conversation history per user - Task completion notifications: node pushes to router → router sends to Feishu --- ### M3.1 — Shared Protocol Module Foundation for both sides. **`shared/protocol.py`:** ```python @dataclass class RegisterMessage: type: str = "register" node_id: str = "" serves_users: list[str] = field(default_factory=list) working_dir: str = "" capabilities: list[str] = field(default_factory=list) # ["claude_code", "shell", "file_ops", "web"] display_name: str = "" # human-readable, shown in /nodes @dataclass class ForwardRequest: type: str = "forward" id: str = "" # correlation id, router awaits matching response user_id: str = "" chat_id: str = "" text: str = "" @dataclass class ForwardResponse: type: str = "forward_response" id: str = "" reply: str = "" error: str = "" @dataclass class TaskComplete: type: str = "task_complete" task_id: str = "" user_id: str = "" chat_id: str = "" result: str = "" @dataclass class Heartbeat: type: str = "ping" | "pong" ``` Serialization: JSON + type-field dispatch. Both sides import from `shared/`. --- ### M3.2 — Host Client: Full Mailboy Node Each host client is a self-contained assistant: receives a raw user message from the router, runs the full LLM + tool loop, returns the reply. **Host client config** (`host_config.yaml`): ```yaml NODE_ID: home-pc DISPLAY_NAME: Home PC ROUTER_URL: wss://router.example.com/ws/node ROUTER_SECRET: # LLM for this node's mailboy OPENAI_BASE_URL: https://open.bigmodel.cn/api/paas/v4/ OPENAI_API_KEY: OPENAI_MODEL: glm-4.7 WORKING_DIR: C:/Users/me/projects METASO_API_KEY: # Which Feishu open_ids this node serves (can overlap with other nodes) SERVES_USERS: - ou_abc123def456 - ou_xyz789 ``` **Startup flow:** 1. Connect WebSocket to `ROUTER_URL` with `Authorization: Bearer ` 2. Send `RegisterMessage` → router adds node to registry 3. Enter receive loop: - `ForwardRequest` → run local mailboy LLM → send `ForwardResponse` - `ping` → send `pong` **What the host client runs:** - Full `orchestrator/agent.py` (mailboy LLM, tool loop, per-user history, active session) - Full `orchestrator/tools.py` (CC, shell, file ops, web, scheduler — all local) - `agent/manager.py`, `agent/cc_runner.py`, `agent/task_runner.py` — unchanged Task completion flow: - Background task finishes → host client pushes `TaskComplete` to router - Router receives it → calls `send_text(chat_id, result)` via Feishu API **New files:** - `host_client/main.py` — entry point, WebSocket connect + receive loop, reconnect - `host_client/config.py` — loads `host_config.yaml` **Reused unchanged:** - `orchestrator/` — entire mailboy stack moves here as-is - `agent/` — entire session/execution stack moves here as-is --- ### M3.3 — Router: Node Registry + Routing LLM The router is thin: Feishu integration, node registry, and a small LLM that decides which node to forward each message to. **Node registry** (`router/nodes.py`): - `{node_id: NodeConnection}` — connected nodes - `NodeConnection`: WebSocket, `node_id`, `serves_users[]`, `capabilities[]`, `display_name`, `connected_at`, `last_heartbeat` - `get_nodes_for_user(open_id) -> list[NodeConnection]` — may return multiple - `get_active_node(user_id) -> NodeConnection | None` — per-user active node preference - `set_active_node(user_id, node_id)` — updated by router LLM or `/node` command **Router LLM** (`router/routing_agent.py`): Lightweight, one-shot routing decision. System prompt: ``` You are a routing assistant. A user has sent a message. Choose which node to forward it to. Connected nodes for this user: - home-pc (ACTIVE): sessions=[todo_app, blog], capabilities=[claude_code, shell, file_ops] - work-server: sessions=[], capabilities=[claude_code, shell] Rules: - If the message references an active session, route to the node owning it. - If the user names a machine explicitly ("on work-server", "@work-server"), route there. - If only one node is connected, route there without asking. - If ambiguous with multiple idle nodes, ask the user to clarify. - For meta commands (/nodes, /help), handle directly without routing. ``` One tool: `route_to(node_id: str)`. No history. No multi-step loop. Single LLM call. **WebSocket endpoint** (`router/ws.py`): ``` GET /ws/node Authorization: Bearer ``` - Validates secret → accepts registration → adds to registry - Forwards `ForwardRequest` → host client - Receives `ForwardResponse` → resolves pending `asyncio.Future` - Receives `TaskComplete` → calls `send_text(chat_id, result)` to Feishu - Heartbeat: ping every 30s, drop if no pong in 10s **Request correlation** (`router/rpc.py`): - `forward(node, user_id, chat_id, text) -> str` (reply) - Assigns UUID `request_id`, stores `Future` in pending map - Sends `ForwardRequest` over node's WebSocket - Awaits `Future` with timeout (default 600s for long CC tasks) - On `ForwardResponse`, resolves Future with `reply` or raises on `error` **Modified files:** - `main.py` → mounts `/ws/node`, starts `NodeRegistry` - `bot/handler.py` → after allowlist check, calls `routing_agent.route(user_id, chat_id, text)` instead of `agent.run(user_id, text)` directly - `config.py` → adds `ROUTER_SECRET`, `ROUTER_LLM_*` (can be same or different model) **New files:** - `router/nodes.py` — `NodeRegistry`, `NodeConnection` - `router/ws.py` — WebSocket endpoint - `router/rpc.py` — `forward()` with future correlation - `router/routing_agent.py` — single-shot routing LLM --- ### M3.4 — Standalone Mode Script Single-machine users run `python standalone.py` — identical UX to today's `python main.py`. Internally uses the full M3 architecture with both components in one process. **`standalone.py`:** ```python """ Run router + host client in a single process (localhost mode). Equivalent to the pre-M3 single-machine setup. """ import asyncio, secrets, uvicorn from router.main import create_app from host_client.main import NodeClient async def main(): secret = secrets.token_hex(16) router_url = "ws://127.0.0.1:8000/ws/node" # Start FastAPI router in background config = uvicorn.Config(create_app(router_secret=secret), host="127.0.0.1", port=8000) server = uvicorn.Server(config) asyncio.create_task(server.serve()) # Wait for router to be ready await asyncio.sleep(1.0) # Start host client connecting to localhost client = NodeClient.from_keyring(router_url=router_url, secret=secret) await client.run() # reconnect loop asyncio.run(main()) ``` **Config:** `standalone.py` reads the same `keyring.yaml` as today. The host client inherits all LLM/CC config from it. User only maintains one config file. --- ### M3.5 — Node Health + User-Facing Status **`/nodes` slash command** (handled at router, before forwarding): ``` Connected Nodes: → home-pc [ACTIVE] sessions=2 online 3h work-server sessions=0 online 47m Use "/node " to switch active node. ``` **`/node ` slash command** — sets active node for user. **Router `/health` updates:** ```json { "nodes": [ {"node_id": "home-pc", "status": "online", "users": 2, "sessions": 3}, {"node_id": "work-server", "status": "offline", "last_seen": "5m ago"} ] } ``` **Feishu notifications on node events (sent to all affected users):** ``` ⚠️ Node "home-pc" disconnected. ✅ Node "home-pc" reconnected. ``` --- ## Final Project Structure (post-M3) ``` PhoneWork/ ├── shared/ │ └── protocol.py # Wire protocol (shared by router + host client) │ ├── router/ # Deployable unit 1: public VPS │ ├── main.py # FastAPI app factory, mounts /ws/node │ ├── nodes.py # NodeRegistry, NodeConnection │ ├── ws.py # WebSocket endpoint for host clients │ ├── rpc.py # forward(node, user_id, chat_id, text) → reply │ └── routing_agent.py # Single-shot routing LLM │ ├── bot/ # Part of router │ ├── handler.py # Feishu event handler (now calls routing_agent) │ ├── feishu.py # Send text/file/card to Feishu │ └── commands.py # /nodes, /node, /help handled here; rest forwarded │ ├── host_client/ # Deployable unit 2: dev machine │ ├── main.py # WS connect to router, receive loop, reconnect │ └── config.py # host_config.yaml loader │ ├── orchestrator/ # Part of host client (full mailboy) │ ├── agent.py # Mailboy LLM (unchanged) │ └── tools.py # Tools: CC, shell, file ops, web, scheduler │ ├── agent/ # Part of host client (local execution) │ ├── manager.py # Session registry │ ├── cc_runner.py # Claude Code runner │ ├── task_runner.py # Background tasks │ ├── scheduler.py # Reminders │ └── audit.py # Audit log │ ├── standalone.py # Runs router + host client in one process ├── config.py # Router config (keyring.yaml) └── requirements.txt ``` --- ## M3 Implementation Order 1. **M3.1** — Shared protocol (foundation) 2. **M3.2** — Host client daemon (wrap existing mailboy + agent stack) 3. **M3.3** — Router (node registry, WS, routing LLM, refactor handler) 4. **M3.4** — Standalone script 5. **M3.5** — Node health, `/nodes`, `/node` commands --- ## M3 Verification Checklist - [x] `python standalone.py` — works identically to current `python main.py` - [x] Router starts, host client connects, registration logged - [x] Feishu message → routing LLM selects node → forwarded → reply returned - [x] `/nodes` shows all connected nodes with active marker - [x] `/node work-server` — switches active node, confirmed in next message - [x] Two nodes serving same user — message routed to active node - [x] Kill host client → router marks offline, user sees "Node home-pc is offline" - [x] Host client reconnects → re-registered, messages flow again - [x] Long CC task on node finishes → router forwards completion notification to Feishu - [x] Wrong `ROUTER_SECRET` → connection rejected with 401 --- ## M3 Implementation Notes (from M2 code review) Three concrete details discovered from reading the actual M2 code that must be handled during M3 implementation: ### 1. `bot/commands.py` accesses node-local state directly The current `commands.py` calls `agent._active_conv`, `manager.list_sessions()`, `task_runner.list_tasks()`, `scheduler` — all of which move to the host client in M3. **Resolution:** At the router, `bot/commands.py` is reduced to two commands: `/nodes` and `/node `. All other slash commands (`/new`, `/status`, `/close`, `/switch`, `/direct`, `/smart`, `/shell`, `/tasks`, `/remind`) are forwarded to the active node as-is — the node's mailboy handles them using its local `commands.py`. The node's command handler remains unchanged from M2. ### 2. `chat_id` must be forwarded to the node `bot/handler.py` calls `set_current_chat(chat_id)` before invoking the agent. In M3, `handler.py` stays at the router but the agent (and `set_current_chat`) moves to the node. The `chat_id` travels in `ForwardRequest` (already planned), and `host_client/main.py` must call `set_current_chat(msg.chat_id)` before invoking the local `agent.run()`. This is essential for `FileSendTool` and `SchedulerTool` to work. ### 3. `orchestrator/tools.py` imports `config.WORKING_DIR` `_resolve_dir()` imports `WORKING_DIR` from root `config.py`. When `orchestrator/` moves to the host client, this import must switch to `host_client/config.py`. In standalone mode, `host_client/config.py` can re-export from root `config.py` to keep a single `keyring.yaml`.