22 KiB
PhoneWork — Roadmap
✅ Milestone 2: Mailboy as a Versatile Assistant (COMPLETED)
Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users can control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.
M2.1 — Direct Q&A (no CC session required)
The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".
Changes:
- Update the system prompt: give the mailboy explicit permission to answer questions directly
- Add a heuristic in
_run_locked: if there's no active session and the message looks like a question (ends with?, containswhat/how/why/explain), skip tool loop and reply directly - Zero new code; pure prompt + logic tweak in
orchestrator/agent.py
M2.2 — Shell Tool
Run arbitrary shell commands on the host machine and return stdout/stderr.
Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.
New tool: orchestrator/tools.py → ShellTool
name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}
Safety guards:
- Blocklist of destructive patterns (
rm -rf /,format,mkfs,shutdown,reboot,dd if=,:(){:|:&};:) — refuse with a clear error cwdmust be underWORKING_DIR(reuse_resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)- Timeout hard cap: 120 s; for longer tasks see M2.4
New slash command: /shell <command> (bypasses LLM; runs directly)
M2.3 — File Operations Tool
Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"
New tool: orchestrator/tools.py → FileOpsTool (single tool, action dispatch)
name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
max_bytes (int, default 8000)
read: read file, truncate tomax_bytes, return contentlist: recursive directory tree (depth-limited to 3), file sizessearch: grep-like ripgrep/Python search forqueryinpathsend: upload and send file viabot/feishu.py::send_file()(already implemented) — tool receiveschat_idvia context var (add alongsidecurrent_user_id)
Safety: all paths must be under WORKING_DIR
M2.4 — Long-Running Task Manager
This is the key UX upgrade. claude -p and shell commands that take minutes need
fire-and-forget with completion notification.
Design:
- Add
BackgroundTaskdataclass:{task_id, description, started_at, status, conv_id_or_none} TaskRunnersingleton inagent/task_runner.py:submit(coro, description, notify_chat_id) -> task_id- wraps coroutine in
asyncio.create_task; on completion sends a Feishu notification viasend_text(notify_chat_id, ...) - stores tasks in-memory dict
{task_id: BackgroundTask}
- When
manager.send()is called for a CC session:- if
cc_timeout > 60: automatically run in background, return immediately with"⏳ Task #<id> started. I'll notify you when it's done." - otherwise run inline as today
- if
New tool: run_background — explicitly submits any shell command or CC prompt as a
background task and returns task_id immediately.
New slash command: /tasks — list running/completed background tasks with status.
New tool: task_status — check status of a specific task_id, optionally get output so far.
Notification format:
✅ Task #abc123 done (42s)
/new todo_app: fix the login bug
[CC output truncated to 800 chars]...
M2.5 — Web Tool
Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"
Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible,
official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml.
Get key at: https://metaso.cn/search-api/api-keys
New tool: WebTool (three actions dispatched via one tool)
name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)
search: callsmetaso_web_search— returns top results (title + snippet + URL)scopeoptions:webpage(default),paper,document,video,podcast
fetch: callsmetaso_web_readerwithformat=markdown— extracts clean content from URLask: callsmetaso_chat— RAG answer combining search + generation (for quick factual Q&A)
Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body,
Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).
New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool
disabled gracefully if not set)
M2.6 — Scheduling & Reminders
Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"
Design: agent/scheduler.py — thin wrapper around asyncio with:
schedule_once(delay_seconds, coro, description)— fire onceschedule_recurring(interval_seconds, coro_factory, description, max_runs)— repeat N times- All scheduled jobs send a Feishu notification on completion (same as M2.4)
- Jobs stored in-memory; cleared on server restart (acceptable for now)
New tool: scheduler
args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
message (str), conv_id (str, optional — if set, forward to that CC session)
New slash command: /remind <N>m|h|s <message> — set a reminder without LLM
Implementation Order
- M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
- M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
- M2.2 — Shell tool (most-used phone use case)
- M2.3 — File ops tool (
send_filealready done; rest is straightforward) - M2.5 — Web tool (秘塔AI MCP; needs
METASO_API_KEY) - M2.6 — Scheduling (builds on M2.4 notification infra)
Files to Create / Modify
| File | Change |
|---|---|
orchestrator/agent.py |
M2.1 prompt update + question heuristic |
orchestrator/tools.py |
Add ShellTool, FileOpsTool, WebTool, TaskStatusTool, SchedulerTool |
agent/task_runner.py |
New — TaskRunner singleton, BackgroundTask dataclass |
agent/scheduler.py |
New — schedule_once, schedule_recurring |
bot/commands.py |
Add /shell, /tasks, /remind commands |
bot/feishu.py |
Add chat_id context var for file send from tool |
bot/handler.py |
Pass chat_id into context var alongside user_id |
requirements.txt |
Add httpx (if not already present as transitive dep) |
Verification Checklist
- M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
- M2.2: Send "check git status in todo_app" —
ShellToolruns, output returned - M2.2: Send "rm -rf /" — blocked by safety guard
- M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
- M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
- M2.4: Start a long CC task (e.g.
--timeout 120) — bot replies immediately, notifies on finish - M2.4:
/tasks— lists running task with elapsed time - M2.5: "Python 3.13 有哪些新特性?" —
web askreturns RAG answer from metaso - M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
- M2.6:
/remind 10m deploy check— 10 min later, message arrives in Feishu
✅ Milestone 3: Multi-Host Architecture (Router / Host Client Split) (COMPLETED)
Goal: Split PhoneWork into two deployable components — a public-facing Router and one or more Host Clients behind NAT. A user can be served by multiple nodes simultaneously. Intelligence is split: the router runs a cheap LLM for routing decisions only; each node runs the full mailboy LLM for execution. A standalone script preserves the current single-machine experience.
Architecture
┌──────────┐ WebSocket ┌────────────────────────────────────┐
│ Feishu │◄────────────►│ Router (public VPS) │
│ Cloud │ │ - Feishu event handler │
└──────────┘ │ - Router LLM (routing only) │
│ - Node registry + active node map │
│ - NO mailboy, NO sessions │
└───────────┬────────────────────────┘
│ WebSocket (host clients connect in)
┌───────────┴────────────────────────┐
│ │
┌──────────▼──────────┐ ┌────────────▼────────┐
│ Host Client A │ │ Host Client B │
│ (home-pc) │ │ (work-server) │
│ - Mailboy LLM │ │ - Mailboy LLM │
│ - CC sessions │ │ - CC sessions │
│ - Shell / files │ │ - Shell / files │
│ - Task runner │ │ - Task runner │
└─────────────────────┘ └─────────────────────┘
Key design decisions:
- Host clients connect TO the router (outbound WebSocket) — NAT-transparent
- A user can be registered on multiple nodes simultaneously
- The router LLM decides which node to route each message to (cheap, one-shot)
- The node mailboy LLM handles the full orchestration loop (sessions, tools, CC)
- Each node maintains its own conversation history per user
- Task completion notifications: node pushes to router → router sends to Feishu
M3.1 — Shared Protocol Module
Foundation for both sides.
shared/protocol.py:
@dataclass
class RegisterMessage:
type: str = "register"
node_id: str = ""
serves_users: list[str] = field(default_factory=list)
working_dir: str = ""
capabilities: list[str] = field(default_factory=list) # ["claude_code", "shell", "file_ops", "web"]
display_name: str = "" # human-readable, shown in /nodes
@dataclass
class ForwardRequest:
type: str = "forward"
id: str = "" # correlation id, router awaits matching response
user_id: str = ""
chat_id: str = ""
text: str = ""
@dataclass
class ForwardResponse:
type: str = "forward_response"
id: str = ""
reply: str = ""
error: str = ""
@dataclass
class TaskComplete:
type: str = "task_complete"
task_id: str = ""
user_id: str = ""
chat_id: str = ""
result: str = ""
@dataclass
class Heartbeat:
type: str = "ping" | "pong"
Serialization: JSON + type-field dispatch. Both sides import from shared/.
M3.2 — Host Client: Full Mailboy Node
Each host client is a self-contained assistant: receives a raw user message from the router, runs the full LLM + tool loop, returns the reply.
Host client config (host_config.yaml):
NODE_ID: home-pc
DISPLAY_NAME: Home PC
ROUTER_URL: wss://router.example.com/ws/node
ROUTER_SECRET: <shared_secret>
# LLM for this node's mailboy
OPENAI_BASE_URL: https://open.bigmodel.cn/api/paas/v4/
OPENAI_API_KEY: <key>
OPENAI_MODEL: glm-4.7
WORKING_DIR: C:/Users/me/projects
METASO_API_KEY: <optional>
# Which Feishu open_ids this node serves (can overlap with other nodes)
SERVES_USERS:
- ou_abc123def456
- ou_xyz789
Startup flow:
- Connect WebSocket to
ROUTER_URLwithAuthorization: Bearer <ROUTER_SECRET> - Send
RegisterMessage→ router adds node to registry - Enter receive loop:
ForwardRequest→ run local mailboy LLM → sendForwardResponseping→ sendpong
What the host client runs:
- Full
orchestrator/agent.py(mailboy LLM, tool loop, per-user history, active session) - Full
orchestrator/tools.py(CC, shell, file ops, web, scheduler — all local) agent/manager.py,agent/pty_process.py,agent/task_runner.py— unchanged
Task completion flow:
- Background task finishes → host client pushes
TaskCompleteto router - Router receives it → calls
send_text(chat_id, result)via Feishu API
New files:
host_client/main.py— entry point, WebSocket connect + receive loop, reconnecthost_client/config.py— loadshost_config.yaml
Reused unchanged:
orchestrator/— entire mailboy stack moves here as-isagent/— entire session/execution stack moves here as-is
M3.3 — Router: Node Registry + Routing LLM
The router is thin: Feishu integration, node registry, and a small LLM that decides which node to forward each message to.
Node registry (router/nodes.py):
{node_id: NodeConnection}— connected nodesNodeConnection: WebSocket,node_id,serves_users[],capabilities[],display_name,connected_at,last_heartbeatget_nodes_for_user(open_id) -> list[NodeConnection]— may return multipleget_active_node(user_id) -> NodeConnection | None— per-user active node preferenceset_active_node(user_id, node_id)— updated by router LLM or/nodecommand
Router LLM (router/routing_agent.py):
Lightweight, one-shot routing decision. System prompt:
You are a routing assistant. A user has sent a message. Choose which node to forward it to.
Connected nodes for this user:
- home-pc (ACTIVE): sessions=[todo_app, blog], capabilities=[claude_code, shell, file_ops]
- work-server: sessions=[], capabilities=[claude_code, shell]
Rules:
- If the message references an active session, route to the node owning it.
- If the user names a machine explicitly ("on work-server", "@work-server"), route there.
- If only one node is connected, route there without asking.
- If ambiguous with multiple idle nodes, ask the user to clarify.
- For meta commands (/nodes, /help), handle directly without routing.
One tool: route_to(node_id: str). No history. No multi-step loop. Single LLM call.
WebSocket endpoint (router/ws.py):
GET /ws/node
Authorization: Bearer <ROUTER_SECRET>
- Validates secret → accepts registration → adds to registry
- Forwards
ForwardRequest→ host client - Receives
ForwardResponse→ resolves pendingasyncio.Future - Receives
TaskComplete→ callssend_text(chat_id, result)to Feishu - Heartbeat: ping every 30s, drop if no pong in 10s
Request correlation (router/rpc.py):
forward(node, user_id, chat_id, text) -> str(reply)- Assigns UUID
request_id, storesFuturein pending map - Sends
ForwardRequestover node's WebSocket - Awaits
Futurewith timeout (default 600s for long CC tasks) - On
ForwardResponse, resolves Future withreplyor raises onerror
Modified files:
main.py→ mounts/ws/node, startsNodeRegistrybot/handler.py→ after allowlist check, callsrouting_agent.route(user_id, chat_id, text)instead ofagent.run(user_id, text)directlyconfig.py→ addsROUTER_SECRET,ROUTER_LLM_*(can be same or different model)
New files:
router/nodes.py—NodeRegistry,NodeConnectionrouter/ws.py— WebSocket endpointrouter/rpc.py—forward()with future correlationrouter/routing_agent.py— single-shot routing LLM
M3.4 — Standalone Mode Script
Single-machine users run python standalone.py — identical UX to today's python main.py.
Internally uses the full M3 architecture with both components in one process.
standalone.py:
"""
Run router + host client in a single process (localhost mode).
Equivalent to the pre-M3 single-machine setup.
"""
import asyncio, secrets, uvicorn
from router.main import create_app
from host_client.main import NodeClient
async def main():
secret = secrets.token_hex(16)
router_url = "ws://127.0.0.1:8000/ws/node"
# Start FastAPI router in background
config = uvicorn.Config(create_app(router_secret=secret), host="127.0.0.1", port=8000)
server = uvicorn.Server(config)
asyncio.create_task(server.serve())
# Wait for router to be ready
await asyncio.sleep(1.0)
# Start host client connecting to localhost
client = NodeClient.from_keyring(router_url=router_url, secret=secret)
await client.run() # reconnect loop
asyncio.run(main())
Config: standalone.py reads the same keyring.yaml as today. The host client inherits
all LLM/CC config from it. User only maintains one config file.
M3.5 — Node Health + User-Facing Status
/nodes slash command (handled at router, before forwarding):
Connected Nodes:
→ home-pc [ACTIVE] sessions=2 online 3h
work-server sessions=0 online 47m
Use "/node <name>" to switch active node.
/node <name> slash command — sets active node for user.
Router /health updates:
{
"nodes": [
{"node_id": "home-pc", "status": "online", "users": 2, "sessions": 3},
{"node_id": "work-server", "status": "offline", "last_seen": "5m ago"}
]
}
Feishu notifications on node events (sent to all affected users):
⚠️ Node "home-pc" disconnected.
✅ Node "home-pc" reconnected.
Final Project Structure (post-M3)
PhoneWork/
├── shared/
│ └── protocol.py # Wire protocol (shared by router + host client)
│
├── router/ # Deployable unit 1: public VPS
│ ├── main.py # FastAPI app factory, mounts /ws/node
│ ├── nodes.py # NodeRegistry, NodeConnection
│ ├── ws.py # WebSocket endpoint for host clients
│ ├── rpc.py # forward(node, user_id, chat_id, text) → reply
│ └── routing_agent.py # Single-shot routing LLM
│
├── bot/ # Part of router
│ ├── handler.py # Feishu event handler (now calls routing_agent)
│ ├── feishu.py # Send text/file/card to Feishu
│ └── commands.py # /nodes, /node, /help handled here; rest forwarded
│
├── host_client/ # Deployable unit 2: dev machine
│ ├── main.py # WS connect to router, receive loop, reconnect
│ └── config.py # host_config.yaml loader
│
├── orchestrator/ # Part of host client (full mailboy)
│ ├── agent.py # Mailboy LLM (unchanged)
│ └── tools.py # Tools: CC, shell, file ops, web, scheduler
│
├── agent/ # Part of host client (local execution)
│ ├── manager.py # Session registry
│ ├── pty_process.py # Claude Code runner
│ ├── task_runner.py # Background tasks
│ ├── scheduler.py # Reminders
│ └── audit.py # Audit log
│
├── standalone.py # Runs router + host client in one process
├── config.py # Router config (keyring.yaml)
└── requirements.txt
M3 Implementation Order
- M3.1 — Shared protocol (foundation)
- M3.2 — Host client daemon (wrap existing mailboy + agent stack)
- M3.3 — Router (node registry, WS, routing LLM, refactor handler)
- M3.4 — Standalone script
- M3.5 — Node health,
/nodes,/nodecommands
M3 Verification Checklist
python standalone.py— works identically to currentpython main.py- Router starts, host client connects, registration logged
- Feishu message → routing LLM selects node → forwarded → reply returned
/nodesshows all connected nodes with active marker/node work-server— switches active node, confirmed in next message- Two nodes serving same user — message routed to active node
- Kill host client → router marks offline, user sees "Node home-pc is offline"
- Host client reconnects → re-registered, messages flow again
- Long CC task on node finishes → router forwards completion notification to Feishu
- Wrong
ROUTER_SECRET→ connection rejected with 401
M3 Implementation Notes (from M2 code review)
Three concrete details discovered from reading the actual M2 code that must be handled during M3 implementation:
1. bot/commands.py accesses node-local state directly
The current commands.py calls agent._active_conv, manager.list_sessions(),
task_runner.list_tasks(), scheduler — all of which move to the host client in M3.
Resolution: At the router, bot/commands.py is reduced to two commands:
/nodes and /node <name>. All other slash commands (/new, /status, /close,
/switch, /direct, /smart, /shell, /tasks, /remind) are forwarded to the
active node as-is — the node's mailboy handles them using its local commands.py.
The node's command handler remains unchanged from M2.
2. chat_id must be forwarded to the node
bot/handler.py calls set_current_chat(chat_id) before invoking the agent.
In M3, handler.py stays at the router but the agent (and set_current_chat) moves
to the node. The chat_id travels in ForwardRequest (already planned), and
host_client/main.py must call set_current_chat(msg.chat_id) before invoking the
local agent.run(). This is essential for FileSendTool and SchedulerTool to work.
3. orchestrator/tools.py imports config.WORKING_DIR
_resolve_dir() imports WORKING_DIR from root config.py. When orchestrator/
moves to the host client, this import must switch to host_client/config.py.
In standalone mode, host_client/config.py can re-export from root config.py to
keep a single keyring.yaml.