PhoneWork/ROADMAP.md
Yuyao Huang (Sam) 8ecc701d5e feat: 添加任务调度器、后台任务运行器及多种工具支持
实现后台任务调度器(scheduler.py)和任务运行器(task_runner.py),支持长时间运行任务的异步执行和状态跟踪
新增多种工具支持:Shell命令执行、文件操作(读写/搜索/发送)、网页搜索/问答、定时提醒等
扩展README和ROADMAP文档,描述新功能和未来多主机架构规划
在配置文件中添加METASO_API_KEY支持秘塔AI搜索功能
优化代理逻辑,自动识别通用问题直接回答而不创建会话
2026-03-28 13:45:20 +08:00

21 KiB

PhoneWork — Roadmap

Milestone 2: Mailboy as a Versatile Assistant

Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users should be able to control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.

M2.1 — Direct Q&A (no CC session required)

The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".

Changes:

  • Update the system prompt: give the mailboy explicit permission to answer questions directly
  • Add a heuristic in _run_locked: if there's no active session and the message looks like a question (ends with ?, contains what/how/why/explain), skip tool loop and reply directly
  • Zero new code; pure prompt + logic tweak in orchestrator/agent.py

M2.2 — Shell Tool

Run arbitrary shell commands on the host machine and return stdout/stderr. Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.

New tool: orchestrator/tools.pyShellTool

name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}

Safety guards:

  • Blocklist of destructive patterns (rm -rf /, format, mkfs, shutdown, reboot, dd if=, :(){:|:&};:) — refuse with a clear error
  • cwd must be under WORKING_DIR (reuse _resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)
  • Timeout hard cap: 120 s; for longer tasks see M2.4

New slash command: /shell <command> (bypasses LLM; runs directly)


M2.3 — File Operations Tool

Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"

New tool: orchestrator/tools.pyFileOpsTool (single tool, action dispatch)

name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
      max_bytes (int, default 8000)
  • read: read file, truncate to max_bytes, return content
  • list: recursive directory tree (depth-limited to 3), file sizes
  • search: grep-like ripgrep/Python search for query in path
  • send: upload and send file via bot/feishu.py::send_file() (already implemented) — tool receives chat_id via context var (add alongside current_user_id)

Safety: all paths must be under WORKING_DIR


M2.4 — Long-Running Task Manager

This is the key UX upgrade. claude -p and shell commands that take minutes need fire-and-forget with completion notification.

Design:

  • Add BackgroundTask dataclass: {task_id, description, started_at, status, conv_id_or_none}
  • TaskRunner singleton in agent/task_runner.py:
    • submit(coro, description, notify_chat_id) -> task_id
    • wraps coroutine in asyncio.create_task; on completion sends a Feishu notification via send_text(notify_chat_id, ...)
    • stores tasks in-memory dict {task_id: BackgroundTask}
  • When manager.send() is called for a CC session:
    • if cc_timeout > 60: automatically run in background, return immediately with "⏳ Task #<id> started. I'll notify you when it's done."
    • otherwise run inline as today

New tool: run_background — explicitly submits any shell command or CC prompt as a background task and returns task_id immediately.

New slash command: /tasks — list running/completed background tasks with status.

New tool: task_status — check status of a specific task_id, optionally get output so far.

Notification format:

✅ Task #abc123 done (42s)
/new todo_app: fix the login bug

[CC output truncated to 800 chars]...

M2.5 — Web Tool

Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"

Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible, official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml. Get key at: https://metaso.cn/search-api/api-keys

New tool: WebTool (three actions dispatched via one tool)

name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)
  • search: calls metaso_web_search — returns top results (title + snippet + URL)
    • scope options: webpage (default), paper, document, video, podcast
  • fetch: calls metaso_web_reader with format=markdown — extracts clean content from URL
  • ask: calls metaso_chat — RAG answer combining search + generation (for quick factual Q&A)

Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body, Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).

New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool disabled gracefully if not set)


M2.6 — Scheduling & Reminders

Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"

Design: agent/scheduler.py — thin wrapper around asyncio with:

  • schedule_once(delay_seconds, coro, description) — fire once
  • schedule_recurring(interval_seconds, coro_factory, description, max_runs) — repeat N times
  • All scheduled jobs send a Feishu notification on completion (same as M2.4)
  • Jobs stored in-memory; cleared on server restart (acceptable for now)

New tool: scheduler

args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
      message (str), conv_id (str, optional — if set, forward to that CC session)

New slash command: /remind <N>m|h|s <message> — set a reminder without LLM


Implementation Order

  1. M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
  2. M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
  3. M2.2 — Shell tool (most-used phone use case)
  4. M2.3 — File ops tool (send_file already done; rest is straightforward)
  5. M2.5 — Web tool (秘塔AI MCP; needs METASO_API_KEY)
  6. M2.6 — Scheduling (builds on M2.4 notification infra)

Files to Create / Modify

File Change
orchestrator/agent.py M2.1 prompt update + question heuristic
orchestrator/tools.py Add ShellTool, FileOpsTool, WebTool, TaskStatusTool, SchedulerTool
agent/task_runner.py New — TaskRunner singleton, BackgroundTask dataclass
agent/scheduler.py New — schedule_once, schedule_recurring
bot/commands.py Add /shell, /tasks, /remind commands
bot/feishu.py Add chat_id context var for file send from tool
bot/handler.py Pass chat_id into context var alongside user_id
requirements.txt Add httpx (if not already present as transitive dep)

Verification Checklist

  • M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
  • M2.2: Send "check git status in todo_app" — ShellTool runs, output returned
  • M2.2: Send "rm -rf /" — blocked by safety guard
  • M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
  • M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
  • M2.4: Start a long CC task (e.g. --timeout 120) — bot replies immediately, notifies on finish
  • M2.4: /tasks — lists running task with elapsed time
  • M2.5: "Python 3.13 有哪些新特性?" — web ask returns RAG answer from metaso
  • M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
  • M2.6: /remind 10m deploy check — 10 min later, message arrives in Feishu


Milestone 3: Multi-Host Architecture (Router / Host Client Split)

Goal: Split PhoneWork into two deployable components — a public-facing Router and one or more Host Clients behind NAT. A user can be served by multiple nodes simultaneously. Intelligence is split: the router runs a cheap LLM for routing decisions only; each node runs the full mailboy LLM for execution. A standalone script preserves the current single-machine experience.

Architecture

┌──────────┐  WebSocket   ┌────────────────────────────────────┐
│  Feishu  │◄────────────►│          Router (public VPS)       │
│  Cloud   │              │  - Feishu event handler            │
└──────────┘              │  - Router LLM (routing only)       │
                          │  - Node registry + active node map │
                          │  - NO mailboy, NO sessions         │
                          └───────────┬────────────────────────┘
                                      │ WebSocket (host clients connect in)
                          ┌───────────┴────────────────────────┐
                          │                                    │
               ┌──────────▼──────────┐           ┌────────────▼────────┐
               │   Host Client A     │           │   Host Client B     │
               │   (home-pc)         │           │   (work-server)     │
               │  - Mailboy LLM      │           │  - Mailboy LLM      │
               │  - CC sessions      │           │  - CC sessions      │
               │  - Shell / files    │           │  - Shell / files    │
               │  - Task runner      │           │  - Task runner      │
               └─────────────────────┘           └─────────────────────┘

Key design decisions:

  • Host clients connect TO the router (outbound WebSocket) — NAT-transparent
  • A user can be registered on multiple nodes simultaneously
  • The router LLM decides which node to route each message to (cheap, one-shot)
  • The node mailboy LLM handles the full orchestration loop (sessions, tools, CC)
  • Each node maintains its own conversation history per user
  • Task completion notifications: node pushes to router → router sends to Feishu

M3.1 — Shared Protocol Module

Foundation for both sides.

shared/protocol.py:

@dataclass
class RegisterMessage:
    type: str = "register"
    node_id: str = ""
    serves_users: list[str] = field(default_factory=list)
    working_dir: str = ""
    capabilities: list[str] = field(default_factory=list)  # ["claude_code", "shell", "file_ops", "web"]
    display_name: str = ""  # human-readable, shown in /nodes

@dataclass
class ForwardRequest:
    type: str = "forward"
    id: str = ""          # correlation id, router awaits matching response
    user_id: str = ""
    chat_id: str = ""
    text: str = ""

@dataclass
class ForwardResponse:
    type: str = "forward_response"
    id: str = ""
    reply: str = ""
    error: str = ""

@dataclass
class TaskComplete:
    type: str = "task_complete"
    task_id: str = ""
    user_id: str = ""
    chat_id: str = ""
    result: str = ""

@dataclass
class Heartbeat:
    type: str = "ping" | "pong"

Serialization: JSON + type-field dispatch. Both sides import from shared/.


M3.2 — Host Client: Full Mailboy Node

Each host client is a self-contained assistant: receives a raw user message from the router, runs the full LLM + tool loop, returns the reply.

Host client config (host_config.yaml):

NODE_ID: home-pc
DISPLAY_NAME: Home PC
ROUTER_URL: wss://router.example.com/ws/node
ROUTER_SECRET: <shared_secret>

# LLM for this node's mailboy
OPENAI_BASE_URL: https://open.bigmodel.cn/api/paas/v4/
OPENAI_API_KEY: <key>
OPENAI_MODEL: glm-4.7

WORKING_DIR: C:/Users/me/projects
METASO_API_KEY: <optional>

# Which Feishu open_ids this node serves (can overlap with other nodes)
SERVES_USERS:
  - ou_abc123def456
  - ou_xyz789

Startup flow:

  1. Connect WebSocket to ROUTER_URL with Authorization: Bearer <ROUTER_SECRET>
  2. Send RegisterMessage → router adds node to registry
  3. Enter receive loop:
    • ForwardRequest → run local mailboy LLM → send ForwardResponse
    • ping → send pong

What the host client runs:

  • Full orchestrator/agent.py (mailboy LLM, tool loop, per-user history, active session)
  • Full orchestrator/tools.py (CC, shell, file ops, web, scheduler — all local)
  • agent/manager.py, agent/pty_process.py, agent/task_runner.py — unchanged

Task completion flow:

  • Background task finishes → host client pushes TaskComplete to router
  • Router receives it → calls send_text(chat_id, result) via Feishu API

New files:

  • host_client/main.py — entry point, WebSocket connect + receive loop, reconnect
  • host_client/config.py — loads host_config.yaml

Reused unchanged:

  • orchestrator/ — entire mailboy stack moves here as-is
  • agent/ — entire session/execution stack moves here as-is

M3.3 — Router: Node Registry + Routing LLM

The router is thin: Feishu integration, node registry, and a small LLM that decides which node to forward each message to.

Node registry (router/nodes.py):

  • {node_id: NodeConnection} — connected nodes
  • NodeConnection: WebSocket, node_id, serves_users[], capabilities[], display_name, connected_at, last_heartbeat
  • get_nodes_for_user(open_id) -> list[NodeConnection] — may return multiple
  • get_active_node(user_id) -> NodeConnection | None — per-user active node preference
  • set_active_node(user_id, node_id) — updated by router LLM or /node command

Router LLM (router/routing_agent.py):

Lightweight, one-shot routing decision. System prompt:

You are a routing assistant. A user has sent a message. Choose which node to forward it to.

Connected nodes for this user:
- home-pc (ACTIVE): sessions=[todo_app, blog], capabilities=[claude_code, shell, file_ops]
- work-server: sessions=[], capabilities=[claude_code, shell]

Rules:
- If the message references an active session, route to the node owning it.
- If the user names a machine explicitly ("on work-server", "@work-server"), route there.
- If only one node is connected, route there without asking.
- If ambiguous with multiple idle nodes, ask the user to clarify.
- For meta commands (/nodes, /help), handle directly without routing.

One tool: route_to(node_id: str). No history. No multi-step loop. Single LLM call.

WebSocket endpoint (router/ws.py):

GET /ws/node
Authorization: Bearer <ROUTER_SECRET>
  • Validates secret → accepts registration → adds to registry
  • Forwards ForwardRequest → host client
  • Receives ForwardResponse → resolves pending asyncio.Future
  • Receives TaskComplete → calls send_text(chat_id, result) to Feishu
  • Heartbeat: ping every 30s, drop if no pong in 10s

Request correlation (router/rpc.py):

  • forward(node, user_id, chat_id, text) -> str (reply)
  • Assigns UUID request_id, stores Future in pending map
  • Sends ForwardRequest over node's WebSocket
  • Awaits Future with timeout (default 600s for long CC tasks)
  • On ForwardResponse, resolves Future with reply or raises on error

Modified files:

  • main.py → mounts /ws/node, starts NodeRegistry
  • bot/handler.py → after allowlist check, calls routing_agent.route(user_id, chat_id, text) instead of agent.run(user_id, text) directly
  • config.py → adds ROUTER_SECRET, ROUTER_LLM_* (can be same or different model)

New files:

  • router/nodes.pyNodeRegistry, NodeConnection
  • router/ws.py — WebSocket endpoint
  • router/rpc.pyforward() with future correlation
  • router/routing_agent.py — single-shot routing LLM

M3.4 — Standalone Mode Script

Single-machine users run python standalone.py — identical UX to today's python main.py. Internally uses the full M3 architecture with both components in one process.

standalone.py:

"""
Run router + host client in a single process (localhost mode).
Equivalent to the pre-M3 single-machine setup.
"""
import asyncio, secrets, uvicorn
from router.main import create_app
from host_client.main import NodeClient

async def main():
    secret = secrets.token_hex(16)
    router_url = "ws://127.0.0.1:8000/ws/node"

    # Start FastAPI router in background
    config = uvicorn.Config(create_app(router_secret=secret), host="127.0.0.1", port=8000)
    server = uvicorn.Server(config)
    asyncio.create_task(server.serve())

    # Wait for router to be ready
    await asyncio.sleep(1.0)

    # Start host client connecting to localhost
    client = NodeClient.from_keyring(router_url=router_url, secret=secret)
    await client.run()  # reconnect loop

asyncio.run(main())

Config: standalone.py reads the same keyring.yaml as today. The host client inherits all LLM/CC config from it. User only maintains one config file.


M3.5 — Node Health + User-Facing Status

/nodes slash command (handled at router, before forwarding):

Connected Nodes:
→ home-pc  [ACTIVE]  sessions=2  online 3h
  work-server         sessions=0  online 47m

Use "/node <name>" to switch active node.

/node <name> slash command — sets active node for user.

Router /health updates:

{
  "nodes": [
    {"node_id": "home-pc", "status": "online", "users": 2, "sessions": 3},
    {"node_id": "work-server", "status": "offline", "last_seen": "5m ago"}
  ]
}

Feishu notifications on node events (sent to all affected users):

⚠️ Node "home-pc" disconnected.
✅ Node "home-pc" reconnected.

Final Project Structure (post-M3)

PhoneWork/
├── shared/
│   └── protocol.py              # Wire protocol (shared by router + host client)
│
├── router/                      # Deployable unit 1: public VPS
│   ├── main.py                  # FastAPI app factory, mounts /ws/node
│   ├── nodes.py                 # NodeRegistry, NodeConnection
│   ├── ws.py                    # WebSocket endpoint for host clients
│   ├── rpc.py                   # forward(node, user_id, chat_id, text) → reply
│   └── routing_agent.py         # Single-shot routing LLM
│
├── bot/                         # Part of router
│   ├── handler.py               # Feishu event handler (now calls routing_agent)
│   ├── feishu.py                # Send text/file/card to Feishu
│   └── commands.py              # /nodes, /node, /help handled here; rest forwarded
│
├── host_client/                 # Deployable unit 2: dev machine
│   ├── main.py                  # WS connect to router, receive loop, reconnect
│   └── config.py                # host_config.yaml loader
│
├── orchestrator/                # Part of host client (full mailboy)
│   ├── agent.py                 # Mailboy LLM (unchanged)
│   └── tools.py                 # Tools: CC, shell, file ops, web, scheduler
│
├── agent/                       # Part of host client (local execution)
│   ├── manager.py               # Session registry
│   ├── pty_process.py           # Claude Code runner
│   ├── task_runner.py           # Background tasks
│   ├── scheduler.py             # Reminders
│   └── audit.py                 # Audit log
│
├── standalone.py                # Runs router + host client in one process
├── config.py                    # Router config (keyring.yaml)
└── requirements.txt

M3 Implementation Order

  1. M3.1 — Shared protocol (foundation)
  2. M3.2 — Host client daemon (wrap existing mailboy + agent stack)
  3. M3.3 — Router (node registry, WS, routing LLM, refactor handler)
  4. M3.4 — Standalone script
  5. M3.5 — Node health, /nodes, /node commands

M3 Verification Checklist

  • python standalone.py — works identically to current python main.py
  • Router starts, host client connects, registration logged
  • Feishu message → routing LLM selects node → forwarded → reply returned
  • /nodes shows all connected nodes with active marker
  • /node work-server — switches active node, confirmed in next message
  • Two nodes serving same user — message routed to active node
  • Kill host client → router marks offline, user sees "Node home-pc is offline"
  • Host client reconnects → re-registered, messages flow again
  • Long CC task on node finishes → router forwards completion notification to Feishu
  • Wrong ROUTER_SECRET → connection rejected with 401