Yuyao Huang (Sam) 8ecc701d5e feat: 添加任务调度器、后台任务运行器及多种工具支持

实现后台任务调度器(scheduler.py)和任务运行器(task_runner.py)，支持长时间运行任务的异步执行和状态跟踪
新增多种工具支持：Shell命令执行、文件操作(读写/搜索/发送)、网页搜索/问答、定时提醒等
扩展README和ROADMAP文档，描述新功能和未来多主机架构规划
在配置文件中添加METASO_API_KEY支持秘塔AI搜索功能
优化代理逻辑，自动识别通用问题直接回答而不创建会话

2026-03-28 13:45:20 +08:00

21 KiB

Raw Blame History

PhoneWork — Roadmap

Milestone 2: Mailboy as a Versatile Assistant

Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users should be able to control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.

M2.1 — Direct Q&A (no CC session required)

The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".

Changes:

Update the system prompt: give the mailboy explicit permission to answer questions directly
Add a heuristic in _run_locked: if there's no active session and the message looks like a question (ends with ?, contains what/how/why/explain), skip tool loop and reply directly
Zero new code; pure prompt + logic tweak in orchestrator/agent.py

M2.2 — Shell Tool

Run arbitrary shell commands on the host machine and return stdout/stderr. Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.

New tool: orchestrator/tools.py → ShellTool

name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}

Safety guards:

Blocklist of destructive patterns (rm -rf /, format, mkfs, shutdown, reboot, dd if=, :(){:|:&};:) — refuse with a clear error
cwd must be under WORKING_DIR (reuse _resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)
Timeout hard cap: 120 s; for longer tasks see M2.4

New slash command: /shell <command> (bypasses LLM; runs directly)

M2.3 — File Operations Tool

Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"

New tool: orchestrator/tools.py → FileOpsTool (single tool, action dispatch)

name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
      max_bytes (int, default 8000)

read: read file, truncate to max_bytes, return content
list: recursive directory tree (depth-limited to 3), file sizes
search: grep-like ripgrep/Python search for query in path
send: upload and send file via bot/feishu.py::send_file() (already implemented) — tool receives chat_id via context var (add alongside current_user_id)

Safety: all paths must be under WORKING_DIR

M2.4 — Long-Running Task Manager

This is the key UX upgrade. claude -p and shell commands that take minutes need fire-and-forget with completion notification.

Design:

Add BackgroundTask dataclass: {task_id, description, started_at, status, conv_id_or_none}
TaskRunner singleton in agent/task_runner.py:
- submit(coro, description, notify_chat_id) -> task_id
- wraps coroutine in asyncio.create_task; on completion sends a Feishu notification via send_text(notify_chat_id, ...)
- stores tasks in-memory dict {task_id: BackgroundTask}
When manager.send() is called for a CC session:
- if cc_timeout > 60: automatically run in background, return immediately with "⏳ Task #<id> started. I'll notify you when it's done."
- otherwise run inline as today

New tool: run_background — explicitly submits any shell command or CC prompt as a background task and returns task_id immediately.

New slash command: /tasks — list running/completed background tasks with status.

New tool: task_status — check status of a specific task_id, optionally get output so far.

Notification format:

✅ Task #abc123 done (42s)
/new todo_app: fix the login bug

[CC output truncated to 800 chars]...

M2.5 — Web Tool

Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"

Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible, official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml. Get key at: https://metaso.cn/search-api/api-keys

New tool: WebTool (three actions dispatched via one tool)

name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)

search: calls metaso_web_search — returns top results (title + snippet + URL)
- scope options: webpage (default), paper, document, video, podcast
fetch: calls metaso_web_reader with format=markdown — extracts clean content from URL
ask: calls metaso_chat — RAG answer combining search + generation (for quick factual Q&A)

Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body, Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).

New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool disabled gracefully if not set)

M2.6 — Scheduling & Reminders

Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"

Design: agent/scheduler.py — thin wrapper around asyncio with:

schedule_once(delay_seconds, coro, description) — fire once
schedule_recurring(interval_seconds, coro_factory, description, max_runs) — repeat N times
All scheduled jobs send a Feishu notification on completion (same as M2.4)
Jobs stored in-memory; cleared on server restart (acceptable for now)

New tool: scheduler

args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
      message (str), conv_id (str, optional — if set, forward to that CC session)

New slash command: /remind <N>m|h|s <message> — set a reminder without LLM

Implementation Order

M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
M2.2 — Shell tool (most-used phone use case)
M2.3 — File ops tool (send_file already done; rest is straightforward)
M2.5 — Web tool (秘塔AI MCP; needs METASO_API_KEY)
M2.6 — Scheduling (builds on M2.4 notification infra)

Files to Create / Modify

File	Change
`orchestrator/agent.py`	M2.1 prompt update + question heuristic
`orchestrator/tools.py`	Add `ShellTool`, `FileOpsTool`, `WebTool`, `TaskStatusTool`, `SchedulerTool`
`agent/task_runner.py`	New — `TaskRunner` singleton, `BackgroundTask` dataclass
`agent/scheduler.py`	New — `schedule_once`, `schedule_recurring`
`bot/commands.py`	Add `/shell`, `/tasks`, `/remind` commands
`bot/feishu.py`	Add `chat_id` context var for file send from tool
`bot/handler.py`	Pass `chat_id` into context var alongside `user_id`
`requirements.txt`	Add `httpx` (if not already present as transitive dep)

Verification Checklist

M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
M2.2: Send "check git status in todo_app" — ShellTool runs, output returned
M2.2: Send "rm -rf /" — blocked by safety guard
M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
M2.4: Start a long CC task (e.g. --timeout 120) — bot replies immediately, notifies on finish
M2.4: /tasks — lists running task with elapsed time
M2.5: "Python 3.13 有哪些新特性？" — web ask returns RAG answer from metaso
M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
M2.6: /remind 10m deploy check — 10 min later, message arrives in Feishu

Milestone 3: Multi-Host Architecture (Router / Host Client Split)

Goal: Split PhoneWork into two deployable components — a public-facing Router and one or more Host Clients behind NAT. A user can be served by multiple nodes simultaneously. Intelligence is split: the router runs a cheap LLM for routing decisions only; each node runs the full mailboy LLM for execution. A standalone script preserves the current single-machine experience.

Architecture

┌──────────┐  WebSocket   ┌────────────────────────────────────┐
│  Feishu  │◄────────────►│          Router (public VPS)       │
│  Cloud   │              │  - Feishu event handler            │
└──────────┘              │  - Router LLM (routing only)       │
                          │  - Node registry + active node map │
                          │  - NO mailboy, NO sessions         │
                          └───────────┬────────────────────────┘
                                      │ WebSocket (host clients connect in)
                          ┌───────────┴────────────────────────┐
                          │                                    │
               ┌──────────▼──────────┐           ┌────────────▼────────┐
               │   Host Client A     │           │   Host Client B     │
               │   (home-pc)         │           │   (work-server)     │
               │  - Mailboy LLM      │           │  - Mailboy LLM      │
               │  - CC sessions      │           │  - CC sessions      │
               │  - Shell / files    │           │  - Shell / files    │
               │  - Task runner      │           │  - Task runner      │
               └─────────────────────┘           └─────────────────────┘

Key design decisions:

Host clients connect TO the router (outbound WebSocket) — NAT-transparent
A user can be registered on multiple nodes simultaneously
The router LLM decides which node to route each message to (cheap, one-shot)
The node mailboy LLM handles the full orchestration loop (sessions, tools, CC)
Each node maintains its own conversation history per user
Task completion notifications: node pushes to router → router sends to Feishu

M3.1 — Shared Protocol Module

Foundation for both sides.

shared/protocol.py:

@dataclass
class RegisterMessage:
    type: str = "register"
    node_id: str = ""
    serves_users: list[str] = field(default_factory=list)
    working_dir: str = ""
    capabilities: list[str] = field(default_factory=list)  # ["claude_code", "shell", "file_ops", "web"]
    display_name: str = ""  # human-readable, shown in /nodes

@dataclass
class ForwardRequest:
    type: str = "forward"
    id: str = ""          # correlation id, router awaits matching response
    user_id: str = ""
    chat_id: str = ""
    text: str = ""

@dataclass
class ForwardResponse:
    type: str = "forward_response"
    id: str = ""
    reply: str = ""
    error: str = ""

@dataclass
class TaskComplete:
    type: str = "task_complete"
    task_id: str = ""
    user_id: str = ""
    chat_id: str = ""
    result: str = ""

@dataclass
class Heartbeat:
    type: str = "ping" | "pong"

Serialization: JSON + type-field dispatch. Both sides import from shared/.

M3.2 — Host Client: Full Mailboy Node

Each host client is a self-contained assistant: receives a raw user message from the router, runs the full LLM + tool loop, returns the reply.

Host client config (host_config.yaml):

NODE_ID: home-pc
DISPLAY_NAME: Home PC
ROUTER_URL: wss://router.example.com/ws/node
ROUTER_SECRET: <shared_secret>

# LLM for this node's mailboy
OPENAI_BASE_URL: https://open.bigmodel.cn/api/paas/v4/
OPENAI_API_KEY: <key>
OPENAI_MODEL: glm-4.7

WORKING_DIR: C:/Users/me/projects
METASO_API_KEY: <optional>

# Which Feishu open_ids this node serves (can overlap with other nodes)
SERVES_USERS:
  - ou_abc123def456
  - ou_xyz789

Startup flow:

Connect WebSocket to ROUTER_URL with Authorization: Bearer <ROUTER_SECRET>
Send RegisterMessage → router adds node to registry
Enter receive loop:
- ForwardRequest → run local mailboy LLM → send ForwardResponse
- ping → send pong

What the host client runs:

Full orchestrator/agent.py (mailboy LLM, tool loop, per-user history, active session)
Full orchestrator/tools.py (CC, shell, file ops, web, scheduler — all local)
agent/manager.py, agent/pty_process.py, agent/task_runner.py — unchanged

Task completion flow:

Background task finishes → host client pushes TaskComplete to router
Router receives it → calls send_text(chat_id, result) via Feishu API

New files:

host_client/main.py — entry point, WebSocket connect + receive loop, reconnect
host_client/config.py — loads host_config.yaml

Reused unchanged:

orchestrator/ — entire mailboy stack moves here as-is
agent/ — entire session/execution stack moves here as-is

M3.3 — Router: Node Registry + Routing LLM

The router is thin: Feishu integration, node registry, and a small LLM that decides which node to forward each message to.

Node registry (router/nodes.py):

{node_id: NodeConnection} — connected nodes
NodeConnection: WebSocket, node_id, serves_users[], capabilities[], display_name, connected_at, last_heartbeat
get_nodes_for_user(open_id) -> list[NodeConnection] — may return multiple
get_active_node(user_id) -> NodeConnection | None — per-user active node preference
set_active_node(user_id, node_id) — updated by router LLM or /node command

Router LLM (router/routing_agent.py):

Lightweight, one-shot routing decision. System prompt:

You are a routing assistant. A user has sent a message. Choose which node to forward it to.

Connected nodes for this user:
- home-pc (ACTIVE): sessions=[todo_app, blog], capabilities=[claude_code, shell, file_ops]
- work-server: sessions=[], capabilities=[claude_code, shell]

Rules:
- If the message references an active session, route to the node owning it.
- If the user names a machine explicitly ("on work-server", "@work-server"), route there.
- If only one node is connected, route there without asking.
- If ambiguous with multiple idle nodes, ask the user to clarify.
- For meta commands (/nodes, /help), handle directly without routing.

One tool: route_to(node_id: str). No history. No multi-step loop. Single LLM call.

WebSocket endpoint (router/ws.py):

GET /ws/node
Authorization: Bearer <ROUTER_SECRET>

Validates secret → accepts registration → adds to registry
Forwards ForwardRequest → host client
Receives ForwardResponse → resolves pending asyncio.Future
Receives TaskComplete → calls send_text(chat_id, result) to Feishu
Heartbeat: ping every 30s, drop if no pong in 10s

Request correlation (router/rpc.py):

forward(node, user_id, chat_id, text) -> str (reply)
Assigns UUID request_id, stores Future in pending map
Sends ForwardRequest over node's WebSocket
Awaits Future with timeout (default 600s for long CC tasks)
On ForwardResponse, resolves Future with reply or raises on error

Modified files:

main.py → mounts /ws/node, starts NodeRegistry
bot/handler.py → after allowlist check, calls routing_agent.route(user_id, chat_id, text) instead of agent.run(user_id, text) directly
config.py → adds ROUTER_SECRET, ROUTER_LLM_* (can be same or different model)

New files:

router/nodes.py — NodeRegistry, NodeConnection
router/ws.py — WebSocket endpoint
router/rpc.py — forward() with future correlation
router/routing_agent.py — single-shot routing LLM

M3.4 — Standalone Mode Script

Single-machine users run python standalone.py — identical UX to today's python main.py. Internally uses the full M3 architecture with both components in one process.

standalone.py:

"""
Run router + host client in a single process (localhost mode).
Equivalent to the pre-M3 single-machine setup.
"""
import asyncio, secrets, uvicorn
from router.main import create_app
from host_client.main import NodeClient

async def main():
    secret = secrets.token_hex(16)
    router_url = "ws://127.0.0.1:8000/ws/node"

    # Start FastAPI router in background
    config = uvicorn.Config(create_app(router_secret=secret), host="127.0.0.1", port=8000)
    server = uvicorn.Server(config)
    asyncio.create_task(server.serve())

    # Wait for router to be ready
    await asyncio.sleep(1.0)

    # Start host client connecting to localhost
    client = NodeClient.from_keyring(router_url=router_url, secret=secret)
    await client.run()  # reconnect loop

asyncio.run(main())

Config: standalone.py reads the same keyring.yaml as today. The host client inherits all LLM/CC config from it. User only maintains one config file.

M3.5 — Node Health + User-Facing Status

/nodes slash command (handled at router, before forwarding):

Connected Nodes:
→ home-pc  [ACTIVE]  sessions=2  online 3h
  work-server         sessions=0  online 47m

Use "/node <name>" to switch active node.

/node <name> slash command — sets active node for user.

Router /health updates:

{
  "nodes": [
    {"node_id": "home-pc", "status": "online", "users": 2, "sessions": 3},
    {"node_id": "work-server", "status": "offline", "last_seen": "5m ago"}
  ]
}

Feishu notifications on node events (sent to all affected users):

⚠️ Node "home-pc" disconnected.
✅ Node "home-pc" reconnected.

Final Project Structure (post-M3)

PhoneWork/
├── shared/
│   └── protocol.py              # Wire protocol (shared by router + host client)
│
├── router/                      # Deployable unit 1: public VPS
│   ├── main.py                  # FastAPI app factory, mounts /ws/node
│   ├── nodes.py                 # NodeRegistry, NodeConnection
│   ├── ws.py                    # WebSocket endpoint for host clients
│   ├── rpc.py                   # forward(node, user_id, chat_id, text) → reply
│   └── routing_agent.py         # Single-shot routing LLM
│
├── bot/                         # Part of router
│   ├── handler.py               # Feishu event handler (now calls routing_agent)
│   ├── feishu.py                # Send text/file/card to Feishu
│   └── commands.py              # /nodes, /node, /help handled here; rest forwarded
│
├── host_client/                 # Deployable unit 2: dev machine
│   ├── main.py                  # WS connect to router, receive loop, reconnect
│   └── config.py                # host_config.yaml loader
│
├── orchestrator/                # Part of host client (full mailboy)
│   ├── agent.py                 # Mailboy LLM (unchanged)
│   └── tools.py                 # Tools: CC, shell, file ops, web, scheduler
│
├── agent/                       # Part of host client (local execution)
│   ├── manager.py               # Session registry
│   ├── pty_process.py           # Claude Code runner
│   ├── task_runner.py           # Background tasks
│   ├── scheduler.py             # Reminders
│   └── audit.py                 # Audit log
│
├── standalone.py                # Runs router + host client in one process
├── config.py                    # Router config (keyring.yaml)
└── requirements.txt

M3 Implementation Order

M3.1 — Shared protocol (foundation)
M3.2 — Host client daemon (wrap existing mailboy + agent stack)
M3.3 — Router (node registry, WS, routing LLM, refactor handler)
M3.4 — Standalone script
M3.5 — Node health, /nodes, /node commands

M3 Verification Checklist

python standalone.py — works identically to current python main.py
Router starts, host client connects, registration logged
Feishu message → routing LLM selects node → forwarded → reply returned
/nodes shows all connected nodes with active marker
/node work-server — switches active node, confirmed in next message
Two nodes serving same user — message routed to active node
Kill host client → router marks offline, user sees "Node home-pc is offline"
Host client reconnects → re-registered, messages flow again
Long CC task on node finishes → router forwards completion notification to Feishu
Wrong ROUTER_SECRET → connection rejected with 401

21 KiB Raw Blame History