PhoneWork/ROADMAP.md
Yuyao Huang (Sam) 6a0d409dd6 docs: 添加秘塔AI搜索MCP服务文档和更新README
添加秘塔AI搜索MCP服务的详细文档metaso.md,包含API说明和配置指南
更新README文件中的命令说明和功能描述
新增ROADMAP.md文件记录未来开发计划
2026-03-28 12:54:15 +08:00

8.0 KiB

PhoneWork — Roadmap

Milestone 2: Mailboy as a Versatile Assistant

Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users should be able to control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.

M2.1 — Direct Q&A (no CC session required)

The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".

Changes:

  • Update the system prompt: give the mailboy explicit permission to answer questions directly
  • Add a heuristic in _run_locked: if there's no active session and the message looks like a question (ends with ?, contains what/how/why/explain), skip tool loop and reply directly
  • Zero new code; pure prompt + logic tweak in orchestrator/agent.py

M2.2 — Shell Tool

Run arbitrary shell commands on the host machine and return stdout/stderr. Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.

New tool: orchestrator/tools.pyShellTool

name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}

Safety guards:

  • Blocklist of destructive patterns (rm -rf /, format, mkfs, shutdown, reboot, dd if=, :(){:|:&};:) — refuse with a clear error
  • cwd must be under WORKING_DIR (reuse _resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)
  • Timeout hard cap: 120 s; for longer tasks see M2.4

New slash command: /shell <command> (bypasses LLM; runs directly)


M2.3 — File Operations Tool

Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"

New tool: orchestrator/tools.pyFileOpsTool (single tool, action dispatch)

name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
      max_bytes (int, default 8000)
  • read: read file, truncate to max_bytes, return content
  • list: recursive directory tree (depth-limited to 3), file sizes
  • search: grep-like ripgrep/Python search for query in path
  • send: upload and send file via bot/feishu.py::send_file() (already implemented) — tool receives chat_id via context var (add alongside current_user_id)

Safety: all paths must be under WORKING_DIR


M2.4 — Long-Running Task Manager

This is the key UX upgrade. claude -p and shell commands that take minutes need fire-and-forget with completion notification.

Design:

  • Add BackgroundTask dataclass: {task_id, description, started_at, status, conv_id_or_none}
  • TaskRunner singleton in agent/task_runner.py:
    • submit(coro, description, notify_chat_id) -> task_id
    • wraps coroutine in asyncio.create_task; on completion sends a Feishu notification via send_text(notify_chat_id, ...)
    • stores tasks in-memory dict {task_id: BackgroundTask}
  • When manager.send() is called for a CC session:
    • if cc_timeout > 60: automatically run in background, return immediately with "⏳ Task #<id> started. I'll notify you when it's done."
    • otherwise run inline as today

New tool: run_background — explicitly submits any shell command or CC prompt as a background task and returns task_id immediately.

New slash command: /tasks — list running/completed background tasks with status.

New tool: task_status — check status of a specific task_id, optionally get output so far.

Notification format:

✅ Task #abc123 done (42s)
/new todo_app: fix the login bug

[CC output truncated to 800 chars]...

M2.5 — Web Tool

Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"

Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible, official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml. Get key at: https://metaso.cn/search-api/api-keys

New tool: WebTool (three actions dispatched via one tool)

name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)
  • search: calls metaso_web_search — returns top results (title + snippet + URL)
    • scope options: webpage (default), paper, document, video, podcast
  • fetch: calls metaso_web_reader with format=markdown — extracts clean content from URL
  • ask: calls metaso_chat — RAG answer combining search + generation (for quick factual Q&A)

Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body, Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).

New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool disabled gracefully if not set)


M2.6 — Scheduling & Reminders

Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"

Design: agent/scheduler.py — thin wrapper around asyncio with:

  • schedule_once(delay_seconds, coro, description) — fire once
  • schedule_recurring(interval_seconds, coro_factory, description, max_runs) — repeat N times
  • All scheduled jobs send a Feishu notification on completion (same as M2.4)
  • Jobs stored in-memory; cleared on server restart (acceptable for now)

New tool: scheduler

args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
      message (str), conv_id (str, optional — if set, forward to that CC session)

New slash command: /remind <N>m|h|s <message> — set a reminder without LLM


Implementation Order

  1. M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
  2. M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
  3. M2.2 — Shell tool (most-used phone use case)
  4. M2.3 — File ops tool (send_file already done; rest is straightforward)
  5. M2.5 — Web tool (秘塔AI MCP; needs METASO_API_KEY)
  6. M2.6 — Scheduling (builds on M2.4 notification infra)

Files to Create / Modify

File Change
orchestrator/agent.py M2.1 prompt update + question heuristic
orchestrator/tools.py Add ShellTool, FileOpsTool, WebTool, TaskStatusTool, SchedulerTool
agent/task_runner.py New — TaskRunner singleton, BackgroundTask dataclass
agent/scheduler.py New — schedule_once, schedule_recurring
bot/commands.py Add /shell, /tasks, /remind commands
bot/feishu.py Add chat_id context var for file send from tool
bot/handler.py Pass chat_id into context var alongside user_id
requirements.txt Add httpx (if not already present as transitive dep)

Verification Checklist

  • M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
  • M2.2: Send "check git status in todo_app" — ShellTool runs, output returned
  • M2.2: Send "rm -rf /" — blocked by safety guard
  • M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
  • M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
  • M2.4: Start a long CC task (e.g. --timeout 120) — bot replies immediately, notifies on finish
  • M2.4: /tasks — lists running task with elapsed time
  • M2.5: "Python 3.13 有哪些新特性?" — web ask returns RAG answer from metaso
  • M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
  • M2.6: /remind 10m deploy check — 10 min later, message arrives in Feishu