Yuyao Huang (Sam) 6a0d409dd6 docs: 添加秘塔AI搜索MCP服务文档和更新README

添加秘塔AI搜索MCP服务的详细文档metaso.md，包含API说明和配置指南
更新README文件中的命令说明和功能描述
新增ROADMAP.md文件记录未来开发计划

2026-03-28 12:54:15 +08:00

8.0 KiB

Raw Blame History

PhoneWork — Roadmap

Milestone 2: Mailboy as a Versatile Assistant

Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users should be able to control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.

M2.1 — Direct Q&A (no CC session required)

The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".

Changes:

Update the system prompt: give the mailboy explicit permission to answer questions directly
Add a heuristic in _run_locked: if there's no active session and the message looks like a question (ends with ?, contains what/how/why/explain), skip tool loop and reply directly
Zero new code; pure prompt + logic tweak in orchestrator/agent.py

M2.2 — Shell Tool

Run arbitrary shell commands on the host machine and return stdout/stderr. Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.

New tool: orchestrator/tools.py → ShellTool

name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}

Safety guards:

Blocklist of destructive patterns (rm -rf /, format, mkfs, shutdown, reboot, dd if=, :(){:|:&};:) — refuse with a clear error
cwd must be under WORKING_DIR (reuse _resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)
Timeout hard cap: 120 s; for longer tasks see M2.4

New slash command: /shell <command> (bypasses LLM; runs directly)

M2.3 — File Operations Tool

Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"

New tool: orchestrator/tools.py → FileOpsTool (single tool, action dispatch)

name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
      max_bytes (int, default 8000)

read: read file, truncate to max_bytes, return content
list: recursive directory tree (depth-limited to 3), file sizes
search: grep-like ripgrep/Python search for query in path
send: upload and send file via bot/feishu.py::send_file() (already implemented) — tool receives chat_id via context var (add alongside current_user_id)

Safety: all paths must be under WORKING_DIR

M2.4 — Long-Running Task Manager

This is the key UX upgrade. claude -p and shell commands that take minutes need fire-and-forget with completion notification.

Design:

Add BackgroundTask dataclass: {task_id, description, started_at, status, conv_id_or_none}
TaskRunner singleton in agent/task_runner.py:
- submit(coro, description, notify_chat_id) -> task_id
- wraps coroutine in asyncio.create_task; on completion sends a Feishu notification via send_text(notify_chat_id, ...)
- stores tasks in-memory dict {task_id: BackgroundTask}
When manager.send() is called for a CC session:
- if cc_timeout > 60: automatically run in background, return immediately with "⏳ Task #<id> started. I'll notify you when it's done."
- otherwise run inline as today

New tool: run_background — explicitly submits any shell command or CC prompt as a background task and returns task_id immediately.

New slash command: /tasks — list running/completed background tasks with status.

New tool: task_status — check status of a specific task_id, optionally get output so far.

Notification format:

✅ Task #abc123 done (42s)
/new todo_app: fix the login bug

[CC output truncated to 800 chars]...

M2.5 — Web Tool

Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"

Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible, official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml. Get key at: https://metaso.cn/search-api/api-keys

New tool: WebTool (three actions dispatched via one tool)

name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)

search: calls metaso_web_search — returns top results (title + snippet + URL)
- scope options: webpage (default), paper, document, video, podcast
fetch: calls metaso_web_reader with format=markdown — extracts clean content from URL
ask: calls metaso_chat — RAG answer combining search + generation (for quick factual Q&A)

Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body, Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).

New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool disabled gracefully if not set)

M2.6 — Scheduling & Reminders

Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"

Design: agent/scheduler.py — thin wrapper around asyncio with:

schedule_once(delay_seconds, coro, description) — fire once
schedule_recurring(interval_seconds, coro_factory, description, max_runs) — repeat N times
All scheduled jobs send a Feishu notification on completion (same as M2.4)
Jobs stored in-memory; cleared on server restart (acceptable for now)

New tool: scheduler

args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
      message (str), conv_id (str, optional — if set, forward to that CC session)

New slash command: /remind <N>m|h|s <message> — set a reminder without LLM

Implementation Order

M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
M2.2 — Shell tool (most-used phone use case)
M2.3 — File ops tool (send_file already done; rest is straightforward)
M2.5 — Web tool (秘塔AI MCP; needs METASO_API_KEY)
M2.6 — Scheduling (builds on M2.4 notification infra)

Files to Create / Modify

File	Change
`orchestrator/agent.py`	M2.1 prompt update + question heuristic
`orchestrator/tools.py`	Add `ShellTool`, `FileOpsTool`, `WebTool`, `TaskStatusTool`, `SchedulerTool`
`agent/task_runner.py`	New — `TaskRunner` singleton, `BackgroundTask` dataclass
`agent/scheduler.py`	New — `schedule_once`, `schedule_recurring`
`bot/commands.py`	Add `/shell`, `/tasks`, `/remind` commands
`bot/feishu.py`	Add `chat_id` context var for file send from tool
`bot/handler.py`	Pass `chat_id` into context var alongside `user_id`
`requirements.txt`	Add `httpx` (if not already present as transitive dep)

Verification Checklist

M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
M2.2: Send "check git status in todo_app" — ShellTool runs, output returned
M2.2: Send "rm -rf /" — blocked by safety guard
M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
M2.4: Start a long CC task (e.g. --timeout 120) — bot replies immediately, notifies on finish
M2.4: /tasks — lists running task with elapsed time
M2.5: "Python 3.13 有哪些新特性？" — web ask returns RAG answer from metaso
M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
M2.6: /remind 10m deploy check — 10 min later, message arrives in Feishu

8.0 KiB Raw Blame History