8.0 KiB
PhoneWork — Roadmap
Milestone 2: Mailboy as a Versatile Assistant
Goal: Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a fully capable phone assistant. Users should be able to control their machine, manage files, search the web, get direct answers, and track long-running tasks — all without necessarily opening a Claude Code session.
M2.1 — Direct Q&A (no CC session required)
The mailboy already has an LLM; it just needs permission to answer. When no active session exists and the user asks a general question, the mailboy should reply with its own knowledge instead of asking "which project?".
Changes:
- Update the system prompt: give the mailboy explicit permission to answer questions directly
- Add a heuristic in
_run_locked: if there's no active session and the message looks like a question (ends with?, containswhat/how/why/explain), skip tool loop and reply directly - Zero new code; pure prompt + logic tweak in
orchestrator/agent.py
M2.2 — Shell Tool
Run arbitrary shell commands on the host machine and return stdout/stderr.
Covers: check git status, tail logs, kill a process, ps aux | grep, pip list, etc.
New tool: orchestrator/tools.py → ShellTool
name: "run_shell"
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
returns: {stdout, stderr, exit_code}
Safety guards:
- Blocklist of destructive patterns (
rm -rf /,format,mkfs,shutdown,reboot,dd if=,:(){:|:&};:) — refuse with a clear error cwdmust be underWORKING_DIR(reuse_resolve_dir) or be an explicit absolute path approved by the user (raise a confirmation request)- Timeout hard cap: 120 s; for longer tasks see M2.4
New slash command: /shell <command> (bypasses LLM; runs directly)
M2.3 — File Operations Tool
Read files, list directories, search content, and send files back to the user in Feishu. Covers: "show me the error log", "what files are in my project?", "search for TODO comments"
New tool: orchestrator/tools.py → FileOpsTool (single tool, action dispatch)
name: "file_ops"
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
max_bytes (int, default 8000)
read: read file, truncate tomax_bytes, return contentlist: recursive directory tree (depth-limited to 3), file sizessearch: grep-like ripgrep/Python search forqueryinpathsend: upload and send file viabot/feishu.py::send_file()(already implemented) — tool receiveschat_idvia context var (add alongsidecurrent_user_id)
Safety: all paths must be under WORKING_DIR
M2.4 — Long-Running Task Manager
This is the key UX upgrade. claude -p and shell commands that take minutes need
fire-and-forget with completion notification.
Design:
- Add
BackgroundTaskdataclass:{task_id, description, started_at, status, conv_id_or_none} TaskRunnersingleton inagent/task_runner.py:submit(coro, description, notify_chat_id) -> task_id- wraps coroutine in
asyncio.create_task; on completion sends a Feishu notification viasend_text(notify_chat_id, ...) - stores tasks in-memory dict
{task_id: BackgroundTask}
- When
manager.send()is called for a CC session:- if
cc_timeout > 60: automatically run in background, return immediately with"⏳ Task #<id> started. I'll notify you when it's done." - otherwise run inline as today
- if
New tool: run_background — explicitly submits any shell command or CC prompt as a
background task and returns task_id immediately.
New slash command: /tasks — list running/completed background tasks with status.
New tool: task_status — check status of a specific task_id, optionally get output so far.
Notification format:
✅ Task #abc123 done (42s)
/new todo_app: fix the login bug
[CC output truncated to 800 chars]...
M2.5 — Web Tool
Let the mailboy fetch URLs and search the web for quick answers. Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"
Backend: 秘塔AI Search MCP (https://metaso.cn/api/mcp) — mainland China accessible,
official API, Bearer token auth. Requires METASO_API_KEY in keyring.yaml.
Get key at: https://metaso.cn/search-api/api-keys
New tool: WebTool (three actions dispatched via one tool)
name: "web"
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)
search: callsmetaso_web_search— returns top results (title + snippet + URL)scopeoptions:webpage(default),paper,document,video,podcast
fetch: callsmetaso_web_readerwithformat=markdown— extracts clean content from URLask: callsmetaso_chat— RAG answer combining search + generation (for quick factual Q&A)
Implementation: HTTP POST to https://metaso.cn/api/mcp with JSON-RPC body,
Authorization: Bearer <METASO_API_KEY> header. Use httpx.AsyncClient (already installed).
New config key: METASO_API_KEY in keyring.yaml and config.py (optional — WebTool
disabled gracefully if not set)
M2.6 — Scheduling & Reminders
Set a one-shot reminder or run a recurring check. Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"
Design: agent/scheduler.py — thin wrapper around asyncio with:
schedule_once(delay_seconds, coro, description)— fire onceschedule_recurring(interval_seconds, coro_factory, description, max_runs)— repeat N times- All scheduled jobs send a Feishu notification on completion (same as M2.4)
- Jobs stored in-memory; cleared on server restart (acceptable for now)
New tool: scheduler
args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
message (str), conv_id (str, optional — if set, forward to that CC session)
New slash command: /remind <N>m|h|s <message> — set a reminder without LLM
Implementation Order
- M2.1 — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
- M2.4 — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
- M2.2 — Shell tool (most-used phone use case)
- M2.3 — File ops tool (
send_filealready done; rest is straightforward) - M2.5 — Web tool (秘塔AI MCP; needs
METASO_API_KEY) - M2.6 — Scheduling (builds on M2.4 notification infra)
Files to Create / Modify
| File | Change |
|---|---|
orchestrator/agent.py |
M2.1 prompt update + question heuristic |
orchestrator/tools.py |
Add ShellTool, FileOpsTool, WebTool, TaskStatusTool, SchedulerTool |
agent/task_runner.py |
New — TaskRunner singleton, BackgroundTask dataclass |
agent/scheduler.py |
New — schedule_once, schedule_recurring |
bot/commands.py |
Add /shell, /tasks, /remind commands |
bot/feishu.py |
Add chat_id context var for file send from tool |
bot/handler.py |
Pass chat_id into context var alongside user_id |
requirements.txt |
Add httpx (if not already present as transitive dep) |
Verification Checklist
- M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
- M2.2: Send "check git status in todo_app" —
ShellToolruns, output returned - M2.2: Send "rm -rf /" — blocked by safety guard
- M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
- M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
- M2.4: Start a long CC task (e.g.
--timeout 120) — bot replies immediately, notifies on finish - M2.4:
/tasks— lists running task with elapsed time - M2.5: "Python 3.13 有哪些新特性?" —
web askreturns RAG answer from metaso - M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
- M2.6:
/remind 10m deploy check— 10 min later, message arrives in Feishu