190 lines
8.0 KiB
Markdown
190 lines
8.0 KiB
Markdown
# PhoneWork — Roadmap
|
|
|
|
## Milestone 2: Mailboy as a Versatile Assistant
|
|
|
|
**Goal:** Elevate the mailboy (GLM-4.7 orchestrator) from a mere Claude Code relay into a
|
|
fully capable phone assistant. Users should be able to control their machine, manage files,
|
|
search the web, get direct answers, and track long-running tasks — all without necessarily
|
|
opening a Claude Code session.
|
|
|
|
### M2.1 — Direct Q&A (no CC session required)
|
|
|
|
The mailboy already has an LLM; it just needs permission to answer.
|
|
When no active session exists and the user asks a general question, the mailboy should
|
|
reply with its own knowledge instead of asking "which project?".
|
|
|
|
**Changes:**
|
|
- Update the system prompt: give the mailboy explicit permission to answer questions directly
|
|
- Add a heuristic in `_run_locked`: if there's no active session and the message looks like a
|
|
question (ends with `?`, contains `what/how/why/explain`), skip tool loop and reply directly
|
|
- Zero new code; pure prompt + logic tweak in `orchestrator/agent.py`
|
|
|
|
---
|
|
|
|
### M2.2 — Shell Tool
|
|
|
|
Run arbitrary shell commands on the host machine and return stdout/stderr.
|
|
Covers: check git status, tail logs, kill a process, `ps aux | grep`, `pip list`, etc.
|
|
|
|
**New tool:** `orchestrator/tools.py` → `ShellTool`
|
|
```
|
|
name: "run_shell"
|
|
args: command (str), cwd (str, optional, defaults to WORKING_DIR), timeout (int, default 30)
|
|
returns: {stdout, stderr, exit_code}
|
|
```
|
|
|
|
**Safety guards:**
|
|
- Blocklist of destructive patterns (`rm -rf /`, `format`, `mkfs`, `shutdown`, `reboot`,
|
|
`dd if=`, `:(){:|:&};:`) — refuse with a clear error
|
|
- `cwd` must be under `WORKING_DIR` (reuse `_resolve_dir`) or be an explicit absolute path
|
|
approved by the user (raise a confirmation request)
|
|
- Timeout hard cap: 120 s; for longer tasks see M2.4
|
|
|
|
**New slash command:** `/shell <command>` (bypasses LLM; runs directly)
|
|
|
|
---
|
|
|
|
### M2.3 — File Operations Tool
|
|
|
|
Read files, list directories, search content, and send files back to the user in Feishu.
|
|
Covers: "show me the error log", "what files are in my project?", "search for TODO comments"
|
|
|
|
**New tool:** `orchestrator/tools.py` → `FileOpsTool` (single tool, `action` dispatch)
|
|
```
|
|
name: "file_ops"
|
|
args: action ("read" | "list" | "search" | "send"), path (str), query (str, for search),
|
|
max_bytes (int, default 8000)
|
|
```
|
|
|
|
- `read`: read file, truncate to `max_bytes`, return content
|
|
- `list`: recursive directory tree (depth-limited to 3), file sizes
|
|
- `search`: grep-like ripgrep/Python search for `query` in `path`
|
|
- `send`: upload and send file via `bot/feishu.py::send_file()` (already implemented)
|
|
— tool receives `chat_id` via context var (add alongside `current_user_id`)
|
|
|
|
**Safety:** all paths must be under `WORKING_DIR`
|
|
|
|
---
|
|
|
|
### M2.4 — Long-Running Task Manager
|
|
|
|
This is the key UX upgrade. `claude -p` and shell commands that take minutes need
|
|
fire-and-forget with completion notification.
|
|
|
|
**Design:**
|
|
- Add `BackgroundTask` dataclass: `{task_id, description, started_at, status, conv_id_or_none}`
|
|
- `TaskRunner` singleton in `agent/task_runner.py`:
|
|
- `submit(coro, description, notify_chat_id) -> task_id`
|
|
- wraps coroutine in `asyncio.create_task`; on completion sends a Feishu notification
|
|
via `send_text(notify_chat_id, ...)`
|
|
- stores tasks in-memory dict `{task_id: BackgroundTask}`
|
|
- When `manager.send()` is called for a CC session:
|
|
- if `cc_timeout > 60`: automatically run in background, return immediately with
|
|
`"⏳ Task #<id> started. I'll notify you when it's done."`
|
|
- otherwise run inline as today
|
|
|
|
**New tool:** `run_background` — explicitly submits any shell command or CC prompt as a
|
|
background task and returns `task_id` immediately.
|
|
|
|
**New slash command:** `/tasks` — list running/completed background tasks with status.
|
|
|
|
**New tool:** `task_status` — check status of a specific `task_id`, optionally get output so far.
|
|
|
|
**Notification format:**
|
|
```
|
|
✅ Task #abc123 done (42s)
|
|
/new todo_app: fix the login bug
|
|
|
|
[CC output truncated to 800 chars]...
|
|
```
|
|
|
|
---
|
|
|
|
### M2.5 — Web Tool
|
|
|
|
Let the mailboy fetch URLs and search the web for quick answers.
|
|
Covers: "最新的 LangChain 有什么变化?", "fetch this GitHub issue", "帮我搜索这篇论文"
|
|
|
|
**Backend:** 秘塔AI Search MCP (`https://metaso.cn/api/mcp`) — mainland China accessible,
|
|
official API, Bearer token auth. Requires `METASO_API_KEY` in `keyring.yaml`.
|
|
Get key at: https://metaso.cn/search-api/api-keys
|
|
|
|
**New tool:** `WebTool` (three actions dispatched via one tool)
|
|
```
|
|
name: "web"
|
|
args: action ("search" | "fetch" | "ask"), query (str), url (str), scope (str), max_chars (int)
|
|
```
|
|
|
|
- `search`: calls `metaso_web_search` — returns top results (title + snippet + URL)
|
|
- `scope` options: `webpage` (default), `paper`, `document`, `video`, `podcast`
|
|
- `fetch`: calls `metaso_web_reader` with `format=markdown` — extracts clean content from URL
|
|
- `ask`: calls `metaso_chat` — RAG answer combining search + generation (for quick factual Q&A)
|
|
|
|
**Implementation:** HTTP POST to `https://metaso.cn/api/mcp` with JSON-RPC body,
|
|
`Authorization: Bearer <METASO_API_KEY>` header. Use `httpx.AsyncClient` (already installed).
|
|
|
|
**New config key:** `METASO_API_KEY` in `keyring.yaml` and `config.py` (optional — WebTool
|
|
disabled gracefully if not set)
|
|
|
|
---
|
|
|
|
### M2.6 — Scheduling & Reminders
|
|
|
|
Set a one-shot reminder or run a recurring check.
|
|
Covers: "remind me in 30 minutes", "check if the tests pass every 5 minutes"
|
|
|
|
**Design:** `agent/scheduler.py` — thin wrapper around `asyncio` with:
|
|
- `schedule_once(delay_seconds, coro, description)` — fire once
|
|
- `schedule_recurring(interval_seconds, coro_factory, description, max_runs)` — repeat N times
|
|
- All scheduled jobs send a Feishu notification on completion (same as M2.4)
|
|
- Jobs stored in-memory; cleared on server restart (acceptable for now)
|
|
|
|
**New tool:** `scheduler`
|
|
```
|
|
args: action ("remind" | "repeat"), delay_seconds (int), interval_seconds (int),
|
|
message (str), conv_id (str, optional — if set, forward to that CC session)
|
|
```
|
|
|
|
**New slash command:** `/remind <N>m|h|s <message>` — set a reminder without LLM
|
|
|
|
---
|
|
|
|
## Implementation Order
|
|
|
|
1. **M2.1** — Direct Q&A (prompt + 10-line logic change; highest ROI, zero risk)
|
|
2. **M2.4** — Background task runner (unblocks long CC jobs; foundational for M2.5/M2.6)
|
|
3. **M2.2** — Shell tool (most-used phone use case)
|
|
4. **M2.3** — File ops tool (`send_file` already done; rest is straightforward)
|
|
5. **M2.5** — Web tool (秘塔AI MCP; needs `METASO_API_KEY`)
|
|
6. **M2.6** — Scheduling (builds on M2.4 notification infra)
|
|
|
|
---
|
|
|
|
## Files to Create / Modify
|
|
|
|
| File | Change |
|
|
|------|--------|
|
|
| `orchestrator/agent.py` | M2.1 prompt update + question heuristic |
|
|
| `orchestrator/tools.py` | Add `ShellTool`, `FileOpsTool`, `WebTool`, `TaskStatusTool`, `SchedulerTool` |
|
|
| `agent/task_runner.py` | New — `TaskRunner` singleton, `BackgroundTask` dataclass |
|
|
| `agent/scheduler.py` | New — `schedule_once`, `schedule_recurring` |
|
|
| `bot/commands.py` | Add `/shell`, `/tasks`, `/remind` commands |
|
|
| `bot/feishu.py` | Add `chat_id` context var for file send from tool |
|
|
| `bot/handler.py` | Pass `chat_id` into context var alongside `user_id` |
|
|
| `requirements.txt` | Add `httpx` (if not already present as transitive dep) |
|
|
|
|
---
|
|
|
|
## Verification Checklist
|
|
|
|
- [ ] M2.1: Ask "what is a Python generator?" — mailboy replies directly, no tool call
|
|
- [ ] M2.2: Send "check git status in todo_app" — `ShellTool` runs, output returned
|
|
- [ ] M2.2: Send "rm -rf /" — blocked by safety guard
|
|
- [ ] M2.3: Send "show me the last 50 lines of audit/abc123.jsonl" — file content returned
|
|
- [ ] M2.3: Send "send me the sessions.json file" — file arrives in Feishu chat
|
|
- [ ] M2.4: Start a long CC task (e.g. `--timeout 120`) — bot replies immediately, notifies on finish
|
|
- [ ] M2.4: `/tasks` — lists running task with elapsed time
|
|
- [ ] M2.5: "Python 3.13 有哪些新特性?" — `web ask` returns RAG answer from metaso
|
|
- [ ] M2.5: "帮我读取这个URL: https://example.com" — page content extracted as markdown
|
|
- [ ] M2.6: `/remind 10m deploy check` — 10 min later, message arrives in Feishu
|