docs: 更新项目架构文档并添加路线图文件

- 重构README.md,使用图表展示系统架构和组件交互
- 新增ROADMAP.md详细记录未来开发计划,分为四个阶段
- 优化项目设置说明,使其更加清晰易读
This commit is contained in:
Yuyao Huang (Sam) 2026-03-28 08:16:55 +08:00
parent c3741ea006
commit 29c0f2e403
2 changed files with 136 additions and 34 deletions

View File

@ -4,45 +4,54 @@ Feishu bot integration with Claude Code CLI.
## Architecture ## Architecture
- **Agent CLI**: Claude Code (print mode) ```
- **Chat Server**: FastAPI ┌─────────────┐ WebSocket ┌──────────────┐ LangChain ┌─────────────┐
- **Client**: Feishu bot API (long-connection) │ Feishu │ ◄──────────────► │ FastAPI │ ◄──────────────► │ LLM API │
│ (client) │ │ (server) │ │ (OpenAI) │
└─────────────┘ └──────────────┘ └─────────────┘
┌─────────────┐
│ Claude Code │
│ (PTY) │
└─────────────┘
```
**Components:**
| Module | Purpose |
|--------|---------|
| `main.py` | FastAPI entry point, starts WebSocket client + session manager |
| `bot/handler.py` | Receives Feishu events, dispatches to orchestrator |
| `bot/feishu.py` | Sends replies back to Feishu chats |
| `orchestrator/agent.py` | LangChain agent with per-user history + active session tracking |
| `orchestrator/tools.py` | Tools: `create_conversation`, `send_to_conversation`, `close_conversation` |
| `agent/manager.py` | Session registry with idle timeout reaper |
| `agent/pty_process.py` | Runs `claude -p` headlessly, manages session continuity |
**Flow:** User message → Feishu WebSocket → Handler → Orchestrator (LLM decides action) → Tool → Session Manager → Claude Code PTY → Response back to Feishu
## Setup ## Setup
### 1. Feishu App 1. **Feishu App**: Create at https://open.feishu.cn
Create app at https://open.feishu.cn: - Enable Bot capability + long-connection event subscription
- Enable **Bot** capability - Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
- Enable **long-connection event subscription** (no public URL needed)
- Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
### 2. LLM Endpoint 2. **LLM Endpoint**: Configure OpenAI-compatible endpoint
Configure OpenAI-compatible endpoint: - `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`
- `OPENAI_BASE_URL`
- `OPENAI_API_KEY`
- `OPENAI_MODEL`
### 3. Claude Code CLI 3. **Claude Code CLI**: Install and authenticate `claude` command
- Install and authenticate `claude` command
- Ensure available in PATH
### 4. Configuration 4. **Configuration**:
```bash ```bash
cp keyring.example.yaml keyring.yaml cp keyring.example.yaml keyring.yaml
# Edit keyring.yaml with your credentials # Edit keyring.yaml with your credentials
``` ```
### 5. Run 5. **Run**:
```bash ```bash
pip install -r requirements.txt pip install -r requirements.txt
python main.py python main.py
``` ```
## Requirements **Requirements**: Python 3.11+
| Item | Notes |
|---|---|
| Python 3.11+ | Required |
| Feishu App | Bot + long-connection enabled |
| OpenAI-compatible LLM | API endpoint and key |
| Claude Code CLI | Installed + authenticated |

93
ROADMAP.md Normal file
View File

@ -0,0 +1,93 @@
# PhoneWork Roadmap
Issues observed in real usage, grouped by impact. No priority order within each phase.
---
## Phase 1 — Core Reliability
These are friction points that directly break or degrade the basic send-message → get-reply loop.
### 1.1 Long output splitting
**Problem:** Feishu truncates messages at 4000 chars. Long code output is silently cut.
**Fix:** Automatically split into multiple sequential messages with `[1/3]`, `[2/3]` headers.
### 1.2 Concurrent message handling
**Problem:** If the user sends two messages quickly, both fire `agent.run()` simultaneously for the same user, causing race conditions in `_active_conv` and interleaved `--resume` calls to the same CC session.
**Fix:** Per-user async lock (or queue) so messages process one at a time per user.
### 1.3 Session persistence across restarts
**Problem:** `manager._sessions` is in-memory. A server restart loses all active sessions. Users have to recreate them.
**Fix:** Persist `{conv_id, cwd, cc_session_id}` to a JSON file on disk; reload on startup.
### 1.4 Mail boy passthrough mode
**Problem:** The mail boy (GLM) sometimes paraphrases or summarizes instead of relaying verbatim, losing code blocks and exact output.
**Fix:** Bypass the mail boy entirely for follow-up messages — detect that there's an active session and call `manager.send()` directly without an LLM round-trip.
---
## Phase 2 — Better Interaction Model
Reducing the number of messages needed to get things done.
### 2.1 Slash commands
**Problem:** Users must phrase everything as natural language for the mail boy to interpret.
**Fix:** Recognize a small set of commands directly in `handler.py` before hitting the LLM:
- `/new <dir>` — create session
- `/list` — list sessions
- `/close` — close active session
- `/switch <n>` — switch active session by number
- `/retry` — resend last message to CC
### 2.2 Multi-session switching
**Problem:** Only one "active session" per user. To switch projects, the user must remember conv_ids.
**Fix:** `/list` shows numbered sessions; `/switch 2` activates session #2. The system prompt shows all open sessions, not just the active one.
### 2.3 Feishu message cards
**Problem:** Plain text is hard to scan — code blocks, file paths, and status info all look the same.
**Fix:** Use Feishu Interactive Cards (`msg_type: interactive`) to render:
- Session status as a structured card (project name, cwd, session ID)
- Action buttons: **Continue**, **Close session**, **Run again**
---
## Phase 3 — Operational Quality
Making it reliable enough to leave running 24/7.
### 3.1 Health check improvements
**Problem:** `/health` only reports session count. No way to know if the Feishu WS connection is alive, or if CC is callable.
**Fix:** Add to `/health`:
- WebSocket connection status
- Last message received timestamp
- A `claude -p "ping"` smoke test result
### 3.2 Automatic reconnection
**Problem:** The Feishu WebSocket thread is a daemon — if it dies silently (network blip), no messages are received and there's no recovery.
**Fix:** Wrap `ws_client.start()` in a retry loop with exponential backoff and log reconnection events.
### 3.3 Per-session timeout configuration
**Problem:** All sessions share a 30-min idle timeout and 300s CC timeout. Long-running tasks (e.g. running tests) may need more; quick chats need less.
**Fix:** Allow per-session timeout overrides; expose via `/new <dir> --timeout 600`.
### 3.4 Audit log
**Problem:** No record of what was sent to Claude Code or what it did. Impossible to debug after the fact.
**Fix:** Append each `(timestamp, conv_id, prompt, response)` to a JSONL file per session under the project directory.
---
## Phase 4 — Multi-user & Security
For sharing the bot with teammates.
### 4.1 User allowlist
**Problem:** Anyone who can message the bot can run arbitrary code via Claude Code.
**Fix:** `ALLOWED_OPEN_IDS` list in `keyring.yaml`; reject messages from unknown users.
### 4.2 Per-user session isolation
**Problem:** All users share the same `manager` singleton — user A could theoretically send to user B's session by guessing a conv_id.
**Fix:** Namespace sessions by user_id; `send_to_conversation` validates that the requesting user owns the session.
### 4.3 Working directory sandboxing
**Problem:** The safety check in `_resolve_dir` blocks paths outside `WORKING_DIR`, but Claude Code itself runs with `--dangerously-skip-permissions` and can write anywhere.
**Fix:** Consider running CC in a restricted user account or container; or drop `--dangerously-skip-permissions` and implement a permission-approval flow via Feishu buttons.