docs: 更新项目架构文档并添加路线图文件

- 重构README.md，使用图表展示系统架构和组件交互 - 新增ROADMAP.md详细记录未来开发计划，分为四个阶段 - 优化项目设置说明，使其更加清晰易读
2026-03-28 08:16:55 +08:00 · 2026-03-28 08:16:55 +08:00 · 29c0f2e403
commit 29c0f2e403
parent c3741ea006
2 changed files with 136 additions and 34 deletions
--- a/README.md
+++ b/README.md
@ -4,45 +4,54 @@ Feishu bot integration with Claude Code CLI.
 ## Architecture
- **Agent CLI**: Claude Code (print mode)
+```
- **Chat Server**: FastAPI
+┌─────────────┐    WebSocket     ┌──────────────┐    LangChain     ┌─────────────┐
- **Client**: Feishu bot API (long-connection)
+│   Feishu    │ ◄──────────────► │   FastAPI    │ ◄──────────────► │  LLM API    │
 │   (client)  │                  │   (server)   │                  │ (OpenAI)    │
 └─────────────┘                  └──────────────┘                  └─────────────┘
                                        │
                                        ▼
                                 ┌─────────────┐
                                 │ Claude Code │
                                 │   (PTY)     │
                                 └─────────────┘
 ```
 **Components:**
 | Module | Purpose |
 |--------|---------|
 | `main.py` | FastAPI entry point, starts WebSocket client + session manager |
 | `bot/handler.py` | Receives Feishu events, dispatches to orchestrator |
 | `bot/feishu.py` | Sends replies back to Feishu chats |
 | `orchestrator/agent.py` | LangChain agent with per-user history + active session tracking |
 | `orchestrator/tools.py` | Tools: `create_conversation`, `send_to_conversation`, `close_conversation` |
 | `agent/manager.py` | Session registry with idle timeout reaper |
 | `agent/pty_process.py` | Runs `claude -p` headlessly, manages session continuity |
 **Flow:** User message → Feishu WebSocket → Handler → Orchestrator (LLM decides action) → Tool → Session Manager → Claude Code PTY → Response back to Feishu
 ## Setup
-### 1. Feishu App
+1. **Feishu App**: Create at https://open.feishu.cn
-Create app at https://open.feishu.cn:
+   - Enable Bot capability + long-connection event subscription
- Enable **Bot** capability
+   - Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
 - Enable **long-connection event subscription** (no public URL needed)
 - Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
-### 2. LLM Endpoint
+2. **LLM Endpoint**: Configure OpenAI-compatible endpoint
-Configure OpenAI-compatible endpoint:
+   - `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`
 - `OPENAI_BASE_URL`
 - `OPENAI_API_KEY`
 - `OPENAI_MODEL`
-### 3. Claude Code CLI
+3. **Claude Code CLI**: Install and authenticate `claude` command
 - Install and authenticate `claude` command
 - Ensure available in PATH
-### 4. Configuration
+4. **Configuration**:
-```bash
+   ```bash
-cp keyring.example.yaml keyring.yaml
+   cp keyring.example.yaml keyring.yaml
-# Edit keyring.yaml with your credentials
+   # Edit keyring.yaml with your credentials
-```
+   ```
-### 5. Run
+5. **Run**:
-```bash
+   ```bash
-pip install -r requirements.txt
+   pip install -r requirements.txt
-python main.py
+   python main.py
-```
+   ```
-## Requirements
+**Requirements**: Python 3.11+
 | Item | Notes |
 |---|---|
 | Python 3.11+ | Required |
 | Feishu App | Bot + long-connection enabled |
 | OpenAI-compatible LLM | API endpoint and key |
 | Claude Code CLI | Installed + authenticated |
--- a/ROADMAP.md
+++ b/ROADMAP.md
@ -0,0 +1,93 @@
 # PhoneWork Roadmap
 Issues observed in real usage, grouped by impact. No priority order within each phase.
 ---
 ## Phase 1 — Core Reliability
 These are friction points that directly break or degrade the basic send-message → get-reply loop.
 ### 1.1 Long output splitting
 **Problem:** Feishu truncates messages at 4000 chars. Long code output is silently cut.
 **Fix:** Automatically split into multiple sequential messages with `[1/3]`, `[2/3]` headers.
 ### 1.2 Concurrent message handling
 **Problem:** If the user sends two messages quickly, both fire `agent.run()` simultaneously for the same user, causing race conditions in `_active_conv` and interleaved `--resume` calls to the same CC session.
 **Fix:** Per-user async lock (or queue) so messages process one at a time per user.
 ### 1.3 Session persistence across restarts
 **Problem:** `manager._sessions` is in-memory. A server restart loses all active sessions. Users have to recreate them.
 **Fix:** Persist `{conv_id, cwd, cc_session_id}` to a JSON file on disk; reload on startup.
 ### 1.4 Mail boy passthrough mode
 **Problem:** The mail boy (GLM) sometimes paraphrases or summarizes instead of relaying verbatim, losing code blocks and exact output.
 **Fix:** Bypass the mail boy entirely for follow-up messages — detect that there's an active session and call `manager.send()` directly without an LLM round-trip.
 ---
 ## Phase 2 — Better Interaction Model
 Reducing the number of messages needed to get things done.
 ### 2.1 Slash commands
 **Problem:** Users must phrase everything as natural language for the mail boy to interpret.
 **Fix:** Recognize a small set of commands directly in `handler.py` before hitting the LLM:
 - `/new <dir>` — create session
 - `/list` — list sessions
 - `/close` — close active session
 - `/switch <n>` — switch active session by number
 - `/retry` — resend last message to CC
 ### 2.2 Multi-session switching
 **Problem:** Only one "active session" per user. To switch projects, the user must remember conv_ids.
 **Fix:** `/list` shows numbered sessions; `/switch 2` activates session #2. The system prompt shows all open sessions, not just the active one.
 ### 2.3 Feishu message cards
 **Problem:** Plain text is hard to scan — code blocks, file paths, and status info all look the same.
 **Fix:** Use Feishu Interactive Cards (`msg_type: interactive`) to render:
 - Session status as a structured card (project name, cwd, session ID)
 - Action buttons: **Continue**, **Close session**, **Run again**
 ---
 ## Phase 3 — Operational Quality
 Making it reliable enough to leave running 24/7.
 ### 3.1 Health check improvements
 **Problem:** `/health` only reports session count. No way to know if the Feishu WS connection is alive, or if CC is callable.
 **Fix:** Add to `/health`:
 - WebSocket connection status
 - Last message received timestamp
 - A `claude -p "ping"` smoke test result
 ### 3.2 Automatic reconnection
 **Problem:** The Feishu WebSocket thread is a daemon — if it dies silently (network blip), no messages are received and there's no recovery.
 **Fix:** Wrap `ws_client.start()` in a retry loop with exponential backoff and log reconnection events.
 ### 3.3 Per-session timeout configuration
 **Problem:** All sessions share a 30-min idle timeout and 300s CC timeout. Long-running tasks (e.g. running tests) may need more; quick chats need less.
 **Fix:** Allow per-session timeout overrides; expose via `/new <dir> --timeout 600`.
 ### 3.4 Audit log
 **Problem:** No record of what was sent to Claude Code or what it did. Impossible to debug after the fact.
 **Fix:** Append each `(timestamp, conv_id, prompt, response)` to a JSONL file per session under the project directory.
 ---
 ## Phase 4 — Multi-user & Security
 For sharing the bot with teammates.
 ### 4.1 User allowlist
 **Problem:** Anyone who can message the bot can run arbitrary code via Claude Code.
 **Fix:** `ALLOWED_OPEN_IDS` list in `keyring.yaml`; reject messages from unknown users.
 ### 4.2 Per-user session isolation
 **Problem:** All users share the same `manager` singleton — user A could theoretically send to user B's session by guessing a conv_id.
 **Fix:** Namespace sessions by user_id; `send_to_conversation` validates that the requesting user owns the session.
 ### 4.3 Working directory sandboxing
 **Problem:** The safety check in `_resolve_dir` blocks paths outside `WORKING_DIR`, but Claude Code itself runs with `--dangerously-skip-permissions` and can write anywhere.
 **Fix:** Consider running CC in a restricted user account or container; or drop `--dangerously-skip-permissions` and implement a permission-approval flow via Feishu buttons.