diff --git a/README.md b/README.md index 899b74b..a080826 100644 --- a/README.md +++ b/README.md @@ -4,45 +4,54 @@ Feishu bot integration with Claude Code CLI. ## Architecture -- **Agent CLI**: Claude Code (print mode) -- **Chat Server**: FastAPI -- **Client**: Feishu bot API (long-connection) +``` +┌─────────────┐ WebSocket ┌──────────────┐ LangChain ┌─────────────┐ +│ Feishu │ ◄──────────────► │ FastAPI │ ◄──────────────► │ LLM API │ +│ (client) │ │ (server) │ │ (OpenAI) │ +└─────────────┘ └──────────────┘ └─────────────┘ + │ + ▼ + ┌─────────────┐ + │ Claude Code │ + │ (PTY) │ + └─────────────┘ +``` + +**Components:** + +| Module | Purpose | +|--------|---------| +| `main.py` | FastAPI entry point, starts WebSocket client + session manager | +| `bot/handler.py` | Receives Feishu events, dispatches to orchestrator | +| `bot/feishu.py` | Sends replies back to Feishu chats | +| `orchestrator/agent.py` | LangChain agent with per-user history + active session tracking | +| `orchestrator/tools.py` | Tools: `create_conversation`, `send_to_conversation`, `close_conversation` | +| `agent/manager.py` | Session registry with idle timeout reaper | +| `agent/pty_process.py` | Runs `claude -p` headlessly, manages session continuity | + +**Flow:** User message → Feishu WebSocket → Handler → Orchestrator (LLM decides action) → Tool → Session Manager → Claude Code PTY → Response back to Feishu ## Setup -### 1. Feishu App -Create app at https://open.feishu.cn: -- Enable **Bot** capability -- Enable **long-connection event subscription** (no public URL needed) -- Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET` +1. **Feishu App**: Create at https://open.feishu.cn + - Enable Bot capability + long-connection event subscription + - Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET` -### 2. LLM Endpoint -Configure OpenAI-compatible endpoint: -- `OPENAI_BASE_URL` -- `OPENAI_API_KEY` -- `OPENAI_MODEL` +2. **LLM Endpoint**: Configure OpenAI-compatible endpoint + - `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL` -### 3. Claude Code CLI -- Install and authenticate `claude` command -- Ensure available in PATH +3. **Claude Code CLI**: Install and authenticate `claude` command -### 4. Configuration -```bash -cp keyring.example.yaml keyring.yaml -# Edit keyring.yaml with your credentials -``` +4. **Configuration**: + ```bash + cp keyring.example.yaml keyring.yaml + # Edit keyring.yaml with your credentials + ``` -### 5. Run -```bash -pip install -r requirements.txt -python main.py -``` +5. **Run**: + ```bash + pip install -r requirements.txt + python main.py + ``` -## Requirements - -| Item | Notes | -|---|---| -| Python 3.11+ | Required | -| Feishu App | Bot + long-connection enabled | -| OpenAI-compatible LLM | API endpoint and key | -| Claude Code CLI | Installed + authenticated | +**Requirements**: Python 3.11+ diff --git a/ROADMAP.md b/ROADMAP.md new file mode 100644 index 0000000..56ef465 --- /dev/null +++ b/ROADMAP.md @@ -0,0 +1,93 @@ +# PhoneWork Roadmap + +Issues observed in real usage, grouped by impact. No priority order within each phase. + +--- + +## Phase 1 — Core Reliability + +These are friction points that directly break or degrade the basic send-message → get-reply loop. + +### 1.1 Long output splitting +**Problem:** Feishu truncates messages at 4000 chars. Long code output is silently cut. +**Fix:** Automatically split into multiple sequential messages with `[1/3]`, `[2/3]` headers. + +### 1.2 Concurrent message handling +**Problem:** If the user sends two messages quickly, both fire `agent.run()` simultaneously for the same user, causing race conditions in `_active_conv` and interleaved `--resume` calls to the same CC session. +**Fix:** Per-user async lock (or queue) so messages process one at a time per user. + +### 1.3 Session persistence across restarts +**Problem:** `manager._sessions` is in-memory. A server restart loses all active sessions. Users have to recreate them. +**Fix:** Persist `{conv_id, cwd, cc_session_id}` to a JSON file on disk; reload on startup. + +### 1.4 Mail boy passthrough mode +**Problem:** The mail boy (GLM) sometimes paraphrases or summarizes instead of relaying verbatim, losing code blocks and exact output. +**Fix:** Bypass the mail boy entirely for follow-up messages — detect that there's an active session and call `manager.send()` directly without an LLM round-trip. + +--- + +## Phase 2 — Better Interaction Model + +Reducing the number of messages needed to get things done. + +### 2.1 Slash commands +**Problem:** Users must phrase everything as natural language for the mail boy to interpret. +**Fix:** Recognize a small set of commands directly in `handler.py` before hitting the LLM: +- `/new ` — create session +- `/list` — list sessions +- `/close` — close active session +- `/switch ` — switch active session by number +- `/retry` — resend last message to CC + +### 2.2 Multi-session switching +**Problem:** Only one "active session" per user. To switch projects, the user must remember conv_ids. +**Fix:** `/list` shows numbered sessions; `/switch 2` activates session #2. The system prompt shows all open sessions, not just the active one. + +### 2.3 Feishu message cards +**Problem:** Plain text is hard to scan — code blocks, file paths, and status info all look the same. +**Fix:** Use Feishu Interactive Cards (`msg_type: interactive`) to render: +- Session status as a structured card (project name, cwd, session ID) +- Action buttons: **Continue**, **Close session**, **Run again** + +--- + +## Phase 3 — Operational Quality + +Making it reliable enough to leave running 24/7. + +### 3.1 Health check improvements +**Problem:** `/health` only reports session count. No way to know if the Feishu WS connection is alive, or if CC is callable. +**Fix:** Add to `/health`: +- WebSocket connection status +- Last message received timestamp +- A `claude -p "ping"` smoke test result + +### 3.2 Automatic reconnection +**Problem:** The Feishu WebSocket thread is a daemon — if it dies silently (network blip), no messages are received and there's no recovery. +**Fix:** Wrap `ws_client.start()` in a retry loop with exponential backoff and log reconnection events. + +### 3.3 Per-session timeout configuration +**Problem:** All sessions share a 30-min idle timeout and 300s CC timeout. Long-running tasks (e.g. running tests) may need more; quick chats need less. +**Fix:** Allow per-session timeout overrides; expose via `/new --timeout 600`. + +### 3.4 Audit log +**Problem:** No record of what was sent to Claude Code or what it did. Impossible to debug after the fact. +**Fix:** Append each `(timestamp, conv_id, prompt, response)` to a JSONL file per session under the project directory. + +--- + +## Phase 4 — Multi-user & Security + +For sharing the bot with teammates. + +### 4.1 User allowlist +**Problem:** Anyone who can message the bot can run arbitrary code via Claude Code. +**Fix:** `ALLOWED_OPEN_IDS` list in `keyring.yaml`; reject messages from unknown users. + +### 4.2 Per-user session isolation +**Problem:** All users share the same `manager` singleton — user A could theoretically send to user B's session by guessing a conv_id. +**Fix:** Namespace sessions by user_id; `send_to_conversation` validates that the requesting user owns the session. + +### 4.3 Working directory sandboxing +**Problem:** The safety check in `_resolve_dir` blocks paths outside `WORKING_DIR`, but Claude Code itself runs with `--dangerously-skip-permissions` and can write anywhere. +**Fix:** Consider running CC in a restricted user account or container; or drop `--dangerously-skip-permissions` and implement a permission-approval flow via Feishu buttons.