diff --git a/README.md b/README.md
index 899b74b..a080826 100644
--- a/README.md
+++ b/README.md
@@ -4,45 +4,54 @@ Feishu bot integration with Claude Code CLI.
 
 ## Architecture
 
-- **Agent CLI**: Claude Code (print mode)
-- **Chat Server**: FastAPI
-- **Client**: Feishu bot API (long-connection)
+```
+┌─────────────┐    WebSocket     ┌──────────────┐    LangChain     ┌─────────────┐
+│   Feishu    │ ◄──────────────► │   FastAPI    │ ◄──────────────► │  LLM API    │
+│   (client)  │                  │   (server)   │                  │ (OpenAI)    │
+└─────────────┘                  └──────────────┘                  └─────────────┘
+                                        │
+                                        ▼
+                                 ┌─────────────┐
+                                 │ Claude Code │
+                                 │   (PTY)     │
+                                 └─────────────┘
+```
+
+**Components:**
+
+| Module | Purpose |
+|--------|---------|
+| `main.py` | FastAPI entry point, starts WebSocket client + session manager |
+| `bot/handler.py` | Receives Feishu events, dispatches to orchestrator |
+| `bot/feishu.py` | Sends replies back to Feishu chats |
+| `orchestrator/agent.py` | LangChain agent with per-user history + active session tracking |
+| `orchestrator/tools.py` | Tools: `create_conversation`, `send_to_conversation`, `close_conversation` |
+| `agent/manager.py` | Session registry with idle timeout reaper |
+| `agent/pty_process.py` | Runs `claude -p` headlessly, manages session continuity |
+
+**Flow:** User message → Feishu WebSocket → Handler → Orchestrator (LLM decides action) → Tool → Session Manager → Claude Code PTY → Response back to Feishu
 
 ## Setup
 
-### 1. Feishu App
-Create app at https://open.feishu.cn:
-- Enable **Bot** capability
-- Enable **long-connection event subscription** (no public URL needed)
-- Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
+1. **Feishu App**: Create at https://open.feishu.cn
+   - Enable Bot capability + long-connection event subscription
+   - Get `FEISHU_APP_ID` and `FEISHU_APP_SECRET`
 
-### 2. LLM Endpoint
-Configure OpenAI-compatible endpoint:
-- `OPENAI_BASE_URL`
-- `OPENAI_API_KEY`
-- `OPENAI_MODEL`
+2. **LLM Endpoint**: Configure OpenAI-compatible endpoint
+   - `OPENAI_BASE_URL`, `OPENAI_API_KEY`, `OPENAI_MODEL`
 
-### 3. Claude Code CLI
-- Install and authenticate `claude` command
-- Ensure available in PATH
+3. **Claude Code CLI**: Install and authenticate `claude` command
 
-### 4. Configuration
-```bash
-cp keyring.example.yaml keyring.yaml
-# Edit keyring.yaml with your credentials
-```
+4. **Configuration**:
+   ```bash
+   cp keyring.example.yaml keyring.yaml
+   # Edit keyring.yaml with your credentials
+   ```
 
-### 5. Run
-```bash
-pip install -r requirements.txt
-python main.py
-```
+5. **Run**:
+   ```bash
+   pip install -r requirements.txt
+   python main.py
+   ```
 
-## Requirements
-
-| Item | Notes |
-|---|---|
-| Python 3.11+ | Required |
-| Feishu App | Bot + long-connection enabled |
-| OpenAI-compatible LLM | API endpoint and key |
-| Claude Code CLI | Installed + authenticated |
+**Requirements**: Python 3.11+
diff --git a/ROADMAP.md b/ROADMAP.md
new file mode 100644
index 0000000..56ef465
--- /dev/null
+++ b/ROADMAP.md
@@ -0,0 +1,93 @@
+# PhoneWork Roadmap
+
+Issues observed in real usage, grouped by impact. No priority order within each phase.
+
+---
+
+## Phase 1 — Core Reliability
+
+These are friction points that directly break or degrade the basic send-message → get-reply loop.
+
+### 1.1 Long output splitting
+**Problem:** Feishu truncates messages at 4000 chars. Long code output is silently cut.
+**Fix:** Automatically split into multiple sequential messages with `[1/3]`, `[2/3]` headers.
+
+### 1.2 Concurrent message handling
+**Problem:** If the user sends two messages quickly, both fire `agent.run()` simultaneously for the same user, causing race conditions in `_active_conv` and interleaved `--resume` calls to the same CC session.
+**Fix:** Per-user async lock (or queue) so messages process one at a time per user.
+
+### 1.3 Session persistence across restarts
+**Problem:** `manager._sessions` is in-memory. A server restart loses all active sessions. Users have to recreate them.
+**Fix:** Persist `{conv_id, cwd, cc_session_id}` to a JSON file on disk; reload on startup.
+
+### 1.4 Mail boy passthrough mode
+**Problem:** The mail boy (GLM) sometimes paraphrases or summarizes instead of relaying verbatim, losing code blocks and exact output.
+**Fix:** Bypass the mail boy entirely for follow-up messages — detect that there's an active session and call `manager.send()` directly without an LLM round-trip.
+
+---
+
+## Phase 2 — Better Interaction Model
+
+Reducing the number of messages needed to get things done.
+
+### 2.1 Slash commands
+**Problem:** Users must phrase everything as natural language for the mail boy to interpret.
+**Fix:** Recognize a small set of commands directly in `handler.py` before hitting the LLM:
+- `/new <dir>` — create session
+- `/list` — list sessions
+- `/close` — close active session
+- `/switch <n>` — switch active session by number
+- `/retry` — resend last message to CC
+
+### 2.2 Multi-session switching
+**Problem:** Only one "active session" per user. To switch projects, the user must remember conv_ids.
+**Fix:** `/list` shows numbered sessions; `/switch 2` activates session #2. The system prompt shows all open sessions, not just the active one.
+
+### 2.3 Feishu message cards
+**Problem:** Plain text is hard to scan — code blocks, file paths, and status info all look the same.
+**Fix:** Use Feishu Interactive Cards (`msg_type: interactive`) to render:
+- Session status as a structured card (project name, cwd, session ID)
+- Action buttons: **Continue**, **Close session**, **Run again**
+
+---
+
+## Phase 3 — Operational Quality
+
+Making it reliable enough to leave running 24/7.
+
+### 3.1 Health check improvements
+**Problem:** `/health` only reports session count. No way to know if the Feishu WS connection is alive, or if CC is callable.
+**Fix:** Add to `/health`:
+- WebSocket connection status
+- Last message received timestamp
+- A `claude -p "ping"` smoke test result
+
+### 3.2 Automatic reconnection
+**Problem:** The Feishu WebSocket thread is a daemon — if it dies silently (network blip), no messages are received and there's no recovery.
+**Fix:** Wrap `ws_client.start()` in a retry loop with exponential backoff and log reconnection events.
+
+### 3.3 Per-session timeout configuration
+**Problem:** All sessions share a 30-min idle timeout and 300s CC timeout. Long-running tasks (e.g. running tests) may need more; quick chats need less.
+**Fix:** Allow per-session timeout overrides; expose via `/new <dir> --timeout 600`.
+
+### 3.4 Audit log
+**Problem:** No record of what was sent to Claude Code or what it did. Impossible to debug after the fact.
+**Fix:** Append each `(timestamp, conv_id, prompt, response)` to a JSONL file per session under the project directory.
+
+---
+
+## Phase 4 — Multi-user & Security
+
+For sharing the bot with teammates.
+
+### 4.1 User allowlist
+**Problem:** Anyone who can message the bot can run arbitrary code via Claude Code.
+**Fix:** `ALLOWED_OPEN_IDS` list in `keyring.yaml`; reject messages from unknown users.
+
+### 4.2 Per-user session isolation
+**Problem:** All users share the same `manager` singleton — user A could theoretically send to user B's session by guessing a conv_id.
+**Fix:** Namespace sessions by user_id; `send_to_conversation` validates that the requesting user owns the session.
+
+### 4.3 Working directory sandboxing
+**Problem:** The safety check in `_resolve_dir` blocks paths outside `WORKING_DIR`, but Claude Code itself runs with `--dangerously-skip-permissions` and can write anywhere.
+**Fix:** Consider running CC in a restricted user account or container; or drop `--dangerously-skip-permissions` and implement a permission-approval flow via Feishu buttons.