chore: initial monorepo scaffold + WDS Phase 1+2 artifacts

- Nx 22.7 monorepo (pnpm 11.1, TypeScript 5.9, Node 24) - apps/api: NestJS 11 (CJS conforme CODING-RULES.md PGD-DB-004) - apps/web: React 19 + Vite 8 (ESM) - libs/shared/api-interface: Zod contract base - Docker Compose dev: Postgres 18, Valkey 8, MinIO, Mailpit - WDS artifacts: - design-artifacts/A-Product-Brief/ (5 docs canônicos + 16 dialogs) - design-artifacts/B-Trigger-Map/ (hub + 4 personas + feature impact) - Stack canon: STACK.md v2.2 + CODING-RULES.md v2.0 + brand.md - AGENTS.md + README.md como entrada para devs/agentes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:34:20 +00:00
commit 17c08e6392
3631 changed files with 855518 additions and 0 deletions
--- a/.agents/skills/bmad-story-automator/data/adaptive-retry.md
+++ b/.agents/skills/bmad-story-automator/data/adaptive-retry.md
@@ -0,0 +1,102 @@
+# Adaptive Retry Strategy
+
+**Purpose:** Handle dev-story failures intelligently based on progress patterns and agent switching.
+
+**Version:** 2.0.0
+
+**See also:** `retry-fallback-strategy.md` for the universal retry/fallback pattern.
+
+---
+
+## Agent Alternation
+
+This strategy works WITH the retry-fallback pattern:
+- Odd attempts (1, 3, 5): Use primary agent
+- Even attempts (2, 4): Use fallback agent (if configured)
+- Plateau detection applies ACROSS agents (same task across both agents = complexity issue)
+
+---
+
+## Progress Tracking
+
+Track failure patterns across retries (per agent):
+
+```
+attempt_1_progress = {agent: primary, tasks: 5/9}
+attempt_2_progress = {agent: fallback, tasks: 4/9}
+attempt_3_progress = {agent: primary, tasks: 5/9}  # same as attempt 1
+attempt_4_progress = {agent: fallback, tasks: 5/9} # plateau detected
+attempt_5_progress = {agent: primary, tasks: 5/9}  # confirmed plateau
+```
+
+---
+
+## Decision Logic
+
+| Attempt | Condition | Action |
+|---------|-----------|--------|
+| 1 | FAILURE | Switch to fallback agent, retry |
+| 2 | FAILURE, progress > attempt_1 | Switch back to primary, retry with 2x poll interval |
+| 2 | FAILURE, progress ≤ attempt_1 | Switch back to primary, analyze if same plateau point |
+| 3 | FAILURE, plateau at same task (any agent) | Continue to attempt 4 (confirm with other agent) |
+| 4 | FAILURE, plateau confirmed across agents | **DEFER** story (complexity/context limit hit) |
+| 4 | FAILURE, variable progress | One more retry with extended timeout |
+| 5 | FAILURE, plateau confirmed | **DEFER** story |
+| 5 | FAILURE, zero progress all attempts | **ESCALATE** (likely API/connection issue) |
+| 5 | FAILURE, variable but incomplete | **ESCALATE** (all retries exhausted) |
+
+---
+
+## Plateau Detection
+
+If `tasks_completed` is identical across 2+ attempts AND the session crashed/stopped at the same task, this indicates a complexity or context limit.
+
+**Indicators:**
+- Same task number across multiple attempts
+- Session crashes at same point
+- No progress despite retries
+
+**Action:** Mark story as "deferred" and continue with next story.
+
+---
+
+## DEFER Action
+
+When a story is deferred (not failed):
+
+1. **Update state:** Mark story as "deferred" in progress table
+2. **Log:** "Story {N} deferred - dev-story hit complexity limit at {tasks_completed}/{tasks_total}"
+3. **Continue:** Proceed to next story (do not escalate to user unless custom instructions say otherwise)
+
+**Why defer vs fail?**
+- Deferred stories can be revisited manually
+- Doesn't block automation of remaining stories
+- Distinguishes from actual errors (API failures, etc.)
+
+---
+
+## Integration with Crash Recovery
+
+Adaptive retry works WITH crash recovery AND agent fallback:
+
+| Type | Trigger | Handling |
+|------|---------|----------|
+| **Adaptive Retry** | Session completed but FAILED (wrong output, tests failed) | Progress-based retry with agent alternation |
+| **Crash Recovery** | Session DIED unexpectedly (context limit, API error, kill) | Switch agent, retry with new session |
+| **Agent Fallback** | Primary agent fails | Automatic switch to fallback agent on next attempt |
+
+All three mechanisms work together:
+1. Primary crashes → switch to fallback, new session
+2. Fallback fails at task 5 → switch to primary, retry
+3. Primary fails at task 5 → plateau detected across agents → DEFER
+
+**Single attempt counter across all failure types.**
+
+---
+
+## Network Error Handling
+
+On network-related failures (see `retry-fallback-strategy.md`):
+- Sleep 60 seconds before next attempt
+- Network errors do NOT count toward plateau detection
+- Always retry after network error (up to max attempts)
--- a/.agents/skills/bmad-story-automator/data/agent-config-presets.json
+++ b/.agents/skills/bmad-story-automator/data/agent-config-presets.json
@@ -0,0 +1,4 @@
+{
+  "version": "1.0.0",
+  "presets": []
+}
--- a/.agents/skills/bmad-story-automator/data/agent-config-prompts.md
+++ b/.agents/skills/bmad-story-automator/data/agent-config-prompts.md
@@ -0,0 +1,199 @@
+# Agent Configuration Prompts
+
+---
+
+## 🚨 PREREQUISITE (MUST BE MET BEFORE DISPLAYING)
+
+Before showing agent configuration prompts, you MUST have:
+
+1. ✅ **Complexity Matrix displayed** - User has seen the story complexity breakdown
+2. ✅ **`stories_json` populated** - Programmatic complexity data from `scripts/story-automator parse-story --rules`
+3. ✅ **Complexity summary available** - Counts of Low/Medium/High stories
+
+**If these are not met, DO NOT proceed with agent configuration. Go back and complete step 3.**
+
+---
+
+## Agent Configuration Display (v6.0.0)
+
+**IMPORTANT:** This prompt MUST reference the actual complexity data. Do not show generic prompts.
+
+**IMPORTANT:** Select the correct table variant based on `skip_automate`:
+- If `skip_automate` is **false**: show the **WITH auto** table
+- If `skip_automate` is **true**: show the **WITHOUT auto** table
+
+**IMPORTANT:** Before displaying options, check for saved presets:
+```bash
+presets_result=$("{buildStateDoc}" agent-config list --file "{agentConfigPresets}")
+preset_count=$(echo "$presets_result" | jq -r '.count')
+```
+- If `preset_count > 0`: include **[L]oad saved** option in the menu
+- If `preset_count == 0`: omit [L] option (show only S/U/C)
+
+### Variant A: WITH auto column (skip_automate=false)
+
+```
+**AI Agent Configuration (Based on Your Complexity Analysis)**
+
+Your stories by complexity:
+- Low: {low_count} stories
+- Medium: {medium_count} stories
+- High: {high_count} stories
+
+**Agent Details:**
+- **Claude:** `claude --dangerously-skip-permissions` + natural language skill prompt
+- **Codex:** `codex exec --full-auto` + natural language prompt (no command prefix)
+
+**Suggested Complexity-Based Configuration:**
+
+| Complexity | create | dev | auto | review | Rationale |
+|------------|--------|-----|------|--------|-----------|
+| Low | claude | claude | claude | claude | Claude handles simple tasks well |
+| Medium | codex | codex | codex | codex | Codex for moderate complexity (Claude fallback) |
+| High | codex | codex | codex | codex | Codex for complex work (Claude fallback) |
+| Retro | inherits default | - | - | - | Retrospectives follow the configured primary agent unless overridden |
+
+**Options:**
+1. **[S]uggested** - Apply complexity-based defaults above
+2. **[U]niform** - Same agent for ALL stories (you specify which)
+3. **[C]ustom** - Define your own per-complexity or per-task settings
+{IF_PRESETS}4. **[L]oad saved** - Use a previously saved configuration{END_IF_PRESETS}
+
+Enter choice ({IF_PRESETS}S/U/C/L{ELSE}S/U/C{END_IF}) or provide custom overrides:
+```
+
+**Conditional display rule:** `{IF_PRESETS}` blocks render only when `preset_count > 0`.
+
+### Variant B: WITHOUT auto column (skip_automate=true)
+
+```
+**AI Agent Configuration (Based on Your Complexity Analysis)**
+
+Your stories by complexity:
+- Low: {low_count} stories
+- Medium: {medium_count} stories
+- High: {high_count} stories
+
+**Agent Details:**
+- **Claude:** `claude --dangerously-skip-permissions` + natural language skill prompt
+- **Codex:** `codex exec --full-auto` + natural language prompt (no command prefix)
+
+**Suggested Complexity-Based Configuration:**
+
+| Complexity | create | dev | review | Rationale |
+|------------|--------|-----|--------|-----------|
+| Low | claude | claude | claude | Claude handles simple tasks well |
+| Medium | codex | codex | codex | Codex for moderate complexity (Claude fallback) |
+| High | codex | codex | codex | Codex for complex work (Claude fallback) |
+| Retro | inherits default | - | - | Retrospectives follow the configured primary agent unless overridden |
+
+**Options:**
+1. **[S]uggested** - Apply complexity-based defaults above
+2. **[U]niform** - Same agent for ALL stories (you specify which)
+3. **[C]ustom** - Define your own per-complexity or per-task settings
+{IF_PRESETS}4. **[L]oad saved** - Use a previously saved configuration{END_IF_PRESETS}
+
+Enter choice ({IF_PRESETS}S/U/C/L{ELSE}S/U/C{END_IF}) or provide custom overrides:
+```
+
+## Load Saved Preset Prompt (Option L)
+
+**Prerequisite:** `preset_count > 0` (checked before displaying main menu).
+
+```bash
+presets_result=$("{buildStateDoc}" agent-config list --file "{agentConfigPresets}")
+```
+
+Display:
+```
+**Saved Agent Configurations:**
+
+{numbered list from presets_result, e.g.:}
+1. all-claude (saved 2026-03-10)
+2. codex-heavy (saved 2026-03-08)
+
+[D]elete a preset
+
+Enter preset number to load, or [B]ack to return to options:
+```
+
+**Wait.**
+
+**IF number selected:**
+```bash
+preset_name="{selected preset name}"
+loaded=$("{buildStateDoc}" agent-config load --file "{agentConfigPresets}" --name "$preset_name")
+agent_config_json=$(echo "$loaded" | jq -r '.config')
+```
+Display loaded config summary, then proceed with this as `agent_config_json`.
+
+**IF D selected:**
+Ask which preset number to delete, then:
+```bash
+"{buildStateDoc}" agent-config delete --file "{agentConfigPresets}" --name "$delete_name"
+```
+Redisplay this prompt (or return to main options if no presets remain).
+
+**IF B selected:** Return to main S/U/C/L menu.
+
+---
+
+## Save Configuration Prompt
+
+**When to show:** After the user completes a **[C]ustom** or **[U]niform** configuration (NOT after [S]uggested or [L]oad).
+
+```
+**Save this configuration for future runs?**
+
+Enter a name to save (e.g., `all-claude`, `codex-heavy`) or [N]o to skip:
+```
+
+**Wait.**
+
+**IF name provided:**
+```bash
+"{buildStateDoc}" agent-config save --file "{agentConfigPresets}" --name "$save_name" --config-json "$agent_config_json"
+```
+Display: "Configuration saved as **{save_name}**."
+
+**IF N or empty:** Skip, continue.
+
+---
+
+## Uniform Agent Prompt (Option U)
+
+```
+**Uniform Agent Configuration**
+
+Use the same agent for ALL {total_count} stories regardless of complexity.
+
+Which agent for all tasks?
+- `claude` - Claude for everything (more capable, slower)
+- `codex` - Codex for everything (faster, simpler)
+- `claude, false` - Claude only, no fallback
+- `codex, claude` - Codex primary, Claude fallback
+
+Enter agent config:
+```
+
+## Custom Configuration Prompt (Option C)
+
+```
+**Custom Agent Configuration**
+
+Define agents per complexity level and/or per task.
+
+**Per-Complexity Format:** `complexity.task: primary, fallback`
+- `low.dev: claude, false` → Claude for low-complexity dev, no fallback
+- `medium.create: codex, claude` → Codex for medium-complexity create
+- `high.review: claude, false` → Claude for high-complexity review
+
+**Per-Task Format (applies to all complexities):** `task: primary, fallback`
+- `review: claude, false` → Claude for ALL reviews
+- `dev: codex, claude` → Codex for ALL dev
+
+**Complexity levels:** low, medium, high
+**Tasks:** create, dev, auto, review
+
+Enter overrides (comma-separated):
+```
--- a/.agents/skills/bmad-story-automator/data/agent-fallback-troubleshooting.md
+++ b/.agents/skills/bmad-story-automator/data/agent-fallback-troubleshooting.md
@@ -0,0 +1,180 @@
+# Agent Fallback Troubleshooting
+
+### Issue: Session spawns Claude instead of Codex
+
+**Symptoms:**
+- Output shows Claude-specific messages (e.g., "You've used 84% of your weekly limit")
+- Expected Codex but got Claude
+
+**Cause:** The `--agent` flag must be passed to `story-automator tmux-wrapper spawn`, not to `build-cmd`.
+
+**Correct Usage (v1.4.0+):**
+```bash
+# Method 1: Use --agent flag on spawn (RECOMMENDED)
+session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
+  --agent codex \
+  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")
+
+# Method 2: Set environment variable before spawn
+export AI_AGENT="codex"
+session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
+  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")
+```
+
+**Wrong Usage:**
+```bash
+# WRONG - this doesn't work
+session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
+  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id" --agent codex)")
+```
+
+### Issue: Monitor reports "stuck" but Codex is active
+
+**Symptoms:**
+- `story-automator monitor-session` returns `stuck` state after 4 polls
+- Manual inspection shows Codex still producing output (no prompt, output continues to grow)
+
+**Cause:** The monitoring script relied on marker detection instead of output freshness.
+
+**Fixed in v2.4.0:**
+- Output freshness tracking (no marker reliance)
+- `CODEX_OUTPUT_STALE_SECONDS` controls how long Codex can be silent before "stuck"
+- Codex still gets 6 poll grace period before "stuck"
+
+**Verification:**
+```bash
+# Check if session has AI_AGENT set
+tmux show-environment -t "session-name" AI_AGENT
+
+# Manual session status check
+"$scripts" tmux-status-check "session-name" --project-root "$PWD"
+```
+
+### Issue: log command error when using --agent flag
+
+**Symptoms:**
+```
+log: Unknown subcommand 'Codex agent detected - applying 1.5x timeout (90min)'
+```
+
+**Cause:** macOS has `/usr/bin/log` system command. If the `log()` bash function wasn't defined before first use, bash fell through to the system command.
+
+**Fixed in v1.4.0:** The `log()` function is now defined before argument parsing in `story-automator monitor-session`.
+
+### Issue: Manual polling required as workaround
+
+**If monitoring still fails**, use this manual polling approach:
+```bash
+for i in {1..60}; do
+    sleep 30
+    # Check if session still exists
+    if ! tmux has-session -t "session-name" 2>/dev/null; then
+        echo "Session ended"
+        break
+    fi
+    # Check for shell prompt (completion indicator)
+    last_line=$(tmux capture-pane -t "session-name" -p | tail -1)
+    if echo "$last_line" | grep -qE '❯$|\$$|#$'; then
+        echo "Session complete (shell prompt detected)"
+        break
+    fi
+done
+```
+
+### Issue: Codex sessions explore files but don't execute full workflow (v1.4.0)
+
+**Symptoms:**
+- Session output shows file exploration (`sed`, `rg`, `cat` commands)
+- No actual review findings or story updates
+- Sprint-status never changes from "review" to "done"
+- Session completes but workflow steps 1-5 weren't followed
+
+**Cause:** Codex uses natural language prompts and may not follow structured workflow instructions as reliably as Claude.
+
+**Mitigation strategies:**
+1. **Use Claude for code-review by default** - More reliable at following multi-step workflows
+2. **Add explicit step markers** - Tell Codex to output "STEP 1 COMPLETE", "STEP 2 COMPLETE" etc.
+3. **Verify after session** - Check story file Status field, not just sprint-status
+
+**Recommended agent configuration for deterministic reliability:**
+```yaml
+agentConfig:
+  defaultPrimary: "auto"
+  defaultFallback: false  # Disable global fallback; opt in per task
+  perTask:
+    # create-story: Either agent works well
+    create:
+      primary: "claude"
+    # dev-story: Either agent works, Codex may be faster for simple tasks
+    dev:
+      primary: "codex"
+      fallback: "claude"
+    # code-review: Claude recommended - more reliable at following workflow
+    review:
+      primary: "claude"
+      fallback: false
+```
+
+### Issue: Code-review doesn't update sprint-status.yaml
+
+**Symptoms:**
+- Code-review session completes
+- Story file shows review was done (Dev Agent Record updated)
+- But sprint-status.yaml still shows "review" instead of "done"
+
+**Cause:** Code-review workflow step 5 updates sprint-status, but session may not reach step 5 or may use wrong story key format.
+
+**Verification (v1.4.0):**
+```bash
+# Check story file status directly
+"$scripts" orchestrator-helper story-file-status 8.2
+
+# Compare with sprint-status
+"$scripts" orchestrator-helper sprint-status get "8-2-flipside-crypto-provider"
+
+# If story file shows "done" but sprint-status doesn't, manually sync:
+# Edit _bmad-output/implementation-artifacts/sprint-status.yaml and change "8-2-story-name: review" to "done"
+```
+
+### When to manually intervene
+
+**Intervene immediately if:**
+1. **5 code-review cycles with no progress** - Agent likely stuck in a loop
+2. **Story file shows "done" but sprint-status doesn't** - Sync issue, manual fix is faster
+3. **Tests passing but review keeps finding issues** - May be false positives
+4. **Codex sessions consistently incomplete** - Switch to Claude for that workflow
+
+**Steps for manual intervention:**
+```bash
+# 1. Check actual story status
+"$scripts" orchestrator-helper story-file-status {story_id}
+
+# 2. Run tests to verify code quality
+go test ./src/... || npm test
+
+# 3. If tests pass, manually update sprint-status
+# Edit: _bmad-output/implementation-artifacts/sprint-status.yaml
+# Change: "8-2-story-name: review" to "8-2-story-name: done"
+
+# 4. Resume orchestration - it will see "done" and proceed to commit
+```
+
+### Debugging Agent Detection
+
+```bash
+# Check current agent type detection
+"$scripts" tmux-wrapper agent-type
+
+# Check what CLI command would be used
+"$scripts" tmux-wrapper agent-cli
+
+# Check what command prefix would be used
+"$scripts" tmux-wrapper skill-prefix
+
+# View session environment
+tmux show-environment -t "session-name"
+
+# Check story key normalization (v1.4.0)
+"$scripts" orchestrator-helper normalize-key "8.2"
+"$scripts" orchestrator-helper normalize-key "8-2-flipside-crypto-provider"
+```
--- a/.agents/skills/bmad-story-automator/data/agent-fallback.md
+++ b/.agents/skills/bmad-story-automator/data/agent-fallback.md
@@ -0,0 +1,136 @@
+# Agent Fallback Strategy (v3.0.0)
+
+**Multi-Agent Support:** The orchestrator can use Claude or Codex as AI coding agents, with automatic fallback on failure.
+
+## Configuration
+
+From state document (v3.0.0):
+```yaml
+agentConfig:
+  defaultPrimary: "auto"
+  defaultFallback: false
+  perTask:
+    dev:
+      primary: "codex"
+      fallback: "claude"
+  complexityOverrides:
+    low:
+      dev:
+        primary: "claude"
+        fallback: false
+```
+
+Agent selection is resolved via the deterministic agents file created in preflight:
+`_bmad-output/story-automator/agents/agents-{state_filename}.md`
+
+## Agent Differences
+
+| Agent | CLI | Prompt Style | Timeout | Todo Tracking |
+|-------|-----|--------------|---------|---------------|
+| Claude | `claude --dangerously-skip-permissions` | Natural language skill prompt | 60min | ☒/☐ checkboxes |
+| Codex | `codex exec --full-auto` | Natural language prompt | 90min (1.5x) | Not supported |
+
+**CRITICAL:** Both Claude and Codex prompts must name the skill/workflow to execute and include the story ID.
+
+The `story-automator tmux-wrapper build-cmd` function automatically generates the correct prompt format based on `AI_AGENT` environment variable.
+
+**See `workflow-commands.md` for complete prompt templates.**
+
+## Fallback Behavior
+
+**When to fallback:**
+- Primary agent session crashes (non-zero exit)
+- Retries exhausted with primary agent
+- `fallback` is configured for the task and not disabled ("false")
+
+**Fallback procedure:**
+1. Log: "Primary agent ({primary}) failed after {retries} attempts. Trying fallback ({fallback})..."
+2. Set environment: `AI_AGENT={fallback}`
+3. Respawn session with fallback agent
+4. Monitor as normal (timeouts auto-adjust based on agent type)
+5. If fallback also fails → CRITICAL escalation
+
+**Environment Variable:**
+```bash
+# Set before spawning session
+export AI_AGENT="codex"  # or "claude"
+
+# story-automator tmux-wrapper reads this automatically and generates correct prompt format
+session=$("$scripts" tmux-wrapper spawn dev {epic} {story_id} \
+  --command "$("$scripts" tmux-wrapper build-cmd dev {story_id})")
+```
+
+## Codex Monitoring Notes
+
+- **No todo checkboxes:** Codex doesn't use ☒/☐ - `todos_done` and `todos_total` will be 0
+- **Longer waits:** Status check script returns 90s wait estimate for Codex (vs 60s for Claude)
+- **Different activity detection:** Uses output freshness + heartbeat (no marker reliance)
+- **Output staleness window:** `CODEX_OUTPUT_STALE_SECONDS` (default: 300)
+- **1.5x timeout multiplier:** `story-automator monitor-session` applies 1.5x multiplier when `--agent codex`
+- **Fake todo progress (v2.2):** When Codex is idle after activity, reports `1/1` to indicate "work done, needs verification"
+- **Idle vs Completed (v2.2):** Codex sessions report "idle" instead of "completed" when CLI stops but no terminal markers
+
+## ⚠️ Codex Code-Review Limitations (v1.5.0)
+
+**CRITICAL: Codex is NOT recommended for code-review workflow.**
+
+### Known Issue: Sprint-Status Not Updated
+
+Codex code-review sessions often complete (CLI exits) WITHOUT updating `sprint-status.yaml` to "done". This causes:
+- Monitor reports "completed" but sprint-status unchanged
+- Orchestrator loops indefinitely, spawning new review cycles
+- 8+ cycles with 0 progress (observed in Story 8.2)
+
+### Root Cause
+
+Codex runs non-interactively via `codex exec`. When it finishes:
+1. Tmux session goes idle (no active CLI process)
+2. Monitor sees "idle" and marks as "completed"
+3. But workflow step 5 (update sprint-status) may not have executed
+4. No way to verify workflow actually finished
+
+### Recommended Configuration
+
+```yaml
+agentConfig:
+  defaultPrimary: "codex"
+  defaultFallback: "claude"
+  perTask:
+    review:
+      primary: "claude"   # Never use Codex for code-review
+      fallback: false
+```
+
+### "incomplete" State (v2.2)
+
+The monitoring system now detects when Codex finishes but sprint-status wasn't updated:
+- `final_state: "completed"` → Verified: sprint-status shows "done"
+- `final_state: "incomplete"` → Session idle but sprint-status NOT "done"
+
+When "incomplete" is detected:
+- **Do NOT retry automatically** (prevents infinite loop)
+- Escalate to user with options:
+  1. Manual fix (update sprint-status yourself)
+  2. Run code-review with Claude
+  3. Skip this story
+
+### Verification Command (v2.2)
+
+Check if code-review actually completed:
+```bash
+"$scripts" orchestrator-helper verify-code-review {story_id}
+# Returns: {"verified":true/false, "sprint_status":"...", ...}
+```
+
+## Backwards Compatibility
+
+- If `agentConfig` is missing, the primary agent resolves from the active runtime provider and fallback is disabled
+- If `aiCommand` is set (legacy), use it directly with the generated natural language prompt
+- New orchestrations should use `agentConfig` instead of `aiCommand`
+- Agents file is authoritative when present
+
+---
+
+## Troubleshooting
+
+See `agent-fallback-troubleshooting.md` for detailed troubleshooting steps.
--- a/.agents/skills/bmad-story-automator/data/code-review-loop.md
+++ b/.agents/skills/bmad-story-automator/data/code-review-loop.md
@@ -0,0 +1,164 @@
+# Code Review Loop Pattern (v2.3)
+
+**Purpose:** Code review loop execution using script-based automation with per-task agent configuration.
+
+---
+
+## Configuration
+
+```
+reviewCycle = 1
+maxCycles = 5
+```
+
+---
+
+## Agent Selection (v3.0)
+
+Code-review uses **deterministic agent selection** from the agents file, same as all other workflow steps.
+
+```bash
+# Resolve agent for review task (uses agents file)
+resolve_agent_for_task "review" "$state_file" "{story_id}"
+review_agent="$primary_agent"
+review_fallback="$fallback_agent"
+
+echo "Code review using: primary=$review_agent, fallback=$review_fallback"
+```
+
+**Per-task override example in state document:**
+```yaml
+agentConfig:
+  defaultPrimary: "codex"
+  defaultFallback: "claude"
+  perTask:
+    review:
+      primary: "claude"      # Override: use Claude for reviews
+      fallback: false        # Disable fallback for reviews
+```
+
+**Note on Codex:** If Codex is configured for reviews and fails to update sprint-status, the `story-automator monitor-session --workflow review` verification catches this and returns `final_state: "incomplete"`, triggering the escalation path below.
+
+---
+
+## Loop Execution
+
+**WHILE reviewCycle ≤ maxCycles:**
+
+### 1. Spawn Review Session
+
+```bash
+scripts="$(printf "%s" "{project_root}/<installed-skill-root>/bmad-story-automator/scripts/story-automator")"
+[ -n "$scripts" ] || { echo "story-automator helper not found" >&2; exit 1; }
+
+# ⚠️ CRITICAL: --command is REQUIRED - without it, no command runs → never_active failure!
+# Spawn with story-automator tmux-wrapper (handles naming, state cleanup, env vars)
+session_name=$("$scripts" tmux-wrapper spawn review {epic} {story_id} \
+  --agent "$review_agent" \
+  --cycle $reviewCycle \
+  --command "$("$scripts" tmux-wrapper build-cmd review {story_id} --agent "$review_agent" --state-file "$state_file")")
+```
+
+### 2. Monitor Session with Verification (v2.2)
+
+```bash
+# Single call replaces 14+ API roundtrips
+# Pass --workflow and --story-key for completion verification
+result=$("$scripts" monitor-session "$session_name" --json --verbose \
+  --agent "$review_agent" \
+  --workflow review --story-key {story_id} --state-file "$state_file")
+final_state=$(echo "$result" | jq -r '.final_state')
+output_file=$(echo "$result" | jq -r '.output_file')
+```
+
+**Note:** The `--workflow review --story-key` parameters enable sprint-status verification before marking complete.
+
+### 3. Parse Output
+
+```bash
+# Sub-agent parsing (haiku, 99% cheaper than main context)
+parsed=$("$scripts" orchestrator-helper parse-output "$output_file" review --state-file "$state_file")
+```
+
+### 4. Verify Sprint Status
+
+```bash
+status=$("$scripts" orchestrator-helper sprint-status get {story_key})
+is_done=$(echo "$status" | jq -r '.done')
+```
+
+---
+
+## Decision Logic
+
+### Handle final_state (v2.2)
+
+**IF final_state == "completed":**
+- Session verified complete (sprint-status shows "done")
+- Log "Code review passed, story marked done"
+- Cleanup: `"$scripts" tmux-wrapper kill "$session_name"`
+- **EXIT LOOP** → proceed to Git Commit
+
+**IF final_state == "incomplete":** (v2.2 - Codex-specific)
+- Session idle but sprint-status NOT updated
+- Cleanup: `"$scripts" tmux-wrapper kill "$session_name"`
+- Increment `reviewCycle`
+- If `reviewCycle <= maxCycles`: count this as a failed attempt and **CONTINUE** with a retry
+- If `reviewCycle > maxCycles`: Escalate with CRITICAL priority (Trigger #8), then present options:
+  1. **[1] Manual Fix** - Update sprint-status.yaml yourself
+  2. **[2] Run with Claude** - Re-run code-review with Claude agent
+  3. **[3] Skip Story** - Mark story as skipped and continue
+- **HALT** — wait for user choice only after maxCycles is exhausted
+
+**IF final_state == "crashed" or "stuck":**
+- Log "Review session failed: $final_state"
+- Cleanup: `"$scripts" tmux-wrapper kill "$session_name"`
+- Increment reviewCycle
+- **CONTINUE** (retry with new session)
+
+### Handle is_done check
+
+**IF is_done == true:**
+- Log "Sprint-status verified done"
+- **EXIT LOOP** → proceed to Git Commit
+
+**IF is_done == false AND final_state == "completed":**
+- This shouldn't happen with v2.2 verification
+- Fallback: check story file status
+- If story file shows "done", treat as complete
+
+**IF reviewCycle > maxCycles:**
+- Check escalation: `"$scripts" orchestrator-helper escalate review-loop "cycles=$reviewCycle"`
+- **HALT** — wait for user choice
+
+---
+
+## Sprint-Status Verification (v3.0)
+
+Status is determined by **CRITICAL issues remaining** after auto-fix:
+- "done" → 0 CRITICAL issues, proceed to commit
+- "in-progress" → 1+ CRITICAL issues, new review cycle
+
+HIGH/MEDIUM/LOW issues are tracked as action items but don't block automation.
+
+---
+
+## Output Verification Fallback (v1.4.0)
+
+If `output_verified == false` or output truncated, use story file fallback:
+
+```bash
+file_status=$("$scripts" orchestrator-helper story-file-status {story_id})
+# If status == "done", skip parsing - story is complete
+```
+
+---
+
+## Verification Command (v2.2)
+
+Check if code-review actually completed:
+
+```bash
+"$scripts" orchestrator-helper verify-code-review {story_id} --state-file "$state_file"
+# Returns: {"verified":true/false, "sprint_status":"...", ...}
+```
--- a/.agents/skills/bmad-story-automator/data/complexity-rules.json
+++ b/.agents/skills/bmad-story-automator/data/complexity-rules.json
@@ -0,0 +1,246 @@
+{
+  "version": "2.0",
+  "thresholds": {
+    "low_max": 3,
+    "medium_max": 7
+  },
+  "structural_rules": {
+    "ac_count_medium": 6,
+    "ac_count_high": 10,
+    "ac_count_medium_score": 1,
+    "ac_count_high_score": 2,
+    "dependency_score": 1,
+    "large_story_word_threshold": 400,
+    "large_story_score": 1
+  },
+  "rules": [
+    {
+      "id": "external_api",
+      "label": "External API integration",
+      "pattern": "whatsapp|oauth|stripe|payment|third[- ]party|external api|twilio|sendgrid|mailgun|slack api|discord api|shopify|salesforce|hubspot|zapier|plaid|aws sdk|gcp sdk|azure sdk",
+      "score": 2
+    },
+    {
+      "id": "webhook_async",
+      "label": "Webhook/async processing",
+      "pattern": "webhook|async handler|asynchronous|message queue|queue worker|background job|event listener|pub.?sub|kafka|rabbitmq|sqs|nats|event.?driven|callback url",
+      "score": 2
+    },
+    {
+      "id": "realtime",
+      "label": "Real-time communication",
+      "pattern": "websocket|web socket|socket\\.io|sse|server.sent events|real.?time update|live update|push notification|long polling",
+      "score": 2
+    },
+    {
+      "id": "db_migration",
+      "label": "Database schema changes",
+      "pattern": "migration|schema change|new table|alter table|add column|database table|create index|foreign key|database schema|modify schema",
+      "score": 1
+    },
+    {
+      "id": "db_complex_query",
+      "label": "Complex database operations",
+      "pattern": "complex quer|join.*join|subquer|aggregate|group by|window function|recursive.*query|materialized view|stored procedure|database transaction|deadlock|connection pool",
+      "score": 2
+    },
+    {
+      "id": "data_transform",
+      "label": "Data transformation/ETL",
+      "pattern": "data transform|etl|data pipeline|data migration|bulk import|bulk export|csv.*(import|export|parse)|data mapping|data sync|batch process|normalize data|denormalize",
+      "score": 2
+    },
+    {
+      "id": "caching",
+      "label": "Caching layer",
+      "pattern": "cache|redis|memcache|cdn|invalidat|cache.?bust|stale.?while|cache.?strategy|in.?memory store|session store",
+      "score": 1
+    },
+    {
+      "id": "search_index",
+      "label": "Search/indexing",
+      "pattern": "elasticsearch|full.?text search|search index|algolia|typesense|meilisearch|solr|vector search|semantic search|fuzzy search|search engine",
+      "score": 2
+    },
+    {
+      "id": "file_storage",
+      "label": "File upload/storage",
+      "pattern": "file upload|s3|blob storage|image upload|media upload|file processing|pdf generat|csv generat|document generat|file download|cloud storage|presigned url",
+      "score": 1
+    },
+    {
+      "id": "auth_system",
+      "label": "Authentication system",
+      "pattern": "authenticat|login flow|sign.?up flow|session management|jwt|token refresh|password reset|magic link|sso|single sign|two.?factor|2fa|mfa|social login|auth middleware|auth guard",
+      "score": 2
+    },
+    {
+      "id": "authorization",
+      "label": "Authorization/permissions",
+      "pattern": "authori[zs]|rbac|role.?based|permission|access control|acl|policy engine|guard|middleware.*auth|protect.*route|tenant.*isol|multi.?tenant|row.?level security",
+      "score": 2
+    },
+    {
+      "id": "encryption",
+      "label": "Encryption/security",
+      "pattern": "encrypt|decrypt|hash|bcrypt|argon|hmac|digital signature|certificate|ssl|tls|secret.*management|vault|key.*rotation|sanitiz|xss|csrf|sql injection|security header|cors config",
+      "score": 1
+    },
+    {
+      "id": "state_management",
+      "label": "Complex state management",
+      "pattern": "state management|redux|zustand|recoil|jotai|context.*provider|global state|state machine|finite state|xstate|event sourc|cqrs|saga pattern|optimistic update",
+      "score": 1
+    },
+    {
+      "id": "backend_frontend",
+      "label": "Backend + Frontend combined",
+      "pattern": "backend.*frontend|frontend.*backend|full.?stack|api.*and.*ui|server.*and.*client|both.*api.*and|endpoint.*and.*page|controller.*and.*component",
+      "score": 2
+    },
+    {
+      "id": "microservice",
+      "label": "Service communication",
+      "pattern": "microservice|service.to.service|grpc|inter.?service|api gateway|service mesh|service discover|distributed|cross.?service|orchestrat.*service",
+      "score": 2
+    },
+    {
+      "id": "infrastructure",
+      "label": "Infrastructure changes",
+      "pattern": "docker|kubernetes|k8s|terraform|ci.?cd|pipeline|deploy|nginx|caddy|load balanc|auto.?scal|infrastructure|server config|environment variable|env config|systemd|reverse proxy",
+      "score": 2
+    },
+    {
+      "id": "error_handling",
+      "label": "Complex error handling",
+      "pattern": "error handling|error boundar|retry logic|circuit.?break|graceful.?degrad|fallback.*strateg|dead.?letter|error recover|exception handling|rollback|compensat.*transaction|idempoten",
+      "score": 1
+    },
+    {
+      "id": "transaction",
+      "label": "Transaction management",
+      "pattern": "transaction|atomic.*operation|two.?phase|eventual.?consisten|distributed.*lock|optimistic.*lock|pessimistic.*lock|conflict.*resolut|concurren.*control|race condition",
+      "score": 2
+    },
+    {
+      "id": "performance",
+      "label": "Performance optimization",
+      "pattern": "performance|optimiz|pagination|infinite scroll|virtual.*list|lazy load|code split|bundle.*size|lighthouse|core web vital|throttl|debounc|memoiz|profil",
+      "score": 1
+    },
+    {
+      "id": "rate_limiting",
+      "label": "Rate limiting/throttling",
+      "pattern": "rate limit|throttl|quota|usage.*limit|api.*limit|request.*limit|cooldown|backoff|exponential.*back",
+      "score": 1
+    },
+    {
+      "id": "batch_processing",
+      "label": "Batch/bulk operations",
+      "pattern": "batch.*process|bulk.*operat|mass.*update|bulk.*insert|batch.*job|scheduled.*task|cron|periodic.*task|bulk.*delete|queue.*process",
+      "score": 1
+    },
+    {
+      "id": "complex_form",
+      "label": "Complex forms",
+      "pattern": "multi.?step form|form wizard|dynamic form|form validation|conditional field|nested form|form builder|file.*input.*form|complex.*form|form.*state",
+      "score": 1
+    },
+    {
+      "id": "visualization",
+      "label": "Charts/visualization",
+      "pattern": "chart|graph|d3|visualization|dashboard.*widget|data.*viz|sparkline|heatmap|treemap|pie.*chart|bar.*chart|line.*chart|recharts|plotly|canvas.*draw",
+      "score": 1
+    },
+    {
+      "id": "drag_drop",
+      "label": "Drag and drop",
+      "pattern": "drag.?and.?drop|dnd|sortable|reorder|draggable|droppable|kanban.*board|drag.*handle",
+      "score": 1
+    },
+    {
+      "id": "accessibility",
+      "label": "Accessibility requirements",
+      "pattern": "accessib|a11y|screen reader|aria|wcag|keyboard.*navigat|focus.*management|tab.*order|assistive|color.*contrast",
+      "score": 1
+    },
+    {
+      "id": "i18n",
+      "label": "Internationalization",
+      "pattern": "i18n|internationali[zs]|locali[zs]|translat|multi.?language|rtl|right.?to.?left|locale|plural.*form|number.*format|date.*format.*locale",
+      "score": 1
+    },
+    {
+      "id": "integration_test",
+      "label": "Integration testing required",
+      "pattern": "integration test|e2e test|end.to.end|playwright|cypress|selenium|test.*api.*endpoint|test.*database|test.*external|contract.*test|smoke.*test",
+      "score": 1
+    },
+    {
+      "id": "test_fixtures",
+      "label": "Complex test setup",
+      "pattern": "test fixture|mock.*service|stub.*api|seed.*data|test.*factory|test.*database|test.*container|docker.*test|test.*environment|test.*isolation",
+      "score": 1
+    },
+    {
+      "id": "email_notification",
+      "label": "Email/notification system",
+      "pattern": "email.*send|notification.*system|push.*notif|sms.*send|in.?app.*notif|notification.*preference|email.*template|mailer|notification.*queue|alert.*system",
+      "score": 1
+    },
+    {
+      "id": "logging_monitoring",
+      "label": "Logging/monitoring/observability",
+      "pattern": "logging.*system|monitoring|observab|telemetry|tracing|distributed.*trace|log.*aggregat|metrics.*collect|health.*check|alerting|sentry|datadog|newrelic",
+      "score": 1
+    },
+    {
+      "id": "config_system",
+      "label": "Configuration/feature flags",
+      "pattern": "feature.*flag|feature.*toggle|config.*system|dynamic.*config|a.?b.*test|experiment|remote.*config|launch.*darkly|unleash|posthog.*flag",
+      "score": 1
+    },
+    {
+      "id": "frontend_only",
+      "label": "Frontend only (no backend)",
+      "pattern": "frontend only|ui only|css only|layout only|style only|cosmetic|visual.*only|markup.*only|static.*page|presentation.*only",
+      "score": -1
+    },
+    {
+      "id": "simple_crud",
+      "label": "Simple CRUD operations",
+      "pattern": "simple crud|basic crud|create read update delete|simple.*list|basic.*form|standard.*rest|straightforward|simple.*endpoint|basic.*page|simple.*component",
+      "score": -1
+    },
+    {
+      "id": "documentation_only",
+      "label": "Documentation/config only",
+      "pattern": "documentation only|readme|config.*change only|env.*update only|update.*docs|comment.*only|rename only|typo|text.*change only",
+      "score": -2
+    },
+    {
+      "id": "refactor_only",
+      "label": "Pure refactor (no behavior change)",
+      "pattern": "refactor only|code.*cleanup|rename|extract.*method|move.*file|reorgani[zs]e|restructure|no.*behavior.*change|no.*functional.*change",
+      "score": -1
+    },
+    {
+      "id": "simple_bugfix",
+      "label": "Simple/isolated bug fix",
+      "pattern": "simple.*fix|minor.*bug|typo.*fix|off.?by.?one|null.*check|missing.*import|syntax.*error|small.*patch|hotfix|one.?line.*fix",
+      "score": -1
+    },
+    {
+      "id": "uncertainty",
+      "label": "Uncertain/research-heavy scope",
+      "pattern": "research|investigate|spike|prototype|proof of concept|poc|tbd|to be determined|unclear|explore|experiment.*with|evaluate.*option|might.*need|may.*require",
+      "score": 1
+    },
+    {
+      "id": "breaking_change",
+      "label": "Breaking/migration change",
+      "pattern": "breaking.*change|backward.*compat|deprecat|migration.*guide|version.*bump.*major|api.*v\\d|legacy.*support|upgrade.*path",
+      "score": 2
+    }
+  ]
+}
--- a/.agents/skills/bmad-story-automator/data/complexity-scoring.md
+++ b/.agents/skills/bmad-story-automator/data/complexity-scoring.md
@@ -0,0 +1,153 @@
+# Story Complexity Scoring (v2.0.0)
+
+Estimate each story's complexity to predict dev-story success likelihood and inform agent selection. Scoring combines **regex-based pattern matching** (detecting domain signals in story text) with **structural analysis** (measuring story size and shape).
+
+---
+
+## How Scoring Works
+
+The Python helper (`scripts/story-automator parse-story --rules`) performs two passes:
+
+### Pass 1: Pattern Matching (regex rules)
+
+Each rule in `complexity-rules.json` has a regex pattern tested case-insensitively against the concatenation of the story's **title + description + acceptance criteria**. When a rule matches, its score is added (positive = complexity, negative = simplicity).
+
+### Pass 2: Structural Analysis
+
+The parser also examines the story's **structure** independent of text content:
+
+| Structural Factor | Condition | Score | Reason |
+|---|---|---|---|
+| Acceptance Criteria count (medium) | AC lines > 6 | +1 | More ACs = more surface area to implement and verify |
+| Acceptance Criteria count (high) | AC lines > 10 | +2 | (replaces medium; not additive) Large AC count signals multi-faceted story |
+| Explicit dependency | Story references dependency on another story | +1 | Cross-story dependencies add coordination overhead |
+| Large story | Word count > 400 | +1 | Verbose stories indicate broader scope |
+
+### Final Score
+
+`final_score = sum(matched_rule_scores) + structural_bonus`
+
+---
+
+## Rule Categories (40 rules)
+
+### External Integration (+2 each)
+
+| Rule | Detects |
+|---|---|
+| External API integration | Third-party services (Stripe, Twilio, WhatsApp, AWS SDK, etc.) |
+| Webhook/async processing | Webhooks, message queues, pub/sub, background jobs, event-driven patterns |
+| Real-time communication | WebSockets, SSE, push notifications, live updates, long polling |
+
+### Database & Data (+1 to +2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Database schema changes | +1 | Migrations, new tables, index creation, foreign keys |
+| Complex database operations | +2 | Complex queries, joins, subqueries, aggregates, stored procedures, transactions |
+| Data transformation/ETL | +2 | Data pipelines, bulk import/export, CSV parsing, data sync, normalization |
+| Caching layer | +1 | Redis, memcache, CDN, cache invalidation, session stores |
+| Search/indexing | +2 | Elasticsearch, Algolia, full-text search, vector search |
+| File upload/storage | +1 | S3, blob storage, file processing, PDF/CSV generation, presigned URLs |
+
+### Security & Auth (+1 to +2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Authentication system | +2 | Login flows, JWT, password reset, SSO, 2FA/MFA, social login |
+| Authorization/permissions | +2 | RBAC, ACL, row-level security, multi-tenant isolation, route guards |
+| Encryption/security | +1 | Encryption, hashing, CSRF/XSS protection, security headers, CORS |
+
+### State & Architecture (+1 to +2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Complex state management | +1 | Redux, Zustand, state machines, CQRS, event sourcing, optimistic updates |
+| Backend + Frontend combined | +2 | Full-stack changes touching both API and UI layers |
+| Service communication | +2 | Microservices, gRPC, API gateway, service mesh, distributed systems |
+| Infrastructure changes | +2 | Docker, Kubernetes, CI/CD, reverse proxies, deployment, auto-scaling |
+
+### Error Handling & Resilience (+1 to +2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Complex error handling | +1 | Error boundaries, retry logic, circuit breakers, graceful degradation, idempotency |
+| Transaction management | +2 | Atomic operations, distributed locks, conflict resolution, race conditions |
+
+### Performance (+1)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Performance optimization | +1 | Pagination, lazy loading, code splitting, memoization, Core Web Vitals |
+| Rate limiting/throttling | +1 | Rate limits, quotas, backoff strategies, cooldowns |
+| Batch/bulk operations | +1 | Batch processing, bulk inserts/updates, cron jobs, scheduled tasks |
+
+### UI/UX Complexity (+1)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Complex forms | +1 | Multi-step forms, wizards, dynamic forms, conditional fields |
+| Charts/visualization | +1 | D3, Recharts, dashboards, heatmaps, canvas drawing |
+| Drag and drop | +1 | DnD, sortable lists, Kanban boards, reorderable UI |
+| Accessibility | +1 | WCAG, ARIA, screen reader support, keyboard navigation |
+| Internationalization | +1 | i18n, translations, RTL support, locale-aware formatting |
+
+### Testing Signals (+1)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Integration testing required | +1 | E2E tests, Playwright, Cypress, contract tests, API endpoint tests |
+| Complex test setup | +1 | Test fixtures, service mocks, seed data, test containers |
+
+### Cross-Cutting (+1)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Email/notification system | +1 | Email sending, push notifications, SMS, in-app notifications |
+| Logging/monitoring | +1 | Observability, telemetry, distributed tracing, Sentry, Datadog |
+| Configuration/feature flags | +1 | Feature toggles, A/B tests, remote config, LaunchDarkly |
+
+### Simplicity Reducers (-1 to -2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Frontend only | -1 | UI-only, CSS-only, layout-only, static pages |
+| Simple CRUD | -1 | Basic CRUD, standard REST, straightforward endpoints |
+| Documentation/config only | -2 | README updates, config changes, doc-only changes |
+| Pure refactor | -1 | Code cleanup, renames, restructuring with no behavior change |
+| Simple bug fix | -1 | Typo fixes, null checks, missing imports, one-line patches |
+
+### Risk/Uncertainty Signals (+1 to +2)
+
+| Rule | Score | Detects |
+|---|---|---|
+| Uncertain scope | +1 | Research spikes, prototypes, POCs, TBD items, exploratory work |
+| Breaking change | +2 | Breaking changes, deprecations, major version bumps, migration guides |
+
+---
+
+## Complexity Levels
+
+| Score | Level | Meaning | Agent Recommendation |
+|---|---|---|---|
+| ≤ 3 | **Low** | High success probability | Claude handles well autonomously |
+| 4–7 | **Medium** | Normal execution, moderate risk | Codex primary with Claude fallback |
+| ≥ 8 | **High** | Consider longer timeouts, may need intervention | Codex primary with Claude fallback, monitor closely |
+
+---
+
+## Why This Matters
+
+**Session 3 learning:** Backend WhatsApp stories (6.5-6.8) consistently failed dev-story while frontend i18n stories (7.1-7.2) succeeded. The original 8-rule system couldn't distinguish these patterns.
+
+**v2.0 improvements:**
+- 40 rules across 10 categories (was 8 rules, 1 category)
+- Structural analysis adds AC count, dependency, and story size signals
+- 5 simplicity reducers (was 2) prevent over-scoring simple work
+- Expanded regex patterns catch contextual signals, not just exact keywords
+- Recalibrated thresholds account for higher score range
+
+**Without accurate complexity scoring:**
+- Agent configuration cannot be informed by actual story difficulty
+- Simple stories get over-provisioned (waste) or complex stories get under-provisioned (failure)
+- The orchestration may fail or produce suboptimal results
--- a/.agents/skills/bmad-story-automator/data/crash-recovery.md
+++ b/.agents/skills/bmad-story-automator/data/crash-recovery.md
@@ -0,0 +1,174 @@
+# Crash Recovery Pattern
+
+**Purpose:** Handle sessions that crash or disappear unexpectedly.
+
+---
+
+## Detection
+
+The status script returns `session_state` in CSV column 6:
+- `crashed` - Session exited with non-zero exit code (column 5 = exit code, column 4 = output file)
+- `not_found` - Session disappeared (killed, crashed without trace)
+
+---
+
+## Recovery Logic
+
+| Condition | Action |
+|-----------|--------|
+| `crashed` with output file | Read output, check partial progress, retry |
+| `not_found` (no output) | Session died silently, retry immediately |
+| Retry 1 failed | Retry with `-r2` suffix in session name |
+| Retry 2 failed | Escalate to user with diagnostics |
+
+---
+
+## Retry Pattern
+
+```bash
+# On crash/not_found, spawn retry with unique suffix
+project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+timestamp=$(date +%y%m%d-%H%M%S)
+session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}-r2"
+
+# Clear stale state (project-scoped v2.0)
+PROJECT_HASH=$(echo -n "$PWD" | md5sum 2>/dev/null | cut -c1-8 || echo -n "$PWD" | md5 -q 2>/dev/null | cut -c1-8)
+rm -f "/tmp/.sa-${PROJECT_HASH}-session-${session_name}-state.json"
+# ... spawn and monitor as normal
+```
+
+---
+
+## Agent Fallback (v3.0.0)
+
+**Before escalating**, check if fallback agent is configured:
+
+```bash
+# Resolve agents for this story/task from agents file
+selection=$("$scripts" orchestrator-helper agents-resolve \
+  --state-file "$state_file" --story "{story_id}" --task "{task}")
+primary=$(echo "$selection" | jq -r '.primary')
+fallback=$(echo "$selection" | jq -r '.fallback')
+
+if [ "$fallback" != "false" ] && [ -n "$fallback" ]; then
+  if [ "$current_agent" = "$primary" ]; then
+    export AI_AGENT="$fallback"
+    retry_count=0
+    session=$("$scripts" tmux-wrapper spawn dev {epic} {story_id} \
+      --command "$("$scripts" tmux-wrapper build-cmd dev {story_id})")
+    # Continue monitoring...
+  fi
+fi
+```
+
+**Fallback flow:**
+1. Primary agent crashes after 2 retries
+2. IF `fallback != "false"` AND haven't tried fallback yet
+3. Switch `AI_AGENT` to fallback agent
+4. Reset retry counter to 0
+5. Retry with fallback agent (gets 2 more attempts)
+6. IF fallback also fails after 2 retries → CRITICAL escalation
+
+**Log message:**
+"Primary agent (claude) failed after 2 attempts. Switching to fallback agent (codex)..."
+
+---
+
+## Escalation (after exhausting all retries)
+
+Display:
+```
+**Session crashed for Story {N}**
+
+Primary agent: {primary} - Failed after 2 attempts
+Fallback agent: {fallback} - Failed after 2 attempts
+
+Exit code: {exit_code}
+Partial progress: {tasks_completed}/{tasks_total}
+
+[R]etry with primary
+[F]allback retry
+[S]kip story (mark deferred)
+[A]bort orchestration
+```
+
+Show any partial output captured for diagnostics.
+
+---
+
+## Integration with Adaptive Retry
+
+Crash recovery is SEPARATE from adaptive retry:
+- **Adaptive retry** = session completed but FAILED (wrong output, tests failed)
+- **Crash recovery** = session DIED unexpectedly (context limit, API error, kill)
+
+Both can occur: a session might crash on attempt 1, then fail normally on attempt 2.
+Track both counters independently.
+
+---
+
+## Orchestrator Monitoring Task Crash (v1.9.0)
+
+### The Problem
+
+When the orchestrator uses background tasks (e.g., Bash with `run_in_background`) to monitor tmux sessions, the monitoring task itself can crash. This is **different** from the tmux session crashing.
+
+**Observed failure mode:**
+1. Orchestrator spawns background task to run create+dev+monitor loop
+2. Background task crashes after dev-story completes
+3. TaskOutput shows "running" but task is dead
+4. Tmux session actually completed successfully
+5. Orchestrator waits forever on dead monitoring task
+6. Code-review never runs because monitoring never returned
+
+### Detection
+
+Signs that your monitoring task has crashed (not the tmux session):
+
+| Signal | Meaning |
+|--------|---------|
+| `TaskOutput` returns empty 2+ times | Task may be dead |
+| Output file path doesn't exist | Task never wrote results |
+| "running" status but no progress | Task is stuck or dead |
+| Background task ID invalid | Task crashed |
+
+### Recovery Sequence
+
+**See `monitoring-fallback.md` for detailed fallback patterns.**
+
+Quick reference:
+1. Stop waiting on dead monitoring task
+2. Find tmux sessions: `tmux list-sessions | grep "sa-.*e{epic}-s{story}"`
+3. Check session status directly: `story-automator tmux-status-check`
+4. Verify source of truth: story file, sprint-status.yaml
+5. Resume based on verified state
+
+### Prevention
+
+**NEVER chain multiple workflow steps in a single background task:**
+
+```bash
+# ❌ WRONG - If this task crashes, all subsequent steps are lost
+for step in create dev review; do
+    session=$(...spawn...)
+    result=$(...monitor...)
+done
+
+# ✅ CORRECT - Each step is monitored separately
+# Step 1
+session=$(...spawn create...)
+result=$(...monitor...)
+# Verify state
+
+# Step 2 (only after Step 1 verified)
+session=$(...spawn dev...)
+result=$(...monitor...)
+# Verify state
+```
+
+### Key Principle
+
+**The tmux session is the source of truth for session state.**
+**The story file and sprint-status.yaml are the source of truth for workflow state.**
+
+Monitoring is just observation - if monitoring fails, verify from source of truth and continue.
--- a/.agents/skills/bmad-story-automator/data/data-file-index.md
+++ b/.agents/skills/bmad-story-automator/data/data-file-index.md
@@ -0,0 +1,100 @@
+# Data File Index (v1.9.0)
+
+**Purpose:** Explicit guidance on when to load each data file during execution.
+
+---
+
+## Loading Rules
+
+1. **LOAD ONCE** = Read at step initialization, keep in context
+2. **LOAD ON TRIGGER** = Read only when specific condition occurs
+3. **NEVER LOAD** = Reference/debug files, not for execution
+
+---
+
+## Step 03: Execute - File Loading Guide
+
+### LOAD ONCE (at step start)
+
+| File | Why |
+|------|-----|
+| `orchestrator-rules.md` | Core rules for orchestrator behavior |
+| `execution-patterns.md` | FORBIDDEN patterns - must know before any execution |
+| `scripts-reference.md` | Script usage patterns |
+
+### LOAD ON TRIGGER
+
+| File | When to Load |
+|------|--------------|
+| `retry-fallback-strategy.md` | When a step FAILS and you need retry logic |
+| `monitoring-fallback.md` | When monitoring FAILS (TaskOutput empty/error 2+ times) |
+| `crash-recovery.md` | When session CRASHES (not just fails) |
+| `code-review-loop.md` | When entering code review phase (Step D) |
+| `escalation-triggers.md` | When considering escalation to user |
+| `escalation-messages-core.md` | When displaying escalation message (triggers 1-4) |
+| `escalation-messages-extended.md` | When displaying escalation message (triggers 5-8) |
+| `agent-fallback.md` | When switching from primary to fallback agent |
+| `agent-fallback-troubleshooting.md` | When fallback agent also fails |
+| `adaptive-retry.md` | When same task fails 3+ times (plateau detection) |
+| `subagent-prompts.md` | When parsing session output with sub-agent |
+| `monitoring-codex.md` | When using Codex agent (not Claude) |
+
+### NEVER LOAD DURING EXECUTION
+
+| File | Purpose |
+|------|---------|
+| `tmux-commands.md` | Reference doc - use scripts instead |
+| `tmux-long-command-*.md` | Debug/testing docs |
+| `complexity-scoring.md` | Used during preflight, not execution |
+| `preflight-prompts.md` | Used in step-02, not step-03 |
+| `stop-hook-*.md` | Setup docs, not execution |
+| `marker-file-format.md` | Internal format reference |
+| `success-patterns.md` | Output pattern reference |
+| `workflow-commands.md` | Reference doc |
+| `wrapup-templates.md` | Used in step-04, not step-03 |
+| `retrospective-*.md` | Used in step-03b retrospective section only |
+
+---
+
+## Quick Decision Tree
+
+```
+Starting execution?
+  → Load: orchestrator-rules.md, execution-patterns.md, scripts-reference.md
+
+Step failed?
+  → Load: retry-fallback-strategy.md
+  → If 3+ same failures: Load adaptive-retry.md
+
+Monitoring not responding?
+  → Load: monitoring-fallback.md
+
+Session crashed?
+  → Load: crash-recovery.md
+
+Entering code review?
+  → Load: code-review-loop.md
+
+Need to escalate?
+  → Load: escalation-triggers.md, then escalation-messages-*.md
+
+Using Codex?
+  → Load: monitoring-codex.md
+```
+
+---
+
+## Anti-Pattern: Loading Everything
+
+**WRONG:**
+```
+Load ALL data files at start of step-03
+```
+
+**WHY WRONG:** Bloats context, increases confusion, wastes tokens.
+
+**CORRECT:**
+```
+Load 3 core files at start
+Load additional files ONLY when their trigger condition occurs
+```
--- a/.agents/skills/bmad-story-automator/data/escalation-messages-core.md
+++ b/.agents/skills/bmad-story-automator/data/escalation-messages-core.md
@@ -0,0 +1,103 @@
+# Escalation Message Templates
+
+Use these templates when an escalation trigger fires.
+
+## 1. Code Review Loop Exceeded
+
+**Pre-Escalation Verification:**
+```bash
+file_status=$("$scripts" orchestrator-helper story-file-status {story_id})
+file_done=$(echo "$file_status" | jq -r '.status')
+if [ "$file_done" = "done" ]; then
+    echo "✅ Story file shows done - sprint-status out of sync"
+fi
+
+test_result=$(cd "$PROJECT_ROOT" && go test ./src/... 2>&1 || npm test 2>&1 || true)
+tests_pass=$([[ "$test_result" != *"FAIL"* ]] && echo "true" || echo "false")
+```
+
+**Diagnostic Summary (required):**
+| Cycle | Agent | Issues Found | Fixed | Duration |
+|-------|-------|--------------|-------|----------|
+{cycle_history_table}
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Code Review Loop (5 cycles exhausted)
+
+Story: {story_name}
+Story ID: {story_id}
+```
+
+---
+
+## 2. Cannot Parse Session Output
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Ambiguous Session Output
+
+Story: {story_name}
+Step: {step_name}
+Session: {session_id}
+
+Unable to determine if step succeeded or failed.
+
+Last 20 lines of output:
+{output_snippet}
+
+Options:
+[1] Mark as success and proceed
+[2] Mark as failure and retry
+[3] View full session output
+[4] Pause for manual inspection
+
+Select option:
+```
+
+---
+
+## 3. Session Spawn Failure
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Session Spawn Failed
+
+Story: {story_name}
+Step: {step_name}
+Error: {error_message}
+
+Unable to spawn tmux session after retry.
+
+Options:
+[1] Retry again
+[2] Skip this step
+[3] Abort story
+[4] Pause orchestration
+
+Select option:
+```
+
+---
+
+## 4. Git Commit Failure
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Git Commit Failed
+
+Story: {story_name}
+Error: {error_message}
+
+Unable to commit changes for this story.
+
+Options:
+[1] Retry commit
+[2] Skip commit and proceed (changes remain uncommitted)
+[3] Pause for manual git resolution
+[4] Abort story
+
+Select option:
+```
+
+---
--- a/.agents/skills/bmad-story-automator/data/escalation-messages-extended.md
+++ b/.agents/skills/bmad-story-automator/data/escalation-messages-extended.md
@@ -0,0 +1,76 @@
+# Escalation Message Templates (Extended)
+
+## 5. Unexpected Error
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Unexpected Error
+
+Story: {story_name}
+Step: {step_name}
+Error: {error_message}
+
+An unexpected error occurred during orchestration.
+
+Options:
+[1] Retry current step
+[2] Skip current step
+[3] Abort story and continue with next
+[4] Pause orchestration for investigation
+
+Select option:
+```
+
+---
+
+## 6. Dependency Conflict
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Potential Dependency Conflict
+
+Stories in parallel: {story_list}
+Detected conflict: {conflict_description}
+
+These stories may have conflicting changes.
+
+Options:
+[1] Continue in parallel (accept risk)
+[2] Run sequentially instead
+[3] Pause for manual review
+
+Select option:
+```
+
+---
+
+## 7. Dev-Story Implementation Failure
+
+**Pre-escalation behavior:**
+1. Check blocking status (conservative if uncertain)
+2. If BLOCKING: retry up to 3 times
+3. If NOT BLOCKING: retry once
+
+**Escalation message:**
+```
+🔔 DECISION NEEDED: Dev-Story Implementation Failure
+
+Story: {story_name}
+Step: dev-story
+Attempts: {attempt_count}
+Blocking: {yes/no} (affects stories: {list or "none"})
+
+Latest error:
+{error_summary}
+
+Options:
+[1] Retry dev-story - Spawn new session to fix
+[2] Manual fix - Pause orchestration so you can fix it
+[3] View session output - See full output
+[4] Skip story - Move to next (only if not blocking)
+[5] Abort orchestration - Stop entire build cycle
+
+Select option:
+```
+
+**Note:** Option [4] only valid if story is NOT blocking.
--- a/.agents/skills/bmad-story-automator/data/escalation-messages.md
+++ b/.agents/skills/bmad-story-automator/data/escalation-messages.md
@@ -0,0 +1,5 @@
+# Escalation Message Templates
+
+See:
+- `escalation-messages-core.md` (Triggers 1-4)
+- `escalation-messages-extended.md` (Triggers 5-7)
--- a/.agents/skills/bmad-story-automator/data/escalation-triggers.md
+++ b/.agents/skills/bmad-story-automator/data/escalation-triggers.md
@@ -0,0 +1,114 @@
+# Escalation Triggers
+
+**Purpose:** Conditions that require human decision and cannot be resolved autonomously.
+
+## Escalation Categories
+
+### CRITICAL Escalations
+**Definition:** Automation CANNOT proceed - requires human decision.
+
+**Behavior:**
+1. Delete marker file: `rm "{marker_file}"`
+2. Update state: set status to PAUSED in state document
+3. Present options (stop hook won't interfere)
+4. Wait for user input
+5. On resume: recreate marker, set IN_PROGRESS, continue
+
+**Triggers in this category:**
+- Code Review Loop Exceeded (#1)
+- Session Spawn Failure (#3)
+- Git Commit Failure (#4)
+- Unexpected Error (#5)
+- Dev-Story Implementation Failure (#7) when blocking + retries exhausted
+- Session Incomplete (#8) - session finished but workflow not verified complete (v2.2)
+
+### PREFERENCE Escalations
+**Definition:** Automation COULD proceed either way - user chooses direction.
+
+**Behavior:**
+1. Keep marker file (automation still "active")
+2. Present options
+3. Act on selection immediately
+
+**Triggers in this category:**
+- Cannot Parse Session Output (#2)
+- Dependency Conflict (#6)
+- Dev-Story Implementation Failure (#7) when NOT blocking
+
+---
+
+## Escalation Protocol
+
+When an escalation trigger is hit:
+1. Categorize: CRITICAL or PREFERENCE
+2. If CRITICAL: delete marker, set status to PAUSED
+3. Notify: sound/notification
+4. Present: situation + numbered options
+5. Wait: halt until user responds
+6. Log: record decision in action log
+7. Resume: if CRITICAL, recreate marker, set IN_PROGRESS, continue
+
+---
+
+## Trigger Index
+
+Each trigger includes its escalation message template in:
+- `data/escalation-messages-core.md` (Triggers 1-4)
+- `data/escalation-messages-extended.md` (Triggers 5-7)
+
+### 1. Code Review Loop Exceeded (CRITICAL)
+**Trigger:** Code review has run 5 cycles without clean status.
+**See:** `escalation-messages-core.md#1-code-review-loop-exceeded`
+
+### 2. Cannot Parse Session Output (PREFERENCE)
+**Trigger:** Output doesn't match success/failure patterns.
+**See:** `escalation-messages-core.md#2-cannot-parse-session-output`
+
+### 3. Session Spawn Failure (CRITICAL)
+**Trigger:** T-Mux session failed to spawn after retries.
+**See:** `escalation-messages-core.md#3-session-spawn-failure`
+
+### 4. Git Commit Failure (CRITICAL)
+**Trigger:** Git commit failed (conflict, hook error, etc.).
+**See:** `escalation-messages-core.md#4-git-commit-failure`
+
+### 5. Unexpected Error (CRITICAL)
+**Trigger:** Unhandled exception or unexpected condition.
+**See:** `escalation-messages-extended.md#5-unexpected-error`
+
+### 6. Dependency Conflict (PREFERENCE)
+**Trigger:** Parallelism detects potential conflict.
+**See:** `escalation-messages-extended.md#6-dependency-conflict`
+
+### 7. Dev-Story Implementation Failure (CRITICAL or PREFERENCE)
+**Trigger:** dev-story completes with errors after retries.
+**See:** `escalation-messages-extended.md#7-dev-story-implementation-failure`
+
+### 8. Session Incomplete (CRITICAL) [v2.2]
+**Trigger:** `story-automator monitor-session` returns `final_state: "incomplete"` **after maxCycles exhausted**
+**Condition:** Session finished (idle/exited) but workflow verification failed across all retry attempts.
+**Typical cause:** Codex code-review session ended without updating sprint-status.
+
+**Why CRITICAL (not PREFERENCE):**
+- Automated retries already exhausted
+- Human must decide: manual fix, use Claude, or skip story
+
+**Options:**
+1. **[1] Manual Fix** - Update sprint-status.yaml yourself
+2. **[2] Run with Claude** - Re-run code-review with Claude agent
+3. **[3] Skip Story** - Mark story as skipped and continue
+4. **[X] Pause** - Stop orchestration for investigation
+
+**Verification command:**
+```bash
+"$scripts" orchestrator-helper verify-code-review {story_id}
+```
+
+---
+
+## Non-Escalation Conditions
+
+Handled automatically (no escalation):
+- Optional step (automate) skipped by override → log and continue
+- Session completes with clear success → continue
+- Session completes with clear failure → retry once, then escalate if still fails
--- a/.agents/skills/bmad-story-automator/data/execution-patterns.md
+++ b/.agents/skills/bmad-story-automator/data/execution-patterns.md
@@ -0,0 +1,59 @@
+# Execution Patterns (v1.9.0)
+
+**Purpose:** Critical execution patterns and anti-patterns for the orchestrator.
+
+---
+
+## 🚨 FORBIDDEN EXECUTION PATTERNS (NO EXCEPTIONS)
+
+### NEVER Chain Multiple Workflow Steps
+
+**FORBIDDEN:**
+```bash
+# ❌ WRONG - Chaining steps in a loop bypasses per-step error handling
+for step in create dev; do
+  session=$("$scripts" tmux-wrapper spawn "$step" ...)
+  result=$("$scripts" monitor-session "$session" ...)
+done
+```
+
+**WHY:** If the monitoring task crashes mid-loop, ALL subsequent steps are lost. The orchestrator loses visibility even though tmux sessions may have completed successfully.
+
+**REQUIRED:**
+```bash
+# ✅ CORRECT - Each step is a separate operation with its own error handling
+# Step A: Create
+session=$("$scripts" tmux-wrapper spawn create ...)
+result=$("$scripts" monitor-session "$session" ...)
+"$scripts" tmux-wrapper kill "$session"
+# VERIFY state before proceeding
+
+# Step B: Dev (only after create verified)
+session=$("$scripts" tmux-wrapper spawn dev ...)
+result=$("$scripts" monitor-session "$session" ...)
+"$scripts" tmux-wrapper kill "$session"
+# VERIFY state before proceeding
+```
+
+---
+
+## ALWAYS Verify State After Each Step
+
+After each workflow step completes (create/dev/auto/review), **VERIFY state from source of truth** before proceeding to the next step:
+
+1. **Story file exists and has expected content** (for create-story)
+2. **Sprint-status.yaml shows correct status** (for dev-story, code-review)
+3. **DO NOT rely solely on monitoring output** - if monitoring fails, verify directly
+
+---
+
+## IF Monitoring Fails
+
+If `story-automator monitor-session` or background task monitoring fails:
+
+1. Check if tmux session still exists: `tmux list-sessions | grep {pattern}`
+2. Check session status directly: `"$scripts" tmux-status-check "$session"`
+3. Verify story file / sprint-status regardless of monitoring output
+4. Only escalate after direct verification confirms failure
+
+**See also:** `monitoring-fallback.md` for detailed fallback patterns.
--- a/.agents/skills/bmad-story-automator/data/marker-file-format.md
+++ b/.agents/skills/bmad-story-automator/data/marker-file-format.md
@@ -0,0 +1,67 @@
+# Marker File Format
+
+**Location:** Resolved by `orchestrator-helper marker path` for the active runtime layout:
+- Claude: `.claude/.story-automator-active`
+- Codex: follows the active Codex skill root parent, usually `.agents/.story-automator-active` or `.codex/.story-automator-active`
+
+If a runtime is explicitly selected but the installed story-automator skill is discovered under another supported root, the marker follows that active skill root. Always use `orchestrator-helper marker path` rather than hard-coding the marker path.
+
+**Purpose:** Enables the Stop hook to prevent premature stopping during orchestration.
+
+---
+
+## JSON Structure
+
+```json
+{
+  "epic": "{epic_id}",
+  "currentStory": "{first_story_id}",
+  "storiesRemaining": {story_count},
+  "stateFile": "{path_to_state_document}",
+  "startedAt": "{timestamp}",
+  "heartbeat": "{timestamp}",
+  "pid": {process_id},
+  "projectSlug": "{project_slug}"
+}
+```
+
+---
+
+## Field Descriptions
+
+| Field | Description |
+|-------|-------------|
+| `epic` | Epic identifier (e.g., "5") |
+| `currentStory` | Current story being processed (e.g., "5.3") |
+| `storiesRemaining` | Count of stories left in queue |
+| `stateFile` | Path to orchestration state document |
+| `startedAt` | Orchestration start timestamp (ISO 8601) |
+| `heartbeat` | Last activity timestamp, updated periodically |
+| `pid` | Process ID of orchestrator (crash detection) |
+| `projectSlug` | (v2.0) Project identifier for session naming |
+
+---
+
+## Heartbeat Updates
+
+The orchestrator should update the heartbeat timestamp every ~5 minutes during long-running operations. This prevents the marker from going stale if the orchestrator is still running but taking a while on a complex story.
+
+**Staleness threshold:** 30 minutes (see story-automator stop-hook)
+
+---
+
+## Creation Command
+
+```bash
+project_slug=$(echo "$("{deriveProjectSlug}" derive-project-slug --project-root "{project-root}")" | jq -r '.slug')
+"{stateHelper}" orchestrator-helper marker create --epic "$epic_id" --story "$first_story_id" \
+  --remaining "$selected_count" --state-file "$state_path" \
+  --project-slug "$project_slug" --pid "$$" --heartbeat "{timestamp}"
+```
+
+---
+
+## Related Documentation
+
+- **Stop Hook:** See `stop-hook-config.md` for hook behavior
+- **Troubleshooting:** See `stop-hook-troubleshooting.md` for issues
--- a/.agents/skills/bmad-story-automator/data/monitoring-codex.md
+++ b/.agents/skills/bmad-story-automator/data/monitoring-codex.md
@@ -0,0 +1,66 @@
+# Codex-Specific Monitoring (v2.4.0)
+
+**Purpose:** Special handling for Codex CLI sessions in story-automator monitor-session
+
+---
+
+## Agent Detection
+
+Codex sessions are detected by:
+1. `AI_AGENT` environment variable (most reliable)
+2. Explicit Codex CLI identifiers: `OpenAI Codex`, `codex exec`, `codex-cli`, `gpt-*-codex`, `tokens used`
+
+---
+
+## Session States for Codex
+
+| State | Meaning | Detection |
+|-------|---------|-----------|
+| `in_progress` | Codex actively working | Heartbeat alive OR output changed recently |
+| `idle` | Session alive but no prompt yet | Heartbeat idle + output stale (pre-stuck window) |
+| `completed` | CLI has exited | Prompt returned, pane exited, or `tokens used` |
+| `stuck` | No recent output for too long | Output stale beyond threshold |
+
+**Key Difference:** For Codex, "idle" is NOT the same as "completed". The CLI may have stopped but the workflow might not have finished.
+
+---
+
+## Output Freshness vs Completed Detection
+
+```
+output_fresh():   Output hash changed within CODEX_OUTPUT_STALE_SECONDS
+codex_completed(): Prompt returned, pane exited, or "tokens used"
+```
+
+**Priority:** `completed` > `active` > `idle` > `stuck`
+
+### Output Staleness Window
+
+`CODEX_OUTPUT_STALE_SECONDS` (default: 300) defines how long Codex can be silent
+before the session is considered `stuck`. Any output change refreshes the timer.
+
+---
+
+## Code-Review Workflow Verification
+
+For code-review with Codex, story-automator monitor-session verifies completion:
+
+```bash
+# Must pass --workflow and --story-key for verification
+result=$("$scripts" monitor-session "$session" --json \
+  --workflow review --story-key {story_id})
+```
+
+**Verification checks:**
+1. Sprint-status.yaml shows "done" for story
+2. OR story file Status field shows "done"
+3. If neither → `final_state: "incomplete"`
+
+---
+
+## Fake Todo Progress
+
+Codex doesn't use TodoWrite, so `story-automator tmux-status-check` fakes progress:
+- Start: `todos_total=1, todos_done=0`
+- While running: Keep `0/1`
+- On idle after activity: Set `1/1` (signals "done, needs verification")
--- a/.agents/skills/bmad-story-automator/data/monitoring-fallback.md
+++ b/.agents/skills/bmad-story-automator/data/monitoring-fallback.md
@@ -0,0 +1,85 @@
+# Monitoring Failure Fallback (v1.9.0)
+
+**Purpose:** Recovery patterns when primary monitoring fails.
+
+---
+
+## When Primary Monitoring Fails
+
+Primary monitoring can fail in several ways:
+- Background task crashes (TaskOutput returns empty/error)
+- Network timeout during monitoring
+- Process killed unexpectedly
+- Output file missing or corrupted
+
+**Key insight:** The tmux session may have completed successfully even if monitoring died.
+
+---
+
+## Fallback Sequence
+
+When `story-automator monitor-session` fails or background monitoring task dies:
+
+```bash
+# STEP 1: Check if tmux session still exists
+sessions=$(tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "sa-.*{story_pattern}" || true)
+
+# STEP 2: If session exists, check its status directly
+if [ -n "$sessions" ]; then
+    while IFS= read -r session; do
+        status=$("$scripts" tmux-status-check "$session")
+        session_state=$(echo "$status" | cut -d',' -f6)
+        # Act based on direct status
+    done <<< "$sessions"
+fi
+
+# STEP 3: ALWAYS verify source of truth regardless of session status
+# Story file check:
+story_file=$(ls _bmad-output/implementation-artifacts/{story_prefix}-*.md 2>/dev/null | head -1)
+if [ -f "$story_file" ]; then
+    # Story file exists - check its status field
+fi
+
+# Sprint-status check:
+status=$("$scripts" orchestrator-helper sprint-status get "{story_key}")
+is_done=$(echo "$status" | jq -r '.done')
+```
+
+---
+
+## Detection: Monitoring Task Crashed
+
+Signs that your monitoring task has crashed:
+
+| Signal | Meaning |
+|--------|---------|
+| `TaskOutput` returns empty 2+ times | Task may be dead |
+| Output file path doesn't exist | Task never wrote results |
+| "running" status but no progress | Task is stuck or dead |
+
+**Recovery:**
+1. Do NOT wait indefinitely for dead monitoring task
+2. After 2+ empty TaskOutput results, switch to direct verification
+3. Use tmux session checks + source of truth verification
+4. Resume workflow based on verified state, not monitoring state
+
+---
+
+## Integration with Retry Logic
+
+**If fallback verification shows step succeeded:**
+- Proceed to next step (monitoring failed but workflow succeeded)
+- Log: "Monitoring failed but direct verification confirmed success"
+
+**If fallback verification shows step failed/incomplete:**
+- Apply normal retry/fallback strategy
+- Do NOT treat monitoring failure as step failure
+
+---
+
+## Key Principle
+
+**The tmux session is the source of truth for session state.**
+**The story file and sprint-status.yaml are the source of truth for workflow state.**
+
+Monitoring is just observation - if monitoring fails, verify from source of truth and continue.
--- a/.agents/skills/bmad-story-automator/data/monitoring-pattern-parsing.md
+++ b/.agents/skills/bmad-story-automator/data/monitoring-pattern-parsing.md
@@ -0,0 +1,27 @@
+# Monitoring Pattern: Parsing & Review Handling
+
+## Sub-Agent Pattern
+
+**ALWAYS use sub-agent for output parsing:**
+
+```bash
+# Correct: Let haiku parse
+parsed=$("$scripts" orchestrator-helper parse-output "$output_file" dev)
+action=$(echo "$parsed" | jq -r '.next_action')
+
+# WRONG: Parse yourself
+# content=$(cat "$output_file")  # DON'T DO THIS
+# if grep -q "SUCCESS" ...       # DON'T DO THIS
+```
+
+**Why:** Sub-agent costs ~200 tokens. Main context is ~50k+. Parsing yourself wastes 99% more context.
+
+---
+
+## Code Review Special Handling
+
+See `code-review-loop.md` for review cycle logic. Key points:
+
+- Auto-fix via instruction: `code-review ${story_id} auto-fix all issues without prompting`
+- No menu detection needed - instruction handles it
+- After completion, verify sprint-status before proceeding
--- a/.agents/skills/bmad-story-automator/data/monitoring-pattern.md
+++ b/.agents/skills/bmad-story-automator/data/monitoring-pattern.md
@@ -0,0 +1,186 @@
+# Session Monitoring Pattern
+
+## Quick Reference
+
+**All monitoring is handled by the installed helper (`$scripts`, usually `scripts/story-automator`). DO NOT manually construct tmux commands.**
+
+### Binary Location
+
+```
+scripts/
+└── story-automator  # single Python helper (use subcommands below)
+```
+
+---
+
+## 🚨 FORBIDDEN PATTERNS (NO EXCEPTIONS)
+
+| Pattern | Why Forbidden |
+|---------|---------------|
+| `tmux capture-pane` directly | Context bloat, use status script |
+| `while true` loops in LLM context | Session crash, use `$scripts monitor-session` |
+| Manual session name construction | Error-prone, use `$scripts tmux-wrapper` |
+| Parsing raw output yourself | Use `$scripts orchestrator-helper parse-output` |
+
+---
+
+## Standard Workflow: Spawn + Monitor + Verify (Create Example)
+
+```bash
+# STEP 1: Spawn session (use $scripts tmux-wrapper)
+session_name=$("$scripts" tmux-wrapper spawn create 5 5.3 \
+  --command "$("$scripts" tmux-wrapper build-cmd create 5.3 --state-file "$state_file")")
+
+# STEP 2: Monitor until completion (SINGLE API CALL)
+result=$("$scripts" monitor-session "$session_name" \
+  --verbose --json \
+  --workflow create --story-key 5.3 --state-file "$state_file")
+
+# STEP 3: Verify success against the shared create contract
+validation=$("$scripts" orchestrator-helper verify-step create 5.3 --state-file "$state_file")
+verified=$(echo "$validation" | jq -r '.verified')
+
+# STEP 4: Act on verifier result
+[ "$verified" = "true" ] || echo "retry-or-escalate"
+
+# STEP 5: ALWAYS cleanup session (v1.2.0)
+"$scripts" tmux-wrapper kill "$session_name"
+```
+
+**Context savings:** This entire cycle is 5 bash calls instead of 15+ API roundtrips.
+
+**Session Cleanup (v1.2.0):** ALWAYS kill the session after processing, regardless of success or failure. Orphaned sessions consume resources and cause confusion.
+
+---
+
+## Script Quick Reference
+
+### $scripts tmux-wrapper
+
+```bash
+# Spawn session
+"$scripts" tmux-wrapper spawn <step> <epic> <story_id> [--command "..."] [--cycle N]
+
+# Generate session name only
+"$scripts" tmux-wrapper name <step> <epic> <story_id> [--cycle N]
+
+# Build workflow command
+"$scripts" tmux-wrapper build-cmd <step> <story_id> [extra_instruction]
+
+# List/kill sessions
+"$scripts" tmux-wrapper list [--project-only]
+"$scripts" tmux-wrapper kill <session_name>
+"$scripts" tmux-wrapper kill-all [--project-only]
+```
+
+### $scripts monitor-session
+
+```bash
+# Monitor until completion (returns when session ends)
+"$scripts" monitor-session <session_name> [options]
+
+# Options:
+#   --max-polls N     Maximum iterations (default: 30)
+#   --timeout MIN     Overall timeout in minutes (default: 60)
+#   --verbose         Print progress to stderr
+#   --json            Output as JSON instead of CSV
+
+# Output (JSON):
+# {"final_state":"completed|crashed|stuck|timeout|incomplete|not_found","output_file":"/tmp/...","exit_reason":"..."}
+```
+
+### $scripts orchestrator-helper
+
+```bash
+# Check sprint status
+"$scripts" orchestrator-helper sprint-status get <story_key>
+
+# Parse session output with sub-agent (haiku)
+"$scripts" orchestrator-helper parse-output <file> <step_type>
+
+# Marker file operations
+"$scripts" orchestrator-helper marker create --epic E --story S --remaining N
+"$scripts" orchestrator-helper marker remove
+"$scripts" orchestrator-helper marker check
+
+# Escalation checks
+"$scripts" orchestrator-helper escalate <trigger> <context>
+```
+
+### $scripts orchestrator-helper verify-step
+
+```bash
+# Validate create-story via the shared success verifier
+"$scripts" orchestrator-helper verify-step create 5.3 --state-file "$state_file"
+```
+
+---
+
+## Decision Flow
+
+After `$scripts monitor-session` returns:
+
+| final_state | Action |
+|-------------|--------|
+| `completed` | Run step verifier or parser for the active workflow |
+| `incomplete` | **(v2.2)** Session idle but workflow NOT verified → Escalate immediately |
+| `crashed` | Check retry count → retry or escalate |
+| `stuck` | Get output → investigate → may need restart |
+| `timeout` | Get output → escalate to user |
+| `not_found` | Session gone → check for partial work |
+
+---
+
+## Monitoring Failure Fallback (v1.9.0)
+
+**See `monitoring-fallback.md` for complete fallback patterns when monitoring fails.**
+
+Key points:
+- If monitoring crashes, tmux session may have completed successfully
+- Fall back to direct session checks + source of truth verification
+- Do NOT treat monitoring failure as step failure
+
+---
+
+## Statusline Time Gate (v2.6.0)
+
+**Purpose:** Prevent ALL false "stuck" escalations by using the Claude Code statusline as definitive proof-of-life.
+
+### How It Works
+
+Claude Code displays a statusline at the bottom of the terminal:
+```
+folder | ctx(N%) | HH:MM:SS
+                   ^^^^^^^^ <- This time updates continuously while Claude runs
+```
+
+The installed helper's `$scripts tmux-status-check` command:
+1. Parses the statusline time from the tmux pane
+2. Stores it in the session state file
+3. Compares with previous poll's time
+4. **If time has advanced → session is ALIVE → DO NOT escalate**
+
+### Decision Matrix
+
+| Previous Time | Current Time | Other Checks Say | Result |
+|---------------|--------------|------------------|--------|
+| 10:00:00 | 10:01:00 | stuck | `just_started` (time advanced = alive) |
+| 10:00:00 | 10:00:00 | stuck | `stuck` (time unchanged) |
+| (none) | 10:00:00 | stuck | `just_started` (first observation = alive) |
+| (none) | (none) | stuck | `stuck` (no statusline data) |
+
+### Key Principle
+
+**The statusline time gate is the FINAL AUTHORITY.** Even if all other detection methods (process checks, activity indicators, heartbeat) suggest the session is stuck, if the statusline time has advanced, the session is definitively alive and MUST NOT be escalated.
+
+This prevents false escalations for:
+- Complex sessions in long thinking phases
+- Sessions with unusual output patterns
+- Edge cases where other detection fails
+
+---
+
+## References
+
+- **Codex monitoring details:** `monitoring-codex.md`
+- **Output parsing + review handling:** `monitoring-pattern-parsing.md`
--- a/.agents/skills/bmad-story-automator/data/orchestration-policy.json
+++ b/.agents/skills/bmad-story-automator/data/orchestration-policy.json
@@ -0,0 +1,146 @@
+{
+  "version": 1,
+  "snapshot": {
+    "relativeDir": "_bmad-output/story-automator/policy-snapshots"
+  },
+  "runtime": {
+    "parser": {
+      "provider": "claude",
+      "model": "haiku",
+      "timeoutSeconds": 120
+    },
+    "merge": {
+      "maps": "deep",
+      "arrays": "replace"
+    }
+  },
+  "workflow": {
+    "sequence": ["create", "dev", "auto", "review", "retro"],
+    "repeat": {
+      "review": {
+        "maxCycles": 5,
+        "successVerifier": "review_completion",
+        "onIncomplete": "retry",
+        "onExhausted": "escalate"
+      }
+    },
+    "crash": {
+      "maxRetries": 2,
+      "onExhausted": "escalate"
+    }
+  },
+  "steps": {
+    "create": {
+      "label": "create-story",
+      "assets": {
+        "skillName": "bmad-create-story",
+        "workflowCandidates": ["workflow.md", "workflow.yaml"],
+        "instructionsCandidates": ["discover-inputs.md"],
+        "checklistCandidates": ["checklist.md"],
+        "templateCandidates": ["template.md"],
+        "required": ["skill"]
+      },
+      "prompt": {
+        "templateFile": "data/prompts/create.md",
+        "interactionMode": "autonomous"
+      },
+      "parse": {
+        "schemaFile": "data/parse/create.json"
+      },
+      "success": {
+        "verifier": "create_story_artifact",
+        "config": {
+          "glob": "_bmad-output/implementation-artifacts/{story_prefix}-*.md",
+          "expectedMatches": 1
+        }
+      }
+    },
+    "dev": {
+      "label": "dev-story",
+      "assets": {
+        "skillName": "bmad-dev-story",
+        "workflowCandidates": ["workflow.md", "workflow.yaml"],
+        "instructionsCandidates": [],
+        "checklistCandidates": ["checklist.md"],
+        "templateCandidates": [],
+        "required": ["skill"]
+      },
+      "prompt": {
+        "templateFile": "data/prompts/dev.md",
+        "interactionMode": "autonomous"
+      },
+      "parse": {
+        "schemaFile": "data/parse/dev.json"
+      },
+      "success": {
+        "verifier": "session_exit"
+      }
+    },
+    "auto": {
+      "label": "qa-generate-e2e-tests",
+      "assets": {
+        "skillName": "bmad-qa-generate-e2e-tests",
+        "workflowCandidates": ["workflow.md", "workflow.yaml"],
+        "instructionsCandidates": [],
+        "checklistCandidates": ["checklist.md"],
+        "templateCandidates": [],
+        "required": []
+      },
+      "prompt": {
+        "templateFile": "data/prompts/auto.md",
+        "interactionMode": "autonomous"
+      },
+      "parse": {
+        "schemaFile": "data/parse/auto.json"
+      },
+      "success": {
+        "verifier": "session_exit"
+      }
+    },
+    "review": {
+      "label": "code-review",
+      "assets": {
+        "skillName": "bmad-story-automator-review",
+        "workflowCandidates": ["workflow.yaml", "workflow.md"],
+        "instructionsCandidates": ["instructions.xml"],
+        "checklistCandidates": ["checklist.md"],
+        "templateCandidates": [],
+        "required": ["skill"]
+      },
+      "prompt": {
+        "templateFile": "data/prompts/review.md",
+        "interactionMode": "autonomous",
+        "acceptExtraInstruction": true,
+        "defaultExtraInstruction": "auto-fix all issues without prompting"
+      },
+      "parse": {
+        "schemaFile": "data/parse/review.json"
+      },
+      "success": {
+        "verifier": "review_completion",
+        "contractFile": "<skills-root>/bmad-story-automator-review/contract.json"
+      }
+    },
+    "retro": {
+      "label": "retrospective",
+      "assets": {
+        "skillName": "bmad-retrospective",
+        "workflowCandidates": ["workflow.md", "workflow.yaml"],
+        "instructionsCandidates": [],
+        "checklistCandidates": [],
+        "templateCandidates": [],
+        "required": ["skill"]
+      },
+      "prompt": {
+        "templateFile": "data/prompts/retro.md",
+        "interactionMode": "autonomous"
+      },
+      "parse": {
+        "schemaFile": "data/parse/retro.json"
+      },
+      "success": {
+        "verifier": "epic_complete"
+      }
+    }
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/orchestrator-rules-appendix.md
+++ b/.agents/skills/bmad-story-automator/data/orchestrator-rules-appendix.md
@@ -0,0 +1,86 @@
+# Orchestrator Rules Appendix
+
+## Session Naming
+**See `tmux-commands.md` for complete session naming documentation.**
+
+Pattern: `sa-{project_slug}-{timestamp}-e{epic}-s{N}-{type}` where type = `create`, `dev`, `auto`, `review-{cycle}`
+
+## Workflow Command Arguments
+
+**CRITICAL:** ALWAYS pass required positional arguments to BMAD workflows.
+
+### Story ID Requirement
+
+**create-story, dev-story, code-review, automate (`testarch-automate` or `qa-generate-e2e-tests`)** — All require the story ID as a positional argument.
+
+**WRONG:**
+```bash
+Execute the BMAD create-story workflow.
+```
+This causes create-story to create ALL stories in the epic, not just one.
+
+**CORRECT:**
+```bash
+Execute the BMAD create-story workflow for story 5.3.
+```
+This creates ONLY story 5.3.
+
+### Validation After create-story
+
+**After create-story session completes:**
+1. Count story files BEFORE spawning session
+2. Count story files AFTER session completes
+3. Verify exactly ONE new file created
+4. IF 0 or >1 files → Escalate with file list
+
+**This prevents runaway story creation** where create-story creates 5.3, 5.4, 5.5, etc. instead of just the requested story.
+
+## State Updates
+
+After EVERY action:
+1. Update `currentStep` in state document
+2. Log action with timestamp
+3. Update story progress table
+
+## Escalation Protocol
+
+**See `data/escalation-triggers.md` for complete trigger definitions and behavior.**
+
+### Quick Reference
+
+| Category | Marker Action | State | When |
+|----------|---------------|-------|------|
+| CRITICAL | **DELETE** | PAUSED | Cannot proceed (retries exhausted) |
+| PREFERENCE | Keep | IN_PROGRESS | Could proceed either way |
+
+### CRITICAL Escalation (Key Steps)
+
+1. Delete marker: run `orchestrator-helper marker remove` via the installed story-automator helper
+2. Set state to PAUSED
+3. Present menu (stop hook won't interfere)
+4. On resume: recreate marker, set IN_PROGRESS
+
+### Dev-Story Smart Retry
+
+Before escalating, check if story is blocking:
+- **Blocking:** Retry up to 3 times → then CRITICAL
+- **Not blocking:** Retry once → then PREFERENCE (can skip)
+
+## Session Monitoring & Output Parsing
+
+**CRITICAL:** These topics have dedicated reference files. Load them when needed:
+
+- **Session Monitoring:** See `data/monitoring-pattern.md`
+  - FORBIDDEN patterns (capture-pane, etc.)
+  - Status script usage and CSV format
+  - Decision tree for poll results
+  - Polling loop with state tracking
+
+- **Output Parsing:** See `data/monitoring-pattern.md` (Sub-Agent Invocation section)
+  - NEVER parse output yourself
+  - ALWAYS use sub-agents (Task tool, haiku)
+  - Verification checkpoint before proceeding
+
+- **Sub-Agent Prompts:** See `data/subagent-prompts.md`
+  - Session Output Parser
+  - Code Review Analyzer (also see `subagent-prompts-analysis.md`)
--- a/.agents/skills/bmad-story-automator/data/orchestrator-rules.md
+++ b/.agents/skills/bmad-story-automator/data/orchestrator-rules.md
@@ -0,0 +1,180 @@
+# Orchestrator Rules
+
+Load once at workflow start. Do not re-read in subsequent steps.
+
+---
+
+## Your Role
+
+You are the **Build Cycle Orchestrator** — an autonomous coordinator that:
+- Spawns T-Mux sessions for each workflow step
+- Monitors progress and parses outputs
+- Handles code review loops until clean
+- Commits after each completed story
+- Escalates to user ONLY when decisions are needed
+
+## Ground Truth: sprint-status.yaml
+
+**CRITICAL:** `_bmad-output/implementation-artifacts/sprint-status.yaml` is the single source of truth.
+
+### 🚨 ABSOLUTE RULE: NEVER UPDATE sprint-status.yaml 🚨
+
+**YOU (the orchestrator) MUST NEVER, EVER write to sprint-status.yaml.**
+
+- ❌ NEVER use Edit tool on sprint-status.yaml
+- ❌ NEVER use Write tool on sprint-status.yaml
+- ❌ NEVER use Bash to modify sprint-status.yaml
+- ❌ NEVER "fix" mismatches by updating sprint-status.yaml
+
+**WHO updates it:** The T-Mux sessions running dev-story, code-review, etc.
+
+**IF MISMATCH DETECTED:**
+1. Do NOT "correct" sprint-status.yaml
+2. Re-run the workflow that SHOULD update it (dev-story, code-review)
+3. The session will update sprint-status.yaml as part of its workflow
+
+**When to READ (read-only):**
+- At initialization — check if earlier stories are incomplete
+- When resuming — verify current state matches
+- After each story "completes" — verify sprint-status shows `done`
+
+**Initialization/Resume check:**
+- If earlier stories in range are not `done`, ask user: "Stories X, Y are not complete. Process them first?"
+- If yes → add them to queue before requested stories
+
+**Post-story verification:**
+- After code review passes and commit succeeds, check sprint-status.yaml
+- If story is NOT marked `done` → re-run code-review (it will update sprint-status)
+- Only proceed to next story when sprint-status confirms `done`
+
+### Sprint-Status "done" from Dev-Story (Session 22 Note)
+
+**IMPORTANT:** If dev-story marks sprint-status as "done" but code-review later finds HIGH issues:
+- This is EXPECTED behavior - dev-story completes successfully, but code-review finds additional issues
+- The code-review workflow will update sprint-status appropriately
+- Do NOT trust "done" status from dev-story alone
+- ALWAYS run code-review to verify the implementation quality
+
+## Custom Instructions
+
+User-provided instructions are flexible and may apply to:
+- The orchestrator itself (e.g., "prioritize story 3")
+- Specific sessions (e.g., "always run tests" → pass to dev sessions)
+- Conditional situations (e.g., "always run tests after changes")
+
+**Interpret intelligently** — Don't mechanically inject instructions everywhere. Apply judgment about when and how instructions are relevant.
+
+## Core Rules
+
+1. **Coordinate, don't implement** — Spawn sessions, don't write code yourself
+2. **Log everything** — Update state document after every action
+3. **Escalate, don't decide** — When uncertain, ask the user
+4. **Use sub-agents for parsing** — Don't bloat context with raw output
+5. **Follow the sequence** — Don't skip or reorder steps
+6. **Sprint-status is truth** — Always sync with sprint-status.yaml
+7. **Always cleanup sessions** — Kill tmux sessions after completion (v1.2.0)
+8. **Verify state after each step** — Check source of truth, not just monitoring output (v1.9.0)
+
+---
+
+## State Verification After Each Step (v1.9.0)
+
+### 🚨 CRITICAL: Verify Before Proceeding
+
+After **EVERY** workflow step completes (create/dev/auto/review), you MUST verify state from the **source of truth** before proceeding to the next step.
+
+**DO NOT rely solely on monitoring output.** Monitoring can fail, crash, or lose connection. The source of truth is:
+- **Story files** in `_bmad-output/implementation-artifacts/`
+- **sprint-status.yaml** in `_bmad-output/implementation-artifacts/`
+
+### Verification Sequence
+
+After each step:
+
+```bash
+# 1. Get monitoring result (may be incomplete/failed)
+result=$("$scripts" monitor-session "$session" --json)
+final_state=$(echo "$result" | jq -r '.final_state')
+
+# 2. ALWAYS verify from source of truth regardless of monitoring result
+# For create-story: verify story file exists
+# For dev-story: verify sprint-status updated
+# For code-review: verify sprint-status shows "done"
+
+# 3. Only proceed when source of truth confirms success
+```
+
+### Monitoring Failure Fallback
+
+**See `monitoring-fallback.md` for complete fallback patterns.**
+
+Quick reference:
+1. Check if session exists: `tmux list-sessions | grep {session_pattern}`
+2. Check session status directly: `"$scripts" tmux-status-check "$session"`
+3. Verify source of truth: story file / sprint-status.yaml
+4. Proceed based on verified state, not monitoring state
+
+### Why This Matters
+
+Observed failure mode: Orchestrator's monitoring task crashed after dev-story completed. The tmux session had actually succeeded, but the orchestrator lost visibility and never ran code-review. **Direct state verification would have recovered from this.**
+
+---
+
+## Agent Fallback Strategy
+
+**See `agent-fallback.md` for complete multi-agent documentation.**
+**Troubleshooting:** `agent-fallback-troubleshooting.md`
+
+**Quick Reference:**
+- Primary/fallback agents configurable (Claude or Codex)
+- Different CLI commands and prompt styles per agent
+- Automatic fallback on crash after retries exhausted
+- Codex has 1.5x timeouts, no todo tracking
+
+---
+
+### 🚨 ABSOLUTE RULE: NEVER Change Working Directory 🚨
+
+**YOU (the orchestrator) MUST NEVER use the `cd` command.**
+
+- ❌ NEVER use `cd backend && ...`
+- ❌ NEVER use `cd /path/to/dir`
+- ❌ NEVER change working directory for ANY reason
+- ✅ ALWAYS use absolute paths for all file operations
+- ✅ ALWAYS use absolute paths for script invocations
+
+**Why?** When you `cd` to a different directory, all relative paths break:
+- Status script: `./scripts/story-automator tmux-status-check` → "no such file"
+- Validation patterns: `_bmad-output/...` → wrong location
+- All monitoring fails, causing fallback to FORBIDDEN patterns
+
+**Example - WRONG:**
+```bash
+cd backend && go test ./internal/api/...
+```
+
+**Example - CORRECT:**
+```bash
+go test {project_root}/backend/internal/api/...
+```
+
+### 🚨 ABSOLUTE RULE: NEVER Edit Source Code Directly 🚨
+
+**YOU (the orchestrator) MUST NEVER use Edit/Write tools on source code.**
+
+- ❌ NEVER use Edit tool on `.go`, `.ts`, `.tsx`, `.js`, `.py`, etc.
+- ❌ NEVER use Write tool to create source code files
+- ❌ NEVER "fix issues" by modifying code directly
+- ✅ ALWAYS spawn a T-Mux session (dev-story) to make code changes
+- ✅ ALWAYS delegate code fixes to child sessions
+
+**Why?** The orchestrator's role is COORDINATION, not implementation. All code changes must go through proper workflow sessions that:
+- Have full project context
+- Run tests after changes
+- Update sprint-status appropriately
+- Can be reviewed and audited
+
+## Appendix
+
+See `orchestrator-rules-appendix.md` for session naming, workflow command arguments, monitoring, and output parsing details.
+
--- a/.agents/skills/bmad-story-automator/data/parse/auto.json
+++ b/.agents/skills/bmad-story-automator/data/parse/auto.json
@@ -0,0 +1,10 @@
+{
+  "requiredKeys": ["status", "tests_added", "coverage_improved", "summary", "next_action"],
+  "schema": {
+    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+    "tests_added": "integer",
+    "coverage_improved": "true|false",
+    "summary": "brief description",
+    "next_action": "proceed|retry|escalate"
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/parse/create.json
+++ b/.agents/skills/bmad-story-automator/data/parse/create.json
@@ -0,0 +1,10 @@
+{
+  "requiredKeys": ["status", "story_created", "story_file", "summary", "next_action"],
+  "schema": {
+    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+    "story_created": "true|false",
+    "story_file": "path or null",
+    "summary": "brief description",
+    "next_action": "proceed|retry|escalate"
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/parse/dev.json
+++ b/.agents/skills/bmad-story-automator/data/parse/dev.json
@@ -0,0 +1,10 @@
+{
+  "requiredKeys": ["status", "tests_passed", "build_passed", "summary", "next_action"],
+  "schema": {
+    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+    "tests_passed": "true|false",
+    "build_passed": "true|false",
+    "summary": "brief description",
+    "next_action": "proceed|retry|escalate"
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/parse/retro.json
+++ b/.agents/skills/bmad-story-automator/data/parse/retro.json
@@ -0,0 +1,8 @@
+{
+  "requiredKeys": ["status", "summary", "next_action"],
+  "schema": {
+    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+    "summary": "brief description",
+    "next_action": "proceed|retry|escalate"
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/parse/review.json
+++ b/.agents/skills/bmad-story-automator/data/parse/review.json
@@ -0,0 +1,15 @@
+{
+  "requiredKeys": ["status", "issues_found", "all_fixed", "summary", "next_action"],
+  "schema": {
+    "status": "SUCCESS|FAILURE|AMBIGUOUS",
+    "issues_found": {
+      "critical": "integer",
+      "high": "integer",
+      "medium": "integer",
+      "low": "integer"
+    },
+    "all_fixed": "true|false",
+    "summary": "brief description",
+    "next_action": "proceed|retry|escalate"
+  }
+}
--- a/.agents/skills/bmad-story-automator/data/preflight-prompts.md
+++ b/.agents/skills/bmad-story-automator/data/preflight-prompts.md
@@ -0,0 +1,141 @@
+# Pre-flight Prompts
+
+Reference prompts for the pre-flight configuration step.
+
+---
+
+## Context Gathering Questions
+
+Present these questions to gather implementation context:
+
+```
+**Context Gathering:**
+
+To help the implementation sessions succeed, please clarify:
+
+1. **Technical Context:** Are there any architectural decisions, patterns, or conventions the dev sessions should follow?
+
+2. **Testing Requirements:** Any specific testing frameworks or coverage expectations?
+
+3. **Dependencies:** Are there external services, APIs, or packages that need to be set up first?
+
+4. **Known Challenges:** Any tricky areas or things that previous attempts struggled with?
+
+5. **Anything Else:** Any other context that would help the sessions succeed?
+
+Feel free to answer as much or as little as you'd like. You can also say 'none' if the stories are self-explanatory.
+```
+
+**After user responds:**
+- Think about their response before continuing
+- If response raises new questions, ask 1-2 follow-up questions
+- Continue until context is sufficient
+
+---
+
+## Agent Configuration (v1.2.0)
+
+```
+**AI Agent Selection:**
+
+Which AI coding agent should run your workflows?
+
+| Agent | CLI Command | Prompt Style | Best For |
+|-------|-------------|--------------|----------|
+| **Claude** | `claude --dangerously-skip-permissions` | Natural language skill prompt | BMAD workflows |
+| **Codex** | `codex exec --full-auto` | Natural language skill prompt | OpenAI Codex users |
+
+**Primary Agent:** (default: auto, resolves from active runtime provider)
+**Fallback Agent:** (default: false, disabled unless configured)
+**Enable Fallback:** (default: no)
+
+Examples:
+- `auto` → Active runtime provider, no fallback
+- `claude` → Claude primary, no fallback
+- `codex` → Codex primary, Claude fallback
+- `claude, none` → Claude only, no fallback
+- `codex, claude` → Codex primary, Claude fallback
+
+Enter agent config or press Enter for defaults:
+```
+
+Store response as `agentConfig` (v3.0.0):
+```yaml
+agentConfig:
+  defaultPrimary: "auto"
+  defaultFallback: false
+  perTask: {}
+  complexityOverrides: {}
+```
+
+---
+
+## Legacy AI Command Configuration (Deprecated)
+
+```
+**AI Command:**
+What command invokes Claude Code (or your AI CLI) in the terminal?
+
+Examples:
+- `claude --dangerously-skip-permissions` (default - autonomous mode, no prompts)
+- `claude` (interactive mode - will prompt for permissions)
+- `cursor` (Cursor IDE)
+- `/usr/local/bin/claude --dangerously-skip-permissions` (full path)
+
+Enter command or press Enter for default (`claude --dangerously-skip-permissions`):
+```
+
+Store response as `aiCommand`. **Note:** This is deprecated in v1.2.0. Use `agentConfig` instead.
+
+---
+
+## Execution Overrides
+
+```
+**Execution Overrides:**
+
+By default, the orchestrator will:
+- Run all steps: create-story → dev-story → automate → code-review
+- Run stories sequentially (one at a time)
+- Commit after each completed story
+
+**Would you like to change any defaults?**
+
+| Option | Default | Your Choice |
+|--------|---------|-------------|
+| Skip `automate` (guardrail tests) | No | ? |
+| Max parallel stories | 1 | ? |
+
+Enter changes (e.g., `skip automate, max parallel 2`) or `defaults` to keep all defaults:
+```
+
+---
+
+## Configuration Review Template
+
+```
+**Pre-flight Complete. Here's your configuration:**
+
+**Project Context Loaded:**
+- Product Brief: {loaded/not found}
+- PRD: {loaded/not found}
+- Architecture: {loaded/not found}
+- Other docs: {list or 'None'}
+
+**Epic:** {epic_name}
+**Stories:** {story_range} ({count} stories)
+
+**Stories to implement:**
+{story_list_with_titles}
+
+**AI Command:** `{aiCommand}`
+
+**Overrides:**
+- Skip automate: {yes/no}
+- Max parallel: {number}
+
+**Additional Context from Conversation:**
+{context_summary_or_'None provided'}
+
+**Does this look correct?** I'll create the state document and we can begin execution.
+```
--- a/.agents/skills/bmad-story-automator/data/preflight-requirements.md
+++ b/.agents/skills/bmad-story-automator/data/preflight-requirements.md
@@ -0,0 +1,74 @@
+# Preflight Requirements (v1.10.0)
+
+> **🚨 CRITICAL:** Load and internalize these requirements BEFORE executing any preflight steps.
+
+---
+
+## MANDATORY Sequence (NO EXCEPTIONS)
+
+Steps 1-3 MUST be completed IN ORDER using the Python helper BEFORE proceeding to steps 4-7:
+
+1. **Step 1-2:** Request and parse epic(s) → `scripts/story-automator parse-epic`
+2. **Step 3:** Parse ALL stories with complexity scoring → `scripts/story-automator parse-story --rules`
+3. **GATE:** Verify `stories_json` is populated with programmatic complexity data
+4. **Step 4:** Display Complexity Matrix (from step 3 data)
+5. **Steps 5-7:** Custom instructions, agent config, execution settings
+
+---
+
+## 🛑 FORBIDDEN PATTERNS
+
+- ❌ **NEVER** skip step 3 (complexity scoring)
+- ❌ **NEVER** manually assess complexity by reading epic/story content
+- ❌ **NEVER** proceed to agent configuration without displaying the Complexity Matrix
+- ❌ **NEVER** guess complexity levels - they MUST come from `parse-story --rules`
+- ❌ **NEVER** create state document without `stories_json` containing complexity data
+
+---
+
+## ✅ REQUIRED Verification
+
+Before step 5 (Configure Agent), you MUST have:
+- [ ] `stories_json` variable populated with complexity data from Python helper
+- [ ] Complexity Matrix displayed to user showing all stories with levels/scores
+- [ ] User has seen the complexity breakdown before being asked about agents
+
+---
+
+## Why This Matters
+
+Without programmatic complexity scoring:
+- Agent configuration cannot be informed by actual story difficulty
+- User cannot make informed decisions about which agents to use
+- The orchestration may fail or produce suboptimal results
+
+The Python helper (`scripts/story-automator parse-story --rules`) applies consistent, deterministic rules from `data/complexity-rules.json` to score each story. This data MUST be gathered before agent configuration.
+
+---
+
+## Complexity Matrix Display Template
+
+After gathering complexity data, you MUST display:
+
+```
+**Story Complexity Matrix**
+
+| Story | Title | Score | Level | Reasons |
+|-------|-------|-------|-------|---------|
+| {storyId} | {title} | {score} | {level} | {reasons or "-"} |
+...
+
+**Summary:**
+- Low: {count} stories
+- Medium: {count} stories
+- High: {count} stories
+```
+
+---
+
+## Verification Gate (Step 3d)
+
+Before proceeding to step 4 (Custom Instructions), verify:
+- `stories_json` contains complexity data for ALL selected stories
+- Complexity Matrix has been displayed to user
+- If either is missing, DO NOT PROCEED - re-run step 3
--- a/.agents/skills/bmad-story-automator/data/prompts/auto.md
+++ b/.agents/skills/bmad-story-automator/data/prompts/auto.md
@@ -0,0 +1,4 @@
+Execute the BMAD {{label}} workflow for story {{story_id}}.
+
+{{skill_line}}{{workflow_line}}{{instructions_line}}{{checklist_line}}Story file: _bmad-output/implementation-artifacts/{{story_prefix}}-*.md
+Auto-apply all discovered gaps in tests.
--- a/.agents/skills/bmad-story-automator/data/prompts/create.md
+++ b/.agents/skills/bmad-story-automator/data/prompts/create.md
@@ -0,0 +1,7 @@
+Execute the BMAD create-story workflow for story {{story_id}}.
+
+{{skill_line}}{{workflow_line}}{{instructions_line}}{{template_line}}{{checklist_line}}Create story file at: _bmad-output/implementation-artifacts/{{story_prefix}}-*.md
+
+Story ID: {{story_id}}
+
+#YOLO - Do NOT wait for user input.
--- a/.agents/skills/bmad-story-automator/data/prompts/dev.md
+++ b/.agents/skills/bmad-story-automator/data/prompts/dev.md
@@ -0,0 +1,4 @@
+Execute the BMAD dev-story workflow for story {{story_id}}.
+
+{{skill_line}}{{workflow_line}}{{instructions_line}}{{checklist_line}}Story file: _bmad-output/implementation-artifacts/{{story_prefix}}-*.md
+Implement all tasks marked [ ]. Run tests. Update checkboxes.
--- a/.agents/skills/bmad-story-automator/data/prompts/retro.md
+++ b/.agents/skills/bmad-story-automator/data/prompts/retro.md
@@ -0,0 +1,33 @@
+Execute the BMAD retrospective workflow for epic {{story_id}}.
+
+{{skill_line}}{{workflow_line}}{{instructions_line}}Run the retrospective in #YOLO mode.
+Assume the user will NOT provide any input to the retrospective directly.
+For ALL prompts that expect user input, make reasonable autonomous decisions based on:
+- Sprint status data
+- Story files and their dev notes
+- Previous retrospective if available
+- Architecture and PRD documents
+
+Key behaviors:
+- When asked to confirm epic number: auto-confirm based on sprint-status
+- When asked for observations: synthesize from story analysis
+- When asked for decisions: make data-driven choices
+- When presented menus: select the most appropriate option based on context
+- Skip all "WAIT for user" instructions - continue autonomously
+
+After the retrospective has run and created documents, you MUST:
+1. Create a list of documentation that may need updates based on implementation learnings
+2. For each doc in the list, verify whether updates are actually needed by:
+   - Reading the current doc content
+   - Comparing against actual implementation code
+   - Checking for discrepancies between doc and code
+3. Update docs that have verified discrepancies
+4. Discard proposed updates where code matches docs
+
+Focus on these doc types:
+- Architecture decisions that changed during implementation
+- API documentation that diverged from specs
+- README files with outdated instructions
+- Configuration documentation
+
+EVERYTHING SHOULD BE AUTOMATED. THIS IS NOT A SESSION WHERE YOU SHOULD BE EXPECTING USER INPUT.
--- a/.agents/skills/bmad-story-automator/data/prompts/review.md
+++ b/.agents/skills/bmad-story-automator/data/prompts/review.md
@@ -0,0 +1,4 @@
+Execute the story-automator review workflow for story {{story_id}}.
+
+{{skill_line}}{{workflow_line}}{{instructions_line}}{{checklist_line}}Story file: _bmad-output/implementation-artifacts/{{story_prefix}}-*.md
+Review implementation, find issues, fix them automatically. {{extra_instruction}}
--- a/.agents/skills/bmad-story-automator/data/report-retention-policy.md
+++ b/.agents/skills/bmad-story-automator/data/report-retention-policy.md
@@ -0,0 +1,30 @@
+# Validation Report Retention Policy
+
+Purpose: keep workflow repo lean while preserving historical validation evidence.
+
+## Policy
+
+- Keep latest 10 validation reports in `validation-reports/` as `.md`.
+- Archive older reports into `validation-reports/archive/` as `.md.gz`.
+- Keep `validation-report-*-current.md` files unarchived.
+- Never delete archived `.md.gz` files automatically.
+
+## Suggested Maintenance Command
+
+Run from workflow root:
+
+```bash
+mkdir -p validation-reports/archive
+ls -1t validation-reports/validation-report-*.md \
+  | rg -v -- '-current\.md$' \
+  | awk 'NR>10' \
+  | while read -r f; do
+      gzip -c "$f" > "validation-reports/archive/$(basename "$f").gz" && rm "$f"
+    done
+```
+
+## Operational Notes
+
+- This policy applies to historical reports only.
+- Current run artifacts remain readable markdown.
+- Archival is optional during active development, recommended during wrap-up.
--- a/.agents/skills/bmad-story-automator/data/retrospective-automation.md
+++ b/.agents/skills/bmad-story-automator/data/retrospective-automation.md
@@ -0,0 +1,140 @@
+# Retrospective Automation Data
+
+This file provides instructions for running retrospectives in YOLO mode (fully automated, no user input expected).
+
+---
+
+## YOLO Mode Principles
+
+1. **No User Input Expected**: The retrospective must complete autonomously
+2. **Data-Driven Decisions**: All decisions based on sprint-status, story files, and artifacts
+3. **Safe Failure**: If anything goes wrong, log and skip - never escalate
+4. **Configured Agent**: Retrospectives inherit the configured primary agent unless `retro` is explicitly overridden
+
+---
+
+## Agent Constraints
+
+Retrospectives have complex multi-agent "party mode" interactions that require:
+- Natural language dialogue synthesis
+- Multi-step reasoning across story analysis
+- Document generation with rich context
+
+Retrospectives use the configured `agentConfig` retro selection. If no explicit `retro` override is present, they inherit the configured primary agent.
+
+### Timeout Configuration
+
+Retrospectives analyze all stories in an epic and generate comprehensive reports:
+- **Base timeout**: 60 minutes (3600000ms)
+- **Extended timeout for large epics (>10 stories)**: 90 minutes (5400000ms)
+
+---
+
+## YOLO Mode Prompt Template
+
+When spawning a retrospective in YOLO mode, use this prompt:
+
+```
+Execute the BMAD retrospective workflow for epic {epic_number}.
+
+READ this skill first: <installed-skill-root>/bmad-retrospective/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-retrospective/workflow.md
+
+Run the retrospective in #YOLO mode.
+Assume the user will NOT provide any input to the retrospective directly.
+For ALL prompts that expect user input, make reasonable autonomous decisions based on:
+- Sprint status data
+- Story files and their dev notes
+- Previous retrospective if available
+- Architecture and PRD documents
+
+Key behaviors:
+- When asked to confirm epic number: auto-confirm based on sprint-status
+- When asked for observations: synthesize from story analysis
+- When asked for decisions: make data-driven choices
+- When presented menus: select the most appropriate option based on context
+- Skip all "WAIT for user" instructions - continue autonomously
+
+After the retrospective has run and created documents, you MUST:
+1. Create a list of documentation that may need updates based on implementation learnings
+2. For each doc in the list, verify whether updates are actually needed by:
+   - Reading the current doc content
+   - Comparing against actual implementation code
+   - Checking for discrepancies between doc and code
+3. Update docs that have verified discrepancies
+4. Discard proposed updates where code matches docs
+
+Focus on these doc types:
+- Architecture decisions that changed during implementation
+- API documentation that diverged from specs
+- README files with outdated instructions
+- Configuration documentation
+
+EVERYTHING SHOULD BE AUTOMATED. THIS IS NOT A SESSION WHERE YOU SHOULD BE EXPECTING USER INPUT.
+```
+
+---
+
+## Multi-Epic Support
+
+When multiple epics are provided to story-automator:
+
+### Tracking Multiple Epics
+
+State document should track:
+```yaml
+epics:
+  - epicNumber: 1
+    storyRange: ["1-1", "1-2", "1-3"]
+    status: "completed"
+    retrospectiveStatus: "completed"
+  - epicNumber: 2
+    storyRange: ["2-1", "2-2"]
+    status: "in_progress"
+    retrospectiveStatus: "pending"
+```
+
+### Aggregation Rules
+
+1. **Complete epics during run**: If epic N completes while stories from epic N+1 are being processed, trigger retrospective for epic N
+2. **Batch retrospectives**: After all stories complete, run retrospectives for all completed epics in order
+3. **Independent failures**: If retrospective for epic N fails, continue to epic N+1 retrospective
+
+### Safe Skip on Failure
+
+If a retrospective fails:
+1. Log: `⚠️ Retrospective for Epic {N} skipped: {reason}`
+2. Update state: `retrospectives.epic-{N}.status = "skipped"`
+3. Update state: `retrospectives.epic-{N}.reason = "{reason}"`
+4. Continue to next epic - **NEVER ESCALATE**
+
+---
+
+## Documentation Verification
+
+See `retrospective-doc-verification.md` for doc verification patterns and output parsing.
+
+## Error Handling
+
+### Network Errors
+
+If retrospective session fails due to network:
+1. Wait 60 seconds
+2. Retry once
+3. If retry fails, mark as skipped
+
+### Session Crashes
+
+If retrospective session crashes:
+1. Check output file for partial progress
+2. If retro doc was partially created, mark as partial
+3. Log crash reason
+4. Skip to next epic
+
+### Timeout
+
+If retrospective exceeds timeout:
+1. Check if core analysis completed
+2. If retro doc exists, mark as partial success
+3. Skip doc verification phase
+4. Continue to next epic
--- a/.agents/skills/bmad-story-automator/data/retrospective-doc-verification.md
+++ b/.agents/skills/bmad-story-automator/data/retrospective-doc-verification.md
@@ -0,0 +1,94 @@
+# Retrospective Doc Verification
+
+Companion to `retrospective-automation.md`. Contains doc verification patterns and output parsing guidance.
+
+## Doc Verification Patterns
+
+After retrospective generates documents, verify updates against code:
+
+### Documents to Check
+
+| Doc Type | Pattern | Verification Method |
+|----------|---------|---------------------|
+| Architecture | `*architecture*.md` | Compare decisions against implementation |
+| API Docs | `*api*.md`, `*openapi*.yaml` | Verify endpoints match code |
+| README | `README.md` | Check setup/usage instructions |
+| Config Docs | `*config*.md` | Verify env vars and settings |
+
+### Verification Prompt Template
+
+```
+Verify whether this documentation update is needed:
+
+**Document:** {doc_path}
+**Proposed Change:** {change_summary}
+**Reason:** {reason}
+
+Instructions:
+1. Read the current document at {doc_path}
+2. Read the relevant implementation code referenced
+3. Compare doc against actual implementation
+4. Determine if update is genuinely needed
+
+Output JSON:
+{
+  "should_update": true|false,
+  "confidence": "high"|"medium"|"low",
+  "reason": "explanation",
+  "discrepancies": ["list", "of", "specific", "issues"]
+}
+
+If discrepancies exist, apply the fix directly.
+```
+
+### Confidence Thresholds
+
+- **High confidence**: Auto-apply update
+- **Medium confidence**: Auto-apply with log note
+- **Low confidence**: Skip update, log for manual review
+
+---
+
+## Output Parsing
+
+### Parse Doc Proposals from Retrospective Output
+
+Look for sections in retrospective output:
+
+```
+## Documentation Updates Needed
+
+### {doc_path}
+- **Change:** {summary}
+- **Reason:** {reason}
+- **Impact:** {impact}
+```
+
+Extract into structured format:
+```json
+{
+  "proposals": [
+    {
+      "path": "{doc_path}",
+      "summary": "{summary}",
+      "reason": "{reason}",
+      "impact": "{impact}"
+    }
+  ]
+}
+```
+
+### Retrospective Completion Markers
+
+Successful completion indicators:
+- "Retrospective Complete" in output
+- "epic-{N}-retro-*.md" file created
+- Sprint status updated with retrospective done
+
+Failure indicators:
+- Session timeout
+- Error messages in output
+- No retro file created after 30+ minutes
+
+---
+
--- a/.agents/skills/bmad-story-automator/data/retrospective-prompts.md
+++ b/.agents/skills/bmad-story-automator/data/retrospective-prompts.md
@@ -0,0 +1,86 @@
+# Retrospective Prompts
+
+Prompts used by step-05-retrospective for automated retrospective execution.
+
+---
+
+## YOLO Mode Retrospective Prompt
+
+Use this prompt when spawning the retrospective session:
+
+```
+Execute the BMAD retrospective workflow for epic {epic_number}.
+
+READ this skill first: <installed-skill-root>/bmad-retrospective/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-retrospective/workflow.md
+
+Run the retrospective in #YOLO mode.
+Assume the user will NOT provide any input to the retrospective directly.
+For ALL prompts that expect user input, make reasonable autonomous decisions based on:
+- Sprint status data
+- Story files and their dev notes
+- Previous retrospective if available
+- Architecture and PRD documents
+
+Key behaviors:
+- When asked to confirm epic number: auto-confirm based on sprint-status
+- When asked for observations: synthesize from story analysis
+- When asked for decisions: make data-driven choices
+- When presented menus: select the most appropriate option based on context
+- Skip all "WAIT for user" instructions - continue autonomously
+
+After the retrospective has run and created documents, you MUST:
+1. Create a list of documentation that may need updates based on implementation learnings
+2. For each doc in the list, verify whether updates are actually needed by:
+   - Reading the current doc content
+   - Comparing against actual implementation code
+   - Checking for discrepancies between doc and code
+3. Update docs that have verified discrepancies
+4. Discard proposed updates where code matches docs
+
+Focus on these doc types:
+- Architecture decisions that changed during implementation
+- API documentation that diverged from specs
+- README files with outdated instructions
+- Configuration documentation
+
+EVERYTHING SHOULD BE AUTOMATED. THIS IS NOT A SESSION WHERE YOU SHOULD BE EXPECTING USER INPUT.
+```
+
+---
+
+## Doc Verification Prompt
+
+Use this prompt when spawning doc verification subagents:
+
+```
+Verify whether this documentation update is needed:
+
+**Document:** ${proposed_doc.path}
+**Proposed Change:** ${proposed_doc.summary}
+**Reason:** ${proposed_doc.reason}
+
+Instructions:
+1. Read the current document at ${proposed_doc.path}
+2. Read the relevant implementation code referenced
+3. Compare doc against actual implementation
+4. Determine if update is genuinely needed
+
+Output JSON:
+{
+  "should_update": true|false,
+  "confidence": "high"|"medium"|"low",
+  "reason": "explanation",
+  "discrepancies": ["list", "of", "specific", "issues"] // only if should_update
+}
+
+If discrepancies exist, apply the fix directly. Output should_update=true only if you made changes.
+```
+
+---
+
+## Usage Notes
+
+- **YOLO Prompt:** Replace `{epic_number}` with actual epic number
+- **Doc Verification Prompt:** Replace `${proposed_doc.*}` variables with actual values
+- Both prompts are designed for fully automated execution (no user input expected)
--- a/.agents/skills/bmad-story-automator/data/retry-fallback-implementation.md
+++ b/.agents/skills/bmad-story-automator/data/retry-fallback-implementation.md
@@ -0,0 +1,100 @@
+# Retry & Fallback Implementation Examples
+
+**Purpose:** Detailed implementation wrapper and step-specific validation patterns.
+
+---
+
+## Implementation Pattern
+
+```bash
+# Universal retry wrapper with deterministic agent resolution
+task_type="{step}"  # create, dev, auto, or review
+resolve_agent_for_task "$task_type" "$state_file" "{story_id}"
+# Now primary_agent and fallback_agent are set for this story/task
+
+max_attempts=5
+attempt=0
+success=false
+
+while [ $attempt -lt $max_attempts ] && [ "$success" = "false" ]; do
+    attempt=$((attempt + 1))
+
+    # Alternate agent: odd attempts = primary, even = fallback (if available)
+    if [ $((attempt % 2)) -eq 1 ] || [ -z "$fallback_agent" ]; then
+        current_agent="$primary_agent"
+    else
+        current_agent="$fallback_agent"
+    fi
+
+    # Delay logic (after first attempt)
+    if [ $attempt -gt 1 ]; then
+        if [ $attempt -ge 4 ] || [ "$last_was_network_error" = "true" ]; then
+            echo "Waiting 60s before retry (attempt $attempt)..."
+            sleep 60
+        fi
+    fi
+
+    # Execute workflow step
+    session=$("$scripts" tmux-wrapper spawn {step} {epic} {story_id} \
+        --agent "$current_agent" \
+        --command "$("$scripts" tmux-wrapper build-cmd {step} {story_id} --agent "$current_agent" --state-file "$state_file")")
+    result=$("$scripts" monitor-session "$session" --json --agent "$current_agent")
+
+    # Cleanup session
+    "$scripts" tmux-wrapper kill "$session"
+
+    # Check for network errors
+    last_was_network_error="false"
+    if echo "$result" | grep -qiE "(connection refused|timeout|rate limit|503|502|never_active)"; then
+        last_was_network_error="true"
+    fi
+    if [ "$(echo "$result" | jq -r '.final_state')" = "crashed" ]; then
+        output_size=$(wc -c < "$(echo "$result" | jq -r '.output_file')" 2>/dev/null || echo "0")
+        [ "$output_size" -lt 100 ] && last_was_network_error="true"
+    fi
+
+    # Check success (step-specific validation)
+    # ... validation logic here ...
+
+    if [ "$validation_passed" = "true" ]; then
+        success=true
+    else
+        echo "Attempt $attempt failed (agent: $current_agent). $([ $attempt -lt $max_attempts ] && echo "Retrying..." || echo "Escalating.")"
+    fi
+done
+
+if [ "$success" = "false" ]; then
+    # All attempts exhausted - NOW escalate
+    escalate_to_user "Step failed after $max_attempts attempts"
+fi
+```
+
+---
+
+## Step-Specific Validation
+
+### Create Story
+```bash
+validation=$("$scripts" orchestrator-helper verify-step create {story_id} --state-file "$state_file")
+validation_passed=$(echo "$validation" | jq -r '.verified')
+```
+
+### Dev Story
+```bash
+parsed=$("$scripts" orchestrator-helper parse-output "$output_file" dev)
+next_action=$(echo "$parsed" | jq -r '.next_action')
+validation_passed=$([ "$next_action" = "proceed" ] && echo "true" || echo "false")
+```
+
+### Automate
+```bash
+parsed=$("$scripts" orchestrator-helper parse-output "$output_file" auto)
+# Non-blocking: log warning but continue
+validation_passed="true"  # Always proceed (automate is non-blocking)
+```
+
+### Code Review
+```bash
+# See code-review-loop.md for specific review cycle handling
+# Reviews have their own internal retry loop
+```
--- a/.agents/skills/bmad-story-automator/data/retry-fallback-strategy.md
+++ b/.agents/skills/bmad-story-automator/data/retry-fallback-strategy.md
@@ -0,0 +1,131 @@
+# Retry & Fallback Strategy
+
+**Purpose:** Universal retry and fallback agent pattern for all workflow steps (create, dev, auto, review).
+
+**Version:** 2.0.0
+
+---
+
+## Core Principle
+
+**NEVER escalate to user on first failure.** Exhaust all retry options first:
+1. Try fallback agent (if configured for this task)
+2. Retry with alternating agents up to 5 total attempts
+3. Sleep between retries if network issues detected
+4. Only escalate after all attempts exhausted
+
+---
+
+## Agent Configuration (v3.0.0)
+
+**Deterministic agent resolution via agents file:**
+
+```bash
+# Resolve agent for a specific task (create, dev, auto, review)
+# Uses agents file generated during preflight (complexity-aware)
+resolve_agent_for_task() {
+    local task="$1"
+    local state_file="$2"
+    local story_id="$3"
+
+    result=$("$scripts" orchestrator-helper agents-resolve \
+        --state-file "$state_file" \
+        --story "$story_id" \
+        --task "$task")
+
+    primary_agent=$(echo "$result" | jq -r '.primary')
+    fallback_agent=$(echo "$result" | jq -r '.fallback')
+
+    # Handle "false"/null meaning disabled
+    [ "$fallback_agent" = "false" ] && fallback_agent=""
+}
+
+# Usage:
+resolve_agent_for_task "review" "$state_file" "{story_id}"
+echo "Review task: primary=$primary_agent, fallback=$fallback_agent"
+```
+
+**Fallback behavior:**
+- If `fallback_agent` is empty, "false", or same as primary → retry with primary only
+- If `fallback_agent` differs → alternate between agents on retries
+- Complexity overrides win per task, then per-task overrides, then defaults
+
+---
+
+## Retry Sequence (5 Attempts Max)
+
+| Attempt | Agent | Delay Before | Notes |
+|---------|-------|--------------|-------|
+| 1 | primary | none | Initial attempt |
+| 2 | fallback | 0-60s | Switch agent; delay if network error |
+| 3 | primary | 0-60s | Back to primary |
+| 4 | fallback | 60s | Always delay by attempt 4 |
+| 5 | primary | 60s | Final attempt |
+
+**If no fallback configured:** All 5 attempts use primary agent.
+
+---
+
+## Network Error Detection
+
+**Indicators of network/transient issues:**
+- Session output contains: "connection refused", "timeout", "rate limit", "503", "502"
+- Session crashed with zero output
+- `story-automator monitor-session` returns `final_state: "crashed"` with empty output
+- Session stuck at "never_active" state (no response from API)
+
+**On network error detection:**
+- Sleep 60 seconds before next attempt
+- Log: "Network issue detected, waiting 60s before retry..."
+
+---
+
+## Implementation & Validation Examples
+
+Detailed bash patterns and step-specific validation examples are moved to:
+
+- **`retry-fallback-implementation.md`** (implementation wrapper + per-step validation)
+
+---
+
+## Escalation (After All Attempts)
+
+Only after exhausting all 5 attempts:
+
+1. Update state: `status = "AWAITING_DECISION"`
+2. Log all attempt details:
+   ```
+   [timestamp] ESCALATION: {step} failed after 5 attempts
+   - Attempt 1 (primary): {result}
+   - Attempt 2 (fallback): {result}
+   - Attempt 3 (primary): {result}
+   - Attempt 4 (fallback): {result}
+   - Attempt 5 (primary): {result}
+   ```
+3. Present options to user:
+   - Retry with different settings
+   - Skip this story
+   - Abort orchestration
+
+---
+
+## Integration with Adaptive Retry
+
+This strategy **replaces** the simple retry logic. The adaptive-retry.md plateau detection still applies within this framework:
+
+- If same task plateau detected across 3+ attempts → DEFER instead of escalate
+- Plateau detection runs AFTER agent switching (so both agents hit same wall)
+
+---
+
+## Logging
+
+All retry attempts should be logged in the action log:
+```
+[timestamp] {step} attempt {N}/{max} with {agent}: {result}
+```
+
+On success after retry:
+```
+[timestamp] {step} succeeded on attempt {N} with {agent} (after {N-1} failures)
+```
--- a/.agents/skills/bmad-story-automator/data/scripts-reference.md
+++ b/.agents/skills/bmad-story-automator/data/scripts-reference.md
@@ -0,0 +1,102 @@
+# Command Reference
+
+All operations use the installed helper at `scripts/story-automator` (usually via the `$scripts` variable). **DO NOT construct tmux commands manually.**
+
+## Core Commands
+
+| Script | Purpose |
+|--------|---------|
+| `$scripts tmux-wrapper` | Session spawning, naming, lifecycle |
+| `$scripts monitor-session` | Batched polling (14+ API calls → 1) |
+| `$scripts tmux-status-check` | Context-efficient status checking (v2.4.0) |
+| `$scripts codex-status-check` | Codex-specific status with heartbeat (v2.4.0) |
+| `$scripts heartbeat-check` | CPU-based process heartbeat detection |
+| `$scripts orchestrator-helper` | Sprint-status, parsing, markers |
+| `$scripts orchestrator-helper verify-step` | Shared success verifier checks per step |
+| `$scripts orchestrator-helper agents-build` | Deterministic agents file generation |
+| `$scripts orchestrator-helper agents-resolve` | Agent lookup per story/task via state file or direct agents file |
+| `$scripts validate-story-creation` | Legacy story file count validation |
+| `$scripts commit-story` | Deterministic git commit with JSON output |
+
+## Usage Pattern
+
+> **⚠️ CRITICAL: `--command` IS REQUIRED**
+> You MUST pass `--command` with the built command string to `spawn`.
+> Without `--command`, the tmux session will be created but NO command runs → `never_active` failure.
+
+```bash
+scripts="{scriptsDir}"
+
+# ⚠️ --command is REQUIRED - without it, session sits idle!
+# Spawn session
+session=$("$scripts" tmux-wrapper spawn {type} {epic} {story_id} \
+  --agent "$agent" \
+  --command "$("$scripts" tmux-wrapper build-cmd {type} {story_id} --agent "$agent")")
+
+# Monitor session
+result=$("$scripts" monitor-session "$session" --json --agent "$agent")
+
+# Parse output
+parsed=$("$scripts" orchestrator-helper parse-output "$(printf '%s' "$result" | jq -r '.output_file')" {type})
+
+# Cleanup
+"$scripts" tmux-wrapper kill "$session"
+```
+
+## Deterministic Agent Selection
+
+Agent selection is driven by the agents file created during preflight:
+`_bmad-output/story-automator/agents/agents-{state_filename}.md`
+
+To resolve agents for a specific story/task:
+```bash
+selection=$("$scripts" orchestrator-helper agents-resolve --state-file "$state_file" --story "{story_id}" --task "{task}")
+primary=$(echo "$selection" | jq -r '.primary')
+fallback=$(echo "$selection" | jq -r '.fallback')
+```
+
+Direct agents-file resolution is also supported when you already know the generated agents plan path:
+```bash
+selection=$("$scripts" orchestrator-helper agents-resolve --agents-file "$agents_file" --story "{story_id}" --task "{task}")
+primary=$(echo "$selection" | jq -r '.primary')
+fallback=$(echo "$selection" | jq -r '.fallback')
+```
+
+## Step Types
+
+| Type | Description | Agent Support |
+|------|-------------|---------------|
+| `create` | Create story from epic | Claude, Codex |
+| `dev` | Implement story tasks | Claude, Codex |
+| `auto` | Test automation | Claude, Codex |
+| `review` | Code review with auto-fix | Claude, Codex |
+| `retro` | Retrospective (YOLO mode) | Claude, Codex |
+
+## Retrospective Commands (v1.5.0)
+
+**CRITICAL:** Retrospectives use a special step type that:
+- Resolves the retro agent from `agentConfig`
+- Returns full YOLO mode prompt with doc verification instructions
+- Uses epic_number instead of story_id
+
+```bash
+# For retro, "story_id" parameter is actually the epic_number
+retro_agent=$("$scripts" orchestrator-helper retro-agent --state-file "{state_file}" | jq -r '.primary')
+cmd=$("$scripts" tmux-wrapper build-cmd retro {epic_number} --agent "$retro_agent")
+session=$("$scripts" tmux-wrapper spawn retro "" {epic_number} --agent "$retro_agent" --command "$cmd")
+
+# Monitor (retrospectives never block, failures just logged)
+result=$("$scripts" monitor-session "$session" --json --agent "$retro_agent")
+"$scripts" tmux-wrapper kill "$session"
+```
+
+The `build-cmd retro` command automatically includes:
+- The bmad-retrospective skill invocation prompt
+- Full YOLO mode instructions (no user input expected)
+- Key autonomous behaviors for menus/prompts
+- Doc verification instructions with subagent patterns
+- Instructions to update docs that have verified discrepancies
+
+## Binary Location
+
+The installed helper lives at `../scripts/story-automator` relative to step files.
--- a/.agents/skills/bmad-story-automator/data/stop-hook-config.md
+++ b/.agents/skills/bmad-story-automator/data/stop-hook-config.md
@@ -0,0 +1,187 @@
+# Stop Hook Configuration
+
+This document defines the Stop hook required for the story-automator to prevent premature stopping during orchestration in Claude or Codex.
+
+**Related:** See `stop-hook-troubleshooting.md` for child session handling, manual override, and troubleshooting.
+
+---
+
+## Overview
+
+The Stop hook uses a **marker file approach**:
+1. When story-automator starts → Creates marker file with orchestration context
+2. When the active agent tries to stop → Hook script checks marker file
+3. If no marker or completed → Allow stop (normal agent usage)
+4. If marker exists with pending stories → Block stop with continuation guidance
+5. When story-automator completes → Removes marker file
+
+**Important (v2 fix):** The hook intentionally does NOT check the `stop_hook_active` flag. This flag stays `true` for the entire session after one blocked stop, which caused premature exits in long orchestrations. The marker file alone is the source of truth.
+
+---
+
+## Multi-Project Support (v2.0)
+
+**CRITICAL:** The marker file is now PROJECT-SCOPED to support running story-automator on multiple projects simultaneously.
+
+**Old location (DEPRECATED):** `/tmp/.story-automator-active`
+**New location:** runtime-specific project marker resolved by `orchestrator-helper marker path`
+
+### Why Project-Scoped?
+
+When running story-automator on multiple projects at the same time:
+- Old: All projects shared `/tmp/.story-automator-active` → Cross-project interference
+- New: Each project has its own marker in the active runtime layout. The marker follows the active installed skill root parent, for example `.claude/`, `.agents/`, or `.codex/`.
+
+### How It Works
+
+1. The installed hook command exports `PROJECT_ROOT` for the target project before invoking `story-automator stop-hook`
+2. The stop hook resolves the marker from `PROJECT_ROOT`, not from the caller's ambient working directory
+3. Project A's stop hook only sees Project A's marker
+4. Project B's stop hook only sees Project B's marker
+
+Do not hard-code the marker path. Use `orchestrator-helper marker path`; this keeps Claude, `.agents`-based Codex, and `.codex`-based Codex installs consistent with the active skill root.
+
+### State Files Also Scoped
+
+The status check script state files are also project-scoped:
+- **Old:** `/tmp/.tmux-session-{SESSION}-state.json`
+- **New:** `/tmp/.sa-{project_hash}-session-{SESSION}-state.json`
+
+Where `project_hash` = first 8 chars of MD5 hash of project root path.
+
+---
+
+## Hook Configuration
+
+### Runtime Selection
+
+The helper selects hook configuration syntax from the active provider:
+- `BMAD_RUNTIME_PROVIDER`
+- `STORY_AUTOMATOR_RUNTIME_PROVIDER`
+
+Set one of these to `claude` or `codex` to force the provider. If none is set, the helper infers the provider from the installed skill root.
+
+`AI_AGENT` only selects child-agent runtime for spawned work. It does not decide which top-level hook files are written.
+
+The provider decides which hook files are written. Marker location is resolved separately and follows the active installed story-automator skill root when possible. For example, a Codex run using a migrated `.claude/skills/bmad-story-automator` install still uses the `.claude/.story-automator-active` marker so the hook and orchestrator read the same file.
+
+For Claude, add this to the target project's `.claude/settings.json`:
+
+```json
+{
+  "hooks": {
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "/absolute/path/to/scripts/story-automator stop-hook",
+            "timeout": 10
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+For Codex, enable hooks in the target project's `.codex/config.toml`:
+
+```toml
+[features]
+codex_hooks = true
+```
+
+Then add this to `.codex/hooks.json`:
+
+```json
+{
+  "hooks": {
+    "Stop": [
+      {
+        "hooks": [
+          {
+            "type": "command",
+            "command": "/absolute/path/to/scripts/story-automator stop-hook",
+            "timeout": 10,
+            "statusMessage": "Checking story automator state"
+          }
+        ]
+      }
+    ]
+  }
+}
+```
+
+Codex trust is separate from hook configuration. A project can have the Story Automator hook written to disk and still require trust approval before Codex will run it. `ensure-stop-hook` now reports that state as pending trust instead of verified.
+
+### Binary Path is Always Absolute
+
+**The stop hook binary resolves itself to an absolute path.** Regardless of how the caller passes the `--command` argument (relative, project-relative, or absolute), the helper stores a consistent absolute path in `.claude/settings.json` or `.codex/hooks.json`.
+
+This prevents the inconsistency where the AI agent resolves frontmatter paths differently across sessions, which previously caused repeated hook installations and unnecessary restart loops.
+
+**Migration:** If an existing hook config contains a relative or project-relative path, `ensure-stop-hook` will normalize it to absolute in-place without triggering a restart (`reason: "hook_normalized"`).
+
+**When hook fails with "no such file or directory":**
+- Verify BMAD is installed in the target project
+- Check the binary exists in the active runtime skills tree, for example: `test -x <installed-skill-root>/bmad-story-automator/scripts/story-automator`
+- Ensure binary is executable: `chmod +x <installed-skill-root>/bmad-story-automator/scripts/story-automator`
+
+---
+
+## Marker File Format
+
+**Location (v2.0):** resolved by `orchestrator-helper marker path`
+
+*Note: The orchestrator adds the active marker entry returned by `orchestrator-helper marker path` to `.gitignore`. Common entries are `.claude/.story-automator-active`, `.agents/.story-automator-active`, and `.codex/.story-automator-active`.*
+
+Content (JSON - v1.2.0 with heartbeat):
+```json
+{
+  "epic": "epic-01",
+  "currentStory": "story-01",
+  "storiesRemaining": 3,
+  "stateFile": "/path/to/orchestration-epic01.md",
+  "startedAt": "2026-01-13T10:00:00Z",
+  "heartbeat": "2026-01-13T10:30:00Z",
+  "pid": 12345
+}
+```
+
+### Fields (v1.2.0):
+- `heartbeat`: Last activity timestamp, updated periodically during execution
+- `pid`: Process ID of the orchestrator (helps detect crashed sessions)
+
+### Staleness Check
+
+The stop hook checks if marker heartbeat is older than 30 minutes (stale = orchestrator crashed). If stale, allow stop. See `story-automator stop-hook` for implementation.
+
+---
+
+## Verification Logic
+
+The orchestrator verifies hook installation at startup:
+
+```
+1. Resolve active runtime provider
+2. For Claude, check `.claude/settings.json`; for Codex, check `.codex/hooks.json` and `.codex/config.toml`
+3. Parse hook JSON and look for hooks.Stop array
+4. Check if any hook command contains "story-automator stop-hook"
+
+IF found → Continue
+IF not found → Add hook, instruct restart
+```
+
+---
+
+## Hook Behavior
+
+| Scenario | Action |
+|----------|--------|
+| `STORY_AUTOMATOR_CHILD=true` | `exit 0` → Always allow (child session) |
+| No marker file | `exit 0` → Allow stop |
+| Marker exists, `storiesRemaining=0` | `exit 0` → Allow stop |
+| Marker exists, `storiesRemaining > 0` | Output JSON → Block stop with reason |
+
+**Key fix (Session 10):** The hook no longer checks `stop_hook_active`. This flag was causing premature exits in long orchestrations because it stays `true` for the entire session after the first blocked stop.
--- a/.agents/skills/bmad-story-automator/data/stop-hook-recovery.md
+++ b/.agents/skills/bmad-story-automator/data/stop-hook-recovery.md
@@ -0,0 +1,87 @@
+# Stop Hook Recovery
+
+**You are reading this because the Stop hook blocked your attempt to stop.**
+
+---
+
+## Decision Matrix
+
+**Find your scenario and follow the action exactly:**
+
+| Scenario | Action | Why |
+|----------|--------|-----|
+| Just finished a story, more in queue | **CONTINUE** → Process next story | Stopping between stories wastes progress |
+| Session completed successfully | **CONTINUE** → Check state, process output | Session done ≠ orchestration done |
+| Session failed/errored | **CONTINUE** → Follow retry/fallback strategy | Retries not exhausted yet |
+| Feeling uncertain what to do | **CONTINUE** → Check state document, follow step instructions | Uncertainty is not a blocker |
+| Want to "check in" with user | **CONTINUE** → User will interrupt if needed | User doesn't want status updates |
+| Waiting at a menu for user input | **DECIDE** → Use sub-agent to gather context, make informed decision | Don't block on user, keep moving |
+| Multiple valid paths forward | **DECIDE** → Pick the path most likely to finish stories in order | Your job is to finish, not ask |
+| Unrecoverable error (all retries exhausted) | **STOP** → Follow stop procedure below | Cannot proceed without intervention |
+| External dependency down (API, service) | **RETRY** → Sleep with increasing delay (1m, 2m, 4m, 8m, 16m), max 5 attempts | Often recovers on its own |
+| User explicitly requested stop earlier | **STOP** → Follow stop procedure below | Honoring user intent |
+
+---
+
+## Quick Reference
+
+### CONTINUE (Most Scenarios)
+
+1. Read your state document
+2. Check `currentStep`, `storyQueue`, `currentStoryId`
+3. Resume execution from where you left off
+4. Process the next story in queue
+
+### DECIDE (Autonomous Decision Required)
+
+1. Spawn sub-agent to analyze current context
+2. Gather: state document, recent session output, story requirements
+3. Make the decision most likely to complete stories in order
+4. Document decision in state log, then execute
+5. **Never wait for user** - your job is to keep moving
+
+### RETRY (External Dependency Issues)
+
+1. Log the failure with timestamp
+2. Sleep with exponential backoff: 1m → 2m → 4m → 8m → 16m
+3. Retry the operation after each sleep
+4. After 5 failed attempts → treat as unrecoverable → STOP
+
+### STOP (Genuinely Unrecoverable)
+
+1. Update state document status to `STOPPED`
+2. Delete marker using the installed helper: `orchestrator-helper marker remove`
+3. Log WHY you stopped (be specific)
+4. Exit cleanly
+
+---
+
+## Stop Hook Messages ≠ User Input
+
+The stop hook fires when Claude pauses, not just when explicitly stopping.
+
+**If you see this while at a decision point:**
+- This message is NOT telling you what to choose
+- Use a sub-agent to gather context and DECIDE autonomously
+- Do NOT wait for user - make the call yourself
+
+---
+
+## Core Principle
+
+**Your job is to finish every story in the correct order.**
+
+Do whatever it takes. Make autonomous decisions. Only stop when genuinely unrecoverable (all retries exhausted, user explicitly requested stop).
+
+---
+
+## Common Mistakes to Avoid
+
+| Mistake | Correct Behavior |
+|---------|------------------|
+| Stopping to report progress | Continue silently, user sees state doc |
+| Stopping after one story completes | Continue to next story |
+| Stopping because session errored | Follow retry strategy first |
+| Waiting for user at decision points | Decide autonomously, keep moving |
+| Stopping on first API/service failure | Retry with exponential backoff (5 attempts) |
+| Asking user which path to take | Pick the path that finishes stories in order |
--- a/.agents/skills/bmad-story-automator/data/stop-hook-troubleshooting.md
+++ b/.agents/skills/bmad-story-automator/data/stop-hook-troubleshooting.md
@@ -0,0 +1,107 @@
+# Stop Hook Troubleshooting
+
+**Related:** See `stop-hook-config.md` for core configuration.
+
+---
+
+## Child Session Handling (Session 19 Fix)
+
+**CRITICAL:** The stop hook is installed at the PROJECT level. When the orchestrator spawns T-Mux sessions (create-story, dev-story, code-review), those child agent instances:
+1. Run in the same project directory
+2. Read the same project-level hook configuration
+3. Have the same stop hook configured
+4. See the same marker file
+
+**Problem:** Without distinction, the stop hook blocks child sessions from completing, creating infinite loops.
+
+**Solution:** All T-Mux child sessions MUST be spawned with:
+
+```bash
+tmux new-session -d -s "SESSION_NAME" -e STORY_AUTOMATOR_CHILD=true
+```
+
+The `-e STORY_AUTOMATOR_CHILD=true` flag exports the environment variable to the session. The stop hook checks this FIRST and immediately allows stop if set.
+
+**Who gets blocked vs allowed:**
+
+| Session Type | STORY_AUTOMATOR_CHILD | Stop Hook Behavior |
+|--------------|----------------------|-------------------|
+| Orchestrator | not set | BLOCKED (if marker + stories remaining) |
+| create-story | `true` | ALLOWED (always) |
+| dev-story | `true` | ALLOWED (always) |
+| code-review | `true` | ALLOWED (always) |
+| testarch-automate | `true` | ALLOWED (always) |
+| Internal scripts (e.g., haiku calls) | `true` | ALLOWED (always) |
+
+---
+
+## Internal Nested Agent Calls (Session 20 Fix)
+
+### Claude
+
+**CRITICAL:** Scripts that internally call `claude` (like `story-automator tmux-status-check` using Haiku for wait estimation) while an orchestration marker is active MUST prefix the call with the environment variable.
+
+```bash
+# WRONG - will hang when stop hook blocks the claude exit
+RESULT=$(claude -p --model haiku "..." 2>/dev/null)
+
+# CORRECT - allows claude to exit normally
+RESULT=$(STORY_AUTOMATOR_CHILD=true claude -p --model haiku "..." 2>/dev/null)
+```
+
+**Why:** Even non-interactive `claude -p` calls trigger the stop hook when they exit. Without the env var, the hook sees the marker file and blocks, causing the script to hang indefinitely.
+
+### Codex
+
+For Codex, apply the same `STORY_AUTOMATOR_CHILD=true` convention to any future internal non-interactive Codex calls that run inside an active story-automator project.
+
+---
+
+## Stop Hook Messages Are NOT User Input
+
+**When you present a menu and wait for user input, the stop hook may fire with messages like:**
+> "Story Automator is running with N stories remaining. Continue processing..."
+
+**THIS IS NOT USER INPUT.** Do not interpret stop hook feedback as a menu selection.
+
+- NEVER treat "continue processing" as selecting [R]esume
+- NEVER proceed past a menu because the stop hook fired
+- ALWAYS wait for ACTUAL user input (typed response)
+- Stop hook messages are about STOPPING behavior only
+
+**Why this happens:** The stop hook fires when the agent pauses, not just when explicitly stopping. During menu waits, it may fire repeatedly. Ignore these messages when waiting for user input.
+
+---
+
+## Manual Override
+
+If the orchestrator gets stuck, users can:
+1. Remove the marker file from the project root using the installed story-automator helper: `orchestrator-helper marker remove`
+2. Stop the active agent normally
+3. Resume later with the continue flow
+
+**For multi-project cleanup:**
+```bash
+# Remove marker for current project only
+helper="<installed-skill-root>/bmad-story-automator/scripts/story-automator"
+[ -x "$helper" ] || { echo "story-automator helper not found: $helper" >&2; exit 1; }
+"$helper" orchestrator-helper marker remove
+
+# Clean up project-scoped state files (optional)
+PROJECT_HASH=$(echo -n "$PWD" | md5sum | cut -c1-8)
+rm -f /tmp/.sa-${PROJECT_HASH}-session-*
+rm -f /tmp/sa-${PROJECT_HASH}-output-*
+```
+
+---
+
+## Troubleshooting
+
+| Issue | Check |
+|-------|-------|
+| Hook not running | Valid hook config? For Codex, is `[features].codex_hooks = true` set and is the project trusted? Script executable? Session restarted? |
+| "no such file" | BMAD installed? Path correct in the active runtime skills tree? Check each installed root, for example `.claude/skills`, `.agents/skills`, or `.codex/skills`. |
+| Premature stops | Marker exists? `storiesRemaining > 0`? v2 fix applied? |
+| Child sessions blocked | `STORY_AUTOMATOR_CHILD=true` set? Check spawn command. |
+| Script hangs | Internal agent calls missing env var? See Session 20 Fix. |
+| Hook fires during menus | Normal behavior - ignore messages, wait for real input. |
--- a/.agents/skills/bmad-story-automator/data/subagent-prompts-analysis.md
+++ b/.agents/skills/bmad-story-automator/data/subagent-prompts-analysis.md
@@ -0,0 +1,87 @@
+# Sub-Agent Analysis Prompts
+
+**Purpose:** Analysis-focused prompt templates for sub-agents spawned during story-automator execution.
+
+**Related:** See `subagent-prompts.md` for core execution prompts (parser, reader, updater).
+
+---
+
+## Code Review Analyzer
+
+**Use:** Analyze code review output to determine review status and next steps.
+
+**Prompt:**
+```
+You are a code review analyzer. Analyze the code review session output.
+
+Story: {story_name}
+Review cycle: {cycle_number} of 3
+Review output:
+---
+{review_output}
+---
+
+Determine the review outcome by looking for:
+1. "Story Status: done" or "Story Status: in-progress"
+2. "Issues Fixed: N" count
+3. "Issues Found: N High, N Medium, N Low"
+
+Return:
+{
+  "storyStatus": "done|in-progress|unknown",
+  "issuesFixed": N,
+  "highIssues": N,
+  "mediumIssues": N,
+  "lowIssues": N,
+  "recommendation": "proceed|retry|escalate",
+  "summary": "brief description of outcome"
+}
+```
+
+**Decision logic:**
+- storyStatus == "done" → proceed (exit review loop)
+- storyStatus == "in-progress" → retry (new review cycle needed)
+- storyStatus == "unknown" → check sprint-status.yaml directly
+
+**CRITICAL:** The orchestrator MUST verify sprint-status.yaml after review completes. The sub-agent analysis is advisory; sprint-status.yaml is the source of truth.
+
+---
+
+## Dependency Analyzer
+
+**Use:** Analyze stories for parallel execution safety.
+
+**Prompt:**
+```
+You are a dependency analyzer. Determine if these stories can safely run in parallel.
+
+Stories to analyze:
+{stories_list}
+
+For each pair of stories, check for:
+- File conflicts (modifying same files)
+- Logical dependencies (one builds on another)
+- Resource conflicts (same database tables, API endpoints)
+- Test conflicts (interfering test data)
+
+Return:
+{
+  "parallelSafe": true|false,
+  "conflicts": [
+    {
+      "story1": "...",
+      "story2": "...",
+      "conflictType": "file|logical|resource|test",
+      "description": "..."
+    }
+  ],
+  "recommendation": "parallel|sequential|partial",
+  "suggestedOrder": ["story order if sequential needed"]
+}
+```
+
+**Parallel safety indicators:**
+- Different feature areas → likely safe
+- Same component/module → check files
+- Database migrations → sequential only
+- Shared test fixtures → check for conflicts
--- a/.agents/skills/bmad-story-automator/data/subagent-prompts.md
+++ b/.agents/skills/bmad-story-automator/data/subagent-prompts.md
@@ -0,0 +1,153 @@
+# Sub-Agent Prompt Templates
+
+**Purpose:** Core prompt templates for sub-agents spawned during story-automator execution.
+
+**Related:** See `subagent-prompts-analysis.md` for analysis prompts (code review, dependency).
+
+---
+
+## Session Output Parser
+
+**Use:** Parse T-Mux session output to determine success/failure status.
+
+**Prompt (v1.2.0 - strengthened):**
+```
+You are a session output parser. Your job is CRITICAL - incorrect parsing leads to workflow failures.
+
+## MANDATORY STEPS (do these IN ORDER):
+
+1. **READ THE ENTIRE FILE FIRST** - Use the Read tool to load the complete file
+2. **COUNT LINES** - Note total line count. If <50 lines, output may be truncated
+3. **SCAN FOR KEY MARKERS** - Look for these patterns:
+   - SUCCESS: "✅", "complete", "done", "Story file created", "Tests passed"
+   - FAILURE: "❌", "error", "failed", "Exception", "panic"
+   - TRUNCATED: File ends mid-sentence, no clear conclusion
+
+4. **ANALYZE TASK PROGRESS** - Look for todo markers:
+   - "☒" = completed task
+   - "☐" = pending task
+   - Extract: tasks_completed / tasks_total
+
+5. **DETERMINE STATUS:**
+   - SUCCESS: Clear completion markers AND file not truncated
+   - FAILURE: Error markers OR crash indicators
+   - AMBIGUOUS: Truncated output OR no clear markers (recommend escalate)
+
+Session: {session_id}
+Step: {step_name}
+Story: {story_name}
+
+Output file: {output_file_path}
+
+## RESPONSE FORMAT (strict JSON):
+{
+  "status": "SUCCESS|FAILURE|AMBIGUOUS",
+  "summary": "1-2 sentence description",
+  "tasks_completed": 0,
+  "tasks_total": 0,
+  "issues": ["list any errors found"],
+  "nextAction": "proceed|retry|escalate",
+  "confidence": "high|medium|low",
+  "line_count": 0,
+  "reasoning": "brief explanation of how you determined status"
+}
+
+## CRITICAL RULES:
+- If output appears truncated (ends abruptly), set status="AMBIGUOUS" and nextAction="escalate"
+- NEVER guess status - if unclear, use AMBIGUOUS
+- Include line_count to verify you read the whole file
+- For dev-story: tasks_completed < tasks_total with idle session = FAILURE (session crashed)
+```
+
+**Context for parser:**
+- For create-story: Look for "Story file created" or file path in output. Verify file exists.
+- For dev-story: Look for "Implementation complete", "Status: review/done", test pass indicators
+- For code-review: Look for issue counts by severity (CRITICAL, HIGH, MEDIUM, LOW)
+- For automate: Look for test file creation confirmation
+
+**Why strengthened (Session 3):** Sub-agent sometimes returned incomplete analysis because it didn't read the entire file or missed truncation indicators.
+
+---
+
+## Story Reader
+
+**Use:** Read a story file and produce a structured summary for pre-flight context.
+
+**Prompt:**
+```
+You are a story reader. Analyze the following story file and extract key information for orchestration.
+
+Story file: {story_file_path}
+
+Content:
+---
+{story_content}
+---
+
+Extract and return:
+{
+  "storyId": "...",
+  "title": "...",
+  "type": "feature|bugfix|refactor|test|docs",
+  "complexity": "simple|moderate|complex",
+  "dependencies": ["list of dependencies or blockers"],
+  "acceptanceCriteria": ["list of key acceptance criteria"],
+  "technicalNotes": "any technical implementation hints",
+  "estimatedSteps": ["create-story", "dev-story", "automate?", "code-review"],
+  "parallelSafe": true|false,
+  "parallelReason": "why parallel execution is safe or not"
+}
+```
+
+---
+
+## State Document Updater
+
+**Use:** Generate state document update entries.
+
+**Prompt:**
+```
+You are a state document updater. Generate the appropriate update for the orchestration state.
+
+Action type: {action_type}
+Story: {story_name}
+Step: {step_name}
+Result: {result}
+Details: {details}
+
+Generate:
+1. Action log entry (timestamped)
+2. Progress table update (if applicable)
+3. Session reference update (if applicable)
+
+Return:
+{
+  "actionLogEntry": "timestamp | story | step | action | result",
+  "progressUpdate": {
+    "story": "...",
+    "column": "...",
+    "value": "..."
+  },
+  "sessionRef": {
+    "sessionId": "...",
+    "status": "...",
+    "completedAt": "..."
+  }
+}
+```
+
+---
+
+## Usage Notes
+
+1. **Context Isolation:** Each sub-agent runs in its own context. Pass only necessary information.
+
+2. **Return Format:** Always expect JSON responses for easy parsing.
+
+3. **Error Handling:** If sub-agent response doesn't parse, escalate to user.
+
+4. **Timeout:** Sub-agent calls should complete within 60 seconds by default but should be adaptive based on task and context. If timeout, retry once then escalate.
+
+5. **Logging:** Log all sub-agent calls and responses to action log for debugging.
+
+6. **Analysis Prompts:** For code review and dependency analysis prompts, see `subagent-prompts-analysis.md`.
--- a/.agents/skills/bmad-story-automator/data/success-patterns.md
+++ b/.agents/skills/bmad-story-automator/data/success-patterns.md
@@ -0,0 +1,93 @@
+# Success Patterns
+
+**Purpose:** Patterns for detecting when each workflow step has completed successfully.
+
+---
+
+## create-story
+
+**Success indicators:**
+- Story file created at expected path
+- Story file contains required sections (title, acceptance criteria, etc.)
+- Session output contains "Story created" or similar confirmation
+
+**Failure indicators:**
+- Error messages in session output
+- Story file not found after session completes
+- Session exits with non-zero code
+
+---
+
+## dev-story
+
+**Success indicators:**
+- Code changes committed or staged
+- Tests pass (if applicable)
+- Session output contains "Implementation complete" or similar
+- No unresolved errors in session output
+
+**Failure indicators:**
+- Test failures
+- Unresolved compilation/lint errors
+- Session output contains error messages
+- Session times out or crashes
+
+---
+
+## automate (guardrail tests)
+
+**Success indicators:**
+- Test files created
+- Tests pass when run
+- Session output confirms test generation complete
+
+**Failure indicators:**
+- Test generation errors
+- Generated tests fail immediately
+- Session output contains errors
+
+---
+
+## code-review
+
+**Success indicators (clean):**
+- "No issues found" or "LGTM" in session output
+- Zero blocking issues reported
+- Only informational/optional suggestions remain
+
+**Success indicators (issues found):**
+- Clear list of issues with file:line references
+- Issues categorized by severity
+- Actionable fix suggestions provided
+
+**Failure indicators:**
+- Unable to complete review
+- Session crashes or times out
+- Ambiguous output that can't be parsed
+
+---
+
+## git-commit
+
+**Success indicators:**
+- Commit created successfully
+- Commit message follows convention
+- No uncommitted changes remain (for story scope)
+
+**Failure indicators:**
+- Git errors (merge conflicts, etc.)
+- Commit hook failures
+- Unable to stage changes
+
+---
+
+## retrospective
+
+**Success indicators:**
+- Retrospective session completes
+- Summary document generated
+- Learnings captured
+
+**Failure indicators:**
+- Session incomplete
+- Unable to generate summary
--- a/.agents/skills/bmad-story-automator/data/tmux-commands.md
+++ b/.agents/skills/bmad-story-automator/data/tmux-commands.md
@@ -0,0 +1,204 @@
+# T-Mux Commands Reference
+
+**Related:** See `workflow-commands.md` for BMAD workflow invocation commands.
+
+---
+
+## Session Names
+
+**Pattern (v3.0 - MULTI-PROJECT):** `sa-{project_slug}-{YYMMDD}-{HHMMSS}-e{epic}-s{story}-{step}`
+
+**Examples:**
+- `sa-myproj-260114-223045-e6-s64-dev` (Project "myproject", Epic 6, Story 6.4, dev step)
+- `sa-webapp-260114-223512-e6-s64-review-1` (Project "webapp", review cycle 1)
+
+### Project Slug for Multi-Project Support
+
+**Why project slug (v3.0):**
+- **Isolates sessions per project** - List only current project's sessions
+- **Prevents cross-project interference** - Won't kill another project's sessions
+- **Enables parallel orchestration** - Run story-automator on multiple projects simultaneously
+
+**Generate project slug:**
+```bash
+# First 8 chars of project directory name (lowercase, alphanumeric only)
+project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+```
+
+**Example:** Project at `/home/user/my-awesome-project` → `project_slug="myawesom"`
+
+**Why timestamps with seconds (v2.1):**
+- Prevents collisions when multiple sessions spawn in same minute
+- Easier debugging across multiple orchestration runs
+- Session names are unique even if re-running same story
+- Can identify stale sessions from crashed runs
+
+**Generate full session name:**
+```bash
+project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+timestamp=$(date +%y%m%d-%H%M%S)  # Returns "260114-223045"
+session_name="sa-${project_slug}-${timestamp}-e{epic}-s{story_suffix}-{step}"
+```
+
+### Listing/Killing Project-Specific Sessions
+
+**List only current project's sessions:**
+```bash
+project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+tmux list-sessions 2>/dev/null | grep "^sa-${project_slug}-"
+```
+
+**Kill only current project's sessions:**
+```bash
+project_slug=$(basename "$PWD" | tr '[:upper:]' '[:lower:]' | tr -cd '[:alnum:]' | cut -c1-8)
+tmux list-sessions -F '#{session_name}' 2>/dev/null | grep "^sa-${project_slug}-" | xargs -I {} tmux kill-session -t {}
+```
+
+### No Dots in Session Names
+
+**T-Mux session names CANNOT contain dots (`.`).** Story IDs like "6.2" must be converted to hyphens.
+
+```bash
+# Story ID to session name conversion
+# Story ID "6.2" → session suffix "s6-2" (NOT "s6.2")
+session_suffix=$(echo "{story_id}" | tr '.' '-')
+```
+
+**WRONG:** `sa-epic6-s6.2-review-1` ← Will fail with "can't find pane" error
+**RIGHT:** `sa-epic6-s6-2-review-1` ← Works correctly
+
+---
+
+## Status Check Script (PREFERRED)
+
+**ALWAYS use the status check script instead of raw pane capture.**
+
+Script: resolve the installed helper under the active installed skill root. Use `.claude/skills` for Claude, `.agents/skills` for Codex, or `.codex/skills` when that is the installed Codex skill root.
+
+```bash
+# ALWAYS use absolute path - relative paths break when directory changes
+script="$(printf "%s" "{project_root}/{installed-skill-root}/bmad-story-automator/scripts/story-automator")"
+"$script" tmux-status-check "SESSION_NAME"
+```
+
+**Returns CSV:** `status,todos_done,todos_total,active_task,wait_estimate,session_state`
+
+```
+active,3,7,Running tests,90,in_progress
+idle,0,0,,0,just_started
+idle,0,0,,0,completed
+not_found,0,0,,0,not_found
+error,0,0,capture_failed,30,error
+```
+
+**CSV Columns:**
+1. `status` - "active" | "idle" | "not_found" | "error" | "crashed"
+2. `todos_done` - completed todo count (Claude only; Codex returns 0)
+3. `todos_total` - total todo count (Claude only; Codex returns 0)
+4. `active_task` - current task (truncated, no commas) OR output file path (for --full/crashed)
+5. `wait_estimate` - seconds to wait before next check (heuristic-based). For crashed: exit code.
+6. `session_state` - **KEY COLUMN** for decision making:
+   - `just_started` - Session spawned, agent loading
+   - `in_progress` - Actively working
+   - `completed` - Was active, now finished cleanly
+   - `crashed` - Session exited with non-zero status (v2)
+   - `stuck` - Never became active after multiple polls
+   - `not_found` / `error` - Problem states
+
+**Agent Detection (v1.3.0):**
+The status check script automatically detects Claude vs Codex sessions:
+- **Claude:** Looks for `ctrl+c to interrupt`, `☒`/`☐` checkboxes
+- **Codex:** Looks for `OpenAI Codex`, `codex exec`, `codex-cli`, `gpt-*-codex`, `tokens used`
+- **Codex completion cues:** `tokens used` line, shell prompt return (e.g., `❯`, `$`, `#`), or clean tmux exit
+- Codex sessions get 1.5x longer wait estimates (90s vs 60s default); "succeeded" alone is not treated as active
+
+**Runtime Behavior (v1.13.0):**
+- Normal `tmux-wrapper spawn` now uses a runner-based tmux path with explicit session state, not `tmux send-keys`
+- Lifecycle truth comes from the session state file first; pane capture is still used for exported `output_file` artifacts
+- Sessions keep dead panes with `remain-on-exit on`, so `pane_dead` and `pane_dead_status` remain inspectable after completion
+- Temporary migration switch: `SA_TMUX_RUNTIME=legacy|runner|auto` (`auto` is the default)
+
+**For full output (when completed/stuck):**
+```bash
+script="$(printf "%s" "{project_root}/<installed-skill-root>/bmad-story-automator/scripts/story-automator")"
+"$script" tmux-status-check "SESSION_NAME" --full
+```
+Returns: `idle,0,0,/tmp/sa-output-SESSION_NAME.txt,0,completed`
+
+---
+
+## Polling Pattern (for step-03-execute)
+
+**Use `wait_estimate` from CSV - heuristic estimates optimal interval.**
+
+| status | Action |
+|--------|--------|
+| `active` | Log: "{todos_done}/{todos_total} - {active_task}". Sleep `wait_estimate` seconds, re-poll |
+| `idle` | Run `--full`, parse output per success-patterns.md |
+| `crashed` | Session crashed! Column 4 = output file, Column 5 = exit code. Apply adaptive retry strategy. |
+| `not_found` | Session ended unexpectedly, escalate |
+| `error` | Retry once, then escalate |
+
+**Crashed vs Completed (v2):**
+- `completed` = session was active, then exited cleanly (exit code 0)
+- `crashed` = session exited with non-zero exit code (context limit, API error, etc.)
+- Always check session_state to distinguish between success and failure!
+
+---
+
+## Core Commands
+
+### Create Session + Run Command
+
+**CRITICAL: All child sessions MUST set `STORY_AUTOMATOR_CHILD=true`**
+
+This environment variable tells the stop hook to allow the session to complete normally.
+Without it, the stop hook will block child sessions from stopping, causing infinite loops.
+
+```bash
+# Current implementation:
+# 1. create the session with an inert placeholder command
+# 2. set remain-on-exit on the pane/session
+# 3. respawn the pane into a bash runner that executes the per-session command file
+tmux new-session -d -s "SESSION_NAME" -x 200 -y 50 -c "PROJECT_PATH" \
+  -e STORY_AUTOMATOR_CHILD=true -e AI_AGENT=codex -e CLAUDECODE= -e BASH_ENV= \
+  /bin/sleep 86400
+tmux set-option -t "PANE_ID" remain-on-exit on
+tmux respawn-pane -k -t "PANE_ID" /usr/bin/bash "/tmp/.sa-<hash>-session-SESSION_NAME-runner.sh"
+```
+
+**Terminal Dimensions:** The `-x 200 -y 50` flags remain required. They preserve the wide pane geometry used for interactive agent sessions and pane-derived transcripts.
+
+**Command Files:** The runtime now always writes a per-session command file and a per-session runner file. This removes the old short-command vs long-command split and avoids quoting or line-wrap failures from `send-keys`. Explicit `tmux-wrapper kill` deletes these artifacts; stale terminal artifacts are garbage-collected after the retention TTL.
+
+See `data/tmux-long-command-debugging.md` for detailed troubleshooting.
+
+### Other Commands
+
+```bash
+tmux has-session -t "SESSION" 2>/dev/null  # Check exists
+tmux kill-session -t "SESSION"              # Kill session
+tmux list-sessions                          # List all
+tmux capture-pane -t "SESSION" -p -S -100   # Raw capture (use sparingly)
+```
+
+---
+
+## Variables
+
+**Agent Configuration (v1.3.0):**
+
+| Variable | Claude | Codex |
+|----------|--------|-------|
+| CLI | `claude --dangerously-skip-permissions` | `codex exec --full-auto` |
+| Prompt Style | Natural language skill prompt | Natural language skill prompt |
+| Timeout Multiplier | 1x (60min) | 1.5x (90min) |
+| Todo Tracking | ☒/☐ checkboxes | Not supported |
+
+**Environment Variables:**
+- `AI_AGENT` = `claude` or `codex` (used by story-automator tmux-wrapper and story-automator monitor-session)
+- `AI_COMMAND` = Full CLI (legacy, deprecated)
+
+`{projectPath}` = project root
+
+*See `workflow-commands.md` for BMAD workflow command patterns (including Codex natural language prompts).*
--- a/.agents/skills/bmad-story-automator/data/tmux-long-command-debugging.md
+++ b/.agents/skills/bmad-story-automator/data/tmux-long-command-debugging.md
@@ -0,0 +1,138 @@
+# Tmux Long Command Debugging Guide
+
+**Created:** 2026-01-21
+**Context:** Debugging retrospective session failures in story-automator
+**Root Cause:** Terminal width causes line-wrap corruption of long commands
+
+**Related:** See `tmux-long-command-testing.md` for detailed investigation steps and test scripts.
+
+---
+
+## Problem Summary
+
+Tmux sessions spawned via `tmux send-keys` were failing silently when commands exceeded ~1000 characters. Sessions would spawn successfully but the command would never execute, resulting in `stuck/never_active` status.
+
+**Symptoms:**
+- Session spawns successfully (tmux session exists)
+- Command appears in terminal output (visible in capture-pane)
+- No child processes running (Claude never starts)
+- No error messages visible
+- Monitor reports `stuck` or `never_active`
+
+---
+
+## Root Cause
+
+**Default tmux terminal dimensions:** 80 columns × 24 rows
+
+When `tmux send-keys` sends a command longer than the terminal width:
+1. The command wraps across multiple lines in the terminal buffer
+2. The shell receives the wrapped input as if it were multiple lines
+3. Shell parsing fails or behaves unexpectedly with multi-line wrapped input
+4. The command silently fails or produces syntax errors
+
+**Critical insight:** This is NOT a tmux bug or a shell bug individually - it's an interaction problem between how `tmux send-keys` delivers characters and how the shell's line editor handles wrapped input.
+
+---
+
+## Solution
+
+Add explicit dimensions when creating tmux sessions:
+
+```bash
+# Before (BROKEN for long commands):
+tmux new-session -d -s "$session_name" -c "$PROJECT_ROOT"
+
+# After (FIXED):
+tmux new-session -d -s "$session_name" -x 200 -y 50 -c "$PROJECT_ROOT"
+```
+
+**Why 200×50:**
+- 200 columns handles commands up to ~3000 chars without wrapping
+- 50 rows provides adequate scrollback for monitoring
+- These dimensions don't affect the actual terminal the user might attach to
+
+---
+
+## Key Insights
+
+### 1. Silent Failures are Deceptive
+
+The command appears in the terminal output but never executes. This makes debugging difficult because:
+- `tmux capture-pane` shows the command was "sent"
+- No error message is visible
+- The session exists and appears healthy
+
+**Lesson:** Always verify command execution by checking for child processes or activity indicators, not just command presence.
+
+### 2. Length Threshold is Approximate
+
+The exact failure point depends on:
+- Terminal width (obviously)
+- Command content (special characters, quotes)
+- Shell type (bash vs zsh)
+- tmux version
+
+**Lesson:** Use generous margins. If your longest expected command is 1500 chars, use 200+ column width.
+
+### 3. Quote Escaping is NOT the Issue
+
+Initial hypothesis was that escaped quotes (`\"`) or special characters caused parsing failures. Testing proved this wrong:
+
+```bash
+# This works fine with wide terminal:
+cmd='claude "test with \"quotes\" inside"'
+tmux send-keys -t "$sess" "$cmd" Enter  # SUCCESS at 200 cols
+```
+
+**Lesson:** Don't chase red herrings. Test the simplest hypothesis (length/width) before investigating complex escaping issues.
+
+### 4. Process Detection is Reliable
+
+The most reliable way to verify command execution:
+
+```bash
+PANE_PID=$(tmux display -t "$session" -p '#{pane_pid}')
+if pgrep -P "$PANE_PID" >/dev/null 2>&1; then
+    echo "Command is running"
+else
+    echo "No child processes - command failed"
+fi
+```
+
+---
+
+## Checklist for Future Debugging
+
+When tmux commands fail silently:
+
+- [ ] Check command length: `echo ${#cmd}`
+- [ ] Check terminal dimensions: `tmux display -t "$sess" -p '#{pane_width}'`
+- [ ] Test with wider terminal: `-x 200 -y 50`
+- [ ] Verify with process check: `pgrep -P $PANE_PID`
+- [ ] Check pane status: `tmux display -t "$sess" -p '#{pane_dead}'`
+- [ ] Capture full output: `tmux capture-pane -t "$sess" -p -S -100`
+
+---
+
+## Bug: Script File Path Not Executed (2026-02-09)
+
+**Symptoms identical to the terminal-width issue**, but with a different root cause.
+
+When `spawn` receives a command longer than 500 characters, it writes the command to a script file (`/tmp/sa-cmd-{session}.sh`) and sends the path via `tmux send-keys`. However, the path was sent **without the `bash` prefix**, so the shell received a raw file path instead of an executable command.
+
+**Affected commands:** Retrospective prompts (~1577 chars) — all other steps (create-story, dev-story, code-review) are under 500 chars and use direct `send-keys`.
+
+**Fix:** `src/story_automator/commands/tmux.py` — changed the long-command fallback to send `bash /tmp/sa-cmd-{session}.sh` instead of a raw script path, and fail fast if the temp script write or `tmux send-keys` path breaks.
+
+**Lesson:** Two independent failure modes can produce identical symptoms (`never_active`). The `-x 200 -y 50` fix handles line-wrapping for direct `send-keys`, but the script-file fallback path had its own bug. Always check both paths when debugging.
+
+---
+
+## Related Files
+
+- `scripts/story-automator tmux-wrapper` - Session spawning with `-x 200 -y 50` fix + script file `bash` prefix fix
+- `scripts/story-automator monitor-session` - Polling loop that detects stuck sessions
+- `scripts/story-automator tmux-status-check` - Status detection with activity indicators
+- `data/monitoring-pattern.md` - Overall monitoring architecture
+- `data/tmux-long-command-testing.md` - Detailed investigation and test scripts
--- a/.agents/skills/bmad-story-automator/data/tmux-long-command-testing.md
+++ b/.agents/skills/bmad-story-automator/data/tmux-long-command-testing.md
@@ -0,0 +1,184 @@
+# Tmux Long Command Testing & Investigation
+
+**Related:** See `tmux-long-command-debugging.md` for root cause analysis and solution.
+
+---
+
+## Investigation Process
+
+### Step 1: Verify Command Syntax
+
+First, confirm the command itself is valid:
+
+```bash
+# Build the command
+cmd=$("$scripts" tmux-wrapper build-cmd retro 2 --agent "codex")
+
+# Check for syntax issues
+echo "$cmd" | od -c | head -20  # Look for unexpected characters
+
+# Test parsing
+bash -n -c "$cmd"  # Syntax check only
+```
+
+**Finding:** Command syntax was correct. Quotes and escapes were properly formed.
+
+### Step 2: Test Progressive Lengths
+
+Binary search to find the breaking point:
+
+```bash
+test_length() {
+    local len=$1
+    local sess="test-len-$len-$$"
+    local prompt="Execute the BMAD retrospective workflow for epic 2. $(printf 'x%.0s' $(seq 1 $len))"
+
+    tmux new-session -d -s "$sess"
+    tmux send-keys -t "$sess" "claude --dangerously-skip-permissions \"$prompt\"" Enter
+    sleep 5
+
+    local capture=$(tmux capture-pane -t "$sess" -p)
+    tmux kill-session -t "$sess" 2>/dev/null
+
+    if echo "$capture" | grep -qiE "interrupt|Working|Running"; then
+        echo "Length $len: SUCCESS"
+    else
+        echo "Length $len: FAILED"
+    fi
+}
+
+# Test different lengths
+test_length 200   # SUCCESS
+test_length 500   # SUCCESS
+test_length 800   # SUCCESS
+test_length 1000  # SUCCESS
+test_length 1200  # FAILED
+```
+
+**Finding:** Commands failed around 1000-1200 characters.
+
+### Step 3: Test Terminal Width Hypothesis
+
+```bash
+# Default dimensions
+sess="test-default-$$"
+tmux new-session -d -s "$sess"
+tmux display -t "$sess" -p 'cols:#{pane_width} rows:#{pane_height}'
+# Output: cols:80 rows:24
+
+# Send long command
+tmux send-keys -t "$sess" "$long_cmd" Enter
+sleep 10
+# Result: FAILED - no activity
+
+# Wide terminal
+sess="test-wide-$$"
+tmux new-session -d -s "$sess" -x 200 -y 50
+tmux display -t "$sess" -p 'cols:#{pane_width} rows:#{pane_height}'
+# Output: cols:200 rows:50
+
+# Send same long command
+tmux send-keys -t "$sess" "$long_cmd" Enter
+sleep 10
+# Result: SUCCESS - Claude running!
+```
+
+**Finding:** Wide terminal (200 cols) prevents the failure.
+
+### Step 4: Understand the Mechanism
+
+The shell's line editor (readline/zle) handles input differently when lines wrap:
+
+1. **Normal input:** Characters arrive, shell builds command buffer
+2. **Wrapped input:** Terminal sends characters that visually wrap
+3. **Problem:** Some shell/terminal combinations mishandle the wrap points
+4. **Result:** Command buffer corruption or premature execution
+
+This is why the command "appears" in the terminal (tmux captured it) but doesn't execute properly (shell didn't parse it correctly).
+
+---
+
+## Testing Methodology
+
+### Quick Smoke Test
+
+```bash
+#!/bin/bash
+# smoke-test-tmux-command.sh
+
+cmd="$1"
+cmd_len=${#cmd}
+
+echo "Testing command of length: $cmd_len"
+
+# Test with default dimensions
+sess="smoke-default-$$"
+tmux new-session -d -s "$sess"
+tmux send-keys -t "$sess" "$cmd" Enter
+sleep 5
+if tmux capture-pane -t "$sess" -p | grep -qiE "interrupt|Working|Running|Read"; then
+    echo "Default (80x24): SUCCESS"
+else
+    echo "Default (80x24): FAILED"
+fi
+tmux kill-session -t "$sess" 2>/dev/null
+
+# Test with wide dimensions
+sess="smoke-wide-$$"
+tmux new-session -d -s "$sess" -x 200 -y 50
+tmux send-keys -t "$sess" "$cmd" Enter
+sleep 5
+if tmux capture-pane -t "$sess" -p | grep -qiE "interrupt|Working|Running|Read"; then
+    echo "Wide (200x50): SUCCESS"
+else
+    echo "Wide (200x50): FAILED"
+fi
+tmux kill-session -t "$sess" 2>/dev/null
+```
+
+### Comprehensive Test
+
+```bash
+#!/bin/bash
+# test-tmux-long-commands.sh
+
+test_at_width() {
+    local width=$1
+    local cmd_len=$2
+    local sess="test-w${width}-l${cmd_len}-$$"
+
+    # Generate command of specific length
+    local padding=$(printf 'x%.0s' $(seq 1 $cmd_len))
+    local cmd="echo \"test $padding\""
+
+    tmux new-session -d -s "$sess" -x "$width" -y 24
+    tmux send-keys -t "$sess" "$cmd" Enter
+    sleep 2
+
+    local output=$(tmux capture-pane -t "$sess" -p)
+    tmux kill-session -t "$sess" 2>/dev/null
+
+    if echo "$output" | grep -q "test xxx"; then
+        echo "Width $width, Length $cmd_len: PASS"
+        return 0
+    else
+        echo "Width $width, Length $cmd_len: FAIL"
+        return 1
+    fi
+}
+
+# Test matrix
+for width in 80 120 160 200; do
+    for len in 500 1000 1500 2000; do
+        test_at_width $width $len
+    done
+done
+```
+
+---
+
+## References
+
+- tmux manual: `man tmux` (see `new-session` options)
+- Shell line editing: readline (bash) / zle (zsh)
+- Related issue: Commands with many arguments or long strings failing in tmux
--- a/.agents/skills/bmad-story-automator/data/workflow-commands.md
+++ b/.agents/skills/bmad-story-automator/data/workflow-commands.md
@@ -0,0 +1,118 @@
+# Workflow Prompt Reference
+
+**Related:** See `tmux-commands.md` for session naming and management.
+
+---
+
+## Multi-Agent Support
+
+| Agent | CLI Command | Prompt Style |
+|-------|-------------|--------------|
+| **Claude** | `claude --dangerously-skip-permissions` | Natural language skill prompt |
+| **Codex** | `codex exec --full-auto` | Natural language skill prompt |
+
+All child sessions receive explicit skill and workflow paths. Command wrappers are not required.
+
+---
+
+## Required Prompt Fields
+
+Every generated prompt must include:
+
+1. Which skill/workflow to execute
+2. The `SKILL.md` path when available
+3. The `workflow.md` or `workflow.yaml` path
+4. The story file pattern in `_bmad-output/implementation-artifacts`
+5. The story ID or epic ID
+6. Any automation instruction such as `#YOLO` or `auto-fix all issues without prompting`
+
+---
+
+## dev-story
+
+```bash
+tmux send-keys -t "SESSION" 'claude --dangerously-skip-permissions "Execute the BMAD dev-story workflow for story STORY_ID.
+
+READ this skill first: <installed-skill-root>/bmad-dev-story/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-dev-story/workflow.md
+Story file: _bmad-output/implementation-artifacts/STORY_PREFIX-*.md
+Implement all tasks marked [ ]. Run tests. Update checkboxes."' Enter
+```
+
+---
+
+## code-review
+
+**MUST use the dedicated `bmad-story-automator-review` skill. Do NOT use a generic Task agent for reviews.**
+
+```bash
+tmux send-keys -t "SESSION" 'claude --dangerously-skip-permissions "Execute the story-automator review workflow for story STORY_ID.
+
+READ this skill first: <installed-skill-root>/bmad-story-automator-review/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-story-automator-review/workflow.yaml
+Then read: <installed-skill-root>/bmad-story-automator-review/instructions.xml
+Validate with: <installed-skill-root>/bmad-story-automator-review/checklist.md
+Story file: _bmad-output/implementation-artifacts/STORY_PREFIX-*.md
+Review implementation, find issues, fix them automatically. auto-fix all issues without prompting"' Enter
+```
+
+**Why `auto-fix all issues without prompting`:** The dedicated review workflow normally presents a findings menu. This instruction tells it to automatically fix issues without prompting.
+
+---
+
+## create-story
+
+```bash
+tmux send-keys -t "SESSION" 'claude --dangerously-skip-permissions "Execute the BMAD create-story workflow for story STORY_ID.
+
+READ this skill first: <installed-skill-root>/bmad-create-story/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-create-story/workflow.md
+Then read: <installed-skill-root>/bmad-create-story/discover-inputs.md
+Use template: <installed-skill-root>/bmad-create-story/template.md
+Validate with: <installed-skill-root>/bmad-create-story/checklist.md
+Create story file at: _bmad-output/implementation-artifacts/STORY_PREFIX-*.md
+Story ID: STORY_ID
+
+#YOLO - Do NOT wait for user input."' Enter
+```
+
+**CRITICAL:** Always pass the story ID (for example, `5.3`) to ensure create-story creates only that one story.
+
+---
+
+## automate
+
+```bash
+tmux send-keys -t "SESSION" 'claude --dangerously-skip-permissions "Execute the BMAD qa-generate-e2e-tests workflow for story STORY_ID.
+
+READ this skill first: <installed-skill-root>/bmad-qa-generate-e2e-tests/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-qa-generate-e2e-tests/workflow.md
+Validate with: <installed-skill-root>/bmad-qa-generate-e2e-tests/checklist.md
+Story file: _bmad-output/implementation-artifacts/STORY_PREFIX-*.md
+Auto-apply all discovered gaps in tests."' Enter
+```
+
+If `bmad-qa-generate-e2e-tests` is missing from the installed skill root, story-automator install still succeeds, but the orchestrator should run with `Skip Automate = true`.
+
+---
+
+## retrospective
+
+```bash
+tmux send-keys -t "SESSION" 'claude --dangerously-skip-permissions "Execute the BMAD retrospective workflow for epic EPIC_ID.
+
+READ this skill first: <installed-skill-root>/bmad-retrospective/SKILL.md
+READ this workflow file next: <installed-skill-root>/bmad-retrospective/workflow.md
+Run the retrospective in #YOLO mode and assume the user will NOT provide input."' Enter
+```
+
+---
+
+## Variables
+
+- `AI_AGENT` = `claude` or `codex`
+- `AI_COMMAND` = full CLI command override, legacy and deprecated
+- `STORY_PREFIX` = story ID with dots replaced by hyphens, for example `6.1` -> `6-1`
+- `{projectPath}` = project root
+
+All commands assume the session was created with `STORY_AUTOMATOR_CHILD=true`.
--- a/.agents/skills/bmad-story-automator/data/wrapup-templates.md
+++ b/.agents/skills/bmad-story-automator/data/wrapup-templates.md
@@ -0,0 +1,131 @@
+# Wrapup Templates
+
+Templates for the wrapup step summary, learnings, and recommendations.
+
+---
+
+## Summary Report Template
+
+```
+**📊 Build Cycle Summary**
+
+**Epic:** {epic_name}
+**Stories:** {story_range} ({completed}/{total} completed)
+**Duration:** {start_time} to {end_time}
+
+---
+
+**Story Results:**
+
+| Story | Title | Status | Review Cycles | Notes |
+|-------|-------|--------|---------------|-------|
+{story_results_table}
+
+---
+
+**Execution Statistics:**
+
+| Metric | Value |
+|--------|-------|
+| Stories Completed | {count} |
+| Stories Skipped/Aborted | {count} |
+| Total Code Review Cycles | {count} |
+| Escalations | {count} |
+| Git Commits | {count} |
+
+---
+
+**Session Summary:**
+
+| Session Type | Count | Avg Duration |
+|--------------|-------|--------------|
+| create-story | {count} | {avg} |
+| dev-story | {count} | {avg} |
+| automate | {count} | {avg} |
+| code-review | {count} | {avg} |
+
+---
+
+**Escalations Encountered:**
+{escalation_list_or_'None'}
+
+**Issues Resolved:**
+{issues_resolved_list_or_'None'}
+```
+
+---
+
+## Learnings Entry Template
+
+Append this to the sidecar learnings file:
+
+```markdown
+## Run: {timestamp}
+
+**Epic:** {epic_name}
+**Stories:** {story_range}
+
+### Patterns Observed
+- {pattern_1}
+- {pattern_2}
+
+### Code Review Insights
+- Common issues: {list}
+- Average cycles to clean: {avg}
+
+### Timing Estimates
+- create-story: ~{avg_time}
+- dev-story: ~{avg_time}
+- code-review: ~{avg_time} per cycle
+
+### Recommendations for Future Runs
+- {recommendation_1}
+- {recommendation_2}
+```
+
+**Patterns to capture:**
+- Common code review issues (what kept failing?)
+- Steps that frequently needed escalation
+- Stories that took longer than expected
+- Successful patterns (what worked well?)
+
+---
+
+## Recommendations Template
+
+```
+**💡 Recommendations**
+
+Based on this build cycle run:
+
+**For Future Runs:**
+{recommendations_based_on_patterns}
+
+**Process Improvements:**
+{suggestions_for_workflow_improvements}
+
+**Technical Debt:**
+{any_tech_debt_identified}
+
+**Documentation Needs:**
+{any_docs_that_should_be_updated}
+```
+
+---
+
+## Completion Message Template
+
+```
+**✅ Story Automator Complete**
+
+**Results saved to:**
+- State document: `{state_document_path}`
+- Learnings: `{sidecarFile}`
+
+**Stories implemented:** {count}
+**Git commits made:** {count}
+
+Thank you for using Story Automator. The state document contains full history for reference.
+
+To run another build cycle, invoke the story-automator workflow again.
+```