chore: initial monorepo scaffold + WDS Phase 1+2 artifacts
- Nx 22.7 monorepo (pnpm 11.1, TypeScript 5.9, Node 24) - apps/api: NestJS 11 (CJS conforme CODING-RULES.md PGD-DB-004) - apps/web: React 19 + Vite 8 (ESM) - libs/shared/api-interface: Zod contract base - Docker Compose dev: Postgres 18, Valkey 8, MinIO, Mailpit - WDS artifacts: - design-artifacts/A-Product-Brief/ (5 docs canônicos + 16 dialogs) - design-artifacts/B-Trigger-Map/ (hub + 4 personas + feature impact) - Stack canon: STACK.md v2.2 + CODING-RULES.md v2.0 + brand.md - AGENTS.md + README.md como entrada para devs/agentes Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,180 @@
|
||||
# Agent Fallback Troubleshooting
|
||||
|
||||
### Issue: Session spawns Claude instead of Codex
|
||||
|
||||
**Symptoms:**
|
||||
- Output shows Claude-specific messages (e.g., "You've used 84% of your weekly limit")
|
||||
- Expected Codex but got Claude
|
||||
|
||||
**Cause:** The `--agent` flag must be passed to `story-automator tmux-wrapper spawn`, not to `build-cmd`.
|
||||
|
||||
**Correct Usage (v1.4.0+):**
|
||||
```bash
|
||||
# Method 1: Use --agent flag on spawn (RECOMMENDED)
|
||||
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
|
||||
--agent codex \
|
||||
--command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")
|
||||
|
||||
# Method 2: Set environment variable before spawn
|
||||
export AI_AGENT="codex"
|
||||
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
|
||||
--command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")
|
||||
```
|
||||
|
||||
**Wrong Usage:**
|
||||
```bash
|
||||
# WRONG - this doesn't work
|
||||
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
|
||||
--command "$("$scripts" tmux-wrapper build-cmd dev "$story_id" --agent codex)")
|
||||
```
|
||||
|
||||
### Issue: Monitor reports "stuck" but Codex is active
|
||||
|
||||
**Symptoms:**
|
||||
- `story-automator monitor-session` returns `stuck` state after 4 polls
|
||||
- Manual inspection shows Codex still producing output (no prompt, output continues to grow)
|
||||
|
||||
**Cause:** The monitoring script relied on marker detection instead of output freshness.
|
||||
|
||||
**Fixed in v2.4.0:**
|
||||
- Output freshness tracking (no marker reliance)
|
||||
- `CODEX_OUTPUT_STALE_SECONDS` controls how long Codex can be silent before "stuck"
|
||||
- Codex still gets 6 poll grace period before "stuck"
|
||||
|
||||
**Verification:**
|
||||
```bash
|
||||
# Check if session has AI_AGENT set
|
||||
tmux show-environment -t "session-name" AI_AGENT
|
||||
|
||||
# Manual session status check
|
||||
"$scripts" tmux-status-check "session-name" --project-root "$PWD"
|
||||
```
|
||||
|
||||
### Issue: log command error when using --agent flag
|
||||
|
||||
**Symptoms:**
|
||||
```
|
||||
log: Unknown subcommand 'Codex agent detected - applying 1.5x timeout (90min)'
|
||||
```
|
||||
|
||||
**Cause:** macOS has `/usr/bin/log` system command. If the `log()` bash function wasn't defined before first use, bash fell through to the system command.
|
||||
|
||||
**Fixed in v1.4.0:** The `log()` function is now defined before argument parsing in `story-automator monitor-session`.
|
||||
|
||||
### Issue: Manual polling required as workaround
|
||||
|
||||
**If monitoring still fails**, use this manual polling approach:
|
||||
```bash
|
||||
for i in {1..60}; do
|
||||
sleep 30
|
||||
# Check if session still exists
|
||||
if ! tmux has-session -t "session-name" 2>/dev/null; then
|
||||
echo "Session ended"
|
||||
break
|
||||
fi
|
||||
# Check for shell prompt (completion indicator)
|
||||
last_line=$(tmux capture-pane -t "session-name" -p | tail -1)
|
||||
if echo "$last_line" | grep -qE '❯$|\$$|#$'; then
|
||||
echo "Session complete (shell prompt detected)"
|
||||
break
|
||||
fi
|
||||
done
|
||||
```
|
||||
|
||||
### Issue: Codex sessions explore files but don't execute full workflow (v1.4.0)
|
||||
|
||||
**Symptoms:**
|
||||
- Session output shows file exploration (`sed`, `rg`, `cat` commands)
|
||||
- No actual review findings or story updates
|
||||
- Sprint-status never changes from "review" to "done"
|
||||
- Session completes but workflow steps 1-5 weren't followed
|
||||
|
||||
**Cause:** Codex uses natural language prompts and may not follow structured workflow instructions as reliably as Claude.
|
||||
|
||||
**Mitigation strategies:**
|
||||
1. **Use Claude for code-review by default** - More reliable at following multi-step workflows
|
||||
2. **Add explicit step markers** - Tell Codex to output "STEP 1 COMPLETE", "STEP 2 COMPLETE" etc.
|
||||
3. **Verify after session** - Check story file Status field, not just sprint-status
|
||||
|
||||
**Recommended agent configuration for deterministic reliability:**
|
||||
```yaml
|
||||
agentConfig:
|
||||
defaultPrimary: "auto"
|
||||
defaultFallback: false # Disable global fallback; opt in per task
|
||||
perTask:
|
||||
# create-story: Either agent works well
|
||||
create:
|
||||
primary: "claude"
|
||||
# dev-story: Either agent works, Codex may be faster for simple tasks
|
||||
dev:
|
||||
primary: "codex"
|
||||
fallback: "claude"
|
||||
# code-review: Claude recommended - more reliable at following workflow
|
||||
review:
|
||||
primary: "claude"
|
||||
fallback: false
|
||||
```
|
||||
|
||||
### Issue: Code-review doesn't update sprint-status.yaml
|
||||
|
||||
**Symptoms:**
|
||||
- Code-review session completes
|
||||
- Story file shows review was done (Dev Agent Record updated)
|
||||
- But sprint-status.yaml still shows "review" instead of "done"
|
||||
|
||||
**Cause:** Code-review workflow step 5 updates sprint-status, but session may not reach step 5 or may use wrong story key format.
|
||||
|
||||
**Verification (v1.4.0):**
|
||||
```bash
|
||||
# Check story file status directly
|
||||
"$scripts" orchestrator-helper story-file-status 8.2
|
||||
|
||||
# Compare with sprint-status
|
||||
"$scripts" orchestrator-helper sprint-status get "8-2-flipside-crypto-provider"
|
||||
|
||||
# If story file shows "done" but sprint-status doesn't, manually sync:
|
||||
# Edit _bmad-output/implementation-artifacts/sprint-status.yaml and change "8-2-story-name: review" to "done"
|
||||
```
|
||||
|
||||
### When to manually intervene
|
||||
|
||||
**Intervene immediately if:**
|
||||
1. **5 code-review cycles with no progress** - Agent likely stuck in a loop
|
||||
2. **Story file shows "done" but sprint-status doesn't** - Sync issue, manual fix is faster
|
||||
3. **Tests passing but review keeps finding issues** - May be false positives
|
||||
4. **Codex sessions consistently incomplete** - Switch to Claude for that workflow
|
||||
|
||||
**Steps for manual intervention:**
|
||||
```bash
|
||||
# 1. Check actual story status
|
||||
"$scripts" orchestrator-helper story-file-status {story_id}
|
||||
|
||||
# 2. Run tests to verify code quality
|
||||
go test ./src/... || npm test
|
||||
|
||||
# 3. If tests pass, manually update sprint-status
|
||||
# Edit: _bmad-output/implementation-artifacts/sprint-status.yaml
|
||||
# Change: "8-2-story-name: review" to "8-2-story-name: done"
|
||||
|
||||
# 4. Resume orchestration - it will see "done" and proceed to commit
|
||||
```
|
||||
|
||||
### Debugging Agent Detection
|
||||
|
||||
```bash
|
||||
# Check current agent type detection
|
||||
"$scripts" tmux-wrapper agent-type
|
||||
|
||||
# Check what CLI command would be used
|
||||
"$scripts" tmux-wrapper agent-cli
|
||||
|
||||
# Check what command prefix would be used
|
||||
"$scripts" tmux-wrapper skill-prefix
|
||||
|
||||
# View session environment
|
||||
tmux show-environment -t "session-name"
|
||||
|
||||
# Check story key normalization (v1.4.0)
|
||||
"$scripts" orchestrator-helper normalize-key "8.2"
|
||||
"$scripts" orchestrator-helper normalize-key "8-2-flipside-crypto-provider"
|
||||
```
|
||||
Reference in New Issue
Block a user