Files
sar/.claude/skills/bmad-story-automator/data/agent-fallback-troubleshooting.md
julian 17c08e6392 chore: initial monorepo scaffold + WDS Phase 1+2 artifacts
- Nx 22.7 monorepo (pnpm 11.1, TypeScript 5.9, Node 24)
- apps/api: NestJS 11 (CJS conforme CODING-RULES.md PGD-DB-004)
- apps/web: React 19 + Vite 8 (ESM)
- libs/shared/api-interface: Zod contract base
- Docker Compose dev: Postgres 18, Valkey 8, MinIO, Mailpit
- WDS artifacts:
  - design-artifacts/A-Product-Brief/ (5 docs canônicos + 16 dialogs)
  - design-artifacts/B-Trigger-Map/ (hub + 4 personas + feature impact)
- Stack canon: STACK.md v2.2 + CODING-RULES.md v2.0 + brand.md
- AGENTS.md + README.md como entrada para devs/agentes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:34:20 +00:00

6.0 KiB
Raw Permalink Blame History

Agent Fallback Troubleshooting

Issue: Session spawns Claude instead of Codex

Symptoms:

  • Output shows Claude-specific messages (e.g., "You've used 84% of your weekly limit")
  • Expected Codex but got Claude

Cause: The --agent flag must be passed to story-automator tmux-wrapper spawn, not to build-cmd.

Correct Usage (v1.4.0+):

# Method 1: Use --agent flag on spawn (RECOMMENDED)
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
  --agent codex \
  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")

# Method 2: Set environment variable before spawn
export AI_AGENT="codex"
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id")")

Wrong Usage:

# WRONG - this doesn't work
session=$("$scripts" tmux-wrapper spawn dev "$epic" "$story_id" \
  --command "$("$scripts" tmux-wrapper build-cmd dev "$story_id" --agent codex)")

Issue: Monitor reports "stuck" but Codex is active

Symptoms:

  • story-automator monitor-session returns stuck state after 4 polls
  • Manual inspection shows Codex still producing output (no prompt, output continues to grow)

Cause: The monitoring script relied on marker detection instead of output freshness.

Fixed in v2.4.0:

  • Output freshness tracking (no marker reliance)
  • CODEX_OUTPUT_STALE_SECONDS controls how long Codex can be silent before "stuck"
  • Codex still gets 6 poll grace period before "stuck"

Verification:

# Check if session has AI_AGENT set
tmux show-environment -t "session-name" AI_AGENT

# Manual session status check
"$scripts" tmux-status-check "session-name" --project-root "$PWD"

Issue: log command error when using --agent flag

Symptoms:

log: Unknown subcommand 'Codex agent detected - applying 1.5x timeout (90min)'

Cause: macOS has /usr/bin/log system command. If the log() bash function wasn't defined before first use, bash fell through to the system command.

Fixed in v1.4.0: The log() function is now defined before argument parsing in story-automator monitor-session.

Issue: Manual polling required as workaround

If monitoring still fails, use this manual polling approach:

for i in {1..60}; do
    sleep 30
    # Check if session still exists
    if ! tmux has-session -t "session-name" 2>/dev/null; then
        echo "Session ended"
        break
    fi
    # Check for shell prompt (completion indicator)
    last_line=$(tmux capture-pane -t "session-name" -p | tail -1)
    if echo "$last_line" | grep -qE '$|\$$|#$'; then
        echo "Session complete (shell prompt detected)"
        break
    fi
done

Issue: Codex sessions explore files but don't execute full workflow (v1.4.0)

Symptoms:

  • Session output shows file exploration (sed, rg, cat commands)
  • No actual review findings or story updates
  • Sprint-status never changes from "review" to "done"
  • Session completes but workflow steps 1-5 weren't followed

Cause: Codex uses natural language prompts and may not follow structured workflow instructions as reliably as Claude.

Mitigation strategies:

  1. Use Claude for code-review by default - More reliable at following multi-step workflows
  2. Add explicit step markers - Tell Codex to output "STEP 1 COMPLETE", "STEP 2 COMPLETE" etc.
  3. Verify after session - Check story file Status field, not just sprint-status

Recommended agent configuration for deterministic reliability:

agentConfig:
  defaultPrimary: "auto"
  defaultFallback: false  # Disable global fallback; opt in per task
  perTask:
    # create-story: Either agent works well
    create:
      primary: "claude"
    # dev-story: Either agent works, Codex may be faster for simple tasks
    dev:
      primary: "codex"
      fallback: "claude"
    # code-review: Claude recommended - more reliable at following workflow
    review:
      primary: "claude"
      fallback: false

Issue: Code-review doesn't update sprint-status.yaml

Symptoms:

  • Code-review session completes
  • Story file shows review was done (Dev Agent Record updated)
  • But sprint-status.yaml still shows "review" instead of "done"

Cause: Code-review workflow step 5 updates sprint-status, but session may not reach step 5 or may use wrong story key format.

Verification (v1.4.0):

# Check story file status directly
"$scripts" orchestrator-helper story-file-status 8.2

# Compare with sprint-status
"$scripts" orchestrator-helper sprint-status get "8-2-flipside-crypto-provider"

# If story file shows "done" but sprint-status doesn't, manually sync:
# Edit _bmad-output/implementation-artifacts/sprint-status.yaml and change "8-2-story-name: review" to "done"

When to manually intervene

Intervene immediately if:

  1. 5 code-review cycles with no progress - Agent likely stuck in a loop
  2. Story file shows "done" but sprint-status doesn't - Sync issue, manual fix is faster
  3. Tests passing but review keeps finding issues - May be false positives
  4. Codex sessions consistently incomplete - Switch to Claude for that workflow

Steps for manual intervention:

# 1. Check actual story status
"$scripts" orchestrator-helper story-file-status {story_id}

# 2. Run tests to verify code quality
go test ./src/... || npm test

# 3. If tests pass, manually update sprint-status
# Edit: _bmad-output/implementation-artifacts/sprint-status.yaml
# Change: "8-2-story-name: review" to "8-2-story-name: done"

# 4. Resume orchestration - it will see "done" and proceed to commit

Debugging Agent Detection

# Check current agent type detection
"$scripts" tmux-wrapper agent-type

# Check what CLI command would be used
"$scripts" tmux-wrapper agent-cli

# Check what command prefix would be used
"$scripts" tmux-wrapper skill-prefix

# View session environment
tmux show-environment -t "session-name"

# Check story key normalization (v1.4.0)
"$scripts" orchestrator-helper normalize-key "8.2"
"$scripts" orchestrator-helper normalize-key "8-2-flipside-crypto-provider"