Files
sar/.claude/skills/bmad-story-automator/data/adaptive-retry.md
julian 17c08e6392 chore: initial monorepo scaffold + WDS Phase 1+2 artifacts
- Nx 22.7 monorepo (pnpm 11.1, TypeScript 5.9, Node 24)
- apps/api: NestJS 11 (CJS conforme CODING-RULES.md PGD-DB-004)
- apps/web: React 19 + Vite 8 (ESM)
- libs/shared/api-interface: Zod contract base
- Docker Compose dev: Postgres 18, Valkey 8, MinIO, Mailpit
- WDS artifacts:
  - design-artifacts/A-Product-Brief/ (5 docs canônicos + 16 dialogs)
  - design-artifacts/B-Trigger-Map/ (hub + 4 personas + feature impact)
- Stack canon: STACK.md v2.2 + CODING-RULES.md v2.0 + brand.md
- AGENTS.md + README.md como entrada para devs/agentes

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 14:34:20 +00:00

3.6 KiB

Adaptive Retry Strategy

Purpose: Handle dev-story failures intelligently based on progress patterns and agent switching.

Version: 2.0.0

See also: retry-fallback-strategy.md for the universal retry/fallback pattern.


Agent Alternation

This strategy works WITH the retry-fallback pattern:

  • Odd attempts (1, 3, 5): Use primary agent
  • Even attempts (2, 4): Use fallback agent (if configured)
  • Plateau detection applies ACROSS agents (same task across both agents = complexity issue)

Progress Tracking

Track failure patterns across retries (per agent):

attempt_1_progress = {agent: primary, tasks: 5/9}
attempt_2_progress = {agent: fallback, tasks: 4/9}
attempt_3_progress = {agent: primary, tasks: 5/9}  # same as attempt 1
attempt_4_progress = {agent: fallback, tasks: 5/9} # plateau detected
attempt_5_progress = {agent: primary, tasks: 5/9}  # confirmed plateau

Decision Logic

Attempt Condition Action
1 FAILURE Switch to fallback agent, retry
2 FAILURE, progress > attempt_1 Switch back to primary, retry with 2x poll interval
2 FAILURE, progress ≤ attempt_1 Switch back to primary, analyze if same plateau point
3 FAILURE, plateau at same task (any agent) Continue to attempt 4 (confirm with other agent)
4 FAILURE, plateau confirmed across agents DEFER story (complexity/context limit hit)
4 FAILURE, variable progress One more retry with extended timeout
5 FAILURE, plateau confirmed DEFER story
5 FAILURE, zero progress all attempts ESCALATE (likely API/connection issue)
5 FAILURE, variable but incomplete ESCALATE (all retries exhausted)

Plateau Detection

If tasks_completed is identical across 2+ attempts AND the session crashed/stopped at the same task, this indicates a complexity or context limit.

Indicators:

  • Same task number across multiple attempts
  • Session crashes at same point
  • No progress despite retries

Action: Mark story as "deferred" and continue with next story.


DEFER Action

When a story is deferred (not failed):

  1. Update state: Mark story as "deferred" in progress table
  2. Log: "Story {N} deferred - dev-story hit complexity limit at {tasks_completed}/{tasks_total}"
  3. Continue: Proceed to next story (do not escalate to user unless custom instructions say otherwise)

Why defer vs fail?

  • Deferred stories can be revisited manually
  • Doesn't block automation of remaining stories
  • Distinguishes from actual errors (API failures, etc.)

Integration with Crash Recovery

Adaptive retry works WITH crash recovery AND agent fallback:

Type Trigger Handling
Adaptive Retry Session completed but FAILED (wrong output, tests failed) Progress-based retry with agent alternation
Crash Recovery Session DIED unexpectedly (context limit, API error, kill) Switch agent, retry with new session
Agent Fallback Primary agent fails Automatic switch to fallback agent on next attempt

All three mechanisms work together:

  1. Primary crashes → switch to fallback, new session
  2. Fallback fails at task 5 → switch to primary, retry
  3. Primary fails at task 5 → plateau detected across agents → DEFER

Single attempt counter across all failure types.


Network Error Handling

On network-related failures (see retry-fallback-strategy.md):

  • Sleep 60 seconds before next attempt
  • Network errors do NOT count toward plateau detection
  • Always retry after network error (up to max attempts)