# Adaptive Retry Strategy

**Purpose:** Handle dev-story failures intelligently based on progress patterns and agent switching.

**Version:** 2.0.0

**See also:** `retry-fallback-strategy.md` for the universal retry/fallback pattern.

---

## Agent Alternation

This strategy works WITH the retry-fallback pattern:
- Odd attempts (1, 3, 5): Use primary agent
- Even attempts (2, 4): Use fallback agent (if configured)
- Plateau detection applies ACROSS agents (same task across both agents = complexity issue)

---

## Progress Tracking

Track failure patterns across retries (per agent):

```
attempt_1_progress = {agent: primary, tasks: 5/9}
attempt_2_progress = {agent: fallback, tasks: 4/9}
attempt_3_progress = {agent: primary, tasks: 5/9}  # same as attempt 1
attempt_4_progress = {agent: fallback, tasks: 5/9} # plateau detected
attempt_5_progress = {agent: primary, tasks: 5/9}  # confirmed plateau
```

---

## Decision Logic

| Attempt | Condition | Action |
|---------|-----------|--------|
| 1 | FAILURE | Switch to fallback agent, retry |
| 2 | FAILURE, progress > attempt_1 | Switch back to primary, retry with 2x poll interval |
| 2 | FAILURE, progress ≤ attempt_1 | Switch back to primary, analyze if same plateau point |
| 3 | FAILURE, plateau at same task (any agent) | Continue to attempt 4 (confirm with other agent) |
| 4 | FAILURE, plateau confirmed across agents | **DEFER** story (complexity/context limit hit) |
| 4 | FAILURE, variable progress | One more retry with extended timeout |
| 5 | FAILURE, plateau confirmed | **DEFER** story |
| 5 | FAILURE, zero progress all attempts | **ESCALATE** (likely API/connection issue) |
| 5 | FAILURE, variable but incomplete | **ESCALATE** (all retries exhausted) |

---

## Plateau Detection

If `tasks_completed` is identical across 2+ attempts AND the session crashed/stopped at the same task, this indicates a complexity or context limit.

**Indicators:**
- Same task number across multiple attempts
- Session crashes at same point
- No progress despite retries

**Action:** Mark story as "deferred" and continue with next story.

---

## DEFER Action

When a story is deferred (not failed):

1. **Update state:** Mark story as "deferred" in progress table
2. **Log:** "Story {N} deferred - dev-story hit complexity limit at {tasks_completed}/{tasks_total}"
3. **Continue:** Proceed to next story (do not escalate to user unless custom instructions say otherwise)

**Why defer vs fail?**
- Deferred stories can be revisited manually
- Doesn't block automation of remaining stories
- Distinguishes from actual errors (API failures, etc.)

---

## Integration with Crash Recovery

Adaptive retry works WITH crash recovery AND agent fallback:

| Type | Trigger | Handling |
|------|---------|----------|
| **Adaptive Retry** | Session completed but FAILED (wrong output, tests failed) | Progress-based retry with agent alternation |
| **Crash Recovery** | Session DIED unexpectedly (context limit, API error, kill) | Switch agent, retry with new session |
| **Agent Fallback** | Primary agent fails | Automatic switch to fallback agent on next attempt |

All three mechanisms work together:
1. Primary crashes → switch to fallback, new session
2. Fallback fails at task 5 → switch to primary, retry
3. Primary fails at task 5 → plateau detected across agents → DEFER

**Single attempt counter across all failure types.**

---

## Network Error Handling

On network-related failures (see `retry-fallback-strategy.md`):
- Sleep 60 seconds before next attempt
- Network errors do NOT count toward plateau detection
- Always retry after network error (up to max attempts)