A retry storm occurs when an agent responds to 429 rate-limit errors by immediately retrying, creating a feedback loop that amplifies load instead of recovering. Left unchecked, this can lock your agent out of the API for extended periods and degrade service for other consumers on shared infrastructure.
X-RateLimit-Remaining: 0 for extended periodsRetry-After header value keeps increasing with each rejected requestRetry storms typically start when an agent lacks proper backoff logic and retries immediately on any failure. Contributing factors include: no maximum retry cap, ignoring the Retry-After header, multiple concurrent operations all retrying independently, and no circuit breaker to halt calls after repeated failures.
Retry-After header value (in seconds). Do not send any request until this period has elapsed.Python retry logic with exponential backoff and Retry-After header support:
import time, random, requests
def call_with_backoff(url, payload, max_retries=4):
delay = 1.0
for attempt in range(max_retries):
resp = requests.post(url, json=payload)
if resp.status_code != 429:
return resp
# Respect Retry-After header if present
retry_after = resp.headers.get("Retry-After")
if retry_after:
delay = float(retry_after)
else:
delay = min(delay * 2, 60) # exponential, cap 60s
jitter = delay * random.uniform(0, 0.5)
print(f"429 received. Waiting {delay + jitter:.1f}s...")
time.sleep(delay + jitter)
raise Exception("Max retries exceeded — circuit open")You can also check rate-limit headers proactively before hitting the limit:
# Check remaining quota before sending a request
curl -s -o /dev/null -w "%{http_code}" \
-D - https://api.delx.ai/v1/a2a \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"message/send","params":{"message":{"role":"user","parts":[{"type":"text","text":"ping"}]}},"id":1}' \
| grep -i "x-ratelimit"
# Example response headers:
# X-RateLimit-Limit: 60
# X-RateLimit-Remaining: 45
# Retry-After: 0X-RateLimit-Remaining and Retry-After from every response. Throttle proactively when remaining is low, before you get a 429.After implementing backoff, track 429 frequency and response latency percentiles over one full operational cycle (24 hours minimum). Confirm that: (1) 429 responses are followed by appropriate delays, not immediate retries, (2) the agent eventually recovers and resumes normal operation, and (3) CPU/memory usage remains stable during rate-limited periods. If 429s still cluster, increase your base delay or reduce concurrency further.