Blog

Social Media API Rate Limits: Backoff, Jitter, and Queueing Across Platforms

Erwan Prost

Erwan Prost

· 14 min read · Updated

Social media API rate limits are caps each platform places on how many calls your application can make in a given window, enforced by returning HTTP 429 (and sometimes 4xx/5xx variants) once you cross them. Handling them well is not about knowing every platform's exact number. It is about building one ingestion path that treats limits as expected, not exceptional, on four primitives:

  • Respect the 429 status and the Retry-After header.
  • Back off exponentially with randomized jitter so retries do not synchronize.
  • Queue writes so a burst becomes a steady stream.
  • Prefer webhooks over polling so you spend almost no quota when nothing is happening.

This guide covers the strategy and the architecture of rate-limit handling across Instagram, Facebook, Threads, TikTok, LinkedIn, X/Twitter, and YouTube, with working code, not the per-endpoint reference numbers (those live in each platform's docs and drift constantly).

What are social media API rate limits and why do naive integrations break on them?

A rate limit is a ceiling on request volume per time window that a platform enforces by rejecting calls once you exceed it, almost always with HTTP 429 Too Many Requests and frequently a Retry-After header telling you how long to wait. The Meta Graph API, which powers Instagram, Facebook, and Threads, scales its application rate limit as roughly 200 calls per hour multiplied by the app's number of daily active users, an app-wide ceiling rather than a per-user budget (Meta, 2026). That single detail is why naive integrations break.

Naive integrations break because they treat the limit as a per-user allowance and the 429 as an error. Two failure modes follow. First, an app-wide ceiling means one noisy account, or one tight polling loop iterating every connected account, can exhaust the shared budget and starve every other account behind the same app. Second, code that catches a 429, logs it, and retries immediately turns one rejected call into a retry storm that keeps the app pinned against the ceiling. The platform is not telling you something went wrong. It is telling you to slow down, and a correct client slows down.

There is a quieter third failure mode: the thundering herd. If a hundred queued jobs all hit a 429 at the same instant and all retry after exactly the same fixed delay, they collide again in lockstep, and again, indefinitely. Fixed-delay retries do not converge under contention. The fix for all three is the same small set of primitives, covered below, applied once at the ingestion layer rather than scattered across call sites.

What do per-platform rate limits look like in 2026?

The only precise figure worth committing to is the app-wide Meta ceiling described above. For every other platform the exact numbers move between API versions and access tiers, so the durable thing to internalize is the shape, not the digits. Most platforms enforce a sliding hourly or daily window and signal exhaustion with HTTP 429 plus, often, a Retry-After header or rate-limit headers.

Read the table below as a contract surface, not a quota sheet. The cells describe the behavior your client must handle (window style, the rejection signal, whether a wait hint is provided), deliberately without invented exact limits, because a number copied into a blog post is wrong within a quarter and a fabricated one is wrong immediately. When you need the live figure for a specific endpoint, the platform's own rate-limit documentation is the source of truth, and a unified layer should pass the underlying signal through rather than hide it.

PlatformLimit window (general shape)Exhaustion signalWait hint
Instagram / Facebook / Threads (Meta Graph API)App-wide ceiling: ~200 calls/hour x daily active users (app-level, not per-user)HTTP 4xx with an X-App-Usage / throttling payloadUsage headers indicate proximity to the ceiling
X / TwitterSliding window per endpoint and access tierHTTP 429 Too Many RequestsRetry-After / rate-limit reset header commonly present
TikTokPer-endpoint window, varies by API surfaceHTTP 429 (or error code in body)Retry-After / documented cooldown
LinkedInDaily and/or per-application throttle by endpointHTTP 429 Too Many RequestsDocumented daily reset; header varies by endpoint
YouTube (Data API)Daily quota of unit costs per project (operations cost different amounts)HTTP 403 quotaExceeded / rateLimitExceededQuota resets on a fixed daily schedule

The takeaway from the table is that the signals differ enough that per-call-site handling is a maintenance trap, but they overlap enough that one normalized handler can cover all of them: detect rejection, find a wait hint if one exists, otherwise compute a backoff, and retry within a bounded budget. That is the next section.

How do exponential backoff, jitter, and Retry-After fit together?

These three are a precedence chain, not three options. When a platform returns 429 with a Retry-After header, that value is authoritative: wait exactly that long, because the server has told you when it will accept you again. Only when there is no Retry-After do you compute your own delay, and that delay should grow exponentially per attempt (roughly doubling) and carry randomized jitter so concurrent clients do not retry in lockstep. SocialAPI.ai's own webhook delivery follows this exact escalating shape: 5 attempts at immediate, ~30 seconds, ~5 minutes, ~30 minutes, and ~3 hours, then the delivery is marked failed.

Exponential backoff with jitter (SocialAPI.ai retry schedule)
Timeline diagram. Axis: time. Events: 429; Attempt 1 (immediate); Attempt 2 (~30 seconds); Attempt 3 (~5 minutes); Attempt 4 (~30 minutes); Attempt 5 (~3 hours); failed.

The escalating retry schedule (SocialAPI.ai webhook delivery uses this exact shape: immediate, ~30s, ~5m, ~30m, ~3h). After 5 attempts the delivery is marked failed.

Jitter is the part most implementations skip and most outages trace back to. Without it, every client that failed at the same moment retries at the same moment, so the recovery traffic is as spiky as the failure. Full jitter (pick a random delay between zero and the computed backoff ceiling) spreads the recovery load smoothly and converges fastest under contention. The helper below encodes the full precedence: honor Retry-After first, otherwise exponential backoff with full jitter, with a hard attempt cap so a persistently failing call eventually surfaces as an error instead of looping forever.

javascript
// Retry-aware fetch for social platform APIs.
// Precedence: Retry-After header > exponential backoff + full jitter.
// Hard attempt cap so a persistent failure surfaces, not loops.

const MAX_ATTEMPTS = 5;
const BASE_MS = 1000;   // first backoff ceiling: ~1s
const CAP_MS = 60000;   // never wait more than 60s between attempts

function sleep(ms) {
  return new Promise((r) => setTimeout(r, ms));
}

// Full jitter: random point in [0, min(cap, base * 2^attempt)].
function backoffWithJitter(attempt) {
  const ceiling = Math.min(CAP_MS, BASE_MS * 2 ** attempt);
  return Math.random() * ceiling;
}

function retryAfterMs(res) {
  const h = res.headers.get("retry-after");
  if (!h) return null;
  const secs = Number(h);
  if (!Number.isNaN(secs)) return secs * 1000;      // delta-seconds form
  const when = Date.parse(h);                       // HTTP-date form
  return Number.isNaN(when) ? null : Math.max(0, when - Date.now());
}

async function fetchWithRetry(url, init = {}) {
  for (let attempt = 0; attempt < MAX_ATTEMPTS; attempt++) {
    const res = await fetch(url, init);

    // 429, and some platforms signal throttling via 403/4xx.
    const throttled =
      res.status === 429 ||
      (res.status === 403 && /quota|rate/i.test(await res.clone().text()));

    if (!throttled) return res;

    if (attempt === MAX_ATTEMPTS - 1) {
      throw new Error(`Rate limited after ${MAX_ATTEMPTS} attempts: ${url}`);
    }

    // Server's hint wins; otherwise back off with jitter.
    const wait = retryAfterMs(res) ?? backoffWithJitter(attempt);
    await sleep(wait);
  }
}

Two correctness details in that helper are easy to get wrong. Retry-After has two legal forms, an integer number of seconds and an HTTP-date, and a parser that handles only the integer form silently mis-waits on platforms that send the date form. And the jitter must be applied to the delay itself, not added on top of a fixed delay; full jitter means the random draw is the whole wait, which is what actually decorrelates concurrent retries.

How do you design for limits with queueing, caching, and webhooks?

Backoff is the reactive half of rate-limit handling; the proactive half is never generating the excess call in the first place. Three structural patterns do that work: queue writes behind a rate limiter so a burst is smoothed into a steady stream below the ceiling, cache reads so repeat lookups do not each cost a call, and prefer webhooks to polling so you spend quota proportional to real events instead of proportional to the clock. A webhook costs zero quota when the inbox is empty; a poll costs the same whether or not anything happened.

Queue every write. Replies, moderation actions, and DM sends should not call the platform inline from a request handler. Push them onto a queue drained by a worker that holds a token-bucket or leaky-bucket limiter sized below the platform ceiling, so a thousand queued replies become a controlled trickle the API accepts rather than a thousand simultaneous calls it rejects. The 429 handler from the previous section still wraps each individual send as a safety net, but the limiter means it almost never has to fire. This is also where idempotency matters: a retried job must dedupe on a stable id so a backoff-driven retry never double-posts.

Cache reads with a short TTL keyed on the resource. Post metrics, account metadata, and thread lookups are read far more often than they change; a 30-to-120-second cache collapses N reads into one upstream call without users noticing staleness on data that updates slowly anyway. The single highest-leverage move, though, is replacing read polling with push. Polling every connected account every minute to catch new comments is the canonical way to burn an app-wide ceiling on empty inboxes; a webhook fires only when something actually arrives. The full webhook-over-poll argument, including the trade-offs, is in the social listening webhooks playbook, and the inbound moderation path that consumes those events is in the comment moderation API guide.

Why does a unified API absorb per-platform limit complexity?

A unified API does not raise any platform's ceiling, no abstraction can, but it relocates where the rate-limit logic lives: from your code, repeated per platform, down into one layer that handles accounting, backoff, and the 429-to-error contract once. Because the app-wide Meta ceiling described above is shared across every connected account rather than budgeted per user, the layer that owns the platform app is the only layer that can budget that shared ceiling correctly. That is the structural reason this complexity is better absorbed than re-implemented.

Without a unified layer, you write the precedence chain (Retry-After, then exponential backoff, then jitter, then attempt cap) once per platform, plus a different rejection-signal parser for each (429 here, a 403 quota error there, a usage-header payload elsewhere), plus separate token-bucket accounting per app. With a unified social inbox API, that logic is written once: you call one set of endpoints, the layer beneath you maps each platform's native throttling signal into one predictable response, and the queueing and backoff are applied below your code. Your retry policy stops being eight policies that drift independently and becomes one.

SocialAPI.ai applies exactly the backoff philosophy this guide argues for. Failed webhook deliveries retry on a bounded, escalating schedule (5 attempts at immediate, ~30 seconds, ~5 minutes, ~30 minutes, ~3 hours) and then surface as a failed delivery you can inspect and replay, rather than retrying forever or dropping silently. Plan tiers govern call volume by resource rather than by raw request count; the per-tier breakdown is on the rate limits by plan page, and the exact endpoint-level behavior, including which throttling signals pass through, is in the API reference.

Frequently asked questions

What is a social media API rate limit?
A social media API rate limit is a cap each platform sets on how many requests your application may make in a time window. Crossing it gets your calls rejected, almost always with HTTP 429 Too Many Requests and often a Retry-After header telling you how long to wait. The Meta Graph API, for example, scales its application rate limit as roughly 200 calls per hour multiplied by the app's daily active users, an app-wide ceiling rather than a per-user budget. The limit is a signal to slow down, not an error to retry immediately.
What should I do when an API returns HTTP 429?
First check for a Retry-After header. If present, wait exactly that long, because the server has told you when it will accept you again, then retry. If there is no Retry-After, compute your own delay that grows exponentially per attempt (roughly doubling) and carries randomized jitter so concurrent clients do not retry in lockstep, and cap the total number of attempts so a persistent failure surfaces as an error instead of looping forever. Never catch a 429 and retry immediately; that turns one rejection into a retry storm.
Why does exponential backoff need jitter?
Without jitter, every client that failed at the same moment retries at the same moment, so the recovery traffic is as spiky as the failure that caused it (the thundering-herd problem). Fixed-delay retries under contention collide in lockstep indefinitely. Full jitter, picking a random delay between zero and the computed backoff ceiling, spreads recovery load smoothly and converges fastest. The randomness must be the whole wait, not an addition on top of a fixed delay, or it does not decorrelate concurrent retries.
Are social media API rate limits per user or per app?
It depends on the platform and you must not assume per-user. The Meta Graph API application rate limit is app-wide: roughly 200 calls per hour multiplied by the app's daily active users, an app-wide ceiling rather than a per-user budget (Meta, 2026). That means one noisy account or one tight polling loop can exhaust a shared budget and starve every other account behind the same app. Other platforms mix per-endpoint, per-application, and per-project quotas, so the durable rule is to read each platform's own rate-limit documentation and never infer the model from one platform.
How do I reduce how often I hit rate limits in the first place?
Generate fewer calls structurally. Queue write operations behind a token-bucket or leaky-bucket limiter sized below the platform ceiling so a burst becomes a steady stream. Cache reads that change slowly (metrics, metadata) with a short TTL so repeat lookups do not each cost a call. Most importantly, replace read polling with webhooks: a webhook costs zero quota when nothing happened, while polling costs the same whether or not there was anything new. Backoff handles the calls you still make; these patterns stop you making the unnecessary ones.
Does a unified social media API remove rate limits?
No. No abstraction can raise a platform's ceiling, because the limit is enforced by the platform, not by the layer in front of it. What a unified API does is relocate the handling: the backoff precedence, the per-platform rejection-signal parsing, and the shared-ceiling accounting are implemented once below your code instead of once per platform in it. You still operate within each platform's limits, but you write and maintain one retry and queueing policy instead of eight that drift apart over time.
What does SocialAPI.ai do when a delivery keeps failing?
Failed webhook deliveries are retried on a bounded, escalating schedule: 5 attempts at immediate, about 30 seconds, about 5 minutes, about 30 minutes, and about 3 hours. After the fifth attempt the delivery is marked failed rather than retried forever or dropped silently, and you can inspect and replay it through the deliveries API. This is the same exponential-backoff philosophy this guide recommends for handling platform rate limits in your own client code.

Rate-limit handling is not platform trivia you memorize; it is one small, well-understood set of primitives applied at the ingestion layer: honor Retry-After, back off exponentially with jitter, cap attempts, queue writes below the ceiling, cache slow-changing reads, and prefer webhooks to polling. The reason to centralize it, in your own code or behind a unified API, is that the Meta ceiling is app-wide and the per-platform signals diverge, so eight copies of this logic drift while one copy stays correct. Start from the unified social inbox API model, read the social listening webhooks playbook for the polling-replacement side, and check the SocialAPI.ai docs for the current endpoint-level behavior before you write the ingestion path.

Third-party statistics in this guide are linked inline so every number can be checked at its source. Primary source: Meta Graph API: rate limiting overview (app-level ceiling of ~200 calls/hour times daily active users, not a per-user budget)

Get started today

Ready to unify your social interactions?

Free tier available · No credit card required · Ships with MCP server

We use essential cookies for security, and analytics cookies (PostHog) with your consent. Privacy Policy.