Rate Limiting

Advanced: Rate Limiting

To ensure platform stability and fair resource allocation for all users, REALM AI enforces rate limits on API requests.

Understanding Limits

Scope: Limits are typically applied per API Key and may vary based on your subscription plan (e.g., Free, Pro, Enterprise).
Types: Different endpoints might have different limits based on their computational cost. For example, initiating a generation job might have a lower limit than fetching user profile data.
Time Window: Limits are usually defined over a specific time window (e.g., requests per minute, requests per hour).

Rate Limit Headers

Every API response includes the following headers to help you track your current status:

X-RateLimit-Limit: The maximum number of requests allowed in the current time window for the specific endpoint group.
X-RateLimit-Remaining: The number of requests remaining in the current time window.
X-RateLimit-Reset: The Unix timestamp (in seconds) indicating when the current rate limit window resets.

Handling 429 Errors

If you exceed the rate limit for an endpoint, the API will return an HTTP 429 Too Many Requests status code.

When encountering a 429 error:

Stop Sending Requests: Immediately cease making requests to the endpoint that returned the error.
Check Reset Time: Inspect the X-RateLimit-Reset header to determine when you can resume sending requests.
Implement Backoff: Use an exponential backoff strategy for retrying requests. Start with a small delay (e.g., 1 second) and exponentially increase the delay for subsequent retries until the reset time is reached or the request succeeds.

Example Exponential Backoff (Conceptual):

retry_delay = 1 # seconds
max_retries = 5
retry_count = 0

while retry_count < max_retries:
    response = make_api_request()
    if response.status_code == 429:
        reset_time = int(response.headers.get('X-RateLimit-Reset', 0))
        current_time = time.time()
        wait_time = max(retry_delay, reset_time - current_time)
        print(f"Rate limit hit. Waiting for {wait_time:.2f} seconds.")
        time.sleep(wait_time)
        retry_delay *= 2 # Exponential increase
        retry_count += 1
    elif response.ok:
        # Success!
        break
    else:
        # Handle other errors
        break

Optimizing Usage

Use Webhooks: Prefer webhooks over polling for asynchronous job updates to reduce unnecessary requests.
Batch Operations: Utilize batch endpoints (where available) to perform multiple actions in a single request.
Caching: Cache responses appropriately, especially for data that doesn't change frequently (e.g., model details, user profile).
Upgrade Plan: If your application consistently requires higher limits, consider upgrading your subscription plan.

(Specific rate limits for different plans and endpoints may be detailed here or in your account dashboard.)

PreviousIdempotency NextSecurity

Last updated 2 months ago