How to: Handle Errors

Introduction

This guide covers error handling when interacting with the C2C API. Gracefully handling HTTP errors is an essential component of a robust integration with any third-party service.

Types of Errors

Errors in your integration can originate from various sources, which we can categorize into four main groups:

I/O Errors: Originate from hardware operations on your device, such as failed read/write operations
Application Errors: Arise from issues within your application code
Network Errors: Occur within the networking stack and are communicated by your networking library
API Errors: Generated by Frame.io's backend services

Each error category requires specific handling considerations. This guide primarily focuses on API errors, though we'll address general strategies for the other categories as well.

How API Errors are Returned

Frame.io's API communicates errors through two primary mechanisms:

Status Codes: HTTP error codes that indicate the nature of the problem
Error Messages: Payload content that provides additional error details, especially when multiple error conditions share the same status code

HTTP status codes are standardized numerical responses that communicate the outcome of an HTTP request. For more information, see Mozilla's HTTP status code documentation or HTTP Cats for a more visual approach.

Each Frame.io API endpoint specifies an expected success status code—typically 200 (OK), 201 (Created), or 204 (No Content). You can verify success either by checking for these specific codes or by confirming the code falls within the 200-299 range.

Status codes above 399 indicate errors. Most API errors return 4XX codes (400-499), signifying client-side issues. Errors outside this range typically originate from network infrastructure between your device and our service, with the notable exception of 500 (Internal Server Error), which indicates an unexpected issue within our server.

Similarly, a 404 (Not Found) response might be generated by intermediate services rather than our backend, despite being a 4XX code.

If you encounter unexpected status codes, please notify our team.

Error Payload Schemas

Frame.io returns error details in two formats: simple and detailed. Your error handling logic should accommodate both formats.

Simple Error Schema

Here's an example of a failed request with an incorrect client_secret:

curl -X POST https://api.frame.io/v2/auth/device/code \
    --include \
    --header 'x-client-version: 2.0.0' \
    --form 'client_id=Some-Client-ID' \
    --form 'client_secret=bad_secret' \
    --form 'scope=asset_create offline'

Response:

HTTP/2 400
...

{"error":"invalid_client"}

The simple schema contains just a single field for error identification.

Detailed Error Schema

For comparison, here's a request without proper authorization:

curl -X POST https://api.frame.io/v2/devices/heartbeat \
    --header 'Authorization: Bearer bad-token' \
    --header 'x-client-version: 2.0.0' \
    | python -m json.tool

Response:

{
    "code": 409,
    "errors": [
        {
            "code": 409,
            "detail": "The channel you're uploading from is currently paused.",
            "status": 409,
            "title": "Channel Paused"
        }
    ],
    "message": "Channel Paused"
}

Detailed errors contain a message field that identifies the error type.

Determining Error Type

When handling Frame.io errors, first check for an error payload and then fall back to the HTTP status code if no payload is present.

Here's a basic error handling implementation:

# Dict of known error codes: native errors.
ERROR_STATUS_MAP = {
    429: SlowDownError,
    ...
}

# Dict of known error messages: native errros.
ERROR_MESSAGE_MAP = {
   "Channel Paused": ChannelPausedError,
   "invalid_client": InvalidClientError,
   "slow_down": SlowDownError,
   ...
}

def _c2c_extract_error_message(response):
    """
    Gets the error message from an error payload. Returns `None` 
    if an error payload is not found.
    """

    # Try to decode the payload, if it is not JSON return `None`
    try:
        payload = response.json() 
    except JSONDecodeError:
       return None

    # Try the simple error schema first.
    message = payload.get("error", default=None)
    if message is not None:
        return message

    # Now try the detailed schema. Return None if we do not find one.
    return payload.get("message", default=None)

def _c2c_error_type_from_response(response):
    """
    Converts a bad HTTP response into an error.
    """
    error_message = _c2c_extract_error_message(response)

    # try to do a lookup of the error type by message.
    error_type = ERROR_MESSAGE_MAP.get(error_message, default=None) 
    if error_type is not None:
        return error_type()

    # If not, try to do a lookup by error code.
    error_type = ERROR_STATUS_MAP.get(response.status_code, default=None)
    if error_type is not None:
        return error_type()

    # Otherwise we are going to return an `UnknownAPIError` to signal that we
    # encoutnered an error from Frame.io's backend servers, but do not know the
    # message and/or status code.
    return UnknownAPIError(message=error_message)

def raise_on_frameio_error(response, expected_status):
    """
    Raises a native error from an HTTP response if the response indicates an error
    occured. Expected status should be the status we expect to get (200, 201, 204, 
    etc).
    """

    # If the status code is less than `400`, then it is not an error status code.
    if response.status < 400:

        # Check that the status code is the one we expected, otherwise raise an
        # error.
        if response.status != expected_status:
            raise UnexpectedStatusError(
                expected=expected_status, received=response.status
            )

        return None

    # Otherwise convert and raise a native error.
    raise _c2c_error_type_from_response(response)

The error lookup tables referenced in this example are provided at the end of this guide.

AWS Errors

When uploading file chunks, you interact directly with AWS S3, which has its own error format. See AWS's common error documentation for details. As a general rule, non-fatal AWS errors should be retried at least once.

AWS errors are returned as XML:

<?xml version="1.0" encoding="UTF-8"?>
<Error>
  <Code>NoSuchKey</Code>
  <Message>The resource you requested does not exist</Message>
  <Resource>/mybucket/myfoto.jpg</Resource> 
  <RequestId>4442587FB7D0A2F9</RequestId>
</Error>

The Code element identifies the error type.

Retrying Errors

When to Retry

The error tables in this guide indicate which API errors should be retried. For non-API errors from I/O operations, networking libraries, or AWS, consider retrying those that may result from transient conditions. Network congestion, temporary server outages, or packet loss typically warrant retry attempts. Most networking libraries raise TimeoutError when a request takes too long, which is a prime candidate for retry.

When in Doubt, Retry Once

Computing environments can experience unpredictable issues. Even with errors that appear fatal, it's often worth making one retry attempt. Temporary system states, hardware anomalies (such as cosmic ray bit flips), or rare memory conditions can cause seemingly fatal errors that resolve on a second attempt.

Some errors, however, should not be retried. For example, a 409: CHANNEL PAUSED response when creating an asset indicates the device is paused and should not upload. This state is deliberate and unlikely to change from a retry.

Exponential Backoff

Frame.io implements rate limiting, and exceeding these limits produces either a 429: Slow Down error or a 400 status with this payload:

HTTP/2 400

{"error":"slow_down"}

When you receive these responses, implement exponential backoff for retries. A recommended formula for calculating delay (in seconds) is:

delay = min(2 ** attempt / 2, 32.0)

Where attempt starts at 0. This produces delays of 0.5s, 1s, 2s, 4s, 8s, 16s, 32s, with all subsequent attempts waiting 32 seconds.

Backoff jitter

We recommend adding randomness (jitter) to your backoff timing to prevent request synchronization across multiple devices recovering from the same error condition. This helps mitigate the thundering herd problem where many devices retry simultaneously after an outage. A good approach is to add a random offset between 0 and half the calculated delay: math.rand(0, delay // 2).

While exponential backoff is essential for rate-limiting errors, it's also beneficial for handling network and I/O failures generally. This approach allows temporary resource constraints to resolve without additional load from your retry attempts.

Detecting Disconnected Status

When network errors occur, they may indicate that Frame.io is unreachable because:

Your local network is down
Frame.io's services are experiencing issues
An intermediate network component is failing

It's important to detect these conditions. When an error suggests connectivity issues, implement a monitoring task that checks for service restoration and informs the user of the disconnection.

Waiting for Connection and Authorization

Design your application to avoid making unnecessary requests when the device is refreshing authorization, awaiting user authorization, or unable to reach Frame.io. This reduces network overhead and improves user experience.

Block all API calls (except to https://api.frame.io/health) when you detect a disconnected state. When connectivity issues arise, start a background task that polls the health endpoint and blocks further API calls until connectivity is restored.

Similarly, if token expiration occurs, block authorization-dependent calls until a new token is issued. If token refresh fails, alert the user to re-authenticate.

When polling for connection status, apply the same exponential backoff approach described earlier.

Request Timeouts

Configure appropriate timeout values for different types of requests:

Default: 15 seconds for basic requests
Authorization Refresh: 2 minutes to account for potential backend processing
File Chunk Upload: 5 minutes to accommodate slow networks when transferring larger data

Example Retry Handler

Here's a pseudocode implementation showing error handling with exponential backoff:

# List of errors we know are fatal and should not be retried.
FATAL_ERRORS = (
    ChannelPausedError,
    DevicesDisabledError,
    ...
)

# List of errors we know should be retried more than once.
RETRY_ERRORS = (
    TimeoutError,
    NotFoundError,
    SlowDownError,
    UnknownAPIError,
    ...
)

# List of errors that could be the result of Frame.io being unreachable.
DISCONNECTED_ERRORS = (
    TimeoutError,
    HttpClientError,
    ...
)

def retry_with_backoff(next_handler):
    """
    Middleware for retrying errors with exponential backoff.
    """

    def retry_handler(call, retry_count):
        """
        Handler for retrying c2c API calls with exponential backoff.
        """

        error = None

        # We will retry the call 8 times here, totalling 63.5 seconds +- ~32 seconds.
        for attempt in range(start=1, stop=retry_count + 1):

            # If we are attempting to reach an endpoint that requires authorization
            # we should wait unil we have valid authorization before attempting
            # a call. We need to do this each time in case our access_token
            # expires between attempts.
            C2C.wait_for_authorized(call)

            # Likewise, we should wait until we are connected to Frame.io to attempt
            # a call if we are not calling `https://api.frame.io/health`
            C2C.wait_for_connected(call)

            try:
                # Return the result on a success.
                return next_handler(call)
            except FATAL_ERRORS as error:
                # If we hit an error we know is fatal, raise the error without
                # retrying it.
                raise error

            except RETRY_ERRORS as error:
                # If we hit an error we know we should retry many times, continue,
                # but notify our client if we think we may have been disconnected.
                if type(error) in DISCONNECTED_ERRORS:
                    C2C.notify_disconnected()

            except BaseException as error:
                # Otherwise, do not retry the call more than once.
                if attempt > 1:
                    raise error

            # The delay for the next attempt should be no more than 32 seconds.
            # This algorithm will go: 0.5s, 1s, 2s, 4s, 8s, 16s, 32s, 32s, ...
            delay = min(2 ** attempt / 2, 32.0)

            # Add some randomness (jitter) to the delay (up to half the value of
            # the delay in either direction).
            delay += math.random(-delay, delay) / 2

            # Wait between retries
            sleep(delay)

        # If we have exhausted all retries,
        raise error

    return retry_handler

Error Tables

The following tables categorize Frame.io API errors and provide handling guidance. Here's what each column represents:

message: The error payload message identifier http code: The HTTP status code error type: A conceptual error category (detailed in the descriptions section) schema: The error payload format (simple or detailed) retry: Retry recommendation (yes for multiple attempts, once for a single retry, no for fatal errors)

Asterisks (*) indicate special considerations detailed in the descriptions section.

Frame.io Error Messages

Message	Error Type	HTTP Code	Schema	Retry
"access_denied"	AccessDenied	401	simple	once
"authorization_pending"	AuthorizationPending	400	simple	yes
"Channel Paused"	ChannelPaused	409	simple	no
"expired_token"	ExpiredToken	400	simple	no
"Invalid Argument"	InvalidArgument	422	detailed	no
"invalid_client"	InvalidClient	400	simple	no
"Invalid client version"	InvalidClientVersion	400	simple	no
"invalid_grant"	InvalidGrant	400	simple	no
"invalid_request"	InvalidRequest	400	simple	once
"Not Authorized"	UnauthorizedClient	401	detailed	no
"slow_down"	SlowDown	400	simple	yes
"unauthorized_client"	UnauthorizedClient	401	simple	yes*

Frame.io Status Codes

HTTP Code	Error Type	Retry
400	InvalidRequest	once
401	UnauthorizedClient	no
422	InvalidContentType	no
429	SlowDown	yes
500	InternalServerError	yes

AWS Errors

See AWS's documentation for detailed descriptions.

Error	Retry
InternalError	yes
OperationAborted	yes
RequestTimeout	yes
ServiceUnavailable	yes
SlowDown	yes
[All Other Errors]	once

Parsing similar AWS errors

Both SlowDown and ServiceUnavailable from AWS indicate request rate issues and can be treated similar to Frame.io's SlowDown error, implementing exponential backoff. Similarly, AWS's InternalError corresponds conceptually to InternalServerError in our API.

Descriptions

AccessDenied

Returned when a user declines authorization during device pairing.

AuthorizationPending

Indicates a user hasn't yet entered the device pairing code. Continue polling after the interval period specified in the device code response.

ChannelPaused

The device channel was paused when the asset was created. Do not attempt to upload this asset again.

ExpiredToken

The device pairing code has expired. Generate a new code and restart the pairing process.

InternalServerError

Indicates an unexpected backend issue. Retry once, and please report 500 errors to our team for investigation.

Note that some known issues return 500 errors when they should return InvalidRequest:

Attempting to upload to a non-existent device channel
Requesting an invalid custom chunk count

InvalidArgument

A payload parameter contained an invalid value. Verify the parameter values match API expectations.

InvalidContentType

The request's Content-Type header is unsupported. The API generally accepts:

form/multipart (authorization endpoints only)
application/x-www-form-urlencoded (all endpoints)
application/json (non-authorization endpoints)

InvalidClient

The provided credentials (client_id, client_secret, etc.) were not recognized. Verify your integration credentials.

InvalidClientVersion

The x-client-version header was either duplicated or contains an invalid semantic version.

InvalidGrant

The authorization grant type is invalid. Review the authorization guides for correct values.

InvalidRequest

The request parameters or payload format is incorrect. Verify field names and value formats.

If received during token refresh, your refresh token has expired and you must restart the authorization process.

SlowDown

You've exceeded request rate limits. Implement exponential backoff for subsequent requests. Note that making multiple device code requests on the same TCP connection can trigger this error—create new connections for each pairing request.

UnauthorizedClient

Typically indicates an expired or missing access_token. If you receive this error, refresh your token before retrying.

If encountered during token refresh, you must restart the authorization process and prompt the user to reconnect.

This error can also occur when accessing resources outside your device's authorization scope or when a project has disabled C2C devices. Verify that you've requested appropriate scopes during authorization.

If this error occurs during token refresh, the entire authorization process must be restarted with user intervention.

Next Steps

We encourage you to contact our team with any questions and proceed to the advanced uploads guide. We look forward to supporting your integration progress.

If you haven't already, please review the Implementing C2C: Setting Up guide before proceeding.

You'll need the access_token obtained during the authentication and authorization process.

This guide builds on the Basic Upload guide and Advanced Uploads guide.

How to: Handle Errors

Error tables and strategies

Overview

Integrations

Company

Support