How to: Upload (Advanced)

Strategies and requirement for making a reliable uploader


Introduction

Uploading assets is the bread and butter of every C2C integration, and there’s a number of considerations that need to be taken when uploading assets reliably. The guide is your black-belt training for making a reliable, resilient, and fast uploader.

What will I need?

If you haven’t read the Implementing C2C: Setting Up guide, give it a quick glance before moving on!

You will also need the access_token you received during the C2C hardware or C2C Application authentication and authorization guide.

In this guide we will be using the same test asset we used in the Basic Uploads guide.

Advanced asset parameters

When creating a new asset in Frame.io, there are a few advanced parameters that can be passed to modify upload behavior. The offset parameter is particularly important to get right, so please make sure to give it particular attention!

Offset - handling paused devices

It is critical that you supply a proper offset value. The offset is how we determine when a piece of media was made, and is critical to ensuring that your device does not upload media it is not supposed to. When a device is paused in Frame.io, the user is telling the device not to upload media created while paused. Read more on how the paused feature works here.

Our API is designed so that you do not have to be aware of when a device is paused or unpaused. Instead, when you upload a file, you supply how many seconds ago the file was created. Our server takes that value, compares it against the windows for which a device was paused, and returns an error if your device was paused when the media was created.

Let's check this feature out. In the C2C Connections tab, go to the three-dot menu on your device, and click Pause.

Now when we try to upload a chunk:

Shell
{
curl -X POST https://api.frame.io/v2/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --data-binary @- <<'__JSON__' 
        {
            "name": "C2C_TEST_CLIP.mp4", 
            "filetype": "video/mp4", 
            "filesize": 21136250,
            "offset": 0
        }
__JSON__
} | python -m json.tool
API endpoint specificaton

Docs for /v2/assets can be found here

... we will get the following error:

JSON
{
    "code": 409,
    "errors": [
        {
            "code": 409,
            "detail": "The channel you're uploading from is currently paused.",
            "status": 409,
            "title": "Channel Paused"
        }
    ],
    "message": "Channel Paused"
}

The device has been paused!

Now let's unpause our device! If we make the same command again, we will be able to create the asset.

But that's not right! We made this asset while the device was paused. We need to change our request to have an offset that puts the asset within the paused window:

Shell
{
curl -X POST https://api.frame.io/v2/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --data-binary @- <<'__JSON__' 
        {
            "name": "C2C_TEST_CLIP.mp4", 
            "filetype": "video/mp4", 
            "filesize": 21136250,
            "offset": 60
        }
__JSON__
} | python -m json.tool

We are telling Fram.io's backend that this asset was created 60 seconds ago — during the period during which our device was paused — and will get the proper Channel Paused error back.

It's CRITICAL that the correct offset be set when uploading your media to Frame.io, or media that the user wishes to keep out of their project's may be uploaded by accident. This could include sensitive IP, actor nudity, or any other critically sensitive footage.

Offset and retries

When retrying a failed call, make sure to update your offset. If you are retrying a failed call to create an asset over a long period of time, you may unintentionally let your offset drift out of the appropriate paused window.

Uploading to a specific channel

If your device has more than one channel, then you can specify which channel a piece of media should be uploaded on. Let's upload a file to channel 2:

Shell
{
curl -X POST https://api.frame.io/v2/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --data-binary @- <<'__JSON__' 
        {
            "name": "C2C_TEST_CLIP.mp4", 
            "filetype": "video/mp4", 
            "filesize": 21136250,
            "offset": -10,
            "channel": 2
        }
__JSON__
} | python -m json.tool

The default channel is 0 if none is supplied. Most integrations will not need to supply this value.

Requesting a custom chunk count

When registering your asset with S3, our backend splits up your file with the goal to keep each chunk at around ~25MB. Some integrations may find that this size is too large for highly congested networks and wish to request a higher number of smaller chunks. You can request a specific chunk count by including a parts parameter like so:

Shell
{
curl -X POST https://api.frame.io/v2/assets \
    --header 'Authorization: Bearer [access_token]' \
    --header 'Content-Type: application/json' \
    --data-binary @- <<'__JSON__'
        {
            "name": "C2C_TEST_CLIP.mp4",
            "filetype": "video/mp4",
            "filesize": 21136250,
            "offset": 0,
            "parts": 4
        }
__JSON__
} | python -m json.tool

Now when we get our response payload, we will have four URLs:

JSON
{
    ...
    "upload_urls": [
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-01-path]",
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-02-path]",
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-03-path]",
        "https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-04-path]"
    ],
    ...
}

And our chunk size will be:

Python
math.ceiling(float(21136250) / float(4))
# 5284063 bytes

... with our last chunk being 5284061 bytes (21136250 - 5284063 * 3).

If you choose to request custom part counts there are a number of limitations you need to be aware of, which are listed in this S3 documentation page. The high-level restrictions are:

- Each part can be no smaller than 5 MiB (5,242,880 bytes), with the exception of the last part, which can be as small as required.
- There can be no more than 10,000 parts.

If you request too many parts (so your chunk size falls below 5 MiB), you will get a 500: INTERNAL SERVER ERROR:

{
    "code": 500,
    "errors": [
        {
            "code": 500,
            "detail": "There was a problem with your request",
            "status": 500,
            "title": "Something went wrong"
        }
    ],
    "message": "Something went wrong"
}

When requesting your own part count, be very careful that you are conforming to S3's requirements.

Uploading efficiently

Our devices often find themselves in some of the worst network conditions imaginable, so we want to make sure our uploads are as efficient as possible — pushing the most bits with the fewest resources. Let's go over some tips and tricks for building an efficient uploader.

TCP connection re-use / pooling

Crating an encrypted connection requires a lot of negotiating. When opening a TCP connection to Frame.io, your app will spend a fair amount of time chatting with our server about its weekend in-between the actual data transfers. A serial set of uploads may looks like this:

Your App: Oh hey! Got something for you! I know [these] languages.
Frame.io: 'sup. Cool, I know [these] ones.
Your App: Let's use this string to make a code so we can talk.
Frame.io: Cool give me a second.
Frame.io: Alright, I worked out our code.
Your App: Me too!
Your App: Here's part of a file! [SCREAMS IN BINARY]
Frame.io: Got it, thank you!
Your App: That's it, thanks!

...

Your App: Oh hey! Got something else for you! I know [these] languages.
Frame.io: 'sup. I've never met you before in my life, so here's the ones I know...

Only one of those steps is actually sending your file data! One way to cut down on upload time is by re-using TCP connections so that we only have to negotiate our encryption once. Most HTTP libraries will come with a Client or Session abstraction that will keep one or more TCP connections alive and re-use them to make new requests.

That makes your chatter look more like this:

Your App: Oh hey! Got something for you! I know [these] languages.
Frame.io: 'sup. Cool, I know [these] ones.
Your App: Let's use this string to make a code so we can talk.
Frame.io: Cool give me a second.
Frame.io: Alright, I worked out our code.
Your App: Me too!
Your App: Here's part of a file! [SCREAMS IN BINARY]
Frame.io: Got it, thank you!
Your App: Oh, I have another part! (deep inhale) [SCREAMS IN BINARY]
Your App: Got that one too!
Your App: Oh, and a third!
...

The overhead of negotiating your encryption only happens once until the connection is closed.

TCP handshake reference

If you want a more technical, less tongue-in-cheek reference for what’s happening during a TCP / HTTP handshake, Cloudflare has a great blogpost on it.

curl is capable or re-using tcp connections, so let’s try that out! First, make another asset in Frame.io like we did in the last guide,

Next, let’s split up our chunks into their own files so they are a little bit easier to pass to curl (note: this is not an efficient workflow for production, just something we are doing to illustrate this example):

Shell
head -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 > "C2C_TEST_CLIP-Chunk01"
tail -c 10568125 ~/Downloads/C2C_TEST_CLIP.mp4 > "C2C_TEST_CLIP-Chunk02"

Now we can use the —next argument to send two requests over the same TCP connection to Frame.io:

Shell
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-1-path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk01 \
--next -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-2-path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk02

Now let’s compare that to doing them on separate TCP connections using two separate curl commands:

Shell
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-1-path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk01 \
&& curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-2-path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk02
Reusing chunk URLs

You can upload a file chunk more than once, so feel free to re-use your urls from one example to another.

On the Macbook this guide was written on, re-using the TCP connection took 14 seconds, while making separate requests took 17 seconds., that’s almost 20% faster!

Parallel uploads

But how can we make this even faster? Most HTTP requests will not be capable of saturating your network’s bandwidth. By uploading multiple chunks at the same time, we can significantly cut down how long it takes:

Shell
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-1-path] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk01 \
& \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[part-2-path]\
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk02 \
&

When you bandwidth supports it, this should take only as long as the slowest request.

When uploading in parallel, generally 2 uploads per CPU core of your device will help achieve maximum throughput. Any more than that and you will start to lose performance from too many requests fighting for the same resources. Do not blindly launch parallel requests without a limit, as it may negatively impact your performance.

Parallel upload speeds

Depending on your network conditions, uploading in parallel may actually be slower than uploading sequentially. Very advanced integrations may monitor their throughput and raise / lower the number of parallel requests as needed. The terminal also handles background tasks differently from the way your language will handle parallelism. These examples are meant to show concepts, so do not take their profiling as an indication that this approach will be faster or slower in your language of choice. Always profile in your actual app!

Putting them both together

The highest possible throughput will come from re-using a pool of TCP connections when making parallel requests. Let’s make a second test asset, the. upload each in their own process, but each uploading their chunks sequentially:

Shell
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[asset01-chunk01] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk01 \
--next -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[asset01-chunk02] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk02 \
& \
curl -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[asset02-chunk01] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk01 \
--next -X PUT https://frameio-uploads-production.s3-accelerate.amazonaws.com/parts/[asset02-chunk02] \
        --include \
        --header 'content-type: video/mp4' \
        --header 'x-amz-acl: private' \
        --data-binary @C2C_TEST_CLIP-Chunk02 \
&
Lean on your HTTP library

We’ll repeat that your HTTP library is likely to have some sort of abstraction to handle connection pooling + parallelism for you, and will be a good resource to lean on. Try profiling a few different strategies from your library and see what works best!

Tracking progress

We require that our integrations supply basic upload progress to the user. Chunk-level granularity is acceptable, so if your asset has 3 chunks, as each chunk is uploaded successfully, your progress can jump from 0% → 33% → 66% → 100%.

Finer-grained progress is nice for the end-user, but implementation will be highly dependent on your HTTP library, and therefore out of scope for these guides. Please reach out to us if you would like to implement better progress bars, but get stuck on how to execute them!

Uploading reliably

If you have not, please review our errors guide. When we make C2C calls in these examples, we are assuming that under the hood, each call is already handling errors the way we describe in that article (exponential backoff, waiting on network connection & authorization, etc).

To write a good uploader that functions across multiple power cycles, there are additional steps we need to take beyond the techniques for retrying a single request.

Creating an upload queue

In the real world, your device may be creating media faster than it can upload it, you may lose your connection to Frame.io for long periods of time, etc.

Often it’s helpful to have an upload queue, where the process that creates your media registers it with another process in your app that is responsible for uploading the media, and will do so when it is able to get to it.

To make things extra robust, it might be helpful to have two queues:

  • A media queue to take local files and register an asset with Frame.io for each.
  • A chunk queue for taking chunk URLs from the asset creator and uploading file chunks.

Let’s look at some pseudocode:

Python
# Where we are going to queue new files.
FILE_QUEUE = Queue()

# Where we are going to queue new chunks.
CHUNK_QUEUE = Queue()

# The http session that will handle TCP 
# connection pooling for us.
HTTP_SESSION = http.Session()

def take_picture():
    """Snaps a picture for the user."""

    image = MY_DEVICE.capture()
    file_path = MY_DEVICE.write_image(image)
    FILE_QUEUE.add(file_path)

def task_register_assets():
    """
    Pulls snapped pictures from the FILE_QUEUE, registers
    with Frame.io, and adds the chunks to the CHUNK_QUEUE.
    """
    while True:
        # Get the latest file added to the queue and register 
        # a C2C Asset for it.
        new_file = FILE_QUEUE.get()
        asset = c2c.crete_asset_for_file(HTTP_SESSION, new_file)

        # Calculate the size for each chunk
        chunk_size = c2c.calculate_chunk_size(asset, new_file)

        # Create a message for each chunk with it's parameters 
        # and add it to the 
        # queue.
        chunk_start = 0
        for chunk_url in asset.upload_urls:
            message = {
                "file_path": new_file,
                "chunk_url": chunk_url,
                "chunk_start": chunk_start,
                "chunk_size": chunk_size,
            }

            # Put the message in the queue and 
            CHUNK_QUEUE.put(message)
            chunk_start += chunk_size

def task_upload_chunk():
    """Takes a chunks and uploads them."""

    while True:
        info = CHUNK_QUEUE.get()
        c2c.upload_chunk(HTTP_SESSION, info)

def launch_upload_tasks():
    """Lauches our Frame.io upload tasks."""
    # Create a list to hold all of our tasks.
    tasks = list()

    # Create one task for registering assets.
    asset_task = run_task_in_thread(task_register_assets)
    tasks.append(asset_task)

    # Create 2 tasks per CPU core for uploading chunks.
    for _ in range(0, GET_CPU_COUNT() * 2):
        chunk_task = run_task_in_thread(task_upload_chunk)
        tasks.append(chunk_task)

    # Run these tasks until shutdown
    run_forever(tasks)
Error handling

In the above example, we assume that the functions invoked for c2c calls are handling errors as discussed in the errors guide.

Persistent queuing across power cycles

The above example works great when the device is on. But what happens if the device is powered down before the assets can be uploaded? We need to make sure that when the device powers down, it is able to pick up where it left off when it comes back online. To do this, we need to have a way to persist our queue to storage between cycles. The easiest way to do this is to use an embedded database like SQLite.

When implementing a persistent queue, you will need your persistence layer to be able to do the following:

  • Add newly created files for upload.
  • Mark when an asset has been created in Frame.io.
  • Mark when an asset could not be created due to an error.
  • Store file chunk info for uploader tasks.
  • Fetch the next chunk that should be uploaded.
  • Mark when a chunk has been successfully uploaded.
  • Mark when a chunk could not be uploaded due to an error.
  • Get list of files and their uploaded status to display to the user.

Let’s rewrite the example from the previous section using a persistent store:

Python
# Our perssitence layer for queuing uploads. Might use SQLite or another
# database under the hood
C2C_UPLOAD_STORE = NewC2CUploadStore()

# The http session that will handle TCP 
# connection pooling for us.
HTTP_SESSION = http.Session()

def take_picture():
    """Snaps a picture for the user."""

    image = MY_DEVICE.capture()
    file_path = MY_DEVICE.write_image(image)

    # Add a new file to the store.
    C2C_UPLOAD_STORE.add_file(file_path)

def task_register_assets():
    """
    Pulls snapped pictures from the FILE_QUEUE, registers
    with Frame.io, and adds the chunks to the CHUNK_QUEUE.
    """
    while True:
        # Get the latest file path from our store.
        file_record = C2C_UPLOAD_STORE.get_file()

        try:
            asset = c2c.crete_asset_for_file(HTTP_SESSION, file_record)
            chunk_size = c2c.calculate_chunk_size(asset, file_record)

            chunk_start = 0
            for chunk_url in asset.upload_urls:
                message = {
                    "file_path": file_record,
                    "chunk_url": chunk_url,
                    "chunk_start": chunk_start,
                    "chunk_size": chunk_size,
                }

                # Put the chunk job in the store.
                C2C_UPLOAD_STORE.new_chunk(message)
                chunk_start += chunk_size
        except BaseException as error:
            # Tell the store we ran into an error so we can retry
            # later if needed.
            C2C_UPLOAD_STORE.file_asset_create_error(file_record, error)
        else:
            # Tell the store we successfully created our asset.
            C2C_UPLOAD_STORE.file_asset_created(file_record)

def task_upload_chunk():
    """Takes a chunks and uploads them."""

    while True:
        # Get the next chunk from our store. Our store should
        # check this chunk out so it will not be fetched by
        # other uploader tasks.
        chunk_record = C2C_UPLOAD_STORE.get_chunk()

        try:
            c2c.upload_chunk(HTTP_SESSION, chunk_record)
        except BaseException as error:
            # If we get an error, mark it in the store so we
            # can try again later.
            C2C_UPLOAD_STORE.chunk_error(chunk_record, error)
        else:
            # Mark success of the chunk.
            C2C_UPLOAD_STORE.chunk_success(chunk_record)

def launch_upload_tasks():
    """Lauches our Frame.io upload tasks."""
    # Create a list to hold all of our tasks.
    tasks = list()

    # Create one task for registering assets.
    asset_task = run_task_in_thread(task_register_assets)
    tasks.append(asset_task)

    # Create 2 tasks per CPU core for uploading chunks.
    for _ in range(0, GET_CPU_COUNT() * 2):
        chunk_task = run_task_in_thread(task_upload_chunk)
        tasks.append(chunk_task)

    # Run these tasks until shutdown
    run_forever(tasks)

Now, since we are interacting with a persistent store, when our integration loses power it can pick back up where it left off.

Tracking errors

The most important thing your uploader should do is track when errors happen. If you retry a C2C API call 10 or so times as detailed in the errors guide, it’s sometimes a good idea to mark that chunk as errored in your persistence store, and move on to any other chunks in your backlog. This helps mitigate problems from poisoned uploads (see below).

When an upload hits a fatal error as detailed in the previous guide, make sure to mark that it should not be retried.

Killing stalled uploads

Your chunk upload task should have some maximum time limit before a supervisor kills it and spins up a new one. 30 minutes, say. This helps reduce the chance that all your uploader tasks get stuck and blocks all future uploads.

Retrying silent failures

If your device loses power during an upload or the process uploading a chunk otherwise unexpectedly exits, it may not have time to report that an error occurred. When you fetch a chunk from your queue to upload, you should store the time that the chunk was fetched. If the chunk’s success or failure is not reported in some reasonable amount of time (30 minutes, say) that chunk should become available again for an upload process to grab it, otherwise you may skip uploading chunks that were in-flight when a catastrophic error occurred.

Mitigating poisoned uploads

A poisoned value is used to refer to a message or a job in a queue that always results in a failure due to some unforeseen circumstance. If poisoned values are allowed to re-queue themselves, a system can grind to halt as all processes get stuck endlessly trying to execute an impossible task. In the errors guide, we separated out what types of errors should be re-tried, but sometimes that isn’t enough.

Poisoned uploads can be caused by:

  • Corrupt data on disk causing I/O errors that can normally be retried.
  • A process hitting an error so catastrophic it cannot communicate that an error occurred.
  • An error that is normally retriable being raised from a permanently bad state.

Corrupt data in an HTTP request, for example, could cause a proxy along the route to stall and always return a TimeoutError, which we would normally want to retry. But if we are dealing with corrupt media, we may always trigger a TimeoutError, which could cause our uploader to go into an infinite loop if it always pulls jobs in the order they were created, blocking any new uploads from being completed.

There are a few good strategies for handling poisoned uploads:

  • After an upload hits a reasonable per-attempt retry limit, deprioritize it in your upload queue so it does not block future uploads. Only come back to upload tasks which have experienced errors after all fresh media has been uploaded.
  • Tack the number of times an upload has been checked out to catch uploads that are dying without reporting their progress. Use this value in addition to the number of explicit errors that have been reported.
  • Ensure you aren't counting network disconnections and authorization expiration as a failures by following our best practices.
  • Retry a job at least three times (with each job having 10 network call retries itself), for a total of 30 attempts to upload a file chunk.
  • If an upload get’s erroneously marked as poisoned due to consistent environmental failures, (like a temperamental network), it’s a great feature to allow users to manually reset the upload job once the network errors have been resolved.

Retry once after restart

Before you remove an upload you suspect is poisoned from the upload queue, mark it to be re-tried the next time the device is power cycled. Your upload may be failing due to the device’s CPU, drivers, memory, or application logic being in a bad state or overloaded. When an upload is failing over and over, stop trying to upload it until the device or application is restarted. If the job once again fails 3 or so times, permanently mark it as poisoned and move on.

Any error that is marked as retry once in the errors guide should probably be retried once after a power cycle as well.

Clearing your queue

This may seem obvious in hindsight, but make sure you clear your queue of files that are no longer available for upload! This might occur when removable media is swapped out on your device, or files are deleted from storage, for instance. You don’t want to trigger a lot of errors from files no longer available.

You are required to clear your upload queue when the user connects to a new project. We do not want media that was queued to upload to an old project to start popping up into a new one. When a device or application is paired to a new project, check if that project matches the last one it was connected to. If it does not, clear your queue with extreme prejudice.

Next up

You are now an uploading expert! If you haven’t already, we encourage you to reach out to our team. We look forward to hearing from you!