An actually USEFUL guide on uploading large files via API

TheFlodge · ‎11-07-2021

Hello fellow Dropsketeers!

So after writing what I hope is viewed as a SCATHING indictment of Dropbox's awful CLI support and API functionality/documentation, I decided that, while it was a somewhat satisfying way to blow off steam, it doesn't really help anyone in the same predicament (other than letting them know they're not insane or stupid, it's just terribly implemented and poorly documented). It occurred to me that some of the pageviews on my rant were probably from people Googling terms in order to try to find something useful that could actually help them figure this out. That being the case, I decided to channel my frustration into a more useful outlet by writing this simple guide to uploading large files from CLI via the API.

This guide makes a few assumptions: 1) You have a dropbox account (or a free trial or something - access to dropbox, basically). 2) You are working from Ubuntu or some other linux distro's CLI, not a GUI, and know how to do simple things like run cURL and read output from "ls". 3) You have a Dropbox developer account (if not, go here - they're free for Dropbox users and easy to set up). 4) You have created your first "App" in the developer section (if not, click "App console" in the upper right, then "Create app" and follow this guide).

For the purposes of this guide, I am using cURL to illustrate how to upload the files, however if you're working with files a few gigs or larger, you're going to need to script this, else you'll still spend the better part of an afternoon getting your content uploaded due to the fact that you have to upload things in 150MB chunks, and that chunking is done manually, by you, then POSTed one at a time, by you. In fairness, it's possible to run concurrent uploads, but I'm not ready to further abuse myself trying to translate their documentation into a useful format again...perhaps next time I'm feeling depressed and want to self-harm (it's a joke, people, relax - harming yourself is bad and if you're doing it or thinking about it, talk to a mental health professional).

Disclaimer: This guide is provided as-is and might be a little quick and dirty, as I'm also watching Squid Game while writing it (and that show just DEMANDS your attention, especially with original audio and subtitles), but I'll include CLI examples and output to illustrate exactly what I did in order to upload my example file, a 640MB gzipped archive of pictures (sexy pictures ☞-_^☞). I am not professionally affiliated with Dropbox in any way, other than as a customer bound to them by my annual subscription period (and, increasingly, the white hot fire of my rage).

Overview:
1) Split files into 150MB chunks
2) Generate a token
3) /upload_session/start
4) /upload_session/append_v2
5) /upload_session/finish
6) Verify
7) Scripting

----- 1) Split files into 150MB chunks -----

Okay, so the first thing we need to do here is choose a file and prepare it for upload. Let's take a look at my example file, which lives in "~/move/camera/" (I'm already in ~/, so I'm just using relative paths):

$ ls -la move/camera/camera6.tar.gz
-rw-rw-r-- 1 theflodge theflodge 670934575 Sep 11 2019 move/camera/camera6.tar.gz

This is clearly over the 150MB limit set for "/upload", hence the need to use "/upload_session/" (Fun fact: If you try to upload something over the limit, you don't get an error message, instead you get a 4xx HTML response page, with no explanation of what happened or why. Helpful!)

Let's split this guy into chunks - in my case, I want to dump them into a directory in order to keep things neat, so I'll create a dir and output there. Additionally, "split" labels stuff with an alpha format by default (aa, ab, ac), but I like numbers better, so we'll pass the -d option to get that output (something something "giving it the D", laughter):

$ mkdir move/camera/camera6.tar.gz.d/
$ split -db 150000000 move/camera/camera6.tar.gz move/camera/camera6.tar.gz.d/camera6.tar.gz_
$ ls -la move/camera/camera6.tar.gz.d/
total 655268
drwxrwxr-x 2 theflodge theflodge 4096 Nov 7 08:41 .
drwxrwxr-x 3 theflodge theflodge 16384 Nov 7 08:41 ..
-rw-rw-r-- 1 theflodge theflodge 150000000 Nov 7 08:41 camera6.tar.gz_00
-rw-rw-r-- 1 theflodge theflodge 150000000 Nov 7 08:41 camera6.tar.gz_01
-rw-rw-r-- 1 theflodge theflodge 150000000 Nov 7 08:41 camera6.tar.gz_02
-rw-rw-r-- 1 theflodge theflodge 150000000 Nov 7 08:41 camera6.tar.gz_03
-rw-rw-r-- 1 theflodge theflodge 70934575 Nov 7 08:41 camera6.tar.gz_04

If you're not familiar with the split command, what I've done here, besides using -d in order to get numeric output, is declared the size in bytes of the chunks I want with -b 150000000, the file I want to chop up, move/camera/camera6.tar.gz, and the destination and naming convention for the chunks, move/camera/camera6.tar.gz.d/camera6.tar.gz_. So:

$ split -<options> <filename> <destination>
-<options> : "-db 150000000" = d for numeric output, b for "this many bytes", 150000000 because we want 150MB (well, technically 144MB, but I don't want to get into all that).
<filename> : "move/camera/camera6.tar.gz" = my sweet, innocent example file, about to be rent asunder.
<destination> : "move/camera/camera6.tar.gz.d/camera6.tar.gz_" = the folder I made plus the naming convention I want to use - if you put nothing here, you'll get "x00, x01" etc, which is fine, I just prefer to give everything descriptive names in case I forget due to the head trauma from repeatedly smashing my face into the keys in frustration.

If everything went well, there should be a directory full of chunks now. We can see in my output that there are five happy little chunks waiting to be POSTed, with the last little runty chunk being the leftover bytes that don't add up to 150MB (poor lil' guy).

----- 2) Generate a token -----

At the top of this guide, I mentioned that I'm assuming you're both a customer and have signed up for a free developer account, and that you've successfully created your first App (although I provided some links to do those things if you haven't already). You'll need an App from this point forward.

Go to your App (easiest way is just to go to the main developer page, then click "App console" in the upper right, followed by your App). Right here on the "Settings" page, there's a section to generate an OAuth 2 token - but don't do that yet. First, we need to change some permissions, and generated tokens are linked to permissions, so if you generate one now, then change the permissions, you'll need to generate a new one or it won't use the new permissions.

Go to the "Permissions" tab up at the top, then scroll down to "Files and Folders". To upload files, you need to enable "files.content.write" (and you might as well do read also, since verifying your upload via API would require this, whenever you get around to writing a script). Scroll to the bottom and click "Submit" on the little grey bar, which should then show a little green checkmark and "Permissions change successful", indicating you have successfully submitted to your Dropbox masters, peasant.

Head back over to the "Settings" tab - you'll probably be here a lot during testing and script writing, since the token they provide expires after a few hours. It IS possible to tell the token to never expire, but I don't recommend doing so because A) it's a security no-no, and B) that feature will be deprecated soon (or so they claim). It would be nice if you could SET an expiration time, but that's evidently too much to ask. Just leave this site open in the background when you're working on this, and if you need a new token, refresh the page and click "Generate" in the OAuth 2 section - do that now, and copy the token somewhere for use in the next steps.

----- 3) /upload_session/start -----

You can find their documentation for this API call here, but unless you're a masochist, you probably won't find it terribly useful, other than to provide the basic request format. Go ahead and copy their request format, but the first thing I recommend you do is chop off the last line, with --data-binary (and the trailing backslash from the previous line), since it's not really useful and will just confuse you more than help you - we'll do the actual uploading with the /upload_session/append_v2 call, not this one.

Think of the calls kind of like this:

/upload_session/start - establish connection
/upload_session/append_v2 - upload data
/upload_session/finish - close connection

Only the second call actually transmits any file data. Yes, it's technically possible to cram some data in the first and last calls, but for the sake of clarity, I'd advise against it for the moment - what you do after reading this guide is between you and God (or Satan if you're cool), but you came here for help, so just take my word on it for the time being, eh?

All we really want from this API call is a session_id, because that's what we need in order to string together our 150MB chunks into a single file. If your file is less than 150MB, then what are you even doing here? Just use /upload instead of torturing yourself with all this, you nut.

Below is the actual output from my test, showing the request and response:

$ curl -X POST https://content.dropboxapi.com/2/files/upload_session/start \
> --header "Authorization: Bearer sl.A73FgUsMf6w1LuuUY7HGcXbNr5EIbdDJBLbQxs_-qximeK5G5ldg0o6jAbOS2eH0QnCSLoxNxfiMVPlmME8M1qOPgylY9ug3RmMWGiqwzwHA2AoXZ_uOz0uvdsAYf8wy0Pd1sij4" \
> --header "Dropbox-API-Arg: {\"close\": false}" \
> --header "Content-Type: application/octet-stream"
{"session_id": "pid_upload_session:ABIELlngYaqIbBy6DPrwsPRoBupuyqmFiLpruEDqwewseXeX"}

Don't worry - my token and session_id have been changed to protect the innocent (Dragnet theme plays).

Most of the above is ripped straight from the help docs, with one notable exception - the token (Authorization: Bearer <token>). You'll need YOUR token in the same place in YOUR request. Additionally, as I said previously, I've chopped out the last line and the trailing backslash from the previous line, because I don't want to send any data with this request, all I want is that sweet, sweet session_id. Note that the whole value is needed, pid_upload_session: included, as you'll see on the next step.

Note: If you get a response with an error about "permissions" or "wrong scope" instead of session_id, that means you didn't listen to me earlier when I said "don't generate a token until AFTER you've changed the permissions" - you're either using a token from before you changed the permissions or you screwed up on setting the permissions. Go back to that step and try again.

----- 4) /upload_session/append_v2 -----

The documentation for this API call is here, but again, all we care about is the basic request format. This time, leave the --data-binary part in there - we're actually uploading data this time. Note that it IS important to use --data-binary, not just --data or -d - that will screw things up royally, so make sure to use --data-binary instead.

This time I'm going to show output from three requests, back to back, to properly illustrate the correct method:

$ curl -X POST https://content.dropboxapi.com/2/files/upload_session/append_v2 \
> --header "Authorization: Bearer sl.A73FgUsMf6w1LuuUY7HGcXbNr5EIbdDJBLbQxs_-qximeK5G5ldg0o6jAbOS2eH0QnCSLoxNxfiMVPlmME8M1qOPgylY9ug3RmMWGiqwzwHA2AoXZ_uOz0uvdsAYf8wy0Pd1sij4" \
> --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\": \"pid_upload_session:ABIELlngYaqIbBy6DPrwsPRoBupuyqmFiLpruEDqwewseXeX\",\"offset\": 0},\"close\": false}" \
> --header "Content-Type: application/octet-stream" \
> --data-binary @move/camera/camera6.tar.gz.d/camera6.tar.gz_00
null

$ curl -X POST https://content.dropboxapi.com/2/files/upload_session/append_v2 \
> --header "Authorization: Bearer sl.A73FgUsMf6w1LuuUY7HGcXbNr5EIbdDJBLbQxs_-qximeK5G5ldg0o6jAbOS2eH0QnCSLoxNxfiMVPlmME8M1qOPgylY9ug3RmMWGiqwzwHA2AoXZ_uOz0uvdsAYf8wy0Pd1sij4" \
> --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\": \"pid_upload_session:ABIELlngYaqIbBy6DPrwsPRoBupuyqmFiLpruEDqwewseXeX\",\"offset\": 150000000},\"close\": false}" \
> --header "Content-Type: application/octet-stream" \
> --data-binary @move/camera/camera6.tar.gz.d/camera6.tar.gz_01
null

$ curl -X POST https://content.dropboxapi.com/2/files/upload_session/append_v2 \
> --header "Authorization: Bearer sl.A73FgUsMf6w1LuuUY7HGcXbNr5EIbdDJBLbQxs_-qximeK5G5ldg0o6jAbOS2eH0QnCSLoxNxfiMVPlmME8M1qOPgylY9ug3RmMWGiqwzwHA2AoXZ_uOz0uvdsAYf8wy0Pd1sij4" \
> --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\": \"pid_upload_session:ABIELlngYaqIbBy6DPrwsPRoBupuyqmFiLpruEDqwewseXeX\",\"offset\": 300000000},\"close\": false}" \
> --header "Content-Type: application/octet-stream" \
> --data-binary @move/camera/camera6.tar.gz.d/camera6.tar.gz_02
null

There are some differences here from /upload_session/start, starting with the response - instead of giving us an actual useful reply of some kind, we get null - this is Dropbox's way of telling you the upload succeeded. No, I don't know why they chose to do it that way, and yes, I think it would be lots more helpful for there to be actual usable data here, but hey, we're living in the world we've got, not the one we want, you know?

Something else different is the value of the Dropbox-API-Arg header - there's more in there, including some stuff we need to populate. You'll see our friend session_id from the last step as joined the party in his designated spot (so make sure your session_id friend is there, too), and he brought along ANOTHER friend (without even asking!) - offset, which, for the first request, is set to 0. This is an important piece of information, because it tells Dropbox where to pick up after the last request left off. This being our first actual upload with data in it, there's nothing to append to yet, so offset is set to 0 on this request. As we submit additional requests, we'll need to change this number to reflect the data we've already submitted - luckily, this is generally pretty simple, since most of these files are exactly 150000000 bytes, so we just increment the number by that amount for each subsequent request.

The final difference is the path to the chunk, shown after --data-binary. Note that the path is preceded by the "@" symbol - this just tells cURL to read and transmit the file at that path instead of the actual text itself - you should ALWAYS have the "@" symbol after --data-binary, separated by a space, but do NOT put a space between the "@" symbol and the path - the correct format is:
--data-binary @move/camera/camera6.tar.gz.d/camera6.tar.gz_00

Once you've sent off the first request, make sure your second request has the correct offset value and the filename in the path for --data-binary ends with a number one higher than your last ("_00" becomes "_01", "_01" becomes "_02", etc). You can see on each of my subsequent requests that the offset has been increased and the filename has changed to reflect the next file in the session.

Note: If you input the wrong offset value, the API is kind enough to let you know and give you the correct value, however this value should generally always be some multiple of 150000000 - if you're getting something more random looking, like a "128632849" or some such, you probably used --data or -d like I said not to do - you'll need to start a new session. The last chunk DOES have a weird size, but we shouldn't need to worry about that until the next step.

----- 5) /upload_session/finish -----

Up ahead! Is that...a glimmer of light in the gloom? A waft of fresh air? Could it be...?
Yes, adventurer! You are indeed seeing the light at the end of the tunnel and feeling the crisp, fresh air of the open world beyond - we're almost done.

This final request (documented here) is very similar to the /upload_session/append_v2 requests we just did, however it does get a little trickier, because the last chunk is that cute little runty one, so it has less bytes than the others, which means the offset value is no longer just a multiple of 150000000 - you'll need to add that runty, uneven value to the last offset value to find the current offset. Oh no! MATHS!!

Don't fret, my numerically challenged friends! Just look at the filesize in bytes from the ls -la command we did waaay back in step 1 - in my case, you'll see it's listed as "670934575". If all the uploads have been done correctly with /upload_session/append_v2, that means the whole file has been uploaded, so the offset should be the entire filesize - we can just copy that value from the ls -la output and drop that into our offset value, no thinky parts required.

$ curl -X POST https://content.dropboxapi.com/2/files/upload_session/finish \
> --header "Authorization: Bearer sl.A73FgUsMf6w1LuuUY7HGcXbNr5EIbdDJBLbQxs_-qximeK5G5ldg0o6jAbOS2eH0QnCSLoxNxfiMVPlmME8M1qOPgylY9ug3RmMWGiqwzwHA2AoXZ_uOz0uvdsAYf8wy0Pd1sij4" \
> --header "Dropbox-API-Arg: {\"cursor\": {\"session_id\": \"pid_upload_session:ABIELlngYaqIbBy6DPrwsPRoBupuyqmFiLpruEDqwewseXeX\",\"offset\": 670934575},\"commit\": {\"path\": \"/move/camera/camera6.tar.gz\",\"mode\": \"add\",\"autorename\": true,\"mute\": false,\"strict_conflict\": false}}" \
> --header "Content-Type: application/octet-stream"

{"name": "camera6.tar.gz", "path_lower": "/move/camera/camera6.tar.gz", "path_display": "/move/camera/camera6.tar.gz", "id": "id:UzbsaLFsMIcAAAAAAAAeTA", "client_modified": "2021-11-07T07:51:35Z", "server_modified": "2021-11-07T07:51:36Z", "rev": "5d02e2547034580e30a11", "size": 670934575, "is_downloadable": true, "content_hash": "d278ad1a1937c1d08ec0804e1f0111dd48b48cd73449c04b8eea637540497db4"}

I've added a space between the request and response to make this a bit more readable, since the response actually HAS something to read this time around. Before we get into the response, however, let's look at how the request has changed.

You'll notice --data-binary is gone again, because we're done uploading data - we uploaded all the chunks in the last step, so there's nothing to send here. Another important element is the path value in the Dropbox-API-Arg header - this is the first time we've needed this value, and that's because we're defining where in our Dropbox we want to store this file now that the uploading is done.

Basically, the /upload_session/ stuff gets uploaded to some temp directory on Dropbox's servers, and doesn't actually make its way into your folders until the location is specified in this API call. I suspect this is to prevent a bunch of screwed up partial data from getting saved all over customer accounts from abandoned upload sessions - it's honestly pretty smart to do it this way, because it would be a total nightmare to track down and eliminate all these little rogue chunks otherwise. This is a case of "an ounce of prevention is worth a pound of cure" - by using temp files, they prevent the problem from ever happening.

What it means for us, however, is that this is where we define the location where we want our file to be saved. It's important to note, however, that you can't just save it anywhere (well, depending on how you set up your App waaaaay back at the beginning). If you chose "App folder", then all API uploads to this App go to their own folder, but if you chose "Full Dropbox", you can put it anywhere (one of my favorite scenes from "Cruel Intentions"). For the purposes of this guide, I'm going to assume you chose "App folder", because that's what I did, which is relevant because it impacts where your file is stored, regardless of what path you enter.

After setting up the App, I saw that my dropbox contained a new top level folder called "Apps", and that inside this folder was another folder with the name of the App I created (crApp, in my case, because I am a mature grownup). Any path I specify is relative to that directory - if I uploaded to "/folder/foo.txt, it would live at /Apps/crApp/folder/foo.txt. Keep that in mind when searching for your files after upload.

Assuming your API call went well, you'll be provided with a response showing you the details of your finished file - this is the same information you would get if you uploaded a single file using /upload (assuming that file was under the 150MB limit for that API call). Most of it is pretty self-explanatory, so I won't go into detail here, just note that it shows the filename and path and you should be happy.

----- 6) Verify -----

Hurray! Your file is uploaded!! OR IS IT? As far as I'm concerned, the upload didn't happen until I can download it and verify all the information is there and valid - a corrupted file doesn't do me a hell of a lot of good. Luckily, upon downloading the file on my Mac,
I was pleased to discover that everything was intact.

If you followed this guide and got the same kind of output I did, you SHOULD find that your test asset is in perfect working order. You now officially know how to upload files over 150MB to Dropbox. Let's go out and celebrate!!

----- 7) Scripting -----

Personally, I like to celebrate by writing a script! Woo!!

As I mentioned at the beginning, actually uploading large files this way is cumbersome and horrible - a 200GB file, to use an arbitrary number, will have 1334 separate chunks to be uploaded - have fun incrementing offset by 150000000 1337 times, and getting to _1337 on your --data-binary entries. Even a relatively small 10GB file would need 67 separate calls to /upload_session/append_v2.

That would all be completely absurd, so obviously, something better is needed. The reason I made this guide with cURL is so that anyone could look at the basic information and adapt it to their language of choice (I used Python + Requests, personally).

TheFlodge · ‎11-08-2021

There was actually more of this guide at the end regarding writing a script, but the site incorrectly claimed my post was over 30000 characters so I had to truncate it (it was actually less than 25000, but maybe they could tell more than half of it was dad jokes and snark and that counted against me). Then the post got marked as spam for a while and my irritation reached a level that made it impossible to care or invest any more effort into this.

The mods un-spamiffied it after I made a rage filled post about it (the "report" feature was useless, but a hateful post DID seem to help), but I'm too emotionally drained from this whole rollercoaster to expend any further effort.

If the guide helps someone have less frustration than I did, I'm happy something good came of it.

Greg-DB · ‎11-08-2021

Thanks for taking the time to share your experience and write up and post this guide. I hope it can be of use to others here.

fabianbitter

Thanks for this article. I wrote a script for handling this http://github.com/bitterdev/dropbox-upload-script

Cheesefondue

Thanks sooooooooooo much. This was so, so useful.

TheFlodge

Thanks for telling me, buddy! Glad I could help at least one person, makes it all worthwhile. 🙂

An actually USEFUL guide on uploading large files via API

An actually USEFUL guide on uploading large files via API

Top contributors to this post