Discuss Dropbox Developer & API
I am working on an AWS serverless app that queries a specific DropBox folder tree for daily PDF uploads. My process and config/code are below. I _think_ I understand how the API endpoint is supposed to work but the results I am seeing do not match what I expect. So the most likely explanation is that I actually do not understand how it works.
My App:
My app is simple. I watch a DropBox folder for daily PDF uploads and at the end of the day, download and merge all new PDFs into a single PDF. I am using the NodeJS DropBox pkg here : https://www.npmjs.com/package/dropbox-v2-api
I have no indication that the NodeJS package is not working as it should.
On a given day there are between 150-200 PDFs anywhere from a couple of MB up to 500MB. I'm not having any issues with the size of the PDFs. That part works great.
The Process:
What I expect to see is a list of all files added to the folder tree each day since the 2:00 AM cursor excluding files with the ".tag" : "deleted" property.
What I am seeing is that ".tag" : "deleted" files are included in the results. So where as my result set should be around 400 files including support JPG and PSD files as well as the PDFs, I am seeing about 900 files because all of the deleted files are included even though I am explicitly excluding them.
/** * Get latest Dropbox cursor * @param event * @param callback * @returns {*} */ module.exports.getLatestCursor = (event, callback) => { const s3 = getAwsS3() , dropbox = getDropbox(); console.log('[index.js][getLatestCursor] STEP 01 -- Get Latest Cursor') dropbox({ resource: 'files/list_folder/get_latest_cursor', parameters: { path : process.env.DROPBOX_WATCH_FOLDER, recursive : true, include_deleted : false, include_non_downloadable_files : false, include_media_info : false, limit : 2000 } }, (err, result, response) => { if (err) { return console.log(err); } console.log('[index.js][getLatestCursor] STEP 02 -- Prepare Latest Cursor', JSON.stringify(response)) const params = { Bucket : process.env.S3_BUCKET_NAME, Key : `cursor/${process.env.CURSOR_FILENAME}`, Body : Buffer.from(JSON.stringify(response.body)), ACL : 'private' }; s3.upload(params, (err, data) => { console.log('[index.js][getLatestCursor] STEP 03 -- Save Latest Token to S3', data) if (err) throw err; callback(null, data) }); }); };
Then, my call to list files:
/** * Get file list * @param event * @param callback * @returns {*} */ module.exports.getFileList = (event, callback) => { const bucket = process.env.S3_BUCKET_NAME , prefix = 'cursor' , filename = process.env.CURSOR_FILENAME const s3 = getAwsS3(); const params = { Bucket: bucket, Key: `${prefix}/${filename}` } s3.getObject(params, (err, data) => { if (err) { console.error(err); throw err; } console.log('[index.js][getFileList] @@@ DATA @@@', data) let response; const dropbox = getDropbox() , cursor = data.Body.cursor , params = { resource: 'files/list_folder/continue', parameters: { cursor : cursor } }; dropbox(params, (err, result, response) => { if (err) {
console.error(err);
throw err; }
console.log('[index.js][getFileList] @@@ ENTRIES @@@', result) let iter = 0 , _debug_downloads = [] , _debug_all_hr = [] , _debug_all_lr = [] , entries = [] , downloadables = [] if (result && typeof result.entries !== 'undefined') { entries = result.entries; saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/entries.json`, JSON.stringify(entries)); entries = entries.map((entry, i) => { // Process the entries });
// Storing results for debugging. Ignore this. It works fine. saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/proofs.csv`, _debug_all_lr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/artwork.csv`, _debug_all_hr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/downloadables.csv`, _debug_downloads.join("\r\n")); } // process result set. }); }); };
My questions are:
Thanks in advance.
Thanks for the detailed writeup!
First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.
Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:
Thanks for the detailed writeup!
First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.
Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:
Ok, that clarifies the deleted file issue perfectly. I thought it meant to exclude all deleted so that makese sense.
I do need to update the cursor every day because I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days. That won't fit my use case.
Yes, I caught the has_more issue last night. Technically that is the right thing to do but it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. The most I have seen to-date is 900 in one day including the deleted files. I am deploying the update once I have tested it but did not include it here because I have not fully tested the new code.
Thanks for the response. I think this resolves my issue. I mainly needed to confirm I'm understanding the way the API endpoint works and to clarify on the deleted files.
Great, I'm glad that helps.
To further clarify a few things though:
"I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days."
One option is to always update your stored cursor to be the latest cursor you last received, e.g., from /2/files/list_folder/continue itself. When you then call /2/files/list_folder/continue again, you'll only receive updates that occurred since that call to /2/files/list_folder/continue that gave you that cursor.
"it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. "
Be aware that the "limit" is only an approximate upper bound on how many items Dropbox will return per page; it does not affect the lower bound. In some cases, Dropbox may have to return far fewer entries per page, in which case you do need to check and follow 'has_more'.
I know this was posted about 3 years ago, but in case you read this, is there a TTL on the cursor? So if I don't run the script again for, say, 6 months, would the cursor, at least in theory, still be valid? I'm just curious about this. Our solution has been humming along perfectly for the past 2-3 years (thanks to your help) but this would be good to know.
@iconify These list_folder cursors do not expire by default, so they could last that long. However it is possible for them to become invalid at any time, due to some kinds of operations in the account, at which point /2/files/list_folder/continue would return a 'reset' error. That being the case, make sure your app is able to catch that error at any point; it would then need to restart from /2/files/list_folder to get a new valid cursor.
Hi there!
If you need more help you can view your support options (expected response time for a ticket is 24 hours), or contact us on X or Facebook.
For more info on available support options for your Dropbox plan, see this article.
If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!