cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Check out how our Community members are using Dropbox here!
Close

Discuss Dropbox Developer & API

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Help Understanding how /files/list_folder/continue

Highlighted

Help Understanding how /files/list_folder/continue

Explorer | Level 3

I am working on an AWS serverless app that queries a specific DropBox folder tree for daily PDF uploads. My process and config/code are below. I _think_ I understand how the API endpoint is supposed to work but the results I am seeing do not match what I expect. So the most likely explanation is that I actually do not understand how it works.

 

My App:

 

My app is simple. I watch a DropBox folder for daily PDF uploads and at the end of the day, download and merge all new PDFs into a single PDF. I am using the NodeJS DropBox pkg here : https://www.npmjs.com/package/dropbox-v2-api

 

I have no indication that the NodeJS package is not working as it should.

 

On a given day there are between 150-200 PDFs anywhere from a couple of MB up to 500MB. I'm not having any issues with the size of the PDFs. That part works great.

 

The Process:

 

  1. At 2:00 AM every morning I call the get_latest_cursor endpoint and store the cursor.
  2. At 3:00 PM every afternoon I call /files/list_folder/continue passing the stored cursor
  3. My config has:
    1. recursive = true
    2. include_deleted = false
    3. limit = 2000

What I expect to see is a list of all files added to the folder tree each day since the 2:00 AM cursor excluding files with the ".tag" : "deleted" property.

 

What I am seeing is that ".tag" : "deleted" files are included in the results. So where as my result set should be around 400 files including support JPG and PSD files as well as the PDFs, I am seeing about 900 files because all of the deleted files are included even though I am explicitly excluding them.

 

 

/**
 * Get latest Dropbox cursor
 * @param event
 * @param callback
 * @returns {*}
 */
module.exports.getLatestCursor = (event, callback) => {

    const
        s3 = getAwsS3()
        , dropbox = getDropbox();

    console.log('[index.js][getLatestCursor] STEP 01 -- Get Latest Cursor')

    dropbox({
        resource: 'files/list_folder/get_latest_cursor',
        parameters: {
            path                           : process.env.DROPBOX_WATCH_FOLDER,
            recursive                      : true,
            include_deleted                : false,
            include_non_downloadable_files : false,
            include_media_info             : false,
            limit                          : 2000
        }
    }, (err, result, response) => {

        if (err) { return console.log(err); }

        console.log('[index.js][getLatestCursor] STEP 02 -- Prepare Latest Cursor', JSON.stringify(response))

        const params = {
            Bucket : process.env.S3_BUCKET_NAME,
            Key    : `cursor/${process.env.CURSOR_FILENAME}`,
            Body   : Buffer.from(JSON.stringify(response.body)),
            ACL    : 'private'
        };

        s3.upload(params, (err, data) => {
            console.log('[index.js][getLatestCursor] STEP 03 -- Save Latest Token to S3', data)
            if (err) throw err;
            callback(null, data)
        });
    });
};

Then, my call to list files:

 

 

 

/**
 * Get file list
 * @param event
 * @param callback
 * @returns {*}
 */
module.exports.getFileList = (event, callback) => {

    const
          bucket   = process.env.S3_BUCKET_NAME
        , prefix   = 'cursor'
        , filename = process.env.CURSOR_FILENAME

    const s3 = getAwsS3();

    const params = {
        Bucket: bucket,
        Key: `${prefix}/${filename}`
    }

    s3.getObject(params, (err, data) => {

        if (err) {
            console.error(err);
            throw err;
        }

        console.log('[index.js][getFileList] @@@ DATA @@@', data)

        let response;

        const
            dropbox  = getDropbox()
            , cursor = data.Body.cursor
            , params = {
                resource: 'files/list_folder/continue',
                parameters: {
                    cursor : cursor
                }
            };

        dropbox(params, (err, result, response) => {
            if (err) {
console.error(err);
throw err; }
console.log('[index.js][getFileList] @@@ ENTRIES @@@', result) let iter = 0 , _debug_downloads = [] , _debug_all_hr = [] , _debug_all_lr = [] , entries = [] , downloadables = [] if (result && typeof result.entries !== 'undefined') { entries = result.entries; saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/entries.json`, JSON.stringify(entries)); entries = entries.map((entry, i) => { // Process the entries });

// Storing results for debugging. Ignore this. It works fine. saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/proofs.csv`, _debug_all_lr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/artwork.csv`, _debug_all_hr.join("\r\n")); saveToS3Bucket(`debug/${formatDate(false, true)}/${kTS}/downloadables.csv`, _debug_downloads.join("\r\n")); } // process result set. }); }); };

 

My questions are:

 

  1. Why are the deleted files being included? They should not be should they?
  2. Am I using the cursor and the list_folder/continue correctly?

Thanks in advance.

 

1 Accepted Solution

Accepted Solutions
Highlighted

Re: Help Understanding how /files/list_folder/continue

Dropboxer
Dropboxer

Thanks for the detailed writeup! 

 

First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.

 

Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:

  • Regarding the deleted entries, note that the 'include_deleted' parameter only applies to "entries for files and folders that used to exist but were deleted", that is, at the time of the call to /2/files/list_folder/get_latest_cursor. Files or folders that are deleted after that call will still be reported later by /2/files/list_folder/continue as 'deleted'. Does this account for the entries you're seeing? Essentially, it may just be items deleted between 2:00 AM and 3:00 PM. If that doesn't seem to be it, perhaps you could share a sample so we can take a look? Feel free to open an API ticket privately if you'd prefer.
  • Also, I don't see you checking the 'has_more' value returned by /2/files/list_folder/continue. You're not guaranteed to get everything back in one call, so you should check that 'has_more' value and call back again to /2/files/list_folder/continue as described in the /2/files/list_folder documentation.
  • Also, it may or may not make sense for your use case, but you don't need to call /2/files/list_folder/get_latest_cursor every day. You can store and re-use the last cursor you received to be able to just receive updates about changes that have occurred since you received that cursor. That would let you track all changes over time. As written, it seems you're not monitoring anything that occurs between 3:00 PM and 2:00 AM. The Detecting Changes guide may be helpful, if you haven't already read it.

View solution in original post

3 Replies 3
Highlighted

Re: Help Understanding how /files/list_folder/continue

Dropboxer
Dropboxer

Thanks for the detailed writeup! 

 

First, I should note that the 'dropbox-v2-api' package isn't made by Dropbox itself, so I can't really offer support for that or say what it may actually be doing under the hood, but I'll take a look here and advise with respect to the Dropbox API.

 

Anyway, looking over your code and description, it looks like you have the right basic idea here for the most part (though it will depend on exactly what you're trying to accomplish of course), but there are a few things to note:

  • Regarding the deleted entries, note that the 'include_deleted' parameter only applies to "entries for files and folders that used to exist but were deleted", that is, at the time of the call to /2/files/list_folder/get_latest_cursor. Files or folders that are deleted after that call will still be reported later by /2/files/list_folder/continue as 'deleted'. Does this account for the entries you're seeing? Essentially, it may just be items deleted between 2:00 AM and 3:00 PM. If that doesn't seem to be it, perhaps you could share a sample so we can take a look? Feel free to open an API ticket privately if you'd prefer.
  • Also, I don't see you checking the 'has_more' value returned by /2/files/list_folder/continue. You're not guaranteed to get everything back in one call, so you should check that 'has_more' value and call back again to /2/files/list_folder/continue as described in the /2/files/list_folder documentation.
  • Also, it may or may not make sense for your use case, but you don't need to call /2/files/list_folder/get_latest_cursor every day. You can store and re-use the last cursor you received to be able to just receive updates about changes that have occurred since you received that cursor. That would let you track all changes over time. As written, it seems you're not monitoring anything that occurs between 3:00 PM and 2:00 AM. The Detecting Changes guide may be helpful, if you haven't already read it.

View solution in original post

Highlighted

Re: Help Understanding how /files/list_folder/continue

Explorer | Level 3

Ok, that clarifies the deleted file issue perfectly. I thought it meant to exclude all deleted so that makese sense.

 

I do need to update the cursor every day because I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days. That won't fit my use case.

 

Yes, I caught the has_more issue last night. Technically that is the right thing to do but it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. The most I have seen to-date is 900 in one day including the deleted files. I am deploying the update once I have tested it but did not include it here because I have not fully tested the new code.

 

Thanks for the response. I think this resolves my issue. I mainly needed to confirm I'm understanding the way the API endpoint works and to clarify on the deleted files.

Highlighted

Re: Help Understanding how /files/list_folder/continue

Dropboxer
Dropboxer

Great, I'm glad that helps.

 

To further clarify a few things though:

 

"I only want the files for that day. If I don't update the cursor, won't that give me all files since the cursor? So that would give me all of the files going back possibly multiple days."

 

One option is to always update your stored cursor to be the latest cursor you last received, e.g., from /2/files/list_folder/continue itself. When you then call /2/files/list_folder/continue again, you'll only receive updates that occurred since that call to /2/files/list_folder/continue that gave you that cursor. 

 

"it won't have any tangible impact since I have the limit set to 2,000 files and we are not coming anywhere close to that. "

 

Be aware that the "limit" is only an approximate upper bound on how many items Dropbox will return per page; it does not affect the lower bound. In some cases, Dropbox may have to return far fewer entries per page, in which case you do need to check and follow 'has_more'.

Work Smarter with Dropbox

The way we work is changing. Share and discover new ways to work smarter with Dropbox in our community.

Sound good? Let's get started.
Who's talking

Top contributors to this post

What do Dropbox user levels mean?
Need more support?