cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements
Want to learn some quick and useful tips to make your day easier? Check out how Calvin uses Replay to get feedback from other teams at Dropbox here.

Discuss Dropbox Developer & API

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Re: searchContinueV2 seems to not work on the last batch it should find

searchContinueV2 seems to not work on the last batch it should find

tfrysinger
Explorer | Level 4

I am using the Java2 SDK to perform a search for directories matching a pattern, then deleting them. The directories are named as:

 

ROOT_DIR_CONSTANT/p_<YYYYMMDD>/p_<ID 1>_<SUB ID 1>

 

For example here are three directories:

 

/TestingRootDir/p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds_test001

/TestingRootDir/p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds_test002

/TestingRootDir/p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds_test003

 

You will notice that all 3 are identical except for the last underscore ("_") section.

 

Here is my code:

 

String shareDir = "p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds";
SearchV2Result result;
DbxClientV2 dbx;
try {
dbx = DropboxClientFactory.getClient();
SearchV2Builder searchV2Builder = dbx.files().searchV2Builder(shareDir);
SearchOptions.Builder optionsBuilder = SearchOptions.newBuilder();
optionsBuilder = optionsBuilder.withFilenameOnly(true)
.withMaxResults(1L)
.withPath("/TestingRootDir/" + subDirToCheck);
searchV2Builder = searchV2Builder.withOptions(optionsBuilder.build());
result = searchV2Builder.start();
List<SearchMatchV2> matches = result.getMatches();
boolean keepGoin = true;
while (matches.size() > 0 && keepGoin) {
for ( SearchMatchV2 match : matches) {
MetadataV2 meta = match.getMetadata();
String path = meta.getMetadataValue().getPathDisplay();
DeleteResult dResult = dbx.files().deleteV2( path );
Timber.d("Lifecycle: found match: %s, delete result: %s", match.toString(), dResult.toString());
}
if ( result.getHasMore() ) {
result = dbx.files().searchContinueV2(result.getCursor());
matches = result.getMatches();
} else {
keepGoin = false;
}
}

} catch (Exception e) {
Timber.d("Lifecycle: searchV2 had error: %s", e.getMessage());
}

What I am seeing is:

 

1. The call to searchV2Builder.start() finds the first directory, and result.getHasMore() returns true. 'matches' has a single item, which is the directory ending in 'test003'. This is all correct - the limit I specified in the search was '1L', so it should have matched just one, and since there are 3 directories there that would match, 'getHasMore()' should return true.

2. The for loop is entered and the delete on the first match within 'matches' works correctly, the 'test003' directory is removed.

3. After the single match item in 'matches' is processed, the for loop exits.

4. Since result.getHasMore() returns TRUE, and the searchContinueV2() is called and works correctly, so now the variable 'matches' contains a new single item, which is the directory ending in 'test001'. result.getHasMore() is TRUE which is also correct, since there is still one more directory that matches the search criteria.

5. Steps 2-3 above are executed and work correctly, the 'test001' directory is removed.

6. Step 4 occurs and since result.getHasMore() returns true, the searchContinueV2() is called. However this time, while getHasMore() is set to FALSE within the returned result (which is correct), there is no entry in 'matches', i.e. it is empty. The third and final directory was not found! WHY NOT?

 

If I change the code to not use the searchContinueV2, but instead just have it continuously call searchV2Builder.start().getMatches(), then it works as expected (i.e. all 3 directories are found and deleted):

 

List<SearchMatchV2> matches = searchV2Builder.start().getMatches();
while (matches.size() > 0 ) {
for ( SearchMatchV2 match : matches) {
MetadataV2 meta = match.getMetadata();
String path = meta.getMetadataValue().getPathDisplay();
dbx.files().deleteV2( path );
}
matches = searchV2Builder.start().getMatches();
}

 

OR if I change the 'batch size' from 1L to 2L, and I either keep my initial directory list to 3 or increase it by adding a 'test004' directory the original coded way using searchContinueV2 still fails.

 

With a batch size of 2 and 5 directories that should match, it succeed in finding the first batch of 2 (directories ending in 005 and 004), then upon utilizing the cursor it found a second batch of only 1 (directory ending in 002).  

 

With a batch size of 2 and using the second approach, I got 3 cycles through the loop with the last 'matches' containing a single item (the fifth item) and so all directories were properly deleted.

 

Thanks for any help.

6 Replies 6

Greg-DB
Dropbox Staff

The Dropbox API search functionality is not always guaranteed to completely/immediately return results, especially when changes are actively being made, due to indexing and caching.

 

That being the case, for a scenario like this, I recommend using listFolder/listFolderBuilder and listFolderContinue instead as that functionality is not subject to the same limitations.

tfrysinger
Explorer | Level 4

Greg - 

 

Is there a way with listFolders to have it match on a partial directory name? That is why I was using search. Because while I know the root dir name, and I can specifically check the name of the directory associated with a date, AND I know the first ID, I do not know in advance what the last portion of the directory name might be. i.e. given this directory path:

 

/TestingRootDir/p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds_test001

 


I can derive this much ahead of time: /TestingRootDir/p_20211018/p_44c92de4-ddd6-42dc-823a-9dkkf35j5sds

 

But the last '_test001' may be any number of a combination of characters. So what I was trying to do is get the list of directories associated with the first portion of the directory name only.

 

Can listFolder do that?

 

Thanks!

 

 

Greg-DB
Dropbox Staff

No, the listFolder functionality doesn't support partial matching like that. You would need to list the nearest parent folder and then apply whatever filtering you want on the results client-side.

Здравко
Legendary | Level 20

@tfrysinger wrote:
...
...
boolean keepGoin = true;
while (matches.size() > 0 && keepGoin) {
...

What I am seeing is:

...

6. Step 4 occurs and since result.getHasMore() returns true, the searchContinueV2() is called. However this time, while getHasMore() is set to FALSE within the returned result (which is correct), there is no entry in 'matches', i.e. it is empty. The third and final directory was not found! WHY NOT?

...

Hi @tfrysinger,

@Greg-DBhas explained what's the reason for 'there is no entry in', already. In addition, I would like to note the logical error in your code related to the loop. Why aren't you trying to fix it, instead of looking for alternatives? 🤔🤷

At any one moment the 'matches' could be empty, despite there are matching entries! In such a case what's going to happen? -> You will never get in the loop and your code wouldn't do anything, if that happens at the beginning or will stop 'looping' at first such moment (in your case at the last entry).

Reasonable question here is why you are using as a loop criteria unreliable values combination? There is always chance for a 'pause', which can confuse your code. Dropbox API (including all SDKs - Java in your case) performs asynchronous calls and they can NOT be predicted or forced ('batch size' is nothing more than a wish declaration, which server tries to and in most cases follows, but without guaranties)! You should be able make your code expect everything possible. That's it - everything possible is what's everything not impossible (including 'unusual' empty transaction - it's actually usual). Wouldn't it be easier to reformat loop criteria to something like:

...
    while (matches.size() > 0 || result.getHasMore()) {
...

... or similar? 🙂 Just a little logic inversion, so loop can't break processing in such a way anymore. Have you tried something like? 😉

Hope this gives direction.

tfrysinger
Explorer | Level 4

Thanks but your suggestion wouldn't help in this case anyway. As Greg explained, the Search API isn't guaranteed to return the exact results due to indexing and caching. I didn't know that until his answer. And sure, I could optimize the loop with an OR statement instead of testing at the bottom but that really is just that - perhaps an improvement in readability but nothing that would solve the problem he brought up.

 

Instead, as he suggested, I will use listFolder to list to the level that I can, then iterate through to find and delete the relevant subdirs.

Здравко
Legendary | Level 20

@tfrysinger wrote:

... And sure, I could optimize the loop with an OR statement instead of testing at the bottom but that really is just that - perhaps an improvement in readability but nothing that would solve the problem he brought up.

...

Hmm... 😁 Are you sure it's something related to the "readability" only?! 😉 Yes, I'm not sure it will solve the issue (in particular), but it's definitely a BUG!!! Sleeping bug, able to "wake up" at some point, at least.

Be careful!

Need more support?