Your workflow is unique 👨💻 - tell us how you use Dropbox here.
Forum Discussion
Raheel S.
11 years agoExplorer | Level 3
Offtopic: To Check or Not to check for duplication. (DBDatastores)
This is purely off topic. Not dropbox related at all, but really this is a sync design question. (if this is against the rules, My apologies to the mods).
I've been struggling with an idea to avoid saving duplicate DBRecords.
We know that record.recordID is the main identifier. You can create it once in its life time and Never change it. (Obviously, why would anyone?)
Each DBRecord has a text. record["text"], All incoming DBRecords deposit these records in my local container. Using its recordID as the identifier.
At some point the User:
- Delinks Dropbox
- Links Dropbox
My understanding is that delinking will delete all cache (which is good). Newly relinked DBDatastores will try to send all those records.
It is here that I don't really know whats the right move here, thats in line with the end-user's expectation.
So this is what I planned.
Upon Incoming Datastores and its Records.
- Check whether I have any local data at all. If I don't, assume its a new app and/or device installation. Add all DBRecords. Check for Nothing
-
I do have some local data. The next move is:
-
Add all
DBRecordsas if they're NEW OR ** - Check for Duplication, only add those that are completely new.
-
Add all
Both doable. But now I'm thinking to myself, what If the user intentionally wants duplicated entries. To make modifications later. Finally, here i arrive at a conclusion that I give users a choice. Same entires found. Continue to add them as new or Cancel?
So my question is: Is this unnecessary complexity? Should I be even bothering about this? It bothers me because I don't like to have these duplications. I use this app myself. I don't like deleting them later.
How did the Dropbox engineers handle sync (localdatastores transfer etc..)
And on a side note, is MD5 still fast enough for simple text check? Or I should look at SHA or something
Thanks
3 Replies
Replies have been turned off for this discussion
- Steve M.11 years ago
Dropbox Staff
You may want to take a look at https://www.dropbox.com/developers/blog/84/initializing-data-in-datastores-with-getorinsert, which shows how to use
getOrInsertto avoid duplicates. The basic idea is that you need to model your data in such a way that if two records are considered "duplicates," then they'll have the same record ID. ThengetOrInsertwill take care of avoiding duplicates altogether, which lets you stop worrying about it. - Raheel S.11 years agoExplorer | Level 3
Hey Steve!
Yea I fully utilise
getOrInsert. It works okay. The above scenario is not really about Datastore handling but After I do receive thoseDBRecords. In essence its about handling duplication locally. I don't useLocalDatastores.Heres what I do.
I have a local db where I storerecordIDand thetext. ( yes i don't use localDatastores, aah, for some reasons).For Incoming DBRecord: (At Sync)
- I check if recordID already exists, if yes, I update the
localDb.record.text = record[@"text"]. If it does not exist, I create a New row in mylocalDBwith thatrecordIDandtext. This is fine. The uniqueness of a DBRecord is therecord["text"]; thats it.
I save the
recordIDandtextfield on the localDB.Ok great.
Now, I open up This very same
DBRecord. I want to update the text of this very DBRecord. I don't want to create a new one. I want to update thisDBRecord. I already know itsrecordID, because I stored it on my localDB.I get that DBRecord using
getOrInsert, and I update the record["text1"] field (text --> text1). And sure enough the record is updated with new data. BUT, it is here that the relationship betweenrecordIDand thetextfield has become invalid. When thisDBRecordwas created`, therecordID==md5("text"). NOW: md5("text1") !=recordID.Next time, When I create a new record with
text1(like importing from the same file) that is equal to the text1 from above; I potentially created a duplicate DBRecord.- I know that text1 exists already in a DBRecord.
- But I cannot avoid duplication by looking for md5(text1).
- that DBRecord has an Older md5(text), but text was later updated to text1.
I want to be able to give my self a choice to avoid this kind of duplication. What would you guys do here?
Its way early morning right now. I hope what i wrote is understandable. This isn't a datastore problem. Its my problem :/
- I check if recordID already exists, if yes, I update the
- Steve M.11 years ago
Dropbox Staff
Do you consider two records with the same ID but different text to be "the same" or not? (Is it a "duplicate" for both to exist?)
If the answer is "yes," then I think what you're doing should work well already. If the answer is "no," then you need to change your data model. (You should probably model editing an existing record as a delete and then an insert of a new record.)
I don't think anyone but you will be able to answer this question, since we don't know what your app is or does. :-)
About Dropbox API Support & Feedback
Find help with the Dropbox API from other developers.
The Dropbox Community team is active from Monday to Friday. We try to respond to you as soon as we can, usually within 2 hours.
If you need more help you can view your support options (expected response time for an email or ticket is 24 hours), or contact us on X, Facebook or Instagram.
For more info on available support options for your Dropbox plan, see this article.
If you found the answer to your question in this Community thread, please 'like' the post to say thanks and to let us know it was useful!