<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Offtopic: To Check or Not to check for duplication. (DBDatastores) in Dropbox API Support &amp; Feedback</title>
    <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8468#M362</link>
    <description>&lt;P&gt;This is purely off topic. Not dropbox related at all, but really this is a sync design question. (if this is against the rules, My apologies to the mods).&lt;/P&gt;

&lt;P&gt;I've been struggling with an idea to avoid saving duplicate DBRecords.&lt;/P&gt;

&lt;P&gt;We know that &lt;CODE&gt;record.recordID&lt;/CODE&gt; is the main identifier. You can create it once in its life time and Never change it. (Obviously, why would anyone?)&lt;/P&gt;

&lt;P&gt;Each DBRecord has a text. &lt;CODE&gt;record["text"]&lt;/CODE&gt;, All incoming DBRecords deposit these records in my local container. Using its &lt;CODE&gt;recordID&lt;/CODE&gt; as the identifier.&lt;/P&gt;

&lt;P&gt;At some point the User:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Delinks Dropbox&lt;/LI&gt;
&lt;LI&gt;Links Dropbox&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;My understanding is that delinking will delete all cache (which is good). Newly relinked &lt;CODE&gt;DBDatastores&lt;/CODE&gt; will try to send all those records. &lt;/P&gt;

&lt;P&gt;It is here that I don't really know whats the right move here, thats in line with the end-user's expectation.&lt;/P&gt;

&lt;P&gt;So this is what I planned.&lt;/P&gt;

&lt;P&gt;Upon Incoming Datastores and its Records.&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Check whether I have any local data at all. If I don't, assume its a new app and/or device installation. Add all DBRecords. &lt;STRONG&gt;Check for Nothing&lt;/STRONG&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;EM&gt;I do have some local data&lt;/EM&gt;. The next move is:

&lt;UL&gt;
&lt;LI&gt;
&lt;STRONG&gt;Add all &lt;CODE&gt;DBRecords&lt;/CODE&gt; as if they're NEW&lt;/STRONG&gt;  OR **&lt;/LI&gt;
&lt;LI&gt;Check for Duplication, only add those that are completely new.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Both doable. But now I'm thinking to myself, what If the user intentionally wants duplicated entries. To make modifications later. Finally, here i arrive at a conclusion that I give users a choice. &lt;STRONG&gt;Same entires found. Continue to add them as new or Cancel?&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;So my question is: Is this unnecessary complexity? Should I be even bothering about this? It bothers me because I don't like to have these duplications. I use this app myself. I don't like deleting them later.&lt;/P&gt;

&lt;P&gt;How did the Dropbox engineers handle sync (localdatastores transfer etc..)&lt;/P&gt;

&lt;P&gt;And on a side note, is MD5 still fast enough for simple text check? Or I should look at SHA or something&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
    <pubDate>Wed, 29 May 2019 09:45:49 GMT</pubDate>
    <dc:creator>Raheel S.</dc:creator>
    <dc:date>2019-05-29T09:45:49Z</dc:date>
    <item>
      <title>Offtopic: To Check or Not to check for duplication. (DBDatastores)</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8468#M362</link>
      <description>&lt;P&gt;This is purely off topic. Not dropbox related at all, but really this is a sync design question. (if this is against the rules, My apologies to the mods).&lt;/P&gt;

&lt;P&gt;I've been struggling with an idea to avoid saving duplicate DBRecords.&lt;/P&gt;

&lt;P&gt;We know that &lt;CODE&gt;record.recordID&lt;/CODE&gt; is the main identifier. You can create it once in its life time and Never change it. (Obviously, why would anyone?)&lt;/P&gt;

&lt;P&gt;Each DBRecord has a text. &lt;CODE&gt;record["text"]&lt;/CODE&gt;, All incoming DBRecords deposit these records in my local container. Using its &lt;CODE&gt;recordID&lt;/CODE&gt; as the identifier.&lt;/P&gt;

&lt;P&gt;At some point the User:&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Delinks Dropbox&lt;/LI&gt;
&lt;LI&gt;Links Dropbox&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;My understanding is that delinking will delete all cache (which is good). Newly relinked &lt;CODE&gt;DBDatastores&lt;/CODE&gt; will try to send all those records. &lt;/P&gt;

&lt;P&gt;It is here that I don't really know whats the right move here, thats in line with the end-user's expectation.&lt;/P&gt;

&lt;P&gt;So this is what I planned.&lt;/P&gt;

&lt;P&gt;Upon Incoming Datastores and its Records.&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;Check whether I have any local data at all. If I don't, assume its a new app and/or device installation. Add all DBRecords. &lt;STRONG&gt;Check for Nothing&lt;/STRONG&gt;
&lt;/LI&gt;
&lt;LI&gt;
&lt;EM&gt;I do have some local data&lt;/EM&gt;. The next move is:

&lt;UL&gt;
&lt;LI&gt;
&lt;STRONG&gt;Add all &lt;CODE&gt;DBRecords&lt;/CODE&gt; as if they're NEW&lt;/STRONG&gt;  OR **&lt;/LI&gt;
&lt;LI&gt;Check for Duplication, only add those that are completely new.&lt;/LI&gt;
&lt;/UL&gt;
&lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;Both doable. But now I'm thinking to myself, what If the user intentionally wants duplicated entries. To make modifications later. Finally, here i arrive at a conclusion that I give users a choice. &lt;STRONG&gt;Same entires found. Continue to add them as new or Cancel?&lt;/STRONG&gt;&lt;/P&gt;

&lt;P&gt;So my question is: Is this unnecessary complexity? Should I be even bothering about this? It bothers me because I don't like to have these duplications. I use this app myself. I don't like deleting them later.&lt;/P&gt;

&lt;P&gt;How did the Dropbox engineers handle sync (localdatastores transfer etc..)&lt;/P&gt;

&lt;P&gt;And on a side note, is MD5 still fast enough for simple text check? Or I should look at SHA or something&lt;/P&gt;

&lt;P&gt;Thanks&lt;/P&gt;</description>
      <pubDate>Wed, 29 May 2019 09:45:49 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8468#M362</guid>
      <dc:creator>Raheel S.</dc:creator>
      <dc:date>2019-05-29T09:45:49Z</dc:date>
    </item>
    <item>
      <title>Re: Offtopic: To Check or Not to check for duplication. (DBDatastores)</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8469#M363</link>
      <description>&lt;P&gt;You may want to take a look at &lt;A href="https://www.dropbox.com/developers/blog/84/initializing-data-in-datastores-with-getorinsert" rel="nofollow noreferrer"&gt;https://www.dropbox.com/developers/blog/84/initializing-data-in-datastores-with-getorinsert&lt;/A&gt;, which shows how to use &lt;CODE&gt;getOrInsert&lt;/CODE&gt; to avoid duplicates. The basic idea is that you need to model your data in such a way that if two records are considered "duplicates," then they'll have the same record ID. Then &lt;CODE&gt;getOrInsert&lt;/CODE&gt; will take care of avoiding duplicates altogether, which lets you stop worrying about it.&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jan 2015 11:40:15 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8469#M363</guid>
      <dc:creator>Steve M.</dc:creator>
      <dc:date>2015-01-26T11:40:15Z</dc:date>
    </item>
    <item>
      <title>Re: Offtopic: To Check or Not to check for duplication. (DBDatastores)</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8470#M364</link>
      <description>&lt;P&gt;Hey Steve!&lt;/P&gt;

&lt;P&gt;Yea I fully utilise &lt;CODE&gt;getOrInsert&lt;/CODE&gt;. It works okay. The above scenario is not really about Datastore handling but After I do receive those &lt;CODE&gt;DBRecords&lt;/CODE&gt;. In essence its about handling duplication locally. I don't use &lt;CODE&gt;LocalDatastores&lt;/CODE&gt;. &lt;/P&gt;

&lt;P&gt;Heres what I do.&lt;BR /&gt;
I have a local db where I store &lt;CODE&gt;recordID&lt;/CODE&gt; and the &lt;CODE&gt;text&lt;/CODE&gt;. ( yes i don't use localDatastores, aah, for some reasons).&lt;/P&gt;

&lt;P&gt;For Incoming DBRecord: (At Sync)&lt;/P&gt;

&lt;OL&gt;
&lt;LI&gt;I check if recordID already exists, if yes, I update the &lt;CODE&gt;localDb.record.text = record[@"text"]&lt;/CODE&gt;. If it does not exist, I create a New row in my &lt;CODE&gt;localDB&lt;/CODE&gt; with that &lt;CODE&gt;recordID&lt;/CODE&gt; and &lt;CODE&gt;text&lt;/CODE&gt;.
This is fine. 
The uniqueness of a DBRecord is the &lt;CODE&gt;record["text"]&lt;/CODE&gt;; thats it. &lt;/LI&gt;
&lt;/OL&gt;

&lt;P&gt;I save the &lt;CODE&gt;recordID&lt;/CODE&gt; and &lt;CODE&gt;text&lt;/CODE&gt; field on the localDB.&lt;/P&gt;

&lt;P&gt;Ok great.&lt;/P&gt;

&lt;P&gt;Now, I open up This very same &lt;CODE&gt;DBRecord&lt;/CODE&gt;. I want to update the text of this very DBRecord. I don't want to create a new one. I want to update this &lt;CODE&gt;DBRecord&lt;/CODE&gt;. I already know its &lt;CODE&gt;recordID&lt;/CODE&gt;, because I stored it on my localDB.&lt;/P&gt;

&lt;P&gt;I get that DBRecord using &lt;CODE&gt;getOrInsert&lt;/CODE&gt;, and I update the record["text1"] field (&lt;STRONG&gt;text --&amp;gt; text1&lt;/STRONG&gt;). And sure enough the record is updated with new data. BUT, it is here that the relationship between &lt;CODE&gt;recordID&lt;/CODE&gt; and the &lt;CODE&gt;text&lt;/CODE&gt; field has become invalid. When this &lt;CODE&gt;DBRecord&lt;/CODE&gt; was &lt;STRONG&gt;created`&lt;/STRONG&gt;, the &lt;CODE&gt;recordID&lt;/CODE&gt; == &lt;CODE&gt;md5("text")&lt;/CODE&gt;. NOW:  md5("text1") != &lt;CODE&gt;recordID&lt;/CODE&gt;.&lt;/P&gt;

&lt;P&gt;Next time, When I create a new record with &lt;CODE&gt;text1&lt;/CODE&gt; (like importing from the same file) that is equal to the text1 from above; I potentially created a duplicate DBRecord. &lt;/P&gt;

&lt;UL&gt;
&lt;LI&gt;I know that text1 exists already in a DBRecord.&lt;/LI&gt;
&lt;LI&gt;But I cannot avoid duplication by looking for md5(&lt;STRONG&gt;text1&lt;/STRONG&gt;). &lt;/LI&gt;
&lt;LI&gt;that DBRecord has an &lt;STRONG&gt;Older&lt;/STRONG&gt; md5(&lt;STRONG&gt;text&lt;/STRONG&gt;), but text was later updated to &lt;STRONG&gt;text1&lt;/STRONG&gt;.&lt;/LI&gt;
&lt;/UL&gt;

&lt;P&gt;I want to be able to give my self a choice to avoid this kind of duplication. What would you guys do here? &lt;/P&gt;

&lt;P&gt;Its way early morning right now. I hope  what i wrote is understandable. This isn't a datastore problem. Its my problem &lt;img class="lia-deferred-image lia-image-emoji" src="https://www.dropboxforum.com/html/@EA4D5AD6084EAC95CB4E739348E74CC6/emoticons/1f615.png" alt=":confused_face:" title=":confused_face:" /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jan 2015 18:50:00 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8470#M364</guid>
      <dc:creator>Raheel S.</dc:creator>
      <dc:date>2015-01-26T18:50:00Z</dc:date>
    </item>
    <item>
      <title>Re: Offtopic: To Check or Not to check for duplication. (DBDatastores)</title>
      <link>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8471#M365</link>
      <description>&lt;P&gt;Do you consider two records with the same ID but different text to be "the same" or not? (Is it a "duplicate" for both to exist?)&lt;/P&gt;

&lt;P&gt;If the answer is "yes," then I think what you're doing should work well already. If the answer is "no," then you need to change your data model. (You should probably model editing an existing record as a delete and then an insert of a new record.)&lt;/P&gt;

&lt;P&gt;I don't think anyone but you will be able to answer this question, since we don't know what your app is or does. &lt;img class="lia-deferred-image lia-image-emoji" src="https://www.dropboxforum.com/html/@FBF7D2AB59A0D6E861EBF6A36F93B7E2/emoticons/1f642.png" alt=":slightly_smiling_face:" title=":slightly_smiling_face:" /&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 26 Jan 2015 20:14:00 GMT</pubDate>
      <guid>https://www.dropboxforum.com/t5/Dropbox-API-Support-Feedback/Offtopic-To-Check-or-Not-to-check-for-duplication-DBDatastores/m-p/8471#M365</guid>
      <dc:creator>Steve M.</dc:creator>
      <dc:date>2015-01-26T20:14:00Z</dc:date>
    </item>
  </channel>
</rss>

