<p>Dear all,</p>
<p>I have 2 lists stored in 2 text files may have duplicated records, the raw data looks like this:</p>
<div>lfruit lcountry<br>====== =========<br>orange japan<br>pear china<br>orange china<br>
apple american<br>cherry india<br>lemon china<br>lemon japan<br>strawberry korea<br>
banana thailand<br> australia<br>basically, what I want is:<br> 1. all of the duplicated records need to be removed and <br> 2. the unique items need bind with an unique integer ID, something like a PK in database, no sort needed.<br>
</div>
<div>but before you give answer here, pls also read below.</div>
<p>lfruit lcountry <br>====== =========<br>1 orange 1 japan<br>2 pear 2 china<br>3 apple 3 american<br>
4 cherry 4 india<br>5 lemon 5 taiwan<br>6 strawberry 6 korea<br>7 banana 7 thailand<br> 8 australia</p>
<p>Q1,the items in above lists may need to be added and deleted later, then how to make the list easy to extend and how to make sure the items have a sequenced, unique fixed, INTERGET type ID bind with those items?</p>
<p>Here is why I want an INTEGER ID not hash or uuid: the "uuid4" is not working on my case because I want make that ID may transfer information in low cost in a MCU protocol style later, I means the INTEGER ID used here also as the binary stream position id in my protocol, take lfruit data here for example, a bin stream 0111100 can with the meaning of lfruit items exists or not.</p>
<p><br>Also, a combination of 2 lists may needed later to generate new list or called matrix, also as above, an unique ID is also needed here:<br> lcombination = [lfruit] * [lcountry] <br> ============<br>1 japan orange #(1,1) <br>
2 japan pear #(1,2)<br>3 japan apple #(1,3)<br>4 japan cherry #(1,4)<br>5 japan lemon ...<br>6 japan strawberry ...<br>7 japan banana ...<br>
8 china orange #(2,1)<br>9 china pear #(2,2)<br>…… </p>
<div>Q2, because the lcombination come from the extendable items in lists, then how to make sure the unique ID here also is always fixed and unique?</div>
<p>BTW: my original plan is to use dict or list as the runtime data container and use sqlite as the storage also the assigee of the unique ID , however, base on answer from <br> <a href="http://old.nabble.com/(python)-how-to-define-unchangeable-global-ID-in-a-table--td29000959.html">http://old.nabble.com/(python)-how-to-define-unchangeable-global-ID-in-a-table--td29000959.html</a><br>
it may not just rely on sqlite ensure the unique ID assignee mechanism may works, then I asks help here, any answer or comment will be highly appricated!</p>
<p>Thanks,<br>KC </p>
<p> </p>