deduping
Peter Otten
__peter__ at web.de
Mon Jun 21 09:27:03 EDT 2010
dirknbr wrote:
> Hi
>
> I have 2 files (done and outf), and I want to chose unique elements
> from the 2nd column in outf which are not in done. This code works but
> is not efficient, can you think of a quicker way? The a=1 is just a
> redundant task obviously, I put it this way around because I think
> 'in' is quicker than 'not in' - is that true?
>
> done_={}
> for line in done:
> done_[line.strip()]=0
>
> print len(done_)
>
> universe={}
> for line in outf:
> if line.split(',')[1].strip() in universe.keys():
> a=1
> else:
> if line.split(',')[1].strip() in done_.keys():
> a=1
> else:
> universe[line.split(',')[1].strip()]=0
Instead of
if key in some_dict.keys():
#...
which converts the keys in the dictionary to a list and then performs an
O(N) lookup on that list you should use
if key in some_dict:
#...
which doesn't build a list and looks up the key in constant time.
Peter
More information about the Python-list
mailing list