sorted list vs dictionnary

Thu May 16 07:45:08 EDT 2002

Hello everyone,

I have a class that is able to process messages. I want to avoid
processing the same message twice, so a solution is to maintain a set
of processed messages IDs, and to check if for the presence of the ID
before processing a document. 

I've found two ways (without using anything outside the standard
library) of doing this:

 * use a dictionnary with the ID as the key and None as the value, and
 check for the key with cache.has_key(msg.ID)
 * use a sorted list and the bisect module to keep it sorted, and check
 for the key with 
 i = bisect.bisect_left(l,e)
 if i<len(l) and l[i] == e:
     pass

I'd expect the first one to be faster, since the lookup of a key in a
dictionnary is O(1), and insertion is a dictionnary is roughly O(1) too, 
whereas bisect_left is O(log(N)), and inserting in the middle of a list 
is O(N). 

Could anyone confirm the speed issue, and maybe share some insights about 
the memory efficiency of both solutions, especially when the number of
IDs get big (the order of magnitude is 50000)? I'm not too familiar with the
implementation of lists and dictionnary in the interpreter.

TIA

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).