How to represent sets

James J. Besemer jb at
Tue Sep 17 08:32:01 CEST 2002

Paul Rubin wrote:

>"James J. Besemer" <jb at> writes:
>>For dense sets isomorphic with integers, the long integer
>>implementation probably will be superior.
>Using immutable long integers to represent mutable sets doesn't make
>much sense.  Better to use arrays like the ones implemented by the
>array module.

First off, by "superior" I meant superior to a dictionary implementation.

Second, I don't see why arrays would be that much better than long 
integers.  Depends on the relative efficiency of arrays vs. long integers.

Perhaps I give the Python implementers too much credit, but I also 
expect that the implementation of long integers would be about as good 
as it gets in Python with arbitrary bit vectors.

I may be overly optimistic on this last point but even if the 
implementation of long integers is mediocre, I'd expect it to be better 
than arrays.

Given the generality of having to handle a variety of sub-types, off 
hand, I'd expect arrays to have a little more built-in overhead. 
 Furthermore, the array module has no methods for doing arithmetic on 
the entire array.  Thus to AND or OR two arrays together, you need to 
write an explicit loop or map().  I can't believe it would be as fast as 
the builtin operations for long integers.  And the work to make the 
result array the proper length would have to be spelled out in long 
hand, while it's atomic with long integers.

You seem to attach some special significance to the fact that long 
integers are "immutable" while arrays are not.  However, from the little 
bit that I've studied, the implementation is smart enough to reuse the 
erstwhile 'immutable' values of intermediate results from expressions, 
thus eliminating a pair of create/destroy operations.  I believe they 
also optimize special cases like

    bits1 |= bits2

and reuse the 'immutable' representation of the value of bits1 if the 
reference count is 1.  If not, then it should do.  In the more general 
cases, both implementations (arrays or long ints) would have to create a 
new destination result and probably free an old one.  

Maybe you're right but I just don't see why an array implementation of a 
bit set would be faster than a long integer implementation.

Do you have any benchmarks to support your argument?



James J. Besemer		503-280-0838 voice
2727 NE Skidmore St.		503-280-0375 fax
Portland, Oregon 97211-6557	mailto:jb at	

More information about the Python-list mailing list