How to represent sets

James J. Besemer jb at cascade-sys.com
Tue Sep 17 08:32:01 CEST 2002

```Paul Rubin wrote:

>"James J. Besemer" <jb at cascade-sys.com> writes:
>
>
>>For dense sets isomorphic with integers, the long integer
>>implementation probably will be superior.
>>
>>
>
>Using immutable long integers to represent mutable sets doesn't make
>much sense.  Better to use arrays like the ones implemented by the
>array module.
>

First off, by "superior" I meant superior to a dictionary implementation.

Second, I don't see why arrays would be that much better than long
integers.  Depends on the relative efficiency of arrays vs. long integers.

Perhaps I give the Python implementers too much credit, but I also
expect that the implementation of long integers would be about as good
as it gets in Python with arbitrary bit vectors.

I may be overly optimistic on this last point but even if the
implementation of long integers is mediocre, I'd expect it to be better
than arrays.

Given the generality of having to handle a variety of sub-types, off
hand, I'd expect arrays to have a little more built-in overhead.
Furthermore, the array module has no methods for doing arithmetic on
the entire array.  Thus to AND or OR two arrays together, you need to
write an explicit loop or map().  I can't believe it would be as fast as
the builtin operations for long integers.  And the work to make the
result array the proper length would have to be spelled out in long
hand, while it's atomic with long integers.

You seem to attach some special significance to the fact that long
integers are "immutable" while arrays are not.  However, from the little
bit that I've studied, the implementation is smart enough to reuse the
erstwhile 'immutable' values of intermediate results from expressions,
thus eliminating a pair of create/destroy operations.  I believe they
also optimize special cases like

bits1 |= bits2

and reuse the 'immutable' representation of the value of bits1 if the
reference count is 1.  If not, then it should do.  In the more general
cases, both implementations (arrays or long ints) would have to create a
new destination result and probably free an old one.

Maybe you're right but I just don't see why an array implementation of a
bit set would be faster than a long integer implementation.

Do you have any benchmarks to support your argument?

Regards

--jb

--
James J. Besemer		503-280-0838 voice
2727 NE Skidmore St.		503-280-0375 fax
Portland, Oregon 97211-6557	mailto:jb at cascade-sys.com