[Python-Dev] I think my set module is ready for prime time; comments?

Christopher Petrilli petrilli@amber.org
Tue, 23 Jan 2001 14:06:05 -0500


Eric S. Raymond [esr@thyrsus.com] wrote:
> I believe most uses of sets are small sets.  That reduces the speed ouch
> of using a list representation and increases the proportional memory
> ouch of a dictionary implementation.

The problem is that there are a lot of uses for large sets, especially 
when you begin to introduce intersections and unions.  If an
implementation is only useful for a few dozen (or a hundered) items in 
the set, that eliminates a lot of places where the real use of set
types is useful---optimizing large scale manipulations.

Zope for example, manipulates sets with 10,000 items in it on a
regular basis when doing text index manipulation.  The data structures 
are heavily optimized for this kind of behaviour, without a major
sacrifice in space.  I think Jim perhaps can talk to this. 

Unfortunately, for me, a Python implementation of Sets is only
interesting academicaly.  Any time I've needed to work with them at a
large scale, I've needed them *much* faster than Python could achieve
without a C extension.

Perhaps the difference is in problem domain.  In the "scripting"
problem domain, I would agree that Setswould rarely reach large sizes, 
and so a algorithm which performed in quadratic time might be fine,
because the actual resultant time is small.  However, in more
full-blown applications, this would be counter productive, and the
user would be forced implement their own (or use Aaron's excellent
kjBuckets).

Just my opinion, of course.
Chris
-- 
| Christopher Petrilli
| petrilli@amber.org