Py2.3: Feedback on Sets

Raymond Hettinger python at rcn.com
Sun Aug 17 21:54:07 EDT 2003


[Beni Cherniavsky]
> Generating the full powerset seems pretty useless.  `isdisjoint` sounds like a
> good idea since it can be implemented more effeciently than ``not a & b``.

True enough.  It is easy to putogether an early-out algorithm using itertools.
So, it can be done a C speed.  The question is whether there
are sufficient use cases to warrant expanding an already fat API.


> >* Does the performance meet your expectations?
> >
> I think so.  Almost half of my program's time is spent in set.py according to
> the profiler but when I optimized the sets module (and my dsets) with psyco,
> things got several percent slower.  Is this a good sign ?-)

It's a very good sign.

Already, most methods run at C speed thanks to itertools and dicts.
Try directing pysco to BaseSet._update() and BaseSet.__contains__().
Those are the two most important methods that do not run at C speed.


> >* Do you care that sets can only contain hashable elements?
> >
> No more than I care for dicts containing only hashable keys.  How else could
> you define it?

The hard way.  Keep a separate ordered list for elements that define ordering
operations but not a hash value.  Insert and Search using a binary search.

Keep a third list for elements that only support equality comparision.
Use linear search for that one.

This complicates the heck out of the code but would expand the range of
things storable in sets.  The only real life use case I can think of is using
a third party package that supplied objects without a hash code.


> What I do like is the auto-copying of mutable sets when used a set items; I
> wouldn't mind something like this to be extended into other Python types
> (primarily dicts).

The protocol would also work for dicts, but the key to getting it to work
is installing as_immutable methods in the objects that are being stored.
Something that touches that much code would also need compelling
use cases to justify it.  But yes, it can be done.


> >* How about the design constraint that the argument to most
> >   set methods must be another Set (as opposed to any iterable)?
> >
> I'd go for any iterable where it's enough and require __contains__ where
> that's needed.  I think together this should cover all method's needs.  I
> think it's better not to add artificail checks, so that other set-alike types
> can be used with sets (primarily mapping or set-alike types).

Wish granted.

I've relaxed the requirements that the arguments be a set and now any iterable
will do. This will come out in Py2.3.1.


> >* Are the docs clear?  Can you suggest improvements?
> >
> It's been a lot of time since I had to read them which is probably a good
> sign.

Great!

 
> >* Are sets helpful in your daily work or does the need arise
> >   only rarely?
> >
> Helpful.  Since they were released on 2.2, I feel free to import sets whenever
> it reflects my meaning clearer than dicts, even when I don't use any fancy
> operations.  I find the code more readable this way.

Me too.


> And sets of sets (or ImmutableSets as dict keys) are very helpful when you
> need them.

Good.  I've gotten almost no replies from people using sets of sets.


Raymond Hettinger






More information about the Python-list mailing list