Py2.3: Feedback on Sets

Beni Cherniavsky cben at techunix.technion.ac.il
Sun Aug 17 15:37:09 EDT 2003


In comp.lang.python, you wrote:
>I've gotten lots of feedback on the itertools module
>but have not heard a peep about the new sets module.
>
>* Are you overjoyed/outraged by the choice of | and &
>   as set operators (instead of + and *)?
>
``&`` and ``|`` are good.  In math too, the union/intersection operators are
very similar to the or/and opearators.  The later are overloaded as
addition/mulitplication in boolead algebra only because the or/and operators
are so similar and inconvenient.  This overloading is quite misguided because
these operators don't define a field or even a ring (xor/and would do that).

>* Is the support for sets of sets necessary for your work
>   and, if so, then is the implementation sufficiently
>   powerful?
>
I used sets very heavily in my last project, including nested sets.  The
implementation is powerful enough.  The concept of sets themselves became too
weak at some moment (I needed to associate data with each element) but I got
so used to the powerful union/intersection/etc. operations that I
implemented__ them over dicts...

__ http://www.technion.ac.il/~cben/python/dsets.py

I suspect that the set/dict boundary will be crossed quite frequently by
people; it's hard to know beforehand that you'll never want to asssociate
values with the keys.  Besides, set operators are a very powerful and
high-level way to manipulate dicts.  So perhaps it'd be a good idea to
integrate sets with dicts more along the lines of my dsets module.  So that
dicts grow in power as well.

The biggest lesson from my dsets module is that for set operations on dicts,
you really want to choose how to handle collisions (both sets have the same
key so two values contend for the result).  And you want to choose it
per-operation.  It's OK if you can only do it on named methods but not on
overloaded operators (there is no good syntax for that).

My dsets module treats iterables (including sets) as dsets whose values are
`None`.  This direction also suggests that sets could be true dicts: when you
omit the value in a dict initializer, it defaults to `None`.  But after some
thinking I think it would be suboptimal.  One reason: it means you should to
choose a collision function even when working with sets whose value are
meaningless, which makes little sense.  I now think it'd be better to separate
them (by presence of the __getitem__ method) and only give meaning to
collision functions if both arguments have values.  So:

- ``<set> & <set>`` means what it means now.
- ``<dict> & <set>`` means subset of dict.
- ``<dict>.intersection(<dict>, <collision_func>) means what it means in
  dsets.

But there are more complications:

- ``<dict> | <set>`` must mean union of sets because for keys not in <dict>
  you have no value.
- Should ``<dict> | <dict>``, ``<dict> & <dict>``, etc. raise exception on
  collisions (as now) or simply return sets?

I'll try to release a working design soon.  Feedback most welcome.

>* Is there a compelling need for additional set methods like
>   Set.powerset() and Set.isdisjoint(s) or are the current
>   offerings sufficient?
>
Generating the full powerset seems pretty useless.  `isdisjoint` sounds like a
good idea since it can be implemented more effeciently than ``not a & b``.

>* Does the performance meet your expectations?
>
I think so.  Almost half of my program's time is spent in set.py according to
the profiler but when I optimized the sets module (and my dsets) with psyco,
things got several percent slower.  Is this a good sign ?-)

>* Do you care that sets can only contain hashable elements?
>
No more than I care for dicts containing only hashable keys.  How else could
you define it?

What I do like is the auto-copying of mutable sets when used a set items; I
wouldn't mind something like this to be extended into other Python types
(primarily dicts).

>* How about the design constraint that the argument to most
>   set methods must be another Set (as opposed to any iterable)?
>
I'd go for any iterable where it's enough and require __contains__ where
that's needed.  I think together this should cover all method's needs.  I
think it's better not to add artificail checks, so that other set-alike types
can be used with sets (primarily mapping or set-alike types).

>* Are the docs clear?  Can you suggest improvements?
>
It's been a lot of time since I had to read them which is probably a good
sign.

>* Are sets helpful in your daily work or does the need arise
>   only rarely?
>
Helpful.  Since they were released on 2.2, I feel free to import sets whenever
it reflects my meaning clearer than dicts, even when I don't use any fancy
operations.  I find the code more readable this way.

And sets of sets (or ImmutableSets as dict keys) are very helpful when you
need them.

-- 
Beni Cherniavsky <cben at tx.technion.ac.il>





More information about the Python-list mailing list