[Python-3000] Need help completing ABC pep

Jeffrey Yasskin jyasskin at gmail.com
Sun Apr 22 10:12:33 CEST 2007

On 4/21/07, Guido van Rossum <guido at python.org> wrote:
> On 4/21/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> > PartiallyOrdered:
> > This ABC defines the 5 inequality operations <, <=, >=, >, and cmp().
> Actually, I'm hoping to eliminate cmp() completely. Also, even if it
> doesn't get eliminated, the existence of cmp() suggests a total
> ordering.

Ok. It's more efficient when comparing large structures, but it's not
absolutely needed. In some languages it can be more convenient too,
but that's because you take advantage of their case statements.

> > Subclasses of PartiallyOrdered must define one of cmp() or <=.
> Actually, Python typically uses < as the single comparison operator
> that must be defined (e.g. list.sort() uses < exclusively).

With only a partial order, you need both < and == to define the other
operators, or you can do it with just <=. A total order allows you to
do it with either < or <= alone. sort(), at least in C++, needs at
least a "strict weak ordering"
(http://www.sgi.com/tech/stl/StrictWeakOrdering.html), which lets you
define equivalence but not equality using just <.

> > > > The final specification of Set should spell out the invariants, but
> > > > they're clear enough that the PEP probably doesn't need to. It's odd,
> > > > though, that you mention optimizations to __eq__ and __le__ but not
> > > > their semantics.
> > >
> > > The semantics are kind of obvious. :-)
> > >
> > > Text spelling out the invariants in more detail that I can just paste
> > > into the PEP is most welcome!
> >
> > Bah! ;)
> >
> > Set:
> > Sets should implement an efficient (no worse than O(log N))
> > __contains__ operation. Sets are defined by their extensions.
> What's an extension?

Oops. It's the collection of objects 'x' for which 'x in set' is true.
We can probably just skip that sentence.

> > Subclasses may override the following default definitions of __eq__
> > and __le__ with more efficient versions, but they must respect the
> > same semantics:
> >   def __eq__(a, b): return len(a) == len(b) and all(x in a for x in b)
> >   def __le__(a, b): return len(a) <= len(b) and all(x in a for x in b)
> >
> > ComposableSet:
> > ... mathematical meaning of set composition:
> > Where a and b are instances of ComposableSet and __op__(a,b) is defined:
> >   x in (a | b) ↔ x in a or x in b
> >   x in (a & b) ↔ x in a and x in b
> >   x in (a ^ b) ↔ x in a != x in b
> >   x in (a - b) ↔ x in a and not x in b
> >
> > > > ComposableSet's open issues:
> > > > > Should we just pick a concrete return type (e.g. set)?
> > > >
> > > > For set compositions, you should say whether the arguments need to be
> > > > uniform. (Is (IntSet(1,2,3) | set('a','b','c')) allowed?) If they do,
> > > > then you don't need a concrete return type; if they don't, then you
> > > > do, and the default implementations should use it.
> > >
> > > You may be misunderstanding what I meant. I certainly don't want to
> > > constrain the potential return type of the composition operations
> > > (except that they should themselves be composable sets). The question
> > > about picking a specific return type was meant as a way out to
> > > providing a concrete default implementation. I'm still on the fence of
> > > this, as it could be a big waste of time (I imagine that composing
> > > Patricia trees can be done more efficiently than iterating over the
> > > elements).
> >
> > Borrowing some syntax from Java generics, which of:
> >   ComposableSet __or__(ComposableSet a, ComposableSet b);
> >   <CS extends ComposableSet> ComposableSet __or__(CS a, CS b);
> >   <CS extends ComposableSet> CS __or__(CS a, CS b);
> > do you mean to define?
> Can't say I understand that. The minimum requirement is that the
> return type is a subclass of ComposableSet. *Some* subclasses CS may
> themselves guarantee that the result is always a subclass of CS, but I
> don't want to force that. Returning a 'set' in the default
> implementation satisfies the former but not the latter. This is in
> itself fine (CS can override __or__ to return a CS). But it could be
> expensive -- even if the author of CS doesn't care about the return
> type being a CS itself, they might still care about th efficiency.

I think you want my first option then. How about:

"The default implementation ensures that __or__, __and__, __sub__, and
__xor__ do the right thing for any ComposableSet with Hashable
elements, but because it iterates over both arguments and returns a
built-in set, it may be slow. Subclasses should optimize these
operations for types they know about but continue to support
heterogenous operations by delegating to ComposableSet.

"This should be rare: If you define a ComposableSet that can hold
non-Hashable elements, your users should expect that they can compose
sets with identical types, but not necessarily sets with different
types, unless, of course, you give them extra guarantees."

> > I think they have different implications about
> > what defaults you need to provide. The second is probably out because
> > it would mean (a|b|c) would likely break. This gets even more complex
> > if we consider the types of the contents of the sets... Here's a
> > possible overloaded signature for __or__. (Do you already have a
> > better notation for this?)
> I think this is more appropriate for other languages.

Having the compiler check the signature is definitely more appropriate
for other languages, or at least a third-party module. Knowing what
types we handle as arguments and what that type the result can be is
important for any language. My notation is, of course, probably not
the best we could come up with. :)

Jeffrey Yasskin

More information about the Python-3000 mailing list