[Python-3000] Need help completing ABC pep

Sat Apr 21 09:30:16 CEST 2007

On 4/20/07, Guido van Rossum <guido at python.org> wrote:
> On 4/20/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> > On Sets:
> > Do you have an example of a data structure that is a Set but not a
> > ComposableSet?
>
> Take for example a set that is a contiguous range of integers,
> represented by a (lo, hi) tuple. In many cases the return type of
> union and symmetric difference cannot be represented as an object of
> the same type. I don't want to force the implementer of such a set to
> provide implementations for operations they don't need.

Ah, good example. I think that PEPs that propose ABCs should be
encouraged to describe a few possible implementations, and should be
particularly encouraged to give examples of the implementations that
made them split two classes, or omit a method from a particular class.

> > Set should inherit from a PartiallyOrdered ABC that defines the
> > comparison operators.
>
> I guess then there would also be a TotallyOrdered ABC deriving from
> PartiallyOrdered which adds no operations but does add a constraint.
> I've added both, and made Set derive from PartiallyOrdered.

Cool. Here's some language for their invariants (checking myself with
the Fortress spec, and stealing its properties nearly verbatim):

PartiallyOrdered:
This ABC defines the 5 inequality operations <, <=, >=, >, and cmp().
(Note that == and != are defined by object.) Classes deriving from
this ABC should ensure that '<' and '>' form strict partial orders,
and '<=' and '>=' form weak partial orders, in the sense of
http://en.wikipedia.org/wiki/Partially_ordered_set. These should be
compatible with each other and with ==:

Where PO is a subtype of PartiallyOrdered, and a and b are instances of PO,
cmp(a, b) < 0 ↔ cmp(b, a) > 0
cmp(a, b) == 0 ↔ cmp(b, a) == 0
cmp(a, b) == None ↔ cmp(b, a) == None
a < b ↔ cmp(a, b) < 0
a <= b ↔ cmp(a, b) <= 0
a == b ↔ cmp(a, b) == 0
a >=b ↔ cmp(a, b) >= 0
a > b ↔ cmp(a, b) > 0

Subclasses of PartiallyOrdered must define one of cmp() or <=. The
other operators are defaulted as follows:
  def <=(a,b): return cmp(a,b) <= 0
  def cmp(a,b):
    if a<=b:
      if b<=a: return 0
      else: return -1
    else:
      if b<=a: return 1
      else: return None
  def ==(a,b): return a<=b and b<=a
  def <(a,b): return a<=b and not b<=a
  def >=(a,b): return b<=a
  def >(a,b): return b<a

Open Issues: 1) Is it reasonable to let cmp() return None for
unordered pairs, or should we instead define a 4-value enumeration for
it to return? 2) The @abstractmethod decorator isn't enough to specify
that subclasses must override at least one of a group of methods, and
there are classes with even more complex requirements like "subclasses
must override a or (b and (c or d))". 3)
http://docs.python.org/ref/customization.html#l2h-187 still implies
that < defines a total order over all python objects. Has that changed
for py3k? 4) How should we handle the reflected forms of these
operators?

TotallyOrdered:
This ABC derives from PartiallyOrdered. It adds no new operations but
implies a promise that <, <=, >, and >= are total orders in the sense
of http://en.wikipedia.org/wiki/Totally_ordered_set, and that cmp(a,b)
never returns None. Given this additional restriction, subclasses only
need to define one of <, <=, or cmp(), and the rest of the methods can
be defaulted.

Open Issue: How does the TotallyOrdered property inherit? If Real (an
abstract class) inherits from TotallyOrdered, do different subclasses
need to be totally comparable with each other? If long (a concrete
class) inherits from TotallyOrdered, and I subclass long, do I have to
ensure that my instances are totally comparable with other longs? I
think 'yes' to the second, to obey the Liskov substitution principle.
I don't know about the first: for Real it seems fine, but it might
inhibit refactoring the ABC hierarchy.

> > The final specification of Set should spell out the invariants, but
> > they're clear enough that the PEP probably doesn't need to. It's odd,
> > though, that you mention optimizations to __eq__ and __le__ but not
> > their semantics.
>
> The semantics are kind of obvious. :-)
>
> Text spelling out the invariants in more detail that I can just paste
> into the PEP is most welcome!

Bah! ;)

Set:
Sets should implement an efficient (no worse than O(log N))
__contains__ operation. Sets are defined by their extensions.
Subclasses may override the following default definitions of __eq__
and __le__ with more efficient versions, but they must respect the
same semantics:
  def __eq__(a, b): return len(a) == len(b) and all(x in a for x in b)
  def __le__(a, b): return len(a) <= len(b) and all(x in a for x in b)

ComposableSet:
... mathematical meaning of set composition:
Where a and b are instances of ComposableSet and __op__(a,b) is defined:
  x in (a | b) ↔ x in a or x in b
  x in (a & b) ↔ x in a and x in b
  x in (a ^ b) ↔ x in a != x in b
  x in (a - b) ↔ x in a and not x in b

> > ComposableSet's open issues:
> > > Should we just pick a concrete return type (e.g. set)?
> >
> > For set compositions, you should say whether the arguments need to be
> > uniform. (Is (IntSet(1,2,3) | set('a','b','c')) allowed?) If they do,
> > then you don't need a concrete return type; if they don't, then you
> > do, and the default implementations should use it.
>
> You may be misunderstanding what I meant. I certainly don't want to
> constrain the potential return type of the composition operations
> (except that they should themselves be composable sets). The question
> about picking a specific return type was meant as a way out to
> providing a concrete default implementation. I'm still on the fence of
> this, as it could be a big waste of time (I imagine that composing
> Patricia trees can be done more efficiently than iterating over the
> elements).

Borrowing some syntax from Java generics, which of:
  ComposableSet __or__(ComposableSet a, ComposableSet b);
  <CS extends ComposableSet> ComposableSet __or__(CS a, CS b);
  <CS extends ComposableSet> CS __or__(CS a, CS b);
do you mean to define? I think they have different implications about
what defaults you need to provide. The second is probably out because
it would mean (a|b|c) would likely break. This gets even more complex
if we consider the types of the contents of the sets... Here's a
possible overloaded signature for __or__. (Do you already have a
better notation for this?)

  <C super A, C super B, C extends Hashable> ComposableSet<C>
__or__(ComposableSet<A> a, ComposableSet<B> b);
  <C super A, C super B, (for x in A,B,C: CS<x> extends
ComposableSet<x>)> CS<C> __or__(CS<A> a, CS<B> b);

The first would probably be optimized when its arguments are of the
same type, but could fall back on the abstract methods from
ComposableSet, which would return a built-in (frozen?)set.

> > Numbers:
>
> Would you mind helping to write a PEP for numbers? I think I'm running
> out of round tuits just for the collections and the ABC
> infrastructure.

Bah! again. Yes, I'll try to throw something together with some help
from Neal. Who else has been thinking about the Numbers PEP?

> > Cardinal is also useful as the argument to (Sequence * Cardinal).
>
> I think there's little value for this in a dynamically typed language;
> the implementation can raise ValueError or treat negative numbers as
> zero (which is what Python currently does).

True, and any hypothetical type checkers that want to use such
information can define their own subclasses of Integer.

-- 
Namasté,
Jeffrey Yasskin