[Python-3000] Need help completing ABC pep
Guido van Rossum
guido at python.org
Sat Apr 21 00:24:26 CEST 2007
On 4/20/07, Jeffrey Yasskin <jyasskin at gmail.com> wrote:
> On 4/19/07, Guido van Rossum <guido at python.org> wrote:
> > I've started a PEP on Abstract Base Classes (ABCs), PEP 3119:
> >
> > http://www.python.org/dev/peps/pep-3119/
> >
> > While I'm not ready yet to answer tough questions about this compared
> > to alternative proposals, I *am* ready for feedback on the various
> > open issues sprinkled throughout the current text, especially
> > concerning decisions regarding exactly which operations to include in
> > the various ABCs, and also regarding the level of detail required in
> > the PEP.
>
> I have a bunch of comments that aren't all related to the open issues.
> I'm more familiar with statically typed languages, so sorry if some of
> these comments don't make sense in a Python context.
Thanks for the feedback! You're doing fine. :-)
> Hashable might want to specify the exception that should be thrown for
> mutable objects.
Thanks -- done.
> I wouldn't have guessed what Finite was from the name, but I don't
> mind it now that I know. I do prefer 'Sized'.
Sized it is.
> A possible name for the __contains__ ABC for sequences and strings is
> 'Searchable'. On the other hand, an annotation of some sort for the
> running time of the method might do as well, but I don't know of any
> prior art for that.
I've added the name suggestion to the open issues; I like it. I'm not
sure I want to break new ground specifying running time; it's a tricky
business.
> On Sets:
> Do you have an example of a data structure that is a Set but not a
> ComposableSet?
Take for example a set that is a contiguous range of integers,
represented by a (lo, hi) tuple. In many cases the return type of
union and symmetric difference cannot be represented as an object of
the same type. I don't want to force the implementer of such a set to
provide implementations for operations they don't need.
A different POV could be to move the concrete composition operations
back into Set and let the default implementations return built-in
``set`` instances. I'm not sure how useful that is.
> Set should inherit from a PartiallyOrdered ABC that defines the
> comparison operators.
I guess then there would also be a TotallyOrdered ABC deriving from
PartiallyOrdered which adds no operations but does add a constraint.
I've added both, and made Set derive from PartiallyOrdered.
> The final specification of Set should spell out the invariants, but
> they're clear enough that the PEP probably doesn't need to. It's odd,
> though, that you mention optimizations to __eq__ and __le__ but not
> their semantics.
The semantics are kind of obvious. :-)
Text spelling out the invariants in more detail that I can just paste
into the PEP is most welcome!
> ComposableSet's open issues:
> > Should we define an API for creating new instances (e.g. a class method or a fixed constructor signature)?
>
> There are some set types that can't be constructed from arbitrary
> sequences. For example, Patricia Tries
> (http://www.haskell.org/ghc/docs/latest/html/libraries/base/Data-IntSet.html)
> can be used to make very efficient sets of integers, but only
> integers.
Right. That's why I'm separating ComposableSet from Set.
> > Should we just pick a concrete return type (e.g. set)?
>
> For set compositions, you should say whether the arguments need to be
> uniform. (Is (IntSet(1,2,3) | set('a','b','c')) allowed?) If they do,
> then you don't need a concrete return type; if they don't, then you
> do, and the default implementations should use it.
You may be misunderstanding what I meant. I certainly don't want to
constrain the potential return type of the composition operations
(except that they should themselves be composable sets). The question
about picking a specific return type was meant as a way out to
providing a concrete default implementation. I'm still on the fence of
this, as it could be a big waste of time (I imagine that composing
Patricia trees can be done more efficiently than iterating over the
elements).
> Even this may be a
> problem if someone writes a TreeSet which requires
> TotallyOrdered-but-not-Hashable elements. In any case, the ABC should
> encourage implementors to return an instance of the same type as the
> arguments when the arguments are of the same type.
Another good argument against having a default concrete implementation
that always returns a built-in set.
> > Should we add the copy method?
> Not on ComposableSet, but maybe in a Copyable ABC, which it might inherit from.
I'd rather leave copying APIs out of this completely. There's a
separate copying protocol in Python already, you can define __copy__()
for a shallow copy and __deepcopy__() for a deep copy (copies the
elements recursively). This is also tied into pickling
(serialization).
> MutableSet:
> > Should we support more operations implemented by the Python 2 set type?
>
> You have groups of equivalent operations (add, update, __ior__),
> (discard, difference_update, __isub__), (??, intersection_update,
> __iand__), and (??, symmetric_difference_update, __ixor__)
>
> It would be nice to make these all uniform. Intersection is hard to
> support one element at a time, but symmetric difference would work
> with a toggle() method.
Cool, I've added toggle(). FWIW, I've remoevd the named methods
(update etc.) in favor of stretching the accepted args for the
operators (__ior__ etc.).
> The operations that work a group at a time are
> likely be more efficient on Sized arguments, and much more efficient
> on arguments of the same type. How much are you trying to support
> these kinds of optimizations to implementations?
Haven't thought much about it. Implementations can just override the
concrete methods if they have a faster way, and fall back on the
concrete base class methods if their faster way doesn't apply for a
particular argument.
> > Should we unify remove and discard, a la Java (which has a single method returning a boolean indicating whether it was removed or not)?
>
> Yes. :)
Done.
> Mappings:
> Set-like unions, intersections, and differences are useful on maps
> too, but you usually have to provide a function to say what to do with
> colliding values. This is probably a different PEP though.
Right. My dream is to unify the set and dict types, solving the
problem of how to specify an empty set ({} would do it), but this was
rejected by Raymond Hettinger for valid pragmatic reasons.
> Sequence:
> I guess you've already dealt with this, but allowing __getitem__ with
> integer indices on str objects forces you to use UTF-32 or play
> interesting games with caching. Most string algorithms are quite happy
> with (bidirectional? multipass?) iterators, but programmers aren't
> used to using them.
Yeah, we've had our fill of discussions about alternative
implementations. I want the *semantics* to be those of UTF-32 or
UTF-16, and the initial implementation will use that, since these
semantics are most familiar to Python users. At some point we may have
competing implementations.
> Numbers:
Would you mind helping to write a PEP for numbers? I think I'm running
out of round tuits just for the collections and the ABC
infrastructure.
> Haskell might be good to use for inspiration on the numeric classes:
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/Prelude.html#7.
> Note that Haskell's "Real" doesn't imply that the number is
> fractional, just that it's not complex. Some differences I see from
> Bill Janssen's proposal:
> * A Haskell Num is constructible from any Integer (python's long).
> * A Haskell Fractional (including Complex) is constructible from any
> Rational (which can be constructed from a float)
> * The operations are spread into more classes. I think that primarily
> helps Complex and Fixed-point types fit into the hierarchy better. The
> Haskell folks actually complain that the operations aren't spread into
> _enough_ classes. They mostly want more group-theory related classes
> below Num.
> * The bitwise operators are segregated into a Bits class, but I don't
> see any benefit to that.
Right. Python has a long tradition of using ints (and longs) for that.
> * In a possible use for Cardinal, Integral**Cardinal=>Integral,
> Fractional**Integral=>Fractional, and Floating**Floating=>Floating.
That can be (and is currently) done dynamically (see below).
> * Complex isn't Orderable
>
> Fortress (http://research.sun.com/projects/plrg/fortress.pdf#page=270)
> has an extremely detailed hierarchy of algebraic traits, and it looks
> like they plan a similarly detailed hierarchy of numbers, but the
> numbers aren't yet specified beyond integers and rationals.
>
> Cardinal is also useful as the argument to (Sequence * Cardinal).
I think there's little value for this in a dynamically typed language;
the implementation can raise ValueError or treat negative numbers as
zero (which is what Python currently does).
> Index (Ix) is used in Haskell primarily in the form of tuples of Ints
> for multi-dimensional arrays.
Right, that's where we use it too (especially the NumPy community).
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list