[Python-checkins] r55278 - peps/trunk/pep-3119.txt
guido.van.rossum
python-checkins at python.org
Sat May 12 01:36:50 CEST 2007
Author: guido.van.rossum
Date: Sat May 12 01:36:47 2007
New Revision: 55278
Modified:
peps/trunk/pep-3119.txt
Log:
Streamlined the containers section.
Modified: peps/trunk/pep-3119.txt
==============================================================================
--- peps/trunk/pep-3119.txt (original)
+++ peps/trunk/pep-3119.txt Sat May 12 01:36:47 2007
@@ -7,7 +7,7 @@
Type: Standards Track
Content-Type: text/x-rst
Created: 18-Apr-2007
-Post-History: 26-Apr-2007
+Post-History: 26-Apr-2007, 11-May-2007
Abstract
@@ -30,6 +30,9 @@
Functions (GFs), but about clarifying philosophical issues like "what
makes a set", "what makes a mapping" and "what makes a sequence".
+There's also a companion PEP 3141, which defines ABCs for numeric
+types.
+
Acknowledgements
----------------
@@ -329,7 +332,7 @@
C() # works
-**Notes:** The ``@abstractmethod`` decorator should only be used
+**Note:** The ``@abstractmethod`` decorator should only be used
inside a class body, and only for classes whose metaclass is (derived
from) ``ABCMeta``. Dynamically adding abstract methods to a class, or
attempting to modify the abstraction status of a method or class once
@@ -337,6 +340,11 @@
affects subclasses derived using regular inheritance; "virtual
subclasses" registered with the ``register()`` method are not affected.
+It has been suggested that we should also provide a way to define
+abstract data attributes. As it is easy to add these in a later
+stage, and as the use case is considerably less common (apart from
+pure documentation), we punt on this for now.
+
**Implementation:** The ``@abstractmethod`` decorator sets the
function attribute ``__isabstractmethod__`` to the value ``True``.
The ``ABCMeta.__new__`` method computes the type attribute
@@ -358,19 +366,14 @@
useful as an end-point for a super-call in framework using a
cooperative multiple-inheritance [7]_, [8]_.
-**Open issues:** Should we also provide a standard way to declare
-abstract data attributes? If so, how should these be spelled?
-Perhaps place ``@abstractattribute`` decorators on properties? Or use
-an ``@attributes(name1, name2, ...)`` class decorator? Strawman:
-let's back off on this; it's easy enough to add this later.
-
ABCs for Containers and Iterators
---------------------------------
The ``collections`` module will define ABCs necessary and sufficient
to work with sets, mappings, sequences, and some helper types such as
-iterators and dictionary views.
+iterators and dictionary views. All ABCs have the above-mentioned
+``ABCMeta`` as their metaclass.
The ABCs provide implementations of their abstract methods that are
technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
@@ -381,7 +384,8 @@
Some ABCs also provide concrete (i.e. non-abstract) methods; for
example, the ``Iterator`` class has an ``__iter__`` method returning
itself, fulfilling an important invariant of iterators (which in
-Python 2 has to be implemented anew by each iterator class).
+Python 2 has to be implemented anew by each iterator class). These
+ABCs can be considered "mix-in" classes.
No ABCs defined in the PEP override ``__init__``, ``__new__``,
``__str__`` or ``__repr__``. Defining a standard constructor
@@ -390,27 +394,19 @@
representation for a collection is similarly left up to individual
implementations.
-
-Ordering ABCs
-'''''''''''''
-
-These ABCs are closer to ``object`` in the ABC hierarchy.
-
-``PartiallyOrdered``
- This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``,
- ``>``. (Note that ``==`` and ``!=`` are defined by ``object``.)
- Classes deriving from this ABC should implement a partial order
- as defined in mathematics. [9]_
-
-``TotallyOrdered``
- This ABC derives from ``PartiallyOrdered``. It adds no new
- operations but implies a promise of stronger invariants.
- Classes deriving from this ABC should implement a total order
- as defined in mathematics. [10]_
-
-**Open issues:** Where should these live? The ``collections`` module
-doesn't seem right, but making them built-ins seems a slippery slope
-too.
+**Note:** There are no ABCs for ordering operations (``__lt__``,
+``__le__``, ``__ge__``, ``__gt__``). Defining these in a base class
+(abstract or not) runs into problems with the accepted type for the
+second operand. For example, if class ``Ordering`` defined
+``__lt__``, one would assume that for any ``Ordering`` instances ``x``
+and ``y``, ``x < y`` would be defined (even if it just defines a
+partial ordering). But this cannot be the case: If both ``list`` and
+``str`` derived from ``Ordering``, this would imply that ``[1, 2] <
+(1, 2)`` should be defined (and presumably return False), while in
+fact (in Python 3000!) such "mixed-mode comparisons" operations are
+explicitly forbidden and raise ``TypeError``. See PEP 3100 and [14]_
+for more information. (This is a special case of a more general issue
+with operations that take another argument of the same type:
One Trick Ponies
@@ -418,23 +414,22 @@
These abstract classes represent single methods like ``__iter__`` or
``__len__``.
-
+
``Hashable``
The base class for classes defining ``__hash__``. The
- ``__hash__`` method should return an ``Integer`` (see "Numbers"
- below). The abstract ``__hash__`` method always returns 0, which
- is a valid (albeit inefficient) implementation. **Invariant:** If
- classes ``C1`` and ``C2`` both derive from ``Hashable``, the
- condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all
- instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
- IOW, two objects shouldn't compare equal but have different hash
- values.
+ ``__hash__`` method should return an integer. The abstract
+ ``__hash__`` method always returns 0, which is a valid (albeit
+ inefficient) implementation. **Invariant:** If classes ``C1`` and
+ ``C2`` both derive from ``Hashable``, the condition ``o1 == o2``
+ must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of
+ ``C1`` and all instances ``o2`` of ``C2``. IOW, two objects
+ should never compare equal but have different hash values.
Another constraint is that hashable objects, once created, should
never change their value (as compared by ``==``) or their hash
value. If a class cannot guarantee this, it should not derive
from ``Hashable``; if it cannot guarantee this for certain
- instances only, ``__hash__`` for those instances should raise a
+ instances, ``__hash__`` for those instances should raise a
``TypeError`` exception.
**Note:** being an instance of this class does not imply that an
@@ -453,7 +448,11 @@
The base class for classes defining ``__next__``. This derives
from ``Iterable``. The abstract ``__next__`` method raises
``StopIteration``. The concrete ``__iter__`` method returns
- ``self``.
+ ``self``. Note the distinction between ``Iterable`` and
+ ``Iterator``: an ``Iterable`` can be iterated over, i.e. supports
+ the ``__iter__`` methods; an ``Iterator`` is what the built-in
+ function ``iter()`` returns, i.e. supports the ``__next__``
+ method.
``Sized``
The base class for classes defining ``__len__``. The ``__len__``
@@ -471,57 +470,54 @@
``Iterable``, then ``(x in o for x in o)`` should be a generator
yielding only True values for any instance ``o`` of ``C``.
- **Open issues:** strictly speaking, there are three variants of
- this method's semantics. The first one is for sets and mappings,
- which is fast: O(1) or O(log N). The second one is for membership
- checking on sequences, which is slow: O(N). The third one is for
- subsequence checking on (character or byte) strings, which is also
- slow: O(N). Would it make sense to distinguish these? The
- signature of the third variant is different, since it takes a
- sequence (typically of the same type as the method's target)
- intead of an element. For now, I'm using the same type for all
- three. This means that is is possible for ``x in o`` to be True
- even though ``x`` is never yielded by ``iter(o)``. A suggested
- name for the third form is ``Searchable`` (though people have
- objected against this name on the grounds that it has the wrong
- association).
+**Open issues:** Conceivably, instead of using the ABCMeta metaclass,
+these classes could override ``__instancecheck__`` and
+``__subclasscheck__`` to check for the presence of the applicable
+special method; for example::
+
+ class Sized(metaclass=ABCMeta):
+ @abstractmethod
+ def __hash__(self):
+ return 0
+ @classmethod
+ def __instancecheck__(cls, x):
+ return hasattr(x, "__len__")
+ @classmethod
+ def __subclasscheck__(cls, C):
+ return hasattr(C, "__bases__") and hasattr(C, "__len__")
+
+This has the advantage of not requiring explicit registration.
+However, the semantics hard to get exactly right given the confusing
+semantics of instance attributes vs. class attributes, and that a
+class is an instance of its metaclass; the check for ``__bases__`` is
+only an approximation of the desired semantics. **Strawman:** Let's
+do it, but let's arrange it in such a way that the registration API
+also works.
Sets
''''
-These abstract classes represent various stages of "set-ness". The
+These abstract classes represent read-only sets and mutable sets. The
most fundamental set operation is the membership test, written as ``x
-in s`` and implemented by ``s.__contains__(x)``. This is already
-taken care of by the `Container`` class defined above. Therefore, we
-define a set as a sized, iterable container for which certain
+in s`` and implemented by ``s.__contains__(x)``. This operation is
+already defined by the `Container`` class defined above. Therefore,
+we define a set as a sized, iterable container for which certain
invariants from mathematical set theory hold.
The built-in type ``set`` derives from ``MutableSet``. The built-in
-type ``frozenset`` derives from ``HashableSet``.
-
-You might wonder why we require a set to be sized -- surely certain
-infinite sets can be represented just fine in Python. For example,
-the set of even integers could be defined like this::
-
- class EvenIntegers(Container):
- def __contains__(self, x):
- return x % 2 == 0
-
-However, such sets have rather limited practical value, and deciding
-whether one such set is a subset of another would be difficult in
-general without using a symbolic algebra package. So I consider this
-out of the scope of a pragmatic proposal like this.
+type ``frozenset`` derives from ``Set`` and ``Hashable``.
``Set``
- This is a sized, iterable, partially ordered container, i.e. a
- subclass of ``Sized``, ``Iterable``, ``Container`` and
- ``PartiallyOrdered``. Not every subset of those three classes is
- a set though! Sets have the additional invariant that each
- element occurs only once (as can be determined by iteration), and
- in addition sets define concrete operators that implement the
- inequality operations as subclass/superclass tests. In general,
- the invariants for finite sets in mathematics hold. [11]_
+
+ This is a sized, iterable container, i.e., a subclass of
+ ``Sized``, ``Iterable`` and ``Container``. Not every subclass of
+ those three classes is a set though! Sets have the additional
+ invariant that each element occurs only once (as can be determined
+ by iteration), and in addition sets define concrete operators that
+ implement the inequality operations as subclass/superclass tests.
+ In general, the invariants for finite sets in mathematics
+ hold. [11]_
Sets with different implementations can be compared safely,
(usually) efficiently and correctly using the mathematical
@@ -539,57 +535,33 @@
can be compared to those for equality (and then they always
compare unequal).
+ This class also defines concrete operators to compute union,
+ intersection, symmetric and asymmetric difference, respectively
+ ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``. These
+ operators should return instances of ``Set``. The default
+ implementations call the overridable class method
+ ``_from_iterable()`` with an iterable argument. This factory
+ method's default implementation returns a ``frozenset`` instance;
+ it may be overridden to return another appropriate ``Set``
+ subclass.
+
+ Finally, this class defines a concrete method ``_hash`` which
+ computes the hash value from the elements. Hashable subclasses of
+ ``Set`` can implement ``__hash__`` by calling ``_hash`` or they
+ can reimplement the same algorithm more efficiently; but the
+ algorithm implemented should be the same. Currently the algorithm
+ is fully specified only by the source code [15]_.
+
**Note:** the ``issubset`` and ``issuperset`` methods found on the
set type in Python 2 are not supported, as these are mostly just
aliases for ``__le__`` and ``__ge__``.
- **Open issues:** should we define comparison of instances of
- different concrete set types this way?
-
-``ComposableSet``
- This is a subclass of ``Set`` that defines abstract operators to
- compute union, intersection, symmetric and asymmetric difference,
- respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
- These operators should return instances of ``ComposableSet``. The
- abstract implementations return no meaningful values but raise
- ``NotImplementedError``; this is because any generic
- implementation would have to create new instances but the ABCs
- don't (and shouldn't, IMO) provide an API for creating new
- instances. The implementations of these operators should ensure
- that the results match the mathematical definition of set
- composition. [11]_
-
- **Open issues:** Should ``__or__`` and friends be abstract or
- concrete methods? Making them abstract means that every
- ComposableSet implementation must reimplement all of them. But
- making them concrete begs the question of the actual return type:
- since the ABC doesn't (and IMO shouldn't) define the constructor
- signature for subclasses, the concrete implementations in the ABC
- don't have an API to construct a new instance given an iterable.
- Perhaps the right choice is to have a static concrete factory
- function ``fromiterable`` which takes an iterable and returns
- a ``ComposableSet`` instance. Subclasses can override this and
- benefit from the default implementations of ``__or__`` etc.; or
- they can override ``__or__`` if they want to.
-
-``HashableSet``
- This is a subclass of both ``ComposableSet`` and ``Hashable``. It
- implements a concrete ``__hash__`` method that subclasses should
- not override; or if they do, the subclass should compute the same
- hash value. This is so that sets with different implementations
- still hash to the same value, so they can be used interchangeably
- as dictionary keys. (A similar constraint exists on the hash
- values for different types of numbers and strings.)
-
- **Open issues:** Spell out the hash algorithm. Should there be
- another ABC that derives from Set and Hashable, but not from
- Composable?
-
``MutableSet``
- This is a subclass of ``ComposableSet`` implementing additional
- operations to add and remove elements. The supported methods have
- the semantics known from the ``set`` type in Python 2 (except
- for ``discard``, which is modeled after Java):
+
+ This is a subclass of ``Set`` implementing additional operations
+ to add and remove elements. The supported methods have the
+ semantics known from the ``set`` type in Python 2 (except for
+ ``discard``, which is modeled after Java):
``.add(x)``
Abstract method returning a ``bool`` that adds the element
@@ -635,21 +607,18 @@
Mappings
''''''''
-These abstract classes represent various stages of mapping-ness. The
-``Mapping`` class represents the most common read-only mapping API.
-However, code *accepting* a mapping is encouraged to check for the
-``BasicMapping`` ABC when iteration is not used. This allows for
-certain "black-box" implementations that can look up values by key but
-don't provide a convenient iteration API. A hypothetical example
-would be an interface to a hierarchical filesystem, where keys are
-pathnames relative to some root directory. Iterating over all
-pathnames would presumably take forever, as would counting the number
-of valid pathnames.
+These abstract classes represent read-only mappings and mutable
+mappings. The ``Mapping`` class represents the most common read-only
+mapping API.
The built-in type ``dict`` derives from ``MutableMapping``.
-``BasicMapping``
- A subclass of ``Container`` defining the following methods:
+``Mapping``
+
+ A subclass of ``Container``, ``Iterable`` and ``Sized``. The keys
+ of a mapping naturally form a set. The (key, value) pairs (which
+ must be tuples) are also referred to as items. The items also
+ form a set. Methods:
``.__getitem__(key)``
Abstract method that returns the value corresponding to
@@ -664,25 +633,13 @@
Concrete method returning ``True`` if ``self[key]`` does not
raise ``KeyError``, and ``False`` if it does.
-``Mapping``
- A subclass of ``BasicMapping``, ``Iterable`` and ``Sized``. The
- keys of a mapping naturally form a set. The (key, value) pairs
- are also referred to as items. The items also form a set.
- Methods:
-
``.__len__()``
- Abstract method returning the length of the key set.
+ Abstract method returning the number of distinct keys (i.e.,
+ the length of the key set).
``.__iter__()``
Abstract method returning each key in the key set exactly once.
- ``.__eq__(obj)``
- Concrete method for comparing mappings. Two mappings, even
- with different implementations, can be compared for equality,
- and are considered equal if and only if their item sets are
- equal. **Open issues:** should we define comparison of
- instances of different concrete mapping types this way?
-
``.keys()``
Concrete method returning the key set as a ``Set``. The
default concrete implementation returns a "view" on the key
@@ -712,11 +669,6 @@
i.e. iterating over the items, keys and values should return
results in the same order.
-``HashableMapping``
- A subclass of ``Mapping`` and ``Hashable``. The values should be
- instances of ``Hashable``. The concrete ``__hash__`` method
- should be equal to ``hash(m.items())``.
-
``MutableMapping``
A subclass of ``Mapping`` that also implements some standard
mutating methods. Abstract methods include ``__setitem__``,
@@ -728,13 +680,15 @@
Sequences
'''''''''
-These abstract classes represent various stages of sequence-ness.
+These abstract classes represent read-only sequences and mutable
+sequences.
The built-in ``list`` and ``bytes`` types derive from
``MutableSequence``. The built-in ``tuple`` and ``str`` types derive
-from ``HashableSequence``.
+from ``Sequence`` and ``Hashable``.
``Sequence``
+
A subclass of ``Iterable``, ``Sized``, ``Container``. It
defines a new abstract method ``__getitem__`` that has a somewhat
complicated signature: when called with an integer, it returns an
@@ -747,15 +701,11 @@
**Open issues:** Other candidate methods, which can all have
default concrete implementations that only depend on ``__len__``
- and ``__getitem__`` with an integer argument: __reversed__, index,
- count, __add__, __mul__, __eq__, __lt__, __le__.
-
-``HashableSequence``
- A subclass of ``Sequence`` and ``Hashable``. The concrete
- ``__hash__`` method should implements the hashing algorithms used
- by tuples in Python 2.
+ and ``__getitem__`` with an integer argument: ``__reversed__``,
+ ``index``, ``count``, ``__add__``, ``__mul__``.
``MutableSequence``
+
A subclass of ``Sequence`` adding some standard mutating methods.
Abstract mutating methods: ``__setitem__`` (for integer indices as
well as slices), ``__delitem__`` (ditto), ``insert``, ``append``,
@@ -765,24 +715,14 @@
``sort()`` -- that is only required to exist on genuine ``list``
instances.
-**Open issues:** If all the elements of a sequence are totally
-ordered, the sequence itself can be totally ordered with respect to
-other sequences containing corresponding items of the same type.
-Should we reflect this by making ``Sequence`` derive from
-``TotallyOrdered``? Or ``Partiallyordered``? Also, we could easily
-define comparison of sequences of different types, so that e.g.
-``(1, 2, 3) == [1, 2, 3]`` and ``(1, 2) < [1, 2, 3]``. Should we?
-(It might imply ``["a", "b"] == "ab"`` and ``[1, 2] == b"\1\2"``.)
-
Strings
-------
-Python 3000 has two built-in string types: byte strings (``bytes``),
-deriving from ``MutableSequence``, and (Unicode) character strings
-(``str``), deriving from ``HashableSequence``. They also derive from
-``TotallyOrdered``. If we were to introduce ``Searchable``, they
-would also derive from that.
+Python 3000 will likely have at least two built-in string types: byte
+strings (``bytes``), deriving from ``MutableSequence``, and (Unicode)
+character strings (``str``), deriving from ``Sequence`` and
+``Hashable``.
**Open issues:** define the base interfaces for these so alternative
implementations and subclasses know what they are in for. This may be
@@ -790,29 +730,6 @@
``bytes`` type).
-Numbers
--------
-
-ABCs for numerical types are defined in PEP 3141.
-
-
-Guidelines for Writing ABCs
----------------------------
-
-Some suggestions for writing ABCs:
-
-* Use the ``@abstractmethod`` decorator.
-
-* Define abstract methods that could be useful as an end point when
- called via a super chain.
-
-* Define concrete methods that are very simple permutations of
- abstract methods (e.g. ``Mapping.get``).
-
-* Keep abstract classes small, one per use case instead of one per
- concept.
-
-
ABCs vs. Alternatives
=====================
@@ -940,6 +857,12 @@
.. [13] ABCMeta sample implementation
(http://svn.python.org/view/sandbox/trunk/abc/xyz.py)
+.. [14] python-dev email ("Comparing heterogeneous types")
+ http://mail.python.org/pipermail/python-dev/2004-June/045111.html
+
+.. [15] Function ``frozenset_hash()`` in Object/setobject.c
+ (http://svn.python.org/view/python/trunk/Objects/setobject.c)
+
Copyright
=========
More information about the Python-checkins
mailing list