Deprecating the old-style sequence protocol

This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said:
FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol.
The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below.
Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)
Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.
But obviously, deprecating the old-style sequence protocol would make the problems go away.
---
Here's the argument against doing so:
On 2013-09-22 23:46:37 GMT, Steven D'Aprano wrote:
On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote:
On 9/22/2013 10:22 AM, Nick Coghlan wrote:
The __getitem__ fallback is a backwards compatibility hack, not part of the formal definition of an iterable.
When I suggested that, by suggesting that the fallback *perhaps* could be called 'semi-deprecated, but kept for back compatibility' in the glossary entry, Raymond screamed at me and accused me of trying to change the language. He considers it an intended language feature that one can write a sequence class and not bother with __iter__. I guess we do not all agree ;-).
Raymond did not "scream", he wrote *one* word in uppercase for emphasis. I quote:
It is NOT deprecated. People use and rely on this behavior. It is a guaranteed behavior. Please don't use the glossary as a place to introduce changes to the language.
I agree, and I disagree with Nick's characterization of the sequence protocol as a "backwards-compatibility hack". It is an elegant protocol
for implementing iteration of sequences, an old and venerable one that predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it raises StopIteration. If it were considered *merely* for backward compatibility with Python 1.5 code, there was plenty of opportunity to drop it when Python 3 came out.
The sequence protocol allows one to write a lazily generated, potentially infinite sequence that still allows random access to items. Here's a toy example:
py> class Squares: ... def __getitem__(self, index): ... return index**2 ... py> for sq in Squares(): ... if sq > 9: break ... print(sq) 0 1 4 9
Because it's infinite, there's no value that __len__ can return, and no need for a __len__. Because it supports random access to items, writing this as an iterator with __next__ is inappropriate. Writing *both* is unnecessary, and complicates the class for no benefit. As written, Squares is naturally thread-safe -- two threads can iterate over the same Squares object without interfering.
Also, elsewhere in the thread, someone else pointed out another example (which I'm rewriting to make it fit better with Steven's):
class TenSquares: def __len__(self): return 10 def __getitem__(self, index): if 0 <= index < 10: return index**2 raise IndexError
You can iterate this, convert it to a `list`, call `reversed` on it, etc., all in only 6 lines of code.
---
Guido's response was:
Hm. The example given there is a toy though. Something with a __getitem__ that maps its argument to its square might as well be a mapping. I really think it's time to slowly let go of this (no need to rush into removing support, but we could still frown upon its use).
And it's worth noting that making these examples work without the old-style sequence protocol isn't exactly hard: add a 1-line `__iter__` method, or a 1-line replacement for the old-style `iter`, or, for the second example, just inherit the `Sequence` ABC.
Also, the thread-safety issue seems bogus. Any reasonable collection is thread-safe as an iterable.
Presumably the counter-argument is that, as trivial as those changes are, they're still not nearly as trivial as the original code, and in a quick&dirty script or interactive session, it may be more than you want to do (especially since it involves importing a module you didn't otherwise need). But I'll leave it to the people who are strongly against the deprecation to explain it, rather than putting words in their mouths.
---
Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:
... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).
... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).
If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().

On 27 December 2015 at 13:07, Andrew Barnert via Python-ideas python-ideas@python.org wrote:
Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)
Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.
But obviously, deprecating the old-style sequence protocol would make the problems go away.
[snip]
Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:
... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).
... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).
If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().
There's an additional option we can consider, which is to move the backwards compatibility fallback to type creation time, rather than method lookup time. The two rules would be:
* if a type defines __getitem__ without also defining __iter__, add a default __iter__ implementation that assumes the type is a sequence * if a type defines __getitem__ and __len__ without also defining __reversed__, add a default __reversed__ implementation that assumes the type is a sequence
(At the C level, even sequences need to use the mapping slots to support extended slicing, so we can't make the distinction based on which C level slots are defined)
As with using "__hash__ = None" to block the default inheritance of object.__hash__, setting "__iter__ = None" or "__reversed__ = None" in a class definition would block the addition of the implied methods.
However, while I think those changes would clean up some quirky edge cases without causing any harm, even doing all of that still wouldn't get us to the point of having a truly *structural* definition of the difference between a Mapping and a Sequence. For example, OrderedDict defines all of __len__, __getitem__, __iter__ and __reversed__ *without* being a sequence in the "items are looked up by their position in the sequence" sense.
These days, without considering the presence or absence of any non-dunder methods, the core distinction between sequences, multi-dimensional arrays and arbitrary mappings really lies in the type signature of the key parameter to__getitem__ et al (assuming a suitably defined Index type hint):
MappingKey = Any DictKey = collections.abc.Hashable SequenceKey = Union[Index, slice] ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]]
Regards, Nick.
[1] https://github.com/ambv/typehinting/issues/171

On 27.12.15 08:22, Nick Coghlan wrote:
These days, without considering the presence or absence of any non-dunder methods, the core distinction between sequences, multi-dimensional arrays and arbitrary mappings really lies in the type signature of the key parameter to__getitem__ et al (assuming a suitably defined Index type hint):
MappingKey = Any DictKey = collections.abc.Hashable SequenceKey = Union[Index, slice] ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]]
ArrayKey also includes Ellipsis.

On 27 December 2015 at 17:30, Serhiy Storchaka storchaka@gmail.com wrote:
On 27.12.15 08:22, Nick Coghlan wrote:
These days, without considering the presence or absence of any non-dunder methods, the core distinction between sequences, multi-dimensional arrays and arbitrary mappings really lies in the type signature of the key parameter to__getitem__ et al (assuming a suitably defined Index type hint):
MappingKey = Any DictKey = collections.abc.Hashable SequenceKey = Union[Index, slice] ArrayKey = Union[SequenceKey, Tuple["ArrayKey", ...]]
ArrayKey also includes Ellipsis.
You're right, I was mistakenly thinking that memoryview implemented tuple indexing without ellipsis support, but it actually doesn't implement multi-dimensional indexing at all - once you cast to a multi-dimensional shape, most forms of subscript lookup are no longer permitted at all by the current implementation. So a more accurate array key description would look like:
ArrayKey = Union[SequenceKey, type(Ellipsis), Tuple["ArrayKey", ...]]
(I spelled out Ellipsis to minimise confusion with the tuple-as-frozen-list typing notation)
Cheers, Nick.

On Dec 26, 2015, at 22:22, Nick Coghlan ncoghlan@gmail.com wrote:
On 27 December 2015 at 13:07, Andrew Barnert via Python-ideas python-ideas@python.org wrote:
Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
...
There's an additional option we can consider, which is to move the backwards compatibility fallback to type creation time, rather than method lookup time.
Sure, that's possible, but why? It doesn't make it any easier to add the rule "__iter__ is None blocks fallback". It doesn't make it easier to eventually remove the old-style protocol if we decide to deprecate it (if anything, it seems to make it harder, by adding another observable difference). It might make it easier to write a perfect Iterable ABC, but making a pure-Python stdlib function simpler at the cost of major churn in the C implementation of multiple builtins and C API functions (and similar for other implementations) doesn't seem like a good tradeoff.
Unless it would be a lot simpler than I think? (I confess I haven't looked too much into what type() does under the covers, so maybe I'm overestimating the risk of changing it.)
However, while I think those changes would clean up some quirky edge cases without causing any harm, even doing all of that still wouldn't get us to the point of having a truly *structural* definition of the difference between a Mapping and a Sequence.
Agreed--but that wasn't the goal here. The existing nominal distinction between the two types, with all the most useful structurally-detectable features carved out separately, is a great design; the only problem is the quirky edge cases that erode the design and the workarounds needed to hold up the design; getting rid of those is the goal.
Sure, being able to structurally distinguish Mapping and Sequence would probably make that goal simpler, but it's neither necessary nor sufficient, and is probably impossible.
For example, OrderedDict defines all of __len__, __getitem__, __iter__ and __reversed__ *without* being a sequence in the "items are looked up by their position in the sequence" sense.
Sure, but that just means Sequence implies Reversible (and presumably is a subtype of Reversible) rather than the other way around. There's still a clear hierarchy there, despite it not being structurally detectable.
These days, without considering the presence or absence of any non-dunder methods, the core distinction between sequences, multi-dimensional arrays and arbitrary mappings really lies in the type signature of the key parameter to__getitem__ et al (assuming a suitably defined Index type hint):
Even that doesn't work. For example, most of the SortedDict types out there accept slices of keys, and yet a SkipListSortedDict[int, str] is clearly still not a sequence despite the fact that its __getitem__ takes Union[int, Slice[int]] just like a list[str] does. Unless the type system can actually represent "contiguous ints from 0" as a type, it can't make the distinction structurally. But, again, that's not a problem. I don't know of any serious language that solves the problem you're after (except maybe JS, Tcl, and others that just treat all sequences as mappings and have a clumsy API that everyone gets wrong half the time). The existing Python design, cleaned up a bit, would already be better than most languages, and good enough for me.

On 27.12.2015 04:07, Andrew Barnert via Python-ideas wrote:
This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said:
FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol.
The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below.
Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
I'm not sure I follow. The main purpose of ABCs was to be able to explicitly define a type as complying to the sequence, mapping, etc. protocols by registering the class with the appropriate ABCs.
https://www.python.org/dev/peps/pep-3119/
The "sequence protocol" is defined by the Sequence ABC, so by running an isinstance(obj, collections.abc.Sequence) check you can verify the protocol compliance.
Now, most of your email talks about iteration, so perhaps you're referring to a different protocol, that of iterating over arbitrary objects which implement .__getitem__(), but don't implement .__iter__() or .__len__().
However, the support for the iteration protocol is part of the Sequence ABC, so there's no way to separate the two. A Sequence must implement .__len__() as well as .__getitem__() and thus can always implement .__reversed__() and .__iter__().
An object which implements .__getitem__() without .__len__() is not a Python sequence (*).
Overall, the discussion feels somewhat arbitrary to me and is perhaps caused more by a misinterpretation or vague documentation which would need to be clarified, than by an actually missing feature in Python, paired with an important existing practical need :-)
Putting all this together, I believe you're talking about the iter() support for non-sequence, indexable objects. We don't have an ABC for this:
https://docs.python.org/3.5/library/collections.abc.html#collections-abstrac...
and can thus not check for it.
(*) The CPython interpreter actually has a different view on this. It only checks for a .__getitem__() method, not a .__len__() method, in PySequence_Check(). The length information is only queried where necessary and a missing implementation then results in an exception.

I think there's a lot of interesting stuff in this thread. Personally I don't think we should strive to distinguish between mappings and sequences structurally. We should instead continue to encourage inheriting from (or registering with) the corresponding ABCs. The goal is to ensure that there's one best-practice way to distinguish mappings from sequences, and it's by using isinstance(x, Sequence) or isinstance(x, Mapping).
If we want some way to turn something that just defines __getitem__ and __len__ into a proper sequence, it should just be made to inherit from Sequence, which supplies the default __iter__ and __reversed__. (Registration is *not* good enough here.) If we really want a way to turn something that just supports __getitem__ into an Iterable maybe we can provide an additional ABC for that purpose; let's call it a HalfSequence until we've come up with a better name. (We can't use Iterable for this because Iterable should not reference __getitem__.)
I also think it's fine to introduce Reversible as another ABC and carefully fit it into the existing hierarchy. It should be a one-trick pony and be another base class for Sequence; it should not have a default implementation. (But this has been beaten to death in other threads -- it's time to just file an issue with a patch.)

On 28 December 2015 at 03:04, Guido van Rossum guido@python.org wrote:
If we really want a way to turn something that just supports __getitem__ into an Iterable maybe we can provide an additional ABC for that purpose; let's call it a HalfSequence until we've come up with a better name. (We can't use Iterable for this because Iterable should not reference __getitem__.)
Perhaps collections.abc.Indexable would work? Invariant:
for idx, val in enumerate(container): assert container[idx] is val
That is, while enumerate() accepts any iterable, Indexable containers have the additional property that the contained values can be looked up by their enumeration index. Mappings (even ordered ones) don't qualify, since they offer a key:value lookup, but enumerating them produces an index:key relationship.
Regards, Nick.

On 29.12.2015 03:59, Nick Coghlan wrote:
On 28 December 2015 at 03:04, Guido van Rossum guido@python.org wrote:
[ABCs are one honking great idea -- let's do more of those!]
[collections.abc.Indexable would be a good one.]
Maybe, I still cannot wrap my mind enough around the types-everywhere-in-python-please world.
But, what's so wrong about checking for __getitem__ or __len__ if necessary?
Best, Sven

On Dec 30, 2015, at 10:09, Sven R. Kunze srkunze@mail.de wrote:
On 29.12.2015 03:59, Nick Coghlan wrote:
On 28 December 2015 at 03:04, Guido van Rossum guido@python.org wrote: [ABCs are one honking great idea -- let's do more of those!]
[collections.abc.Indexable would be a good one.]
Maybe, I still cannot wrap my mind enough around the types-everywhere-in-python-please world.
But, what's so wrong about checking for __getitem__ or __len__ if necessary?
Well, for one thing, that will pick up mappings, generic types, and various other things that aren't indexable but use __getitem__ for other purposes.
It's the same problem as this thread in reverse: checking for __iter__ gives you false negatives because of the old-style sequence protocol; checking for __getitem__ gives you false positives because of the mapping protocol. But false positives are generally worse. Normally, you'd just EAFP it and write seq[idx] and deal with any exception; if you have to LBYL for some reason, a test that incorrectly passes many common values is not very helpful.
Of course you could try a more stringent test--check for __getitem__ but not keys and not __extra__ and so on--but then you have to do that test everywhere; better to centralize it in one place.
Or, even better, to just accept that some things are not feasible for structural tests and just test for types that explicitly declare themselves Indexable (by inheritance or registration). That way, you may get false negatives, but not on common types, and it only takes one line of code to register that third-party class with Sequence if you need to--and no false positives.
Also, of course, ABCs are often useful as mixins. The fact that I can write a fully-fledged sequence with all the bells and whistles in 10 lines of code by inheriting from Sequence is pretty nice. Getting things like __iter__ for free by inheriting from Indexable (especially if the old-style sequence protocol is deprecated) would be similarly nice.
Again, you don't need to test for this all over the place--most of the time, you'll just EAFP. But when you do need to have a test, better to have one that says what it means, and doesn't pass false positives, and can be easily hooked for weird third-party classes, and so on.

On 31 December 2015 at 04:09, Sven R. Kunze srkunze@mail.de wrote:
On 29.12.2015 03:59, Nick Coghlan wrote:
On 28 December 2015 at 03:04, Guido van Rossum guido@python.org wrote:
[ABCs are one honking great idea -- let's do more of those!]
[collections.abc.Indexable would be a good one.]
Maybe, I still cannot wrap my mind enough around the types-everywhere-in-python-please world.
But, what's so wrong about checking for __getitem__ or __len__ if necessary?
Most of the time when I care, it's for early error detection. For normal function calls, your best bet is to just try the operation, and let the interpreter generate the appropriate exception - the traceback will give the appropriate context for the error, so there's little gain in doing your own check.
Things change when you're handing a callable off to be invoked later, whether that's through an object queue, atexit, context manager, thread pool, process pool, or something else. In those cases, a delayed exception will trigger in the invocation context, and so the traceback won't give the reader any information about which part of the code provided the bad arguments.
There are two main remedies for this:
1. Use runtime argument checking at the point the arguments are passed in 2. Use some form of structural type checking that allows code to be analysed for correctness without running it
The abc module provides a framework for the former task - if you know an algorithm needs a sequence (for example), you can write "isinstance(arg, Sequence)" before submitting the operation for execution and raise TypeError if the check fails. Folks passing in the wrong kind of argument then get a nice error message with a traceback at the point where they provided the incorrect data, rather than an obscure traceback that they then have to debug. As Andrew explains in his reply, this can be as simple as checking for a specific attribute, but it also extends to more complex criteria without changing the way you perform the runtime check.
Static analysers like mypy, pytypedecl and pylint provide support for the latter approach, by checking for consistency between the way objects are defined and created and the way they're used. While it's possible for incorrect code to pass static analysis, the vast majority of correct code will pass it (and any which fails would likely be confusing to a human reader as well). Since the analysis is static, runtime dynamism isn't relevant - the analyser can point out both sides of the inconsistency, even if they're encountered at different times or in different threads or processes when executed.
Cheers, Nick.
participants (6)
-
Andrew Barnert
-
Guido van Rossum
-
M.-A. Lemburg
-
Nick Coghlan
-
Serhiy Storchaka
-
Sven R. Kunze