[Python-ideas] Deprecating the old-style sequence protocol
Andrew Barnert
abarnert at yahoo.com
Sat Dec 26 22:07:28 EST 2015
This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said:
> FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol.
The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below.
Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).
I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See
http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)
Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.
But obviously, deprecating the old-style sequence protocol would make the problems go away.
---
Here's the argument against doing so:
On 2013-09-22 23:46:37 GMT, Steven D'Aprano wrote:
> On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote:
>> On 9/22/2013 10:22 AM, Nick Coghlan wrote:
>>>
>>> The __getitem__ fallback is a backwards
>>> compatibility hack, not part of the formal definition of an iterable.
>>>
>> When I suggested that, by suggesting that the fallback *perhaps* could
>> be called 'semi-deprecated, but kept for back compatibility' in the
>> glossary entry, Raymond screamed at me and accused me of trying to
>> change the language. He considers it an intended language feature that
>> one can write a sequence class and not bother with __iter__. I guess we
>> do not all agree ;-).
>>
> Raymond did not "scream", he wrote *one* word in uppercase for emphasis.
> I quote:
>
>> It is NOT deprecated. People use and rely on this behavior. It is
>> a guaranteed behavior. Please don't use the glossary as a place to
>> introduce changes to the language.
>
> I agree, and I disagree with Nick's characterization of the sequence
> protocol as a "backwards-compatibility hack". It is an elegant protocol
> for implementing iteration of sequences, an old and venerable one that
> predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it
> raises StopIteration. If it were considered *merely* for backward
> compatibility with Python 1.5 code, there was plenty of opportunity to
> drop it when Python 3 came out.
>
> The sequence protocol allows one to write a lazily generated,
> potentially infinite sequence that still allows random access to items.
> Here's a toy example:
>
> py> class Squares:
> ... def __getitem__(self, index):
> ... return index**2
> ...
> py> for sq in Squares():
> ... if sq > 9: break
> ... print(sq)
> 0
> 1
> 4
> 9
>
> Because it's infinite, there's no value that __len__ can return, and no
> need for a __len__. Because it supports random access to items, writing
> this as an iterator with __next__ is inappropriate. Writing *both* is
> unnecessary, and complicates the class for no benefit. As written,
> Squares is naturally thread-safe -- two threads can iterate over the
> same Squares object without interfering.
Also, elsewhere in the thread, someone else pointed out another example (which I'm rewriting to make it fit better with Steven's):
class TenSquares:
def __len__(self):
return 10
def __getitem__(self, index):
if 0 <= index < 10: return index**2
raise IndexError
You can iterate this, convert it to a `list`, call `reversed` on it, etc., all in only 6 lines of code.
---
Guido's response was:
> Hm. The example given there is a toy though. Something with a __getitem__
> that maps its argument to its square might as well be a mapping. I really
> think it's time to slowly let go of this (no need to rush into removing
> support, but we could still frown upon its use).
And it's worth noting that making these examples work without the old-style sequence protocol isn't exactly hard: add a 1-line `__iter__` method, or a 1-line replacement for the old-style `iter`, or, for the second example, just inherit the `Sequence` ABC.
Also, the thread-safety issue seems bogus. Any reasonable collection is thread-safe as an iterable.
Presumably the counter-argument is that, as trivial as those changes are, they're still not nearly as trivial as the original code, and in a quick&dirty script or interactive session, it may be more than you want to do (especially since it involves importing a module you didn't otherwise need). But I'll leave it to the people who are strongly against the deprecation to explain it, rather than putting words in their mouths.
---
Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:
> ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).
> ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).
> If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().
More information about the Python-ideas
mailing list