[Python-ideas] Deprecating the old-style sequence protocol

Andrew Barnert abarnert at yahoo.com
Sat Dec 26 22:07:28 EST 2015

This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said:

> FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol.

The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below.

Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).

I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See 
http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)

Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.

But obviously, deprecating the old-style sequence protocol would make the problems go away.


Here's the argument against doing so:

On 2013-09-22 23:46:37 GMT, Steven D'Aprano wrote:
> On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote:
>> On 9/22/2013 10:22 AM, Nick Coghlan wrote:
>>> The __getitem__ fallback is a backwards
>>> compatibility hack, not part of the formal definition of an iterable.
>> When I suggested that, by suggesting that the fallback *perhaps* could 
>> be called 'semi-deprecated, but kept for back compatibility' in the 
>> glossary entry, Raymond screamed at me and accused me of trying to 
>> change the language. He considers it an intended language feature that 
>> one can write a sequence class and not bother with __iter__. I guess we 
>> do not all agree ;-).
> Raymond did not "scream", he wrote *one* word in uppercase for emphasis.
> I quote:
>> It is NOT deprecated.   People use and rely on this behavior.  It is 
>> a guaranteed behavior.  Please don't use the glossary as a place to 
>> introduce changes to the language.
> I agree, and I disagree with Nick's characterization of the sequence 
> protocol as a "backwards-compatibility hack". It is an elegant protocol 

> for implementing iteration of sequences, an old and venerable one that 
> predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it 
> raises StopIteration. If it were considered *merely* for backward 
> compatibility with Python 1.5 code, there was plenty of opportunity to 
> drop it when Python 3 came out.
> The sequence protocol allows one to write a lazily generated, 
> potentially infinite sequence that still allows random access to items. 
> Here's a toy example:
> py> class Squares:
> ...     def __getitem__(self, index):
> ...         return index**2
> ...
> py> for sq in Squares():
> ...     if sq > 9: break
> ...         print(sq)
> 0
> 1
> 4
> 9
> Because it's infinite, there's no value that __len__ can return, and no 
> need for a __len__. Because it supports random access to items, writing 
> this as an iterator with __next__ is inappropriate. Writing *both* is 
> unnecessary, and complicates the class for no benefit. As written, 
> Squares is naturally thread-safe -- two threads can iterate over the 
> same Squares object without interfering.

Also, elsewhere in the thread, someone else pointed out another example (which I'm rewriting to make it fit better with Steven's):

    class TenSquares:
        def __len__(self):
            return 10
        def __getitem__(self, index):
            if 0 <= index < 10: return index**2
            raise IndexError

You can iterate this, convert it to a `list`, call `reversed` on it, etc., all in only 6 lines of code.


Guido's response was:

> Hm. The example given there is a toy though. Something with a __getitem__
> that maps its argument to its square might as well be a mapping. I really
> think it's time to slowly let go of this (no need to rush into removing
> support, but we could still frown upon its use).

And it's worth noting that making these examples work without the old-style sequence protocol isn't exactly hard: add a 1-line `__iter__` method, or a 1-line replacement for the old-style `iter`, or, for the second example, just inherit the `Sequence` ABC.

Also, the thread-safety issue seems bogus. Any reasonable collection is thread-safe as an iterable.

Presumably the counter-argument is that, as trivial as those changes are, they're still not nearly as trivial as the original code, and in a quick&dirty script or interactive session, it may be more than you want to do (especially since it involves importing a module you didn't otherwise need). But I'll leave it to the people who are strongly against the deprecation to explain it, rather than putting words in their mouths.


Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:

> ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).

> ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).

> If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().

More information about the Python-ideas mailing list