[Python-ideas] Deprecating the old-style sequence protocol

Andrew Barnert abarnert at yahoo.com
Sat Dec 26 22:07:28 EST 2015


This idea seems to come up regularly, so maybe it would be good to actually discuss it out (and, if necessary, explicitly reject it). Most recently, at https://github.com/ambv/typehinting/issues/170, Guido said:

> FWIW, maybe we should try to deprecate supporting iteration using the old-style protocol? It's really a very old backwards compatibility measure (from when iterators were first introduced). Then eventually we could do the same for reversing using the old-style protocol.

The best discussion I found was from a 2013 thread (http://article.gmane.org/gmane.comp.python.ideas/23369/), which I'll quote below.

Anyway, the main argument for eliminating the old-style sequence protocol is that, unlike most other protocols in Python, it can't actually be checked for (without iterating the values). Despite a bunch of explicit workaround code (which registers builtin sequence types with `Iterable`, checks for C-API mappings in `reversed`, etc.), you still get false negatives when type-checking types like Steven's at runtime or type-checking time, and you still get false positives from `iter` and `reversed` themselves (`reversed(MyCustomMapping({1:2, 3:4}))` or `iter(typing.Iterable)` won't give you a `TypeError`, they'll give you a useless iterator--which may throw some other exception later when trying to iterate it, but even that isn't reliable).


I believe we could solve all of these problems by making `iter` and `reversed` raise a `TypeError`, without falling back to the old-style protocol, if the dunder method is `None` (like `hash`), change the ABC and static typer to use the same rules as `iter` and `reversed`, and add `__reversed__ = None` to `collections.abc.Mapping`. (See 
http://bugs.python.org/issue25864 and http://bugs.python.org/issue25958 for details.)

Alternatively, if there were some way for a Python class to declare whether it's trying to be a mapping or a sequence or neither, as C API types do, I suppose that could be a solution. Or maybe the problems don't actually need to be solved.

But obviously, deprecating the old-style sequence protocol would make the problems go away.

---


Here's the argument against doing so:

On 2013-09-22 23:46:37 GMT, Steven D'Aprano wrote:
> On Sun, Sep 22, 2013 at 12:37:52PM -0400, Terry Reedy wrote:
>> On 9/22/2013 10:22 AM, Nick Coghlan wrote:
>>> 
>>> The __getitem__ fallback is a backwards
>>> compatibility hack, not part of the formal definition of an iterable.
>>> 
>> When I suggested that, by suggesting that the fallback *perhaps* could 
>> be called 'semi-deprecated, but kept for back compatibility' in the 
>> glossary entry, Raymond screamed at me and accused me of trying to 
>> change the language. He considers it an intended language feature that 
>> one can write a sequence class and not bother with __iter__. I guess we 
>> do not all agree ;-).
>> 
> Raymond did not "scream", he wrote *one* word in uppercase for emphasis.
> I quote:
> 
>> It is NOT deprecated.   People use and rely on this behavior.  It is 
>> a guaranteed behavior.  Please don't use the glossary as a place to 
>> introduce changes to the language.
> 
> I agree, and I disagree with Nick's characterization of the sequence 
> protocol as a "backwards-compatibility hack". It is an elegant protocol 

> for implementing iteration of sequences, an old and venerable one that 
> predates iterators, and just as much of Python's defined iterable > behaviour as the business with calling next with no argument until it 
> raises StopIteration. If it were considered *merely* for backward 
> compatibility with Python 1.5 code, there was plenty of opportunity to 
> drop it when Python 3 came out.
> 
> The sequence protocol allows one to write a lazily generated, 
> potentially infinite sequence that still allows random access to items. 
> Here's a toy example:
> 
> py> class Squares:
> ...     def __getitem__(self, index):
> ...         return index**2
> ...
> py> for sq in Squares():
> ...     if sq > 9: break
> ...         print(sq)
> 0
> 1
> 4
> 9
> 
> Because it's infinite, there's no value that __len__ can return, and no 
> need for a __len__. Because it supports random access to items, writing 
> this as an iterator with __next__ is inappropriate. Writing *both* is 
> unnecessary, and complicates the class for no benefit. As written, 
> Squares is naturally thread-safe -- two threads can iterate over the 
> same Squares object without interfering.

Also, elsewhere in the thread, someone else pointed out another example (which I'm rewriting to make it fit better with Steven's):

    class TenSquares:
        def __len__(self):
            return 10
        def __getitem__(self, index):
            if 0 <= index < 10: return index**2
            raise IndexError

You can iterate this, convert it to a `list`, call `reversed` on it, etc., all in only 6 lines of code.

---

Guido's response was:

> Hm. The example given there is a toy though. Something with a __getitem__
> that maps its argument to its square might as well be a mapping. I really
> think it's time to slowly let go of this (no need to rush into removing
> support, but we could still frown upon its use).

And it's worth noting that making these examples work without the old-style sequence protocol isn't exactly hard: add a 1-line `__iter__` method, or a 1-line replacement for the old-style `iter`, or, for the second example, just inherit the `Sequence` ABC.

Also, the thread-safety issue seems bogus. Any reasonable collection is thread-safe as an iterable.


Presumably the counter-argument is that, as trivial as those changes are, they're still not nearly as trivial as the original code, and in a quick&dirty script or interactive session, it may be more than you want to do (especially since it involves importing a module you didn't otherwise need). But I'll leave it to the people who are strongly against the deprecation to explain it, rather than putting words in their mouths.

---

Finally, as far as I can tell, the documentation of the old-style sequence protocol is in the library docs for `iter` and `reversed`, and the data model docs for `__reversed__` (but not `__iter__`), which say, respectively:

> ... object must be a collection object which supports the iteration protocol (the __iter__() method), or it must support the sequence protocol (the __getitem__() method with integer arguments starting at 0).

> ... seq must be an object which has a __reversed__() method or supports the sequence protocol (the __len__() method and the __getitem__() method with integer arguments starting at 0).

> If the __reversed__() method is not provided, the reversed() built-in will fall back to using the sequence protocol (__len__() and __getitem__()). Objects that support the sequence protocol should only provide __reversed__() if they can provide an implementation that is more efficient than the one provided by reversed().


More information about the Python-ideas mailing list