[Python-ideas] Suggested MapView object (Re: __len__() for map())

Steven D'Aprano steve at pearwood.info
Wed Dec 12 03:12:50 EST 2018


On Wed, Dec 12, 2018 at 12:50:41PM +1300, Greg Ewing wrote:
> Steven D'Aprano wrote:
> >The iterator protocol is that iterators must:
> >
> >- have a __next__ method;
> >- have an __iter__ method which returns self;
> >
> >and the test for an iterator is:
> >
> >    obj is iter(obj)
> 
> By that test, it identifies as a sequence, as does testing it
> for the presence of __len__:

Since existing map objects are iterators, that breaks backwards compatibility. For code 
that does something like this:

    if obj is iter(obj):
        process_iterator()
    else:
        n = len(obj)
        process_sequence()

it will change behaviour, shifting map objects from the iterator branch 
to the sequence branch. That's a definite change in behaviour, which 
alone could change the meaning of the code. E.g. if the two process_* 
functions use different algorithms.

Or it could break the code outright, because your MapView objects can 
raise TypeError when you call len() on them.

I know that any object with a __len__ could in principle raise 
TypeError. But for anything else, we are justified in calling it a bug 
in the __len__ implementation. You're trying to sell it as a feature.



> >>> m is iter(m)
> False
> >>> hasattr(m, '__len__')
> True
> 
> So, code that doesn't know whether it has a sequence or iterator
> and tries to find out, will conclude that it has a sequence.
> Presumably it will then proceed to treat it as a sequence, which
> will work fine.

It will work fine, unless something has called __next__, which will 
cause len() to blow up in their face by raising TypeError.

I call these sorts of designs "landmines". They're absolutely fine, 
right up to the point where you hit the right combination of actions and 
step on the landmine. For anything else, this sort of thing would be a 
bug. You're calling it a feature.



> >py> x = MapView(str.upper, "abcdef")  # An imposter.
> >py> next(x)
> >'A'
> >py> next(x)
> >'B'
> >py> next(iter(x))
> >'A'
> 
> That's a valid point, but it can be fixed:
> 
>     def __iter__(self):
>         return self.iterator or map(self.func, *self.args)
> 
> Now it gives
> 
> >>> next(x)
> 'A'
> >>> list(x)
> []
> 
> There is still one case that will behave differently from the
> current map(), i.e. using list() first and then expecting it
> to behave like an exhausted iterator. I'm finding it hard to
> imagine real code that would depend on that behaviour, though.

That's not the only breakage. This is a pattern which I sometimes use:


def test(iterator):
    # Process items up to some symbol one way,
    # and items after that symbol another way.
    for a in iterator:
        print(1, a)
        if a == 'C': break
    # This relies on iterator NOT resetting to the beginning,
    # but continuing from where we left off
    # i.e. not being broken
    for b in iterator:
        print(2, b)


Being an iterator, right now I can pass map() objects directly to that 
code, and it works as expected:

py> test(map(str.upper, 'abcde'))
1 A
1 B
1 C
2 D
2 E


Your MapView does not:

py> test(MapView(str.upper, 'abcde'))
1 A
1 B
1 C
2 A
2 B
2 C
2 D
2 E


This is why such iterators are deemed to be "broken".



> > whether operations succeed or not depend on the
> >order that you call them:
> >
> >py> x = MapView(str.upper, "abcdef")
> >py> len(x)*next(x)  # Safe. But only ONCE.
> 
> But what sane code is going to do that?

You have an object that supports len() and next(). Why shouldn't 
people use both len() and next() on it when both are supported methods?
They don't have to be in a single expression:

x = MapView(blah blah blah)
a = some_function_that_calls_len(x)
b = some_function_that_calls_next(x)

That works. But reverse the order, and you step on a 
landmine:

b = some_function_that_calls_next(x)
a = some_function_that_calls_len(x)

The caller may not even know that the functions call next() or len(), 
they could be implementation details buried deep inside some library 
function they didn't even know they were calling.

Do you still think that it is the caller's code that is insane?



> Remember, the iterator
> interface is only there for backwards compatibility.

Famous last words.



> That would fail under both Python 2 and the current Python 3.

Honestly Greg, you've been around long enough that you ought to 
recognise *minimal examples* for what they are. They're not meant to be 
real-world production code. They're the simplest, most minimal example 
that demonstates the existence of a problem.

The fact that they are *simple* is to make it easy to see the underlying 
problem, not to give you an excuse to dismiss it.

You're supposed to imagine that in real-life code, the call to next() 
could be buried deep, deep, deep in a chain of 15 function calls in some 
function in some third party library that I don't even know is being 
called, and it took me a week to debug why len(obj) would sometimes 
fail mysteriously.

The problem is not the caller, or even the library code, but that your 
class magically and implictly swaps from a sequence to a pseudo-iterator 
whether I want it to or not.

A perfect example of why DWIM code is so hated:

http://www.catb.org/jargon/html/D/DWIM.html



> >py> def innocent_looking_function(obj):
> >...     next(obj)
> >...
> >py> x = MapView(str.upper, "abcdef")
> >py> len(x)
> >6
> >py> innocent_looking_function(x)
> >py> len(x)
> >TypeError: Mapping iterator has no len()
> 
> If you're using len(), you clearly expect to have a sequence,
> not an iterator, so why are you calling a function that blindly
> expects an iterator?

*Minimal example* again.

You ought to be able to imagine the actual function is fleshed out, 
without expecting me to draw you a picture:

     if hasattr(obj, '__next__'):
         first = next(obj, sentinel)

Or if you prefer:

    try:
        first = next(obj)
    except TypeError:
         # fall back on sequence algorithm
    except StopIteration:
         # empty iterator


None of this boilerplate adds any insight at all to the discussion. 
There's a reason bug reports ask for minimal examples.

The point is, I'm calling some innocent looking function, and it breaks 
my sequence: len(obj) worked before I called the function, and 
afterwards, it raises TypeError.

I wouldn't have to care about the implementation if your MapView object 
didn't magically flip from sequence to iterator behind my back.



-- 
Steve


More information about the Python-ideas mailing list