[Python-ideas] Suggested MapView object (Re: __len__() for map())

Steven D'Aprano steve at pearwood.info
Tue Dec 11 11:26:27 EST 2018


On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote:

> Right now I'm specifically responding to the sub-thread that Greg
> started "Suggested MapView object", so I'm considering this a mostly
> clean slate from the previous thread "__len__() for map()".  Different
> ideas have been tossed around and the discussion has me thinking about
> broader possibilities.  I responded to this thread because I liked
> Greg's proposal and the direction he's suggesting.

Greg's code can be found here:

https://mail.python.org/pipermail/python-ideas/2018-December/054659.html

His MapView tries to be both an iterator and a sequence at the same 
time, but it is neither.

The iterator protocol is that iterators must:

- have a __next__ method;
- have an __iter__ method which returns self;

and the test for an iterator is:

    obj is iter(obj)

https://docs.python.org/3/library/stdtypes.html#iterator-types

Greg's MapView object is an *iterable* with a __next__ method, which 
makes it neither a sequence nor a iterator, but a hybrid that will 
surprise people who expect it to act considently as either.


This is how iterators work:

py> x = iter("abcdef")  # An actual iterator.
py> next(x)
'a'
py> next(x)
'b'
py> next(iter(x))
'c'

Greg's hybrid violates that expected behaviour:

py> x = MapView(str.upper, "abcdef")  # An imposter.
py> next(x)
'A'
py> next(x)
'B'
py> next(iter(x))
'A'



As an iterator, it is officially "broken", continuing to yield values 
even after it is exhausted:

py> x = MapView(str.upper, 'a')
py> next(x)
'A'
py> next(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/steve/gregmapview.py", line 24, in __next__
    return next(self.iterator)
StopIteration
py> list(x)  # But wait! There's more!
['A']
py> list(x)  # And even more!
['A']



This hybrid is fragile: whether operations succeed or not depend on the 
order that you call them:

py> x = MapView(str.upper, "abcdef")
py> len(x)*next(x)  # Safe. But only ONCE.
'AAAAAA'

py> y = MapView(str.upper, "uvwxyz")
py> next(y)*len(y)  # Looks safe. But isn't.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/steve/gregmapview.py", line 12, in __len__
    raise TypeError("Mapping iterator has no len()")
TypeError: Mapping iterator has no len()

(For brevity, from this point on I shall trim the tracebacks and show 
only the final error message.)



Things that work once, don't work a second time.

py> len(x)*next(x)  # Worked a moment ago, but now it is broken.
TypeError: Mapping iterator has no len()



If you pass your MapView object to another function, it can 
accidentally sabotage your code:

py> def innocent_looking_function(obj):
...     next(obj)
...
py> x = MapView(str.upper, "abcdef")
py> len(x)
6
py> innocent_looking_function(x)
py> len(x)
TypeError: Mapping iterator has no len()



I presume this is just an oversight, but indexing continues to work even 
when len() has been broken.


Greg seems to want to blame the unwitting coder who runs into these 
boobytraps:

    "But there are no surprises as long as you
    stick to one interface or the other. Weird things happen
    if you mix them up, but sane code won't be doing that."

(URL as above).

This MapView class offers a hybrid "sequence plus iterator, together at 
last!" double-headed API, and even its creator says that sane code 
shouldn't use that API. 

Unfortunately, you can't use the iterator API, because its broken as an 
iterator, and you can't use it as a sequence, because any function you 
pass it to might use it as an iterator and pull the rug out from under 
your feet.

Greg's code is, apart from the addition of the __next__ method, almost 
identical to the version of mapview I came up with in my own testing. 
Except Greg's is even better, since I didn't bother handling the 
multiple-sequences case and his does.

Its the __next__ method which ruins it, by trying to graft on almost- 
but-not-really iterator behaviour onto something which otherwise is a 
sequence. I don't think there's any way around that: I think that any 
attempt to make a single MapView object work as either a sequence with a 
length and indexing AND an iterator with next() and no length and no 
indexing is doomed to the same problems. Far from minimizing surprise, 
it will maximise it.

Look at how many violations of the Principle Of Least Surprise Greg's 
MapView has:

- If an object has a __len__ method, calling len() on it shouldn't 
  raise TypeError;

- If you called len() before, and it succeeded, calling it again
  should also succeed;

- if an object has a __next__ method, it should be an iterator, 
  and that means iter(obj) is obj;

- if it isn't an iterator, you shouldn't be able to call next() on it;

- if it is an iterator, once it is exhausted, it should stay exhausted;

- iterating over an object (calling next() or iter() on it) shouldn't
  change it from a sequence to a non-sequence;

- passing a sequence to another function, shouldn't result in that 
  sequence no longer supporting len() or indexing;

- if an object has a length, then it should still have a length even 
  after iterating over it.


I may have missed some.




-- 
Steve


More information about the Python-ideas mailing list