[Python-ideas] Suggested MapView object (Re: __len__() for map())
Steven D'Aprano
steve at pearwood.info
Tue Dec 11 11:26:27 EST 2018
On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote:
> Right now I'm specifically responding to the sub-thread that Greg
> started "Suggested MapView object", so I'm considering this a mostly
> clean slate from the previous thread "__len__() for map()". Different
> ideas have been tossed around and the discussion has me thinking about
> broader possibilities. I responded to this thread because I liked
> Greg's proposal and the direction he's suggesting.
Greg's code can be found here:
https://mail.python.org/pipermail/python-ideas/2018-December/054659.html
His MapView tries to be both an iterator and a sequence at the same
time, but it is neither.
The iterator protocol is that iterators must:
- have a __next__ method;
- have an __iter__ method which returns self;
and the test for an iterator is:
obj is iter(obj)
https://docs.python.org/3/library/stdtypes.html#iterator-types
Greg's MapView object is an *iterable* with a __next__ method, which
makes it neither a sequence nor a iterator, but a hybrid that will
surprise people who expect it to act considently as either.
This is how iterators work:
py> x = iter("abcdef") # An actual iterator.
py> next(x)
'a'
py> next(x)
'b'
py> next(iter(x))
'c'
Greg's hybrid violates that expected behaviour:
py> x = MapView(str.upper, "abcdef") # An imposter.
py> next(x)
'A'
py> next(x)
'B'
py> next(iter(x))
'A'
As an iterator, it is officially "broken", continuing to yield values
even after it is exhausted:
py> x = MapView(str.upper, 'a')
py> next(x)
'A'
py> next(x)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/steve/gregmapview.py", line 24, in __next__
return next(self.iterator)
StopIteration
py> list(x) # But wait! There's more!
['A']
py> list(x) # And even more!
['A']
This hybrid is fragile: whether operations succeed or not depend on the
order that you call them:
py> x = MapView(str.upper, "abcdef")
py> len(x)*next(x) # Safe. But only ONCE.
'AAAAAA'
py> y = MapView(str.upper, "uvwxyz")
py> next(y)*len(y) # Looks safe. But isn't.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/steve/gregmapview.py", line 12, in __len__
raise TypeError("Mapping iterator has no len()")
TypeError: Mapping iterator has no len()
(For brevity, from this point on I shall trim the tracebacks and show
only the final error message.)
Things that work once, don't work a second time.
py> len(x)*next(x) # Worked a moment ago, but now it is broken.
TypeError: Mapping iterator has no len()
If you pass your MapView object to another function, it can
accidentally sabotage your code:
py> def innocent_looking_function(obj):
... next(obj)
...
py> x = MapView(str.upper, "abcdef")
py> len(x)
6
py> innocent_looking_function(x)
py> len(x)
TypeError: Mapping iterator has no len()
I presume this is just an oversight, but indexing continues to work even
when len() has been broken.
Greg seems to want to blame the unwitting coder who runs into these
boobytraps:
"But there are no surprises as long as you
stick to one interface or the other. Weird things happen
if you mix them up, but sane code won't be doing that."
(URL as above).
This MapView class offers a hybrid "sequence plus iterator, together at
last!" double-headed API, and even its creator says that sane code
shouldn't use that API.
Unfortunately, you can't use the iterator API, because its broken as an
iterator, and you can't use it as a sequence, because any function you
pass it to might use it as an iterator and pull the rug out from under
your feet.
Greg's code is, apart from the addition of the __next__ method, almost
identical to the version of mapview I came up with in my own testing.
Except Greg's is even better, since I didn't bother handling the
multiple-sequences case and his does.
Its the __next__ method which ruins it, by trying to graft on almost-
but-not-really iterator behaviour onto something which otherwise is a
sequence. I don't think there's any way around that: I think that any
attempt to make a single MapView object work as either a sequence with a
length and indexing AND an iterator with next() and no length and no
indexing is doomed to the same problems. Far from minimizing surprise,
it will maximise it.
Look at how many violations of the Principle Of Least Surprise Greg's
MapView has:
- If an object has a __len__ method, calling len() on it shouldn't
raise TypeError;
- If you called len() before, and it succeeded, calling it again
should also succeed;
- if an object has a __next__ method, it should be an iterator,
and that means iter(obj) is obj;
- if it isn't an iterator, you shouldn't be able to call next() on it;
- if it is an iterator, once it is exhausted, it should stay exhausted;
- iterating over an object (calling next() or iter() on it) shouldn't
change it from a sequence to a non-sequence;
- passing a sequence to another function, shouldn't result in that
sequence no longer supporting len() or indexing;
- if an object has a length, then it should still have a length even
after iterating over it.
I may have missed some.
--
Steve
More information about the Python-ideas
mailing list