
I just ran into the following behavior, and found it surprising:
len(map(float, [1,2,3])) TypeError: object of type 'map' has no len()
I understand that map() could be given an infinite sequence and therefore might not always have a length. But in this case, it seems like map() should've known that its length was 3. I also understand that I can just call list() on the whole thing and get a list, but the nice thing about map() is that it doesn't copy data, so it's unfortunate to lose that advantage for no particular reason. My proposal is to delegate map.__len__() to the underlying iterable. Similarly, map.__getitem__() could be implemented if the underlying iterable supports item access: class map: def __init__(self, func, iterable): self.func = func self.iterable = iterable def __iter__(self): yield from (self.func(x) for x in self.iterable) def __len__(self): return len(self.iterable) def __getitem__(self, key): return self.func(self.iterable[key]) Let me know if there any downsides to this that I'm not seeing. From my perspective, it seems like there would be only a number of (small) advantages: - Less surprising - Avoid some unnecessary copies - Backwards compatible -Kale

On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote:
If you know the input is sizeable, why not check its length instead of the map's?
The consumer of map may not be the producer of map. You might know that alist supports len(), but by the time I see it, I only see map(f, alist), not alist itself. -- Steve

Hi Kale Thank you for the sample code. It's most helpful. Please consider
list(zip(range(4), range(4))) [(0, 0), (1, 1), (2, 2), (3, 3)]
A sequence is iterable. An iterator is iterable. There are other things that are iterable. A random number generator is an iterator, whose underlying object does not have a length. Briefly, I don't like your suggestion because many important iterables don't have a length! -- Jonathan

On Tue, Nov 27, 2018 at 9:15 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Briefly, I don't like your suggestion because many important iterables don't have a length!
That part's fine. The implication is that mapping over an iterable with a length would give a map with a known length, and mapping over something without a length wouldn't. But I think there are enough odd edge cases (for instance, is it okay to call the function twice if you __getitem__ twice, or should you cache it?) that it's probably best to keep the built-in map() simple and reliable. Don't forget, too, that map() can take more than one iterable, and some may not have lengths. (You can define enumerate in terms of map and itertools.count; what is the length of the resulting enumeration?) If you want a map-like object that takes specifically a single list, and is a mapped view to that list, then go for it - but that can be its own beast, not related to the map() built-in function. Also, it may be of value to check out more-itertools; you might find something there that you like. ChrisA

On Tue, Nov 27, 2018 at 09:36:08AM +1100, Chris Angelico wrote:
Don't forget, too, that map() can take more than one iterable
I forgot about that! But in this case, I think the answer is obvious: the length of the map object is the *smallest* length of the iterables, ignoring any unsized or infinite ones. Same would apply to zip(). But as per my previous post, there are other problems with this concept that aren't so easy to solve. -- Steve

On Tue, Nov 27, 2018 at 10:41 AM Steven D'Aprano <steve@pearwood.info> wrote:
Equally obvious and valid answer: The length is the smallest length of its iterables, ignoring any infinite ones, but if any iterable is unsized, the map is unsized. And both answers will surprise people. I still think there's room in the world for a "mapped list view" type, which retains a reference to an underlying list, plus a function, and proxies everything through to the function. It would NOT have the flexibility of map(), but it would be able to directly subscript, it wouldn't need any cache, etc, etc. ChrisA

I don't really agree that there are multiple surprising answers here. If you iterate through the whole map, that will produce some number of elements, and that's the length. Whether you can calculate that number in __len__() depends on the particular iterables you have, which is fine, but I don't think the definition of length is ambiguous. But I think Steven is right that you can't implement __len__() for an iterator without running into some inconsistencies. It's just unfortunate that map() is an iterator. -Kale

I agree many important iterables do have. On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote:
If you know the input is sizeable, why not check its length instead of the map's?
The consumer of map may not be the producer of map. Very good point. Honestly, i like the proposal but love to see more reviews on the idea. Maybe i am missing something.

On Mon, Nov 26, 2018 at 01:29:21PM -0800, Kale Kundert wrote:
This seems straightforward, but I think there's more complexity than you might realise, a nasty surprise which I expect is going to annoy people no matter what decision we make, and the usefulness is probably less than you might think. First, the usefulness: we still have to wrap the call to len() in a try...except block, even if we know we have a map object, because we won't know whether the underlying iterable supports len. So it won't reduce the amount of code we have to write. At best it will allow us to take a fast-path when len() returns a value, and a slow-path when it raises. Here's the definition of the Sized abc: https://docs.python.org/3/library/collections.abc.html#collections.abc.Sized and the implementation simply checks for the existence of __len__. We (rightly) assume that if __len__ exists, the object has a known length, and that calling len() on it will succeed or at least not raise TypeError. Your proposal will break that expectation. map objects will be sized, but since sometimes the underlying iterator won't be, they may still raise TypeError. Of course there are ways to work around this. We could just change our expectations: even Sized objects might not be *actually* sized. Or map() could catch the TypeError and raise instead a ValueError, or something. Or we could rethink the whole length concept (see below), which after all was invented back in Python 1 days and is looking a bit old. As for the nasty surprise... do you agree that this ought to be an invariant for sized iterables? count = len(it) i = 0 for obj in it: i += 1 assert i == count That's the invariant I expect, and breaking that will annoy me (and I expect many other people) greatly. But that means that map() cannot just delegate its length to the underlying iterable. The implementation must be more complex, keeping track of how many items it has seen. And consider this case: it = map(lambda x: x, [1, 2, 3, 4, 5]) x = next(it) x = next(it) assert len(it) == 5 # underlying length of the iterable assert len(list(it)) == 3 # but only three items left assert len(it) == 5 # still 5 assert len(list(it)) == 0 # but nothing left So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above. But that's going to annoy other people for another reason: we rightly expect that iterables shouldn't change their length just because you iterate over them! The length should only change if you *modify* them. So these two snippets should do the same: # 1 n = len(it) x = sum(it) # 2 x = sum(it) n = len(it) but if map() updates its length as it goes, it will break that invariant. So *whichever* behaviour we choose, we're going to break *something*. Either the reported length isn't necessarily the same as the actual length you get from iterating over the items, which will be annoying and confusing, or it varies as you iterate, which will ALSO be annoying and confusing. Either way, this apparently simple and obvious change will be annoying and confusing. Rethinking object length ------------------------ len() was invented back in Python 1 days, or earlier, when we effectively had only one kind of iterable: sequences like lists, with a known length. Today, iterables can have: 1. a known, finite length; 2. a known infinite length; 3. An unknown length (and usually no way to estimate it). At least. The len() protocol is intentionally simple, it only supports the first case, with the expectation that iterables will simply not define __len__ in the other two cases. Perhaps there is a case for updating the len() concept to explicitly handle cases 2 and 3, instead of simply not defining __len__. Perhaps it could return -1 for unknown and -2 for infinite. Or raise some other exception apart from TypeError. (I know there have been times I've wanted to know if an iterable was infinite, before spending the rest of my life iterating over it...) And perhaps we can come up with a concept of total length, versus length of items remaining. But these aren't simple issues with obvious solutions, it would surely need a PEP. And the benefit isn't obvious either. -- Steve

Hi Steven, Thanks for the good feedback.
I think most of the time you would know whether the underlying iterable was sized or not. After all, if you need the length, whatever code you're writing would probably not work on an infinite/unsized iterable.
So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above.
I think I see the problem here. map() is an iterator, where I was thinking of it as a wrapper around an iterable. Since an iterator is really just a pointer into an iterable, it doesn't really make sense for it to have a length. Give it one, and you end up with the inconsistencies you describe. I guess I probably would have disagreed with the decision to make map() an iterator rather than a wrapper around an iterable. Such a prominent function should have an API geared towards usability, not towards implementing a low-level protocol (in my opinion). But clearly that ship has sailed. -Kale

On Tue, Nov 27, 2018 at 12:37 PM Kale Kundert <kale@thekunderts.net> wrote:
For map() returns an iterable that can be used more than once, it has to be mapping over an iterable that can be used more than once. That limits it. The way map is currently defined, it can accept any iterable, and it returns a one-shot iterable (which happens to be its own iterator). That's why I think the best solution is to create a separate mapped-sequence-view that depends on its iterable being an actual sequence, and exposes itself as a sequence also. (Yes, I said "list" in my previous post, but any sequence would work.) It can carry the length through, it can directly support subscripting, etc, etc, etc. Both it and map() would have their places. ChrisA

On 11/26/2018 4:29 PM, Kale Kundert wrote:
The len function is defined as always returning the length, an int >= 0. Hence .__len__ methods should always do the same. https://docs.python.org/3/reference/datamodel.html#object.__len__ Objects that cannot do that should not have this method. The previous discussion of this issue lead to function operator.length_hint and special method object.__length_hint__ in 3.4. https://docs.python.org/3/library/operator.html#operator.length_hint """ operator.length_hint(obj, default=0) Return an estimated length for the object o. First try to return its actual length, then an estimate using object.__length_hint__(), and finally return the default value. New in version 3.4. """ https://docs.python.org/3/reference/datamodel.html#object.__length_hint__ """ object.__length_hint__(self) Called to implement operator.length_hint(). Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer >= 0. This method is purely an optimization and is never required for correctness. New in version 3.4. """
But in this case, it seems like map() should've known that its length was 3.
As others have pointed out, this is not true. If not infinite, the size, defined as the number of items to be yielded, and hence the size of list(iterator), shrinks by 1 after every next call, just as with pop methods.
Last I heard, list() uses length_hint for its initial allocation. But this is undocumented implementation. Built-in map does not have .__length_hint__, for the reasons others gave for it not having .__len__. But for private code, you are free to define a subclass that does, with the definition you want. -- Terry Jan Reedy

On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert <kale@thekunderts.net> wrote:
I mostly agree with the existing objections, though I have often found myself wanting this too, especially now that `map` does not simply return a list. This problem alone (along with the same problem for filter) has had a ridiculously outsized impact on the Python 3 porting effort for SageMath, and I find it really irritating at times. As a simple counter-proposal which I believe has fewer issues, I would really like it if the built-in `map()` and `filter()` at least provided a Python-level attribute to access the underlying iterables. This is necessary because if I have a function that used to take, say, a list as an argument, and it receives a `map` object, I now have to be able to deal with map()s, and I may have checks I want to perform on the underlying iterables before, say, I try to iterate over the `map`. Exactly what those checks are and whether or not they're useful may be highly application-specific, which is why say a generic `map.__len__` is not workable. However, if I can at least inspect those iterables I can make my own choices on how to handle the map. Exposing the underlying iterables to Python also has dangers in that I could directly call `next()` on them and possibly create some confusion, but consenting adults and all that...

On Wed, Nov 28, 2018 at 2:28 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
I'm a mathematician, so understand your concerns. Here's what I hope is a helpful suggestion. Create a module, say sage.itertools that contains (not tested) def py2map(iterable): return list(map(iterable)) The porting to Python 3 (for map) is now reduced to writing from .itertools import py2map as map at the head of each module. Please let me know if this helps. -- Jonathan

On Thu, Nov 29, 2018 at 1:46 AM Jonathan Fine <jfine2358@gmail.com> wrote:
With the nitpick that the arguments should be (func, *iterables) rather than just the single iterable, yes, this is a viable transition strategy. In fact, it's very similar to what 2to3 would do, except that 2to3 would do it at the call site. If any Py3 porting process is being held up significantly by this, I would strongly recommend giving 2to3 an eyeball - run it on some of your code, then either accept its changes or just learn from the diffs. It's not perfect (nothing is), but it's a useful tool. ChrisA

On Wed, Nov 28, 2018 at 3:54 PM Chris Angelico <rosuav@gmail.com> wrote:
That effort is already mostly done and adding a helper function would not have worked as users *passing* map(...) as an argument to some function just expect it to work. The only alternative would have been replacing the builtin map with something else at the globals level. 2to3 is mostly useless since a major portion of Sage is written in Cython anyways. I just mentioned that porting effort for background. I still believe that the actual proposal of making the arguments to a map(...) call accessible from Python as attributes of the map object (ditto filter, zip, etc.) is useful in its own right, rather than just having this completely opaque iterator.

On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote:
Ah, that's what I was missing. But... surely the function will still work if they pass an opaque iterator *other* than map() and/or filter? it = (func(x) for x in something if condition(x)) some_sage_function(it) You surely don't expect to be able to peer inside every and any iterator that you are given? So if you have to handle the opaque iterator case anyway, how is it *worse* when the user passes map() or filter() instead of a generator like the above?
Perhaps... I *want* to agree with this, but I'm having trouble thinking of when and how it would be useful. Some concrete examples would help justify it. -- Steve

On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano <steve@pearwood.info> wrote:
That one is admittedly tricky. For that matter it might be nice to have more introspection of generator expressions too, but there at least we have .gi_code if nothing else. But those are a far less common example in my case, whereas map() is *everywhere* in math code :)

On Thu, Nov 29, 2018 at 2:19 AM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Considering that a genexp can do literally anything, I doubt you'll get anywhere with that introspection.
But those are a far less common example in my case, whereas map() is *everywhere* in math code :)
Perhaps then, the problem is that math code treats "map" as something that is more akin to "instrumented list" than it is to a generator. If you know for certain that you're mapping a low-cost pure function over an immutable collection, the best solution may be to proxy through to the original list than to generate values on the fly. And if that's the case, you don't want the Py3 map *or* the Py2 one, although the Py2 one can behave this way, at the cost of crazy amounts of efficiency. ChrisA

On Wed, Nov 28, 2018 at 4:24 PM Chris Angelico <rosuav@gmail.com> wrote:
Yep, that's a great example where it might be possible to introspect a given `map` object and take it apart to do something more efficient with it. This is less of a problem with internal code where it's easy to just not use map() at all, and that is often the case. But a lot of the people who develop code for Sage are mathematicians, not engineers, and they may not be aware of this, so they write code that passes `map()` objects to more internal machinery. And users will do the same even moreso. I can (and have) written horrible C-level hacks--not for this specific issue, but others like it--and am sometimes tempted to do the same here :(

One thing I'd like to add real quick to this (I'm on my phone so apologies for crappy quoting): Although there are existing cases where there is a loss of efficiency over Python 2 map() when dealing with the opaque, iterable Python 3 map(), the latter also presents many opportunities for enhancements that weren't possible before. For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers. On Wed, Nov 28, 2018, 16:31 E. Madison Bray <erik.m.bray@gmail.com wrote:

+1. Throwing away information is almost always a bad idea. That was fixed with classes and kwargs in 3.6 which removes a lot of fiddle workarounds for example. Throwing away data needlessly is also why 2to3, baron, Parso and probably many more had to reimplement a python parser instead of using the built in. We should have information preservation and transparency be general design goals imo. Not because we can see the obvious use now but because it keeps the door open to discover uses later. / Anders

On Wed, Nov 28, 2018 at 05:37:39PM +0100, Anders Hovmöller wrote:
"Almost always"? Let's take this seriously, and think about the consequences if we actually believed that. If I created a series of integers: a = 23 b = 0x17 c = 0o27 d = 0b10111 e = int('1b', 12) your assertion would say it is a bad idea to throw away the information about how they were created, and hence we ought to treat all five values as distinct and distinguishable. So much for the small integer cache... Perhaps every single object we create ought to hold onto a AST representing the literal or expression which was used to create it. Let's not exaggerate the benefit, and ignore the costs, of "throwing away information". Sometimes we absolutely do want to throw away information, or at least make it inaccessible to the consumer of our data structures. Sometimes the right thing to do is *not* to open up interfaces unless there is a clear need for it to be open. Doing so adds bloat to the interface, prevents many changes in implementation including potential optimizations, and may carry significant memory burdens. Bringing this discussion back to the concrete proposal in this thread, as I said earlier, I want to agree with this proposal. I too like the idea of having map (and filter, and zip...) objects expose their arguments, and for the same reason: "it might be useful some day". But every time we scratch beneath the surface and try to think about how and when we would actually use that information, we run into conceptual and practical problems which suggests strongly to me that doing this will turn it into a serious bug magnet, an anti-feature which sounds good but causes more problems than it solves. I'm really hoping someone can convince me this is a good idea, but so far the proposal seems like an attractive nuisance and not a feature.
While that is a reasonable position to take in some circumstances, in others it goes completely against YAGNI. -- Steve

Hi everyone, first participation in Python’s mailing list, don’t be too hard on me Some suggested above to change the definition of len in the long term. Then I think it could be interesting to define len such as : - If has a finite length : return that length (the way it works now) - If has a length that is infinity : return infinity - If has no length : return None There’s an issue with this solution, having None returned add complexity to the usage of len, then I suggest to have a wrapper over __len__ methods so it throws the current error. But still, there’s a problem with infinite length objects. If people code : for i in range(len(infinite_list)): # Something It’s not clear if people actually want to do this. It’s opened to discussion and it is just a suggestion. If we now consider map, then the length of map (or filter or any other generator based on an iterator) is the same as the iterator itself which could be either infinite or non defined. Cheers

On Thu, Nov 29, 2018 at 2:29 AM Adrien Ricocotam <ricocotam@gmail.com> wrote:
Do you anticipate that the `len()` function will be able to solve the Halting Problem? It is simply not possible to know whether a given iterator will produce finitely many or infinitely many elements. Even those that will produce finitely many do not, in general, have a knowable length without running them until exhaustion. Here's a trivial example:
Here's a slightly less trivial one: In [1]: from itertools import count In [2]: def mandelbrot(z): ...: "Yield each value until escape iteration" ...: c = z ...: for n in count(): ...: if abs(z) > 2: ...: return n ...: yield z ...: z = z*z + c What should len(mandelbrot(my_complex_number)) be? Hint, depending on the complex number chosen, it might be any Natural Number (or it might not terminate). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

[David Mertz]
Do you anticipate that the `len()` function will be able to solve the Halting Problem?
It is simply not possible to know whether a given iterator will produce
You don't have to solve the halting problem. You simply ask the object. The default behavior would be "I don't know" whether that's communicated by returning None or some other sentinel value (NaN?) or by raising a special exception. Then you simply override the default behavior for cases where the object does or at least might know. itertools.repeat, for example; would have an infinite length unless "times" is provided, then its length would be the value of "times". map would return the length of the shortest iterable unless there is an unknown sized iterable, then len would be unknown, if all iterables are infinite, the length would be infinite. We could add a decorator for length and/or length hints on generator functions: @length(lambda times: times or float("+inf"))*def* repeat(obj, times=None): if times is None: while True: yield obj else: for i in range(times): yield obj On Thu, Nov 29, 2018 at 10:40 AM David Mertz <mertz@gnosis.cx> wrote:

On Wed, Nov 28, 2018 at 11:04 PM Steven D'Aprano <steve@pearwood.info> wrote:
Not to go too off-topic but I don't think this is a great example either. Although as a practical consideration I agree Python shouldn't preserve the base representation from which an integer were created I often *wish* it would. It's useful information to have. There's nothing I hate more than doing hex arithmetic in Python and having it print out decimal results, then having to wrap everything in hex(...) before displaying. Base representation is still meaningful, often useful information.

E. Madison Bray wrote:
But it will only help if the user passes a map object in particular, and not some other kind of iterator. Also it won't help if the inputs to the map are themselves iterators that aren't amenable to inspection. This smells like exposing an implementation detail of your function in its API. I don't see how it would help with your Sage port either, since the original code only got the result of the mapping and wouldn't have been able to inspect the underlying iterables. I wonder whether it's too late to redefine map() so that it returns a view object instead of an iterator, as was done when merging dict.{items, iter_items} etc. Alternatively, add a mapped() bultin that returns a view. -- Greg

On Wed, Nov 28, 2018 at 03:27:25PM +0100, E. Madison Bray wrote:
*scratches head* I presume that SageMath worked fine with Python 2 map and filter? You can have them back again: # put this in a module called py2 _map = map def map(*args): return list(_map(*args)) And similarly for filter. The only annoying part is to import this new map at the start of every module that needs it, but while that's annoying, I wouldn't call it a "ridiculously outsized impact". Its one line at the top of each module. from py2 import map, filter What am I missing?
Can you give a concrete example of what you would do in practice? I'm having trouble thinking of how and when this sort of thing would be useful. Aside from extracting the length of the iterable(s), under what circumstances would you want to bypass the call to map() or filter() and access the iterables directly?
I don't think that's worse than what we can already do if you hold onto a reference to the underlying iterable: py> a = [1, 2, 3] py> it = map(lambda x: x+100, a) py> next(it) 101 py> a.insert(0, None) py> next(it) 101 -- Steve

On Wed, Nov 28, 2018 at 4:04 PM Steven D'Aprano <steve@pearwood.info> wrote:
For example, some function that used to expect some finite-sized sequence such as a list or tuple is now passed a "map", possibly wrapping one or more iterable of arbitrary, possibly non-finite size. For the purposes of some algorithm I have this is not useful and I need to convert it to a sequence anyways but don't want to do that without some guarantee that I won't blow up the user's memory usage. So I might want to check: finite_definite = True for it in my_map.iters: try: len(it) except TypeError: finite_definite = False if finite_definite: my_seq = list(my_map) else: # some other algorithm Of course, some arbitrary object could lie about its __len__ but I'm not concerned about pathological cases here. There may be other opportunities for optimization as well that are otherwise hidden. Either way, I don't see any reason to hide this data; it's a couple of slot attributes and instantly better introspection capability.

On Wed, Nov 28, 2018 at 04:14:24PM +0100, E. Madison Bray wrote:
But surely you didn't need to do this just because of *map*. Users could have passed an infinite, unsized iterable going back to Python 1 days with the old sequence protocol. They certainly could pass a generator or other opaque iterator apart from map. So I'm having trouble seeing why the Python 2/3 change to map made things worse for SageMath. But in any case, this example comes back to the question of len again, and we've already covered why this is problematic. In case you missed it, let's take a toy example which demonstrates the problem: def mean(it): if isinstance(it, map): # Hypothetical attribute access to the underlying iterable. n = len(it.iterable) return sum(it)/n Now let's pass a map object to it: data = [1, 2, 3, 4, 5] it = map(lambda x: x, data) assert len(it.iterable) == 5 next(it); next(it); next(it) assert mean(it) == 4.5 # fails, as it actually returns 9/5 instead of 9/2 -- Steve

Suppose itr_1 is an iterator. Now consider itr_2 = map(lambda x: x, itr_1) itr_3 = itr_1 We now have itr_1, itr_2 and itr_3. There are all, effectively, the same iterator (unless we do an 'x is y' comparision). I conclude that this suggestion amounts to have a __len__ for ANY iterator, and not just a map. In other words, this suggestion has broader scope and consequences than were presented in the original post. -- Jonathan

Probably the most proliferate reason it made things *worse* is that many functions that can take collections as arguments--in fact probably most--were never written to accept arbitrary iterables in the first place. Perhaps they should have been, but the majority of that was before my time so I and others who worked on the Python 3 port were stuck with that. Sure the fix is simple enough: check if the object is iterable (itself not always as simple as one might assume) and then call list() on it. But we're talking thousands upon thousands of functions that need to be updated where examples involving map previously would have just worked. But on top of the obvious workarounds I would now like to do things like protect users, where possible, from doing things like passing arbitrarily sized data to relatively flimsy C libraries, or as I mentioned in my last message make new optimizations that weren't possible before. Of course this isn't always possible in some cases where dealing with an arbitrary opaque iterator, or some pathological cases. But I'm concerned more about doing the best we can in the most common cases (lists, tuples, vectors, etc) which are *vastly* more common. I use SageMath as an example but I'm sure others could come up with their own clever use cases. I know there are other cases where I've wanted to at least try to get the len of a map, at least in cases where it was unambiguous (for example making a progress meter or something) On Wed, Nov 28, 2018, 16:33 Steven D'Aprano <steve@pearwood.info wrote:

I should add, I know the history here of bitterness surrounding Python 3 complaints and this is not one. I defend most things Python 3 and have ported many projects (Sage just being the largest by orders of magnitude, with every Python 3 porting quirk represented and often magnified). I agree with the new iterable map(), filter(), and zip() and welcomed that change. But I think making them more introspectable would be a useful enhancement. On Wed, Nov 28, 2018, 17:16 E. Madison Bray <erik.m.bray@gmail.com wrote:

Hi Madison Is there a URL somewhere where I can view code written to port sage to Python3? I've already found https://trac.sagemath.org/search?q=python3 And because I'm a bit interested in cluster algebra, I went to https://git.sagemath.org/sage.git/commit/?id=3a6f494ac1d4dbc1e22b0ecbebdbc63... Is this a good example of the change required? Are there other example worth looking at? -- Jonathan

On Wed, Nov 28, 2018 at 11:59 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
You either missed, or completely ignored, my previous message where I addressed this: "For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers."

On 11/28/2018 9:27 AM, E. Madison Bray wrote:
One of the guidelines in the Zen of Python is "Special cases aren't special enough to break the rules." This proposal claims that the Python 3 built-in iterator class 'map' is so special that it should break the rule that iterators in general cannot and therefore do not have .__len__ methods because their size may be infinite, unknowable until exhaustion, or declining with each .__next__ call. For iterators, 3.4 added an optional __length_hint__ method. This makes sense for iterators, like tuple_iterator, list_iterator, range_iterator, and dict_keyiterator, based on a known finite collection. At the time, map.__length_hint__ was proposed and rejected as problematic, for obvious reasons, and insufficiently useful. The proposal above amounts to adding an unspecified __length_hint__ misnamed as __len__. Won't happen. Instead, proponents should define and test one or more specific implementations of __length_hint__ in map subclass(es).
What makes the map class special among all built-in iterator classes? It appears not to be a property of the class itself, as an iterator class, but of its name. In Python 2, 'map' was bound to a different implementation of the map idea, a function that produced a list, which has a length. I suspect that if Python 3 were the original Python, we would not have this discussion.
This proposes to make map (and filter) special in a different way, by adding other special (dunder) attributes. In general, built-in callables do not attach their args to their output, for obvious reasons. If they do, they do not expose them. If input data must be saved, the details are implementation dependent. A C-coded callable would not necessarily save information in the form of Python objects. Again, it seems to me that the only thing special about these two, versus the other iterators left in itertools, is the history of the names.
If a function is documented as requiring a list, or a sequence, or a length object, it is a user bug to pass an iterator. The only thing special about map and filter as errors is the rebinding of the names between Py2 and Py3, so that the same code may be good in 2.x and bad in 3.x. Perhaps 2.7, in addition to future imports of text as unicode and print as a function, should have had one to make map and filter be the 3.x iterators. Perhaps Sage needs something like def size_map(func, *iterables): for it in iterables: if not hasattr(it, '__len__'): raise TypeError(f'iterable {repr(it)} has no size') return map(func, *iterables) https://docs.python.org/3/library/functions.html#map says "map(function, iterable, ...) Return an iterator [...]" The wording is intentional. The fact that map is a class and the iterator an instance of the class is a CPython implementation detail. Another implementation could use the generator function equivalent given in the Python 2 itertools doc, or a translation thereof. I don't know what pypy and other implementations do. The fact that CPython itertools callables are (now) C-coded classes instead Python-coded generator functions, or C translations thereof (which is tricky) is for performance and ease of maintenance. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 02:53:50PM -0500, Terry Reedy wrote:
Thanks for the background Terry, but doesn't that suggest that sometimes special cases ARE special enough to break the rules? *wink* Unfortunately, I don't think it is obvious why map.__length_hint__ is problematic. It only needs to return the *maximum* length, or some sentinel (zero?) to say "I don't know". It doesn't need to be accurate, unlike __len__ itself. Perhaps we should rethink the decision not to give map() and filter() a length hint? [...]
No, in fairness, I too have often wanted to know the length of an arbitrary iterator, including map(), without consuming it. In general this is an unsolvable problem, but sometimes it is (or at least, at first glance *seems*) solvable. map() is one of those cases. If we could solve it, that would be great -- but I'm not convinced that it is solvable, since the solution seems worse than the problem it aims to solve. But I live in hope that somebody cleverer than me can point out the flaws in my argument. [...]
I think that's future_builtins: [steve@ando ~]$ python2.7 -c "from future_builtins import *; print map(len, [])" <itertools.imap object at 0xb7ed39ec> But that wouldn't have helped E. Madison Bray or SageMath, since their difficulty is not their own internal use of map(), but their users' use of map(). Unless they simply ban any use of iterators at all, which I imagine will be a backwards-incompatible change (and for that matter an excessive overreaction for many uses), SageMath can't prevent users from providing map() objects or other iterator arguments. -- Steve

On 11/28/2018 5:27 PM, Steven D'Aprano wrote:
Yes, but these cases is not special enough to break the rules for len and __len__, especially when an alternative already exists.
Unfortunately, I don't think it is obvious why map.__length_hint__ is problematic.
It is less obvious (there are more details to fill in) than the (exact) length_hints for the list, tuple, range, and dict iterators. This are *always* based on a sized collection. Map is *sometimes* based on sized collection(s). It is the other cases that are problematic, as illustrated by your next sentence.
Perhaps we should rethink the decision not to give map() and filter() a length hint?
I should have said this more explicitly. This is why I suggested that someone define and test one or specific map.__length_hint__ implementations. Someone doing so should look into the C code for list to see how list handles iterators with a length hint. I suspect that low estimates are better than high estimates. Does list recognize any value as "I don't know"?
The current situation with length_hint reminds me a bit of the situation with annotations before the addition of typing. Perhaps it is time to think about conventions for the non-obvious 'other cases'.
Thanks for the info.
In particular, by people who are not vividly aware that we broke the back-compatibility rule by rebinding 'map' and 'filter' in 3.0. Breaking back-compatibility *again* by redefining len (to mean something like operator.length) is not the right solution to problems caused by the 3.0 break.
I think their special case problem requires some special case solutions. At this point, I am refraining from making suggestions. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano <steve@pearwood.info> wrote:
In general it's unsolvable, so no attempt should be made to provide a pre-baked attempt at a solution that won't always work. But in many, if not the majority of cases, it *is* solvable. So let's give intelligent people the tools they need to solve it in those cases that they know they can solve it :)
That is the majority of the case I was concerned about, yes.

On Thu, Nov 29, 2018 at 12:16:37PM +0100, E. Madison Bray wrote:
On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano <steve@pearwood.info> wrote:
["it" below being the length of an arbitrary iterator]
So you say, but the solutions made so far seem fatally flawed to me. Just repeating the assertion that it is solvable isn't very convincing. -- Steve

On Thu, Nov 29, 2018 at 1:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Okay, let's keep it simple: m = map(str, [1, 2, 3]) len_of_m = None if len(m.iters) == 1 and isinstance(m.iters[0], Sized): len_of_m = len(m.iters[0]) You can give me pathological cases where that isn't true, but you can't say there's no context in which that wouldn't be virtually guaranteed and consenting adults can decide whether or not that's a safe-enough assumption in their own code.

On Thu, Nov 29, 2018 at 02:16:48PM +0100, E. Madison Bray wrote:
Yes I can, and they aren't pathological cases. They are ordinary cases working the way iterators are designed to work. All you get is a map object. You have no way of knowing how many times the iterator has been advanced by calling next(). Consequently, there is no guarantee that len(m.iters[0]) == len(list(m)) except by the merest accident that the map object hasn't had next() called on it yet. *This is not pathological behaviour*. This is how iterators are designed to work. The ability to partially advance an iterator, pause, then pass it on to another function to be completed is a huge benefit of the iterator protocol. I've written code like this on more than one occasion: # toy example for x in it: process(x) if condition(x): for y in it: do_something_else(y) # Strictly speaking, this isn't needed, since "it" is consumed. break If I pass the partially consumed map iterator to your function, it will use the wrong length and give me back inaccurate results. (Assuming it actually uses the length as part of the calculated result.) You might say that your users are not so advanced, or that they're naive enough not to even know they could do that, but that's a pretty unsafe assumption as well as being rather insulting to your own users, some of whom are surely advanced Python coders not just naive dabblers. Even if only one in a hundred users knows that they can partially iterate over the map, and only one in a hundred of those actually do so, you're still making an unsafe assumption that will return inaccurate results based on an invalid value of len_of_m.
and consenting adults can decide whether or not that's a safe-enough assumption in their own code.
Which consenting adults? How am I, wearing the hat of a Sage user, supposed to know which of the hundreds of Sage functions make this "safe-enough" assumption and return inaccurate results as a consequence? -- Steve

On Thu, Nov 29, 2018 at 3:43 PM Steven D'Aprano <steve@pearwood.info> wrote:
That's a fair point and probably the killer flaw in this proposal (or any involving getting the lengths of iterators). I still think it would be useful to be able to introspect map objects, but this does throw some doubt on the overall reliability of this. I'd say that in most cases it would still work, but you're right it's harder to guarantee in this context. One obvious workaround would be to attach a flag indicating whether or not __next__ has been called (or as long as you have such a flag, why not a counter for the number of times __next__ has been called)? That would effectively solve the problem, but I admit it's a taller order in terms of adding API surface.

On Thu, Nov 29, 2018 at 2:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
I think that what above all unites Sage users is knowledge of mathematics. Use of Python would be secondary. The goal surely is to discover and develop conventions and interface that work for such a group of users. In this area the original poster is probably the expert, and I think should be respected as such. Steve's post divides Sage users into "advanced Python coders" and "naive dabblers". This misses the point, which is to get something that works well for all users. This, I'd say, is one of the features of Python's success. Most Python users are people who want to get something done. By the way, I'd expect that most Sage users fall into the middle range of Python expertise. I think that to focus on the extremes is both unhelpful and divisive. -- Jonathan

On Thu, Nov 29, 2018 at 7:16 PM Jonathan Fine <jfine2358@gmail.com> wrote:
Yes, thank you. They are all very smart people--most of them much moreso than I. The vast majority are mathematicians first, and software developers second, third, fourth, or even further down the line. Some of the most prolific contributors to Sage barely know how to use git without some wrappers we've provided around it (not that they couldn't learn, but let's be honest git is a terrible tool for anyone who isn't Linus Torvalds). They still write good code and sometimes brilliant algorithms. But they're not all Python experts. Many of them are also students who are only using Python because Sage uses it, and not using Sage because it uses Python. The Sagebook [1] may be their first introduction to Python, and even then it only introduces Python programming in drips and drabs as needed for the topics at hand (e.g. variables, loops, functions). I'm trying to consider users at all levels. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf

On 11/29/2018 8:16 AM, E. Madison Bray wrote:
As I have noted before, the existing sized collection __length_hint__ methods (properly) return the remaining items = len(underlying_iterable) - items_already_produced. This is fairly easy at the C level. The following seems to work in Python. class map1: def __init__(self, func, sized): " if isinstance(sized, (list, tuple, range, dict)): self._iter = iter(sized) self._gen = (func(x) for x in self._iter) else: raise TypeError(f'{size} not one of list, tuple, range, dict') def __iter__(self): return self def __next__(self): return next(self._gen) def __length_hint__(self): return __length_hint__(self._iter) m = map1(int, [1.0, 2.0, 3.0]) print(m.__length_hint__()) print('first item', next(m)) print(m.__length_hint__()) print('remainer', list(m)) print(m.__length_hint__()) # prints, as expected and desired 3 first item 1 2 remainer [2, 3] 0 A package could include a version of this, possibly compiled, for use when applicable. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy@udel.edu> wrote:
This seems to be replying to the OP, whom I was quoting. On one hand I would argue that this is cherry-picking the "Zen" since not all rules are special in the first place. But in this case I agree that map should not have a length or possibly even a length hint (although the latter is more justifiable).
Who said anything about "special", or adding "special (dunder) attributes"? Nor did I make any general statement about all built-ins. For arbitrary functions it doesn't necessarily make sense to hold on to their arguments, but in the case of something like map() its arguments are the only thing that give it meaning at all: The fact remains that for something like a map in particular it can be treated in a formal sense as a collection of a function and some sequence of arguments (possibly unbounded) on which that function is to be evaluated (perhaps not immediately). As an analogy, a series in an object in its own right without having to evaluate the entire series: lots of information can be gleaned from the properties of a series without having to evaluate it. Just because you don't see the use doesn't mean others can't find one. The CPython map() implementation already carries this data on it as "func" and "iters" members in its struct. It's trivial to expose those to Python as ".funcs" and ".iters" attributes. Nothing "special" about it. However, that brings me to...
Exactly how intentional is that wording though? If it returns an iterator it has to return *some object* that implements iteration in the manner prescribed by map. Generator functions could theoretically allow attributes attached to them. Roughly speaking: def map(func, *iters): def map_inner(): for args in zip(*iters): yield func(*args) gen = map_inner() gen.func = func gen.iters = iters return gen As it happens this won't work in CPython since it does not allow attribute assignment on generator objects. Perhaps there's some good reason for that, but AFAICT--though I may be missing a PEP or something--this fact is not prescribed anywhere and is also particular to CPython. Point being, I don't think it's a massive leap or imposition on any implementation to go from "Return an iterator [...]" to "Return an iterator that has these attributes [...]" P.S.
It's not a user bug if you're porting a massive computer algebra application that happens to use Python as its implementation language (rather than inventing one from scratch) and your users don't need or want to know too much about Python 2 vs Python 3. Besides, the fact that they are passing an iterator now is probably in many cases a good thing for them, but it takes away my ability as a developer to find out more about what they're trying to do, as opposed to say just being given a list of finite size. That said, I regret bringing up Sage; I was using it as an example but I think the point stands on its own.

On Thu, Nov 29, 2018 at 10:18 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Either this is Python, or it's just an algebra language that happens to be implemented in Python. If the former, the Py2/Py3 distinction should matter to your users, since they are programming in Python. If the latter, it's all about Sage, ergo you can rebind map to mean what you expect it to mean. Take your pick. ChrisA

On Thu, Nov 29, 2018 at 12:21 PM Chris Angelico <rosuav@gmail.com> wrote:
Porque no los dos? Sage is a superset of Python, and while on some level (in terms of advanced programming constructs) users will need to care about the distinction. But most users don't really know exactly what it does when they pass something like map(a_func, a_list) as an argument to a function call. They don't necessarily appreciate the distinction that, depending on how that function is implemented, an arbitrary iterable has to be treated very differently than a list. I certainly don't mind supporting arbitrary iterables--I think they should be supported. But now there are optimizations I can't make that I could have made before when map() just returned a list. In most cases I didn't have to make these optimizations manually because the code is written in Cython. It's true that when a user called map() previously some opportunities for optimization were already lost, but now it's even worse because I have to treat a simple map of a list on par with the necessarily slower arbitrary iterator case, when technically-speaking there is no reason that has to be the case. Cython could even handle that case automatically as well by turning a map(<some_C_function_wrapped_by_cython>, <a_list>) into something like: list = map.iter[0]; for (idx=0; idx < PyList_Length(list); idx++) { wrapped_c_function(PyList_GET_ITEM(list, idx); }
If the latter, it's all about Sage, ergo you can rebind map to mean what you expect it to mean. Take your pick.
I'm' still not sure what makes you think one can just blithely replace a builtin with something that doesn't work how all other Python libraries expect that builtin to work. At best I could subclass map() and add this functionality but now you're adding at least three pointers to every map() that are not necessary since the information is already there in the C struct. For most cases this isn't too bad in terms of overhead but consider cases (which I've seen plenty of), like: list_of_lists = [map(int, x) for x in list_of_lists] Now the user who previously expected to have a list of lists has a list of maps. It's already bad enough that each map holds a pointer to a function but I wouldn't want to make that worse. Anyways, I'd love to get off the topic of Sage and just ask why you would object to useful introspection capabilities? I don't even care if it were CPython-specific.

On Thu, Nov 29, 2018 at 10:21:15PM +1100, Chris Angelico wrote:
False dichotomy. Sage is *all* of these things: - a stand-alone application which is (partially) written in Python; - an application which runs under iPython/Jupiter; - a package which has to interoperate with other Python packages; - an algebra language.
If the former, the Py2/Py3 distinction should matter to your users, since they are programming in Python.
Even if they know, and care, about the difference between iterators and lists, they cannot be expected to know or care about how the hundreds of Sage functions process lists differently from iterators. Which would be implementation details of the Sage functions, and subject to change without warning. I sympathise with this proposal. In my own tiny little way, I've had to grapple with something similar for the stdlib statistics library, and I'm not totally happy with the work-around I came up with. And I have a few ideas for the future which will either render the difference moot, or make the problem worse, I'm not sure which :-)
Sage wraps a number of Python libraries, such as numpy, sympy and others, and itself can run under iPython which for all we know may already have monkeypatched the builtins for its own ~~nefarious~~ useful purposes. Are you really comfortable with monkeypatching the builtins in this way in such a complex ecosystem of packages? Maybe it will work, but I think you're being awfully gung-ho about the suggestion. (At least my earlier suggestion didn't involve monkey-patching the builtin map, merely shadowing it.) Personally, even if monkeypatching in this way solved the problem, as a (potential) user of SageMath I'd be really, really peeved if it patched map() in the way you suggest and regressed map() to the 2.x version. -- Steve

On Fri, Nov 30, 2018 at 12:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
To be quite honest, no, I am not comfortable with it. But I *am* comfortable with expecting Python programmers to program in Python, and thus deeming that breakage as a result of user code being migrated from Py2 to Py3 is to be fixed by the user. You can mess around with map(), but there are plenty of other things you can't mess with, so I don't see why this one thing should be Sage's problem. ChrisA

On Thu, Nov 29, 2018 at 2:22 PM Chris Angelico <rosuav@gmail.com> wrote:
The users--often scientists--of SageMath and many other scientific Python packages* are not "Python programmers" as such**. My job as a software engineer is to make the lower-level libraries they use for their day-to-day research work _just work_, and in particular _optimize_ that lower-level code in as many ways as I can find to. In some cases we do have to tell them about Python 2 vs Python 3 things (especially w.r.t. print()) but most of the time it is relatively transparent, as it should be. Steven has the right idea about it. Not every detail can be made perfectly transparent in terms of how users use or misuse them, no. But there are lots of areas where they should absolutely not have to care (e.g. like Steven wrote they cannot be expected to know how every single function might treat an iterator like map() over a finite sequence distinctly from the original finite sequence itself). In the case of map(), although maybe I have not articulated it well, I can say for sure that I've had perfectly valid use cases that were stymied merely by a semi-arbitrary decision to hide the data the wrapped by the "iterator returned by map()" (if you want to be pedantic about it). I'm willing to accept some explanation for why that would be actively harmful, but as someone with concrete problems to solve I'm less convinced by appeals to abstracts, or "why not just X" as if I hadn't considered "X" and found it flawed (which is not to say that I don't mind any new idea being put thoroughly through its paces.) * (Pandas, SymPy, Astropy, and even lower-level packages like NumPy, not to mention Jupyter which implements kernels for dozens of languages, but is primarily implemented in Python) ** With an obligatory asterisk to counter a common refrain from those who experience impostor syndrome, that if you are using this software then yes you are in fact a Python programmer, you just haven't realized it yet ;)

On Thu, Nov 29, 2018 at 2:05 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Well said. Unlike many people on this list, programming Python is not their top skill. For example, Paul Romer, the 2018 Economic Nobel Memorial Laurate. His strength is economics. Python is one of the many tools he uses. But it's not his top skill (smile). https://developers.slashdot.org/story/18/10/09/0042240/economics-nobel-laure... In some sense, I think, what Madison wants is an internal domain specific language (IDSL) that works well for Sage users. Just as Django is an IDSL that works well for many web developers. See, for example https://martinfowler.com/books/dsl.html for the general idea. We might not agree on the specifics. But that's perhaps mostly a matter for the domain experts, such as Madison and Sage users. -- Jonathan

On 11/29/2018 6:13 AM, E. Madison Bray wrote:
On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy@udel.edu> wrote:
I will come back to this when you do.
The use of 'iterator' is exactly intended, and the iterator protocol is *intentionally minimal*, with one iterator specific __next__ method and one boilerplate __iter__ method returning self. This is more minimal than some might like. An argument against the addition of length_hint and __length_hint__ was that it might be seen as extending at least the 'expected' iterator protocol. The docs were written to avoid this.
Instances of C-coded classes generally cannot be augmented. But set this issue aside.
Do you propose exposing the inner struct members of *all* C-coded iterators? (And would you propose that all Python-coded iterators should use public names for the equivalents?) Some subset thereof? (What choice rule?) Or only for map? If the latter, why do you consider map so special?
In both 2 and 3, the function has to deal with iterator inputs one way or another. In both 2 and 3, possible interator inputs includes maps passed as generator comprehensions, '(<expression with x> for x in iterable)'.
As a former 'scientist who programs' I can understand the desire for ignorance of such details. As a Python core developer, I would say that if you want Sage to allow and cater to such ignorance, you have to either make Sage a '2 and 3' environment, without burdening Python 3, or make future Sage a strictly Python 3 environment (as many scientific stack packages are doing or planning to do). ...
That said, I regret bringing up Sage; I was using it as an example but I think the point stands on its own.
Yes, the issues of hiding versus exposing implementation details, and that of saving versus deleting and, when needed, recreating 'redundant' information, are independent of Sage and 2 versus 3. -- Terry Jan Reedy

On Thu, Nov 29, 2018 at 9:36 PM Terry Reedy <tjreedy@udel.edu> wrote:
You still seem to be confusing my point. I'm not advocating even for __length_hint__ (I think there are times that would be useful but it's still pretty problematic). I admit one thing I'm a little stuck on though is that map() currently just immediately calls iter() on its arguments to get their iterators, and does not store references to the original iterables. It would be nice if more iterators could have an exposed reference to the objects they're iterating, in cases where that's even meaningful. For some reason I thought, for example, that a list_iterator could give me a reference back to the list itself. This was probably omitted intentionally but it still feels pretty limiting :(
Not necessarily, no. But certainly a few: I'm using map() as an example but at the very least map() and filter(). An exact choice rule is something worth thinking about but I don't think you're going to find an "objective" rule. I think it goes without saying that map() is special in a way: It's one of the most basic extensions to function application and is a fundamental construct in functional programming and from a category-theortical perspective. I'm not saying Python's built-in map() needs to represent anything mathematically formal, but it's certainly quite fundamental which is why it's a built-in in the first place.
Yes, but those are still less common, and generator expressions were not even around when Sage was first started: I've been around long enough to remember when they were added to the language, and were well predated by map() and filter(). The Sagebook [1] introduces them around page 60. I'm not sure if it even introduces generators expressions at all. I think a lot of Python and C++ experts don't realize that the "iterator" concept is not at all immediately obvious to a lot of non-programmers. Most iterator inputs supplied by users are things like sized collections for which it's easy to think about "going over them one by one" and not more abstract iterators. This is true whether the user is a Python expert or not.
"ignorance" is not a word I would use here, frankly.
I agree there, that this is not really an argument about Sage or Python 2/3. Though I don't think this is an "implementation detail". In an abstract sense a map is a special container for a function and a sequence that has special semantics. As far as I'm concerned this is what it *is* in some ontological sense, and this fact is not a mere implementation detail. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf

On Fri, Nov 30, 2018 at 10:32:31AM +0100, E. Madison Bray wrote:
Its a built-in in the first place, because back in Python 0.9 or 1.0 or thereabouts, a fan of Lisp added it to the builtins (together with filter and reduce) and nobody objected (possibly because they didn't notice) at the time. It was much easier to add things to the language back then. During the transition to Python 3, Guido wanted to remove all three (as well as lambda): https://www.artima.com/weblogs/viewpost.jsp?thread=98196 Although map, filter and lambda have stayed, reduce has been relegated to the functools module. -- Steve

E. Madison Bray wrote:
This sounds like a backwards way to address the issue. If you have a function that expects a list in particular, it's up to its callers to make sure they give it one. Instead of maing the function do a bunch of looking before it leaps, it would be better to define something like def lmap(f, *args): return list(map(f, *args)) and then replace 'map' with 'lmap' elsewhere in your code. -- Greg

On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert <kale@thekunderts.net> wrote:
Excellent proposal, followed by a flood of confused replies, which I will mostly disregard, since all miss the obvious. What's being proposed is simple, either: * len(map(f, x)) == len(x), or * both raise TypeError That implies, loosely speaking: * map(f, Iterable) -> Iterable, and * map(f, Sequence) -> Sequence But, *not*: * map(f, Iterable|Sequence) -> Magic. So, the map() function becomes a factory, returning an object with __len__ or without, depending on what it was called with. /Paul

That would be great especially if it returned objects of a subclass of map so that it didn't break any code that checks isinstance, however; I think this goes a little beyond map. I've run into cases using itertools where I wished the iterators could support len. I suppose you could turn those all into factories too, but I wonder if that's the most elegant solution. On Thu, Nov 29, 2018 at 7:22 PM Paul Svensson <paul-python@svensson.org> wrote:

On Thu, Nov 29, 2018 at 08:13:12PM -0500, Paul Svensson wrote:
Excellent proposal, followed by a flood of confused replies, which I will mostly disregard, since all miss the obvious.
When everyone around you is making technical responses which you think are "confused", it is wise to consider the possibility that it is you who is missing something rather than everyone else.
Simple, obvious, and problematic. Here's a map object I prepared earlier: from itertools import islice mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) If I now pass you the map object, mo, what should len(mo) return? Five or eight? No matter which choice you make, you're going to surprise and annoy people, and there will be circumstances where that choice will introduce bugs into their code.
But map objects aren't sequences. They're iterators. Just adding a __len__ method isn't going to make them sequences (not even "loosely speaking") or solve the problem above. In principle, we could make this work, by turning the output of map() into a view like dict.keys() etc, or a lazy sequence type like range(). wrapping the underlying sequence. That might be worth exploring. I can't think of any obvious problems with a view-like interface, but that doesn't mean there aren't any. I've spent like 30 seconds thinking about it, so the fact that I can't see any problems with it means little. But its also a big change, not just a matter of exposing the __len__ method of the underlying iterable (or iterables). -- Steve

On Sat, 1 Dec 2018 at 01:17, Steven D'Aprano <steve@pearwood.info> wrote:
Something to consider that, so far, seems to have been overlooked is that the total length of the resulting map isn't only dependent upon the iterable, but also the mapped function. It is a pretty pathological case, but there is no guarantee that the function is a pure function, free from side effects. If the iterable is mutable and the mapped function has a reference to it (either from scoping or the iterable (in)directly containing a reference to itself), there is nothing to prevent the function modifying the iterable as the map is evaluated. For example, map can be used as a filter: it = iter((0, 16, 1, 4, 8, 29, 2, 13, 42)) def filter_odd(x): while x % 2 == 0: x = next(it) return x tuple(map(filter_odd, it)) # (1, 29, 13) The above also illustrates the second way the total length of the map could differ from the length input iterable, even if is immutable. If StopIteration is raised within the mapped function, map finishes early, so can be used in a manner similar to takewhile: def takewhile_lessthan4(x): if x < 4: return x raise StopIteration tuple(map(takewhile_lessthan4, range(9))) # (0, 1, 2, 3) I really don't understand why this is true, under 'normal' usage, map shouldn't have any reason to silently swallow a StopIteration raised _within_ the mapped function. As I opened with, I wouldn't consider using map in either of these ways to be a good idea, and anyone doing so should probably be persuaded to find better alternatives, but it might be something to bear in mind. AJ

On Sat, 1 Dec 2018 at 10:44, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
It's not -- the StopIteration isn't terminating the map, it's terminating the iteration being performed by tuple().
That was a poor choice of wording on my part, it's rather that map doesn't do anything special in that regard. To whatever is iterating over the map, any unexpected StopIteration from the function isn't distinguishable from the expected one from the iterable(s) being exhausted. This issue was dealt with in generators by PEP-479 (by replacing the StopIteration with a RuntimeError). Whilst map, filter, and others may not be generators, I would expect them to be consistent with that PEP when handling the same issue.

A proposal to make map() not return an iterator seems like a non-starter. Yes, Python 2 worked that way, but that was a long time ago and we know better now. In the simple example it doesn't matter much: mo = map(lambda x: x, "aardvark") But map() is more useful for the non-toy case: mo = map(expensive_db_lookup, list_of_keys) list_of_keys can be a concrete list, but I'm using map() mainly specifically to get lazy iterator behavior. On Sat, Dec 1, 2018, 11:10 AM Paul Svensson <paul-python@svensson.org wrote:

On Sat, Dec 01, 2018 at 11:27:31AM -0500, David Mertz wrote:
Paul is certainly not suggesting reverting the behaviour to the Python2 map, at the very least map(func, iterator) will continue to return an iterator. What Paul is *precisely* proposing isn't clear to me, except that map(func, sequence) will be "loosely" a sequence. What that means is not obvious. What is especially unclear is what his map() will do when passed multiple iterable arguments. [...]
list_of_keys can be a concrete list, but I'm using map() mainly specifically to get lazy iterator behavior.
Indeed. That's often why I use it too. But there is a good use-case for having map(), or a map-like function, provide either a lazy sequence like range() or a view. But the devil is in the details. Terry was right to encourage people to experiment with their own map-like function (a subclass?) to identify any tricky corners in the proposal. -- Steve

On Sat, Dec 01, 2018 at 11:07:53AM -0500, Paul Svensson wrote: [...]
I already discussed that: map is not currently a sequence, and just giving it a __len__ is not going to make it one. Making it a sequence, or a view of a sequence, is a bigger change, but worth considering, as I already said in part of my post you deleted. However, it is also a backwards incompatible change. In case its not obvious from my example above, I'll be explicit: # current behaviour mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => passes # future behaviour, with your proposal mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => fails with AssertionError Given the certainty that this change will break code (I know it will break *my* code, as I often rely on map() being an iterator not a sequence) it might be better to introduce a new "mapview" type rather than change the behaviour of map() itself. On the other hand, since the fix is simple enough: mo = iter(mo) perhaps all we need is a depreciation period of at least one full release before changing the behaviour. Either way, this isn't a simple or obvious change, and will probably need a PEP to nut out all the fine details. -- Steve

On Sat, Dec 1, 2018, 11:54 AM Steven D'Aprano <steve@pearwood.info wrote:
Given that the anti-fix is just as simple and currently available, I don't see why we'd want a change: # map->sequence mo = list(mo) FWIW, I actually do write exactly that code fairly often, it's not hard.

On Sat, Dec 01, 2018 at 12:06:23PM -0500, David Mertz wrote:
Sure, but that makes a copy of the original data and means you lose the benefit of map being lazy. Naturally we will always have the ability to call list and eagerly convert to a sequence, but these proposals are for a way of getting the advantages of sequence-like behaviour while still keeping the advantages of laziness. With iterators, the only way to get that advantage of laziness is to give up the ability to query length, random access to items, etc even when the underlying data is a sequence and that information would have been readily available. We can, at least sometimes, have the best of both worlds. Maybe. -- Steve

Other than being able to ask len(), are there any advantages to a slightly less opaque map()? Getting the actual result of applying the function to the element is necessarily either eager or lazy, you can't have both. On Sat, Dec 1, 2018, 12:24 PM Steven D'Aprano <steve@pearwood.info wrote:

On Sat, Dec 01, 2018 at 12:28:16PM -0500, David Mertz wrote:
I don't understand the point you think you are making here. There's no fundamental need to make a copy of a sequence just to apply a map function to it, especially if the function is cheap. (If it is expensive, you might want to add a cache.) This proof of concept wrapper class could have been written any time since Python 1.5 or earlier: class lazymap: def __init__(self, function, sequence): self.function = function self.wrapped = sequence def __len__(self): return len(self.wrapped) def __getitem__(self, item): return self.function(self.wrapped[item]) It is fully iterable using the sequence protocol, even in Python 3: py> x = lazymap(str.upper, 'aardvark') py> list(x) ['A', 'A', 'R', 'D', 'V', 'A', 'R', 'K'] Mapped items are computed on demand, not up front. It doesn't make a copy of the underlying sequence, it can be iterated over and over again, it has a length and random access. And if you want an iterator, you can just pass it to the iter() function. There are probably bells and whistles that can be added (a nicer repr? any other sequence methods? a cache?) and I haven't tested it fully. For backwards compatibilty reasons, we can't just make map() work like this, because that's a change in behaviour. There may be tricky corner cases I haven't considered, but as a proof of concept I think it shows that the basic premise is sound and worth pursuing. -- Steve

Steven D'Aprano wrote:
For backwards compatibilty reasons, we can't just make map() work like this, because that's a change in behaviour.
Actually, I think it's possible to get the best of both worlds. Consider this: from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return self def __next__(self): if not self.iterator: self.iterator = map(self.func, *self.args) return next(self.iterator) If you give it sequences, it behaves like a sequence:
If you give it iterators, it behaves like an iterator:
If you use it as an iterator after giving it sequences, it also behaves like an iterator:
What do people think? Could we drop something like this in as a replacement for map() without disturbing anything too much? -- Greg

On Sun, Dec 2, 2018 at 12:08 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I can't help thinking that it will be extremely surprising to have the length remain the same while the items get consumed. After you take a couple of elements off, the length of the map is exactly the same, yet the length of a list constructed from that map won't be. Are there any other non-pathological examples where len(x) != len(list(x))? ChrisA

Chris Angelico wrote:
I can't help thinking that it will be extremely surprising to have the length remain the same while the items get consumed.
That can be fixed. The following version raises an exception if you try to find the length after having used it as an iterator. (I also fixed a bug -- I had screwed up the sequence case, and it wasn't re-iterating properly.) class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator)
It will still report a length if you use len() *before* starting to use it as an iterator, but the length it returns is correct at that point, so I don't think that's a problem.
Are there any other non-pathological examples where len(x) != len(list(x))?
No longer a problem:
-- Greg

On Mon, Dec 03, 2018 at 02:04:31AM +1300, Greg Ewing wrote:
That's not really a "fix" as such, more of a violation of the principle of least astonishment. Perhaps more like the principle of most astonishment: the object changes from sized to unsized even if you don't modify its value or its type, but merely if you look at it the wrong way: # This is okay, doesn't change the nature of the object. for i in range(sys.maxint): try: print(mapview[i]) except IndexError: break # But this unexpectedly changes it from sized to unsized. for x in mapview: break That makes this object a fragile thing that can unexpectedly change from sized to unsized. Neither fish nor fowl with a confusing API that is not quite a sequence, not quite an iterator, not quite sized, but just enough of each to lead people into error. Or... at least that's what the code is supposed to do, the code you give doesn't actually work that way:
I can't reproduce that behaviour with the code you give above. When I try it, it returns the length 3, even after the iterator has been completely consumed. I daresay you could jerry-rig something to "fix" this bug, but I think this is a poor API that tries to make a single type act like two conceptually different things at the same time. -- Steve

Steven D'Aprano wrote:
Yes, but keep in mind the purpose of the whole thing is to provide a sequence interface while not breaking old code that expects an iterator interface. Code that was written to work with the existing map() will not be calling len() on it at all, because that would never have worked.
Yes, it's a compromise in the interests of backwards compatibility. But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that.
It sounds like you were still using the old version with a broken __iter__() method. This is my current complete code together with test cases: #----------------------------------------------------------- from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): if self.iterator: raise TypeError("Mapping iterator has no len()") return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator) if __name__ == "__main__": a = [1, 2, 3, 4, 5] b = [2, 3, 5] print("As a sequence:") m = MapView(pow, a, b) print(list(m)) print(list(m)) print(len(m)) print(m[1]) print() print("As an iterator:") m = MapView(pow, iter(a), iter(b)) print(next(m)) print(list(m)) print(list(m)) try: print(len(m)) except Exception as e: print("***", e) print() print("As an iterator over sequences:") m = MapView(pow, a, b) print(next(m)) print(next(m)) try: print(len(m)) except Exception as e: print("***", e) #----------------------------------------------------------- This is the output I get: As a sequence: [1, 8, 243] [1, 8, 243] 3 8 As an iterator: 1 [8, 243] [] *** Mapping iterator has no len() As an iterator over sequences: 1 8 *** Mapping iterator has no len() -- Greg

On Mon, Dec 10, 2018 at 5:23 AM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Indeed; I believe it is very useful to have a map-like object that is effectively an augmented list/sequence.
but what IS a "map-like object" -- I'm trying to imagine what that actually means. "map" takes a function and maps it onto a interable, returning a new iterable. So a map object is an iterable -- what's under the hood being used to create it is (and should remain) opaque. Back in the day, Python was "all about sequences" -- so map() took a sequence and returned a sequence (an actual list, but that's not the point here). And that's pretty classic "map". With py3, there was a big shift toward iterables, rather than sequences as the core type to work with. There are a few other benefits, but the main one is that often sequences were made, simply so that they could be immediately iterated over, and that was a waste of resources. for i, item in enumerate(a_sequence): ... for x, y in zip(seq1, seq2): ... These two are pretty obvious, but the same approach was taken over much of python: dict.keys(), map(), range(), .... So now in Python, you need to decide, when writing code, what your API is -- does your function take a sequence? or does it take an iterable? Of course, a sequence is an iterable, but a iterable is not (necessarily) a sequence. -- so back in the day, you din't really need to make the decision. So in the case of the Sage example -- I wonder what the real problem is -- if you have an API that requires a sequence, on Py2, folks may have well been passing it the result of a map() call. -- note that they weren't passing a "map object" that is now somehow different than it used to be -- they were passing a list plain and simple. And there are all sorts of places, when converting from py2 to py3, where you will now get an iterable that isn't a proper sequence, and if the API you are using requires a sequence, you need to wrap a list() or tuple() or some such around it to make the sequence. Note that you can write your code to work under either 2 or 3, but it's really hard to write a library so that your users can run it under either 2 or 3 without any change in their code! But note: the fact that it's a map object is just one special case. I suppose one could write an API now that actually expects a map object (rather than a generic sequence or iterable) but it was literally impossible in py2 -- there was no such object. I'm still confused -- what's so wrong with: list(map(func, some_iterable)) if you need a sequence? You can, of course mike lazy-evaluated sequences (like range), and so you could make a map-like function that required a sequence as input, and would lazy evaluate that sequence. This could be useful if you weren't going to work with the entire collection, but really wanted to only index out a few items, but I'm trying to imagine a use case for that, and I haven't. And I don't think that's the use case that started this thread... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Dec 11, 2018 at 2:16 AM Chris Barker <chris.barker@noaa.gov> wrote:
I don't understand why this is confusing. Greg gave an example of what this *might* mean up thread. It's not the only possible approach but it is one that makes a lot of sense to me. The way you're defining "map" is arbitrary and post-hoc. It's a definition that makes sense for "map" that's restricted to iterating over arbitrary iterators. It's how it happens to be defined in Python 3 for various reasons that you took time to explain at great length, which I regret to inform you was time wasted explaining things I already know. For something like a fixed sequence a "map" could just as easily be defined as a pair (<function>, <sequence>) that applies <function>, which I'm claiming is a pure function, to every element returned by the <sequence>. This transformation can be applied lazily on a per-element basis whether I'm iterating over it, or performing random access (since <sequence> is known for all N). Python has no formal notion of a pure function, but I'm an adult and can accept responsibility if I try to use this "map-like" object in a way that is not logically consistent. The stuff about Sage is beside the point. I'm not even talking about that anymore.

On Tue, 11 Dec 2018 at 10:38, E. Madison Bray <erik.m.bray@gmail.com> wrote:
What's confusing to *me*, at least, is what's actually being suggested here. There's a lot of theoretical discussion, but I've lost track of how it's grounded in reality: 1. If we're saying that "it would be nice if there were a function that acted like map but kept references to its arguments", that's easy to do as a module on PyPI. Go for it - no-one will have any problem with that. 2. If we're saying "the builtin map needs to behave like that", then 2a. *Why*? What is so special about this situation that the builtin has to be changed? 2b. Compatibility questions need to be addressed. Is this important enough to code that "needs" it that such code is OK with being Python 3.8+ only? If not, why aren't the workarounds needed for Python 3.7 good enough? (Long term improvement and simplification of the code *is* a sufficient reason here, it's just something that should be explicit, as it means that the benefits are long-term rather than immediate). 2c. Weird corner case questions, while still being rare, *do* need to be addressed - once a certain behaviour is in the stdlib, changing it is a major pain, so we have a responsibility to get even the corner cases right. 2d. It's not actually clear to me how critical that need actually is. Nice to have, sure (you only need a couple of people who would use a feature for it to be "nice to have") but beyond that I haven't seen a huge number of people offering examples of code that would benefit (you mentioned Sage, but that example rapidly degenerated into debates about Sage's design, and while that's a very good reason for not wanting to continue using that as a use case, it does leave us with few actual use cases, and none that I'm aware of that are in production code...) 3. If we're saying something else (your comment "map could just as easily be defined as..." suggests that you might be) then I'm not clear what it is. Can you describe your proposal as pseudo-code, or a Python implementation of the "map" replacement you're proposing? Paul

On Tue, Dec 11, 2018 at 12:13 PM Paul Moore <p.f.moore@gmail.com> wrote:
It's true, this has been a wide-ranging discussion and it's confusing. Right now I'm specifically responding to the sub-thread that Greg started "Suggested MapView object", so I'm considering this a mostly clean slate from the previous thread "__len__() for map()". Different ideas have been tossed around and the discussion has me thinking about broader possibilities. I responded to this thread because I liked Greg's proposal and the direction he's suggesting. I think that the motivation underlying much of this discussion, forth both the OP who started the original thread, as well as myself, and others is that before Python 3 changed the implementation of map() there were certain assumptions one could make about map() called on a list* which, under normal circumstances were quite reasonable and sane (e.g. len(map(func, lst)) == len(lst), or map(func, lst)[N] == func(lst[N])). Python 3 broke all of these assumptions, for reasons that I personally have no disagreement with, in terms of motivation. However, in retrospect, it might have been nice if more consideration were given to backwards compatibility for some "obvious" simple cases. This isn't a Python 2 vs Python 3 whine though: I'm just trying to think about how I might expect map() to work on different types of arguments, and I see no problem--so long as it's properly documented--with making its behavior somewhat polymorphic on the types of arguments. The idea would be to now enhance the existing built-ins to restore at least some previously lost assumptions, at least in the relevant cases. To give an analogy, Python 3.0 replaced range() with (effectively) xrange(). This broken a lot of assumptions that the object returned by range(N) would work much like a list, and Python 3.2 restored some of that list-like functionality by adding support for slicing and negative indexing on range(N). I believe it's worth considering such enhancements for filter() and map() as well, though these are obviously a bit trickier. * or other fixed-length sequence, but let's just use list as a shorthand, and assume for the sake of simplicity a single list as well.
Sure, though since this is about the behavior of global built-ins that are commonly used by users at all experience levels the problem is a bit hairier. Anybody can implement anything they want and put it in a third-party module. That doesn't mean anyone will use it. I still have to write code that handles map objects. In retrospect I think Guido might have had the right idea of wanting to move map() and filter() into functools along with reduce(). There's a surprisingly lot more at stake in terms of backwards compatibility and least-astonishment when it comes to built-ins. I think that's in part why the new Python 3 definitions of map() and filter() were kept so simple: although they were not backwards compatible I do think they were well designed to minimize astonishment. That's why I don't necessarily disagree with the choices made (but still would like to think about how we can make enhancements going forward).
Same question could apply to last time it was changed. I think now we're trying to find some middle-ground.
That's a good point: I think the same arguments as for enhancing range() apply here, but this is worth further consideration (though having a more concrete proposal in the first place should come first).
It depends on what you mean by getting them "right". It's definitely worth going over as one can think of. Not all corner cases have a satisfying resolution (and may be highly context-dependent). In those cases getting it "right" is probably no more than documenting that corner case and perhaps warning against it.
That's a fair point worthy of further consideration. To me, at least, map on a list working as an augmented list is obvious, clear, useful, at solves most of the use-cases where having map.__len__ might be desirable, among others.
Again, I'm mostly responding to Greg's proposal which I like. To extend it, I'm suggesting that a call to map() where all the arguments are sequences** might return something like his MapView. If even that idea is crazy or impractical though, I can accept that. But I think it's quite analogous to how map on arbitrary iterables went from immediate evaluation to lazy evaluation while iterating: in the same way map on some sequence(s) can be evaluated lazily on random access. ** I have a separate complaint that there's no great way, at the Python level, to define a class that is explicitly a "sequence" as opposed to a more general "mapping", but that's a topic for another thread...

On Tue, 11 Dec 2018 at 11:49, E. Madison Bray <erik.m.bray@gmail.com> wrote:
Thanks. That clarifies the situation for me very well. I agree with most of the comments you made, although I don't have any good answers. I think you're probably right that Guido's original idea to move map and filter to functools might have been better, forcing users to explicitly choose between a genexp and a list comprehension. On the other hand, it might have meant people used more lists than they needed to, as a result. Paul

On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote:
Greg's code can be found here: https://mail.python.org/pipermail/python-ideas/2018-December/054659.html His MapView tries to be both an iterator and a sequence at the same time, but it is neither. The iterator protocol is that iterators must: - have a __next__ method; - have an __iter__ method which returns self; and the test for an iterator is: obj is iter(obj) https://docs.python.org/3/library/stdtypes.html#iterator-types Greg's MapView object is an *iterable* with a __next__ method, which makes it neither a sequence nor a iterator, but a hybrid that will surprise people who expect it to act considently as either. This is how iterators work: py> x = iter("abcdef") # An actual iterator. py> next(x) 'a' py> next(x) 'b' py> next(iter(x)) 'c' Greg's hybrid violates that expected behaviour: py> x = MapView(str.upper, "abcdef") # An imposter. py> next(x) 'A' py> next(x) 'B' py> next(iter(x)) 'A' As an iterator, it is officially "broken", continuing to yield values even after it is exhausted: py> x = MapView(str.upper, 'a') py> next(x) 'A' py> next(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/gregmapview.py", line 24, in __next__ return next(self.iterator) StopIteration py> list(x) # But wait! There's more! ['A'] py> list(x) # And even more! ['A'] This hybrid is fragile: whether operations succeed or not depend on the order that you call them: py> x = MapView(str.upper, "abcdef") py> len(x)*next(x) # Safe. But only ONCE. 'AAAAAA' py> y = MapView(str.upper, "uvwxyz") py> next(y)*len(y) # Looks safe. But isn't. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/gregmapview.py", line 12, in __len__ raise TypeError("Mapping iterator has no len()") TypeError: Mapping iterator has no len() (For brevity, from this point on I shall trim the tracebacks and show only the final error message.) Things that work once, don't work a second time. py> len(x)*next(x) # Worked a moment ago, but now it is broken. TypeError: Mapping iterator has no len() If you pass your MapView object to another function, it can accidentally sabotage your code: py> def innocent_looking_function(obj): ... next(obj) ... py> x = MapView(str.upper, "abcdef") py> len(x) 6 py> innocent_looking_function(x) py> len(x) TypeError: Mapping iterator has no len() I presume this is just an oversight, but indexing continues to work even when len() has been broken. Greg seems to want to blame the unwitting coder who runs into these boobytraps: "But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that." (URL as above). This MapView class offers a hybrid "sequence plus iterator, together at last!" double-headed API, and even its creator says that sane code shouldn't use that API. Unfortunately, you can't use the iterator API, because its broken as an iterator, and you can't use it as a sequence, because any function you pass it to might use it as an iterator and pull the rug out from under your feet. Greg's code is, apart from the addition of the __next__ method, almost identical to the version of mapview I came up with in my own testing. Except Greg's is even better, since I didn't bother handling the multiple-sequences case and his does. Its the __next__ method which ruins it, by trying to graft on almost- but-not-really iterator behaviour onto something which otherwise is a sequence. I don't think there's any way around that: I think that any attempt to make a single MapView object work as either a sequence with a length and indexing AND an iterator with next() and no length and no indexing is doomed to the same problems. Far from minimizing surprise, it will maximise it. Look at how many violations of the Principle Of Least Surprise Greg's MapView has: - If an object has a __len__ method, calling len() on it shouldn't raise TypeError; - If you called len() before, and it succeeded, calling it again should also succeed; - if an object has a __next__ method, it should be an iterator, and that means iter(obj) is obj; - if it isn't an iterator, you shouldn't be able to call next() on it; - if it is an iterator, once it is exhausted, it should stay exhausted; - iterating over an object (calling next() or iter() on it) shouldn't change it from a sequence to a non-sequence; - passing a sequence to another function, shouldn't result in that sequence no longer supporting len() or indexing; - if an object has a length, then it should still have a length even after iterating over it. I may have missed some. -- Steve

Steven D'Aprano wrote:
By that test, it identifies as a sequence, as does testing it for the presence of __len__:
So, code that doesn't know whether it has a sequence or iterator and tries to find out, will conclude that it has a sequence. Presumably it will then proceed to treat it as a sequence, which will work fine.
That's a valid point, but it can be fixed: def __iter__(self): return self.iterator or map(self.func, *self.args) Now it gives
There is still one case that will behave differently from the current map(), i.e. using list() first and then expecting it to behave like an exhausted iterator. I'm finding it hard to imagine real code that would depend on that behaviour, though.
But what sane code is going to do that? Remember, the iterator interface is only there for backwards compatibility. That would fail under both Python 2 and the current Python 3.
If you're using len(), you clearly expect to have a sequence, not an iterator, so why are you calling a function that blindly expects an iterator? Again, this cannot be and could never have been working code.
I presume this is just an oversight, but indexing continues to work even when len() has been broken.
That could be fixed.
No. I would document it like this: It provides a sequence API. It also, *for backwards compatibility*, implements some parts of the iterator API, but new code should not rely on that, nor should any code expect to be able to use both interfaces on the same object. The backwards compatibility would not be perfect, but I think it would work in the vast majority of cases. I also envisage that the backwards compatibility provisions would not be kept forever, and that it would eventually become a pure sequence object. I'm not necessarily saying this *should* be done, just pointing out that it's a possible strategy for migrating map() from an iterator to a view, if we want to do that. -- Greg

On 12/11/2018 6:50 PM, Greg Ewing wrote:
Python has list and list_iterator, tuple and tuple_iterator, set and set_iterator, dict and dict_iterator, range and range_iterator. In 3.0, we could have turned map into a finite sequence analogous to range, and add a new map_iterator. To be completely lazy, such a map would have to restrict input to Sequences. To be compatible with 2.0 map, it would have to use list(iterable) to turn other finite iterables into concrete lists, making it only semi-lazy. Since I am too lazy to write the multi-iterable version, here is the one-iterable version to show the idea. def __init__(func, iterable): self.func = func self.seq = iterable if isinstance(iterable, Sequence) else list(iterable) Given the apparent little need for the extra complication, and the possibility of keeping a reference to sequences and explicitly applying list otherwise, it was decided to rebind 'map' to the fully lazy and general itertools.map. -- Terry Jan Reedy

On Wed, Dec 12, 2018 at 12:50:41PM +1300, Greg Ewing wrote:
Since existing map objects are iterators, that breaks backwards compatibility. For code that does something like this: if obj is iter(obj): process_iterator() else: n = len(obj) process_sequence() it will change behaviour, shifting map objects from the iterator branch to the sequence branch. That's a definite change in behaviour, which alone could change the meaning of the code. E.g. if the two process_* functions use different algorithms. Or it could break the code outright, because your MapView objects can raise TypeError when you call len() on them. I know that any object with a __len__ could in principle raise TypeError. But for anything else, we are justified in calling it a bug in the __len__ implementation. You're trying to sell it as a feature.
It will work fine, unless something has called __next__, which will cause len() to blow up in their face by raising TypeError. I call these sorts of designs "landmines". They're absolutely fine, right up to the point where you hit the right combination of actions and step on the landmine. For anything else, this sort of thing would be a bug. You're calling it a feature.
That's not the only breakage. This is a pattern which I sometimes use: def test(iterator): # Process items up to some symbol one way, # and items after that symbol another way. for a in iterator: print(1, a) if a == 'C': break # This relies on iterator NOT resetting to the beginning, # but continuing from where we left off # i.e. not being broken for b in iterator: print(2, b) Being an iterator, right now I can pass map() objects directly to that code, and it works as expected: py> test(map(str.upper, 'abcde')) 1 A 1 B 1 C 2 D 2 E Your MapView does not: py> test(MapView(str.upper, 'abcde')) 1 A 1 B 1 C 2 A 2 B 2 C 2 D 2 E This is why such iterators are deemed to be "broken".
You have an object that supports len() and next(). Why shouldn't people use both len() and next() on it when both are supported methods? They don't have to be in a single expression: x = MapView(blah blah blah) a = some_function_that_calls_len(x) b = some_function_that_calls_next(x) That works. But reverse the order, and you step on a landmine: b = some_function_that_calls_next(x) a = some_function_that_calls_len(x) The caller may not even know that the functions call next() or len(), they could be implementation details buried deep inside some library function they didn't even know they were calling. Do you still think that it is the caller's code that is insane?
Remember, the iterator interface is only there for backwards compatibility.
Famous last words.
That would fail under both Python 2 and the current Python 3.
Honestly Greg, you've been around long enough that you ought to recognise *minimal examples* for what they are. They're not meant to be real-world production code. They're the simplest, most minimal example that demonstates the existence of a problem. The fact that they are *simple* is to make it easy to see the underlying problem, not to give you an excuse to dismiss it. You're supposed to imagine that in real-life code, the call to next() could be buried deep, deep, deep in a chain of 15 function calls in some function in some third party library that I don't even know is being called, and it took me a week to debug why len(obj) would sometimes fail mysteriously. The problem is not the caller, or even the library code, but that your class magically and implictly swaps from a sequence to a pseudo-iterator whether I want it to or not. A perfect example of why DWIM code is so hated: http://www.catb.org/jargon/html/D/DWIM.html
*Minimal example* again. You ought to be able to imagine the actual function is fleshed out, without expecting me to draw you a picture: if hasattr(obj, '__next__'): first = next(obj, sentinel) Or if you prefer: try: first = next(obj) except TypeError: # fall back on sequence algorithm except StopIteration: # empty iterator None of this boilerplate adds any insight at all to the discussion. There's a reason bug reports ask for minimal examples. The point is, I'm calling some innocent looking function, and it breaks my sequence: len(obj) worked before I called the function, and afterwards, it raises TypeError. I wouldn't have to care about the implementation if your MapView object didn't magically flip from sequence to iterator behind my back. -- Steve

and the test for an iterator is:
obj is iter(obj)
Is that a hard and fast rule? I know it’s the vast majority of cases, but I imagine you could make an object that behaved exactly like an iterator, but returned some proxy object rather that itself. Not sure why one would do that, but it should be possible. - CHB

On Thu, Dec 13, 2018 at 3:07 PM Chris Barker - NOAA Federal via Python-ideas <python-ideas@python.org> wrote:
Yes, it is. https://docs.python.org/3/library/stdtypes.html#iterator-types For an iterable, __iter__ needs to return an appropriate iterator. For an iterator, __iter__ needs to return self (which is, by definition, the "appropriate iterator"). Note also that the behaviour around StopIteration is laid out there, including that an iterator whose __next__ has raised SI but then subsequently doesn't continue to raise SI is broken. (Though it *is* legit to raise StopIteration with a value the first time, and then raise a vanilla SI subsequently. Generators do this, rather than retain the return value indefinitely.) ChrisA

Chris Angelico wrote:
The docs aren't very clear on this point. They claim this is necessary so that the iterator can be used in a for-loop, but that's obviously not strictly true, since a proxy object could also be used. They also make no mention about whether one should be able to rely on this as a definitive test of iterator-ness. In any case, I don't claim that my MapView implements the full iterator protocol, only enough of it to pass for an iterator in most likely scenarios that assume one. -- Greg

On Thu, Dec 13, 2018 at 4:54 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
iterator.__iter__() Return the iterator object itself. I do believe "the iterator object itself" means that "iterator.__iter__() is iterator" should always be true. But maybe there's some other way to return "the object itself" other than actually returning "the object itself"? ChrisA

On Thu, 13 Dec 2018 at 05:55, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
See also https://docs.python.org/3.7/glossary.html#term-iterator, which reiterates the point that "Iterators are required to have an __iter__() method that returns the iterator object itself". By that point, I'd say the docs are pretty clear...
They also make no mention about whether one should be able to rely on this as a definitive test of iterator-ness.
That glossary entry is linked from https://docs.python.org/3.7/library/collections.abc.html#collections.abc.Ite..., so it would be pretty hard to argue that it's not part of the "definitive test of iterator-ness".
But not enough that it's legitimate to describe it as an "iterator". It may well be a useful class, and returning it from a map-like function may be a practical and effective thing to do, but describing it as an "iterator" does nothing apart from leading to distracting debates on how it doesn't work the same as an iterator. Better to just accept that it's *not* an iterator, and focus on whether it's useful... IMO, it sounds like it's useful, but it's not backward compatible (because it's not an iterator ;-)). Whether it's *sufficiently* useful to justify breaking backward compatibility is a different discussion (all I can say on that question is that I've never personally had a case where the current Python 3 behaviour of map is a problem). Paul

On Thu, Dec 13, 2018 at 06:53:54PM +1300, Greg Ewing wrote:
Whether your hybrid sequence+iterator is close enough to an iterator or not isn't the critical point here. If we really wanted to, we could break backwards compatibility, with or without a future import or a deprecation period, and simply declare that this is how map() will work in the future. Doing that, or not, becomes a question of whether the gain is worth the breakages. The critical question here is whether a builtin ought to include the landmines your hybrid class does. *By design*, your class will blow up in people's faces if they try to use the full API offered. It violates at least two expected properties: - As an iterator, it is officially "broken" because in at least two reasonable scenarios, it automatically resets after being exhausted. (Although presumably we could fix that with an "is_exhausted" flag.) - As a sequence, it violates the expectation that if an object is Sized (it has a __len__ method) calling len() on it should not raise TypeError; As a sequence, it is fragile and easily breakable, changing from a sequence to a (pseudo-)iterator whether the caller wants it to or not. Third-party code could easily flip the switch, leading to obscure errors. That second one is critical to your "Do What I Mean" design; the whole point of your class is for the object to automagically swap from behaving like a sequence to behaving like an iterator according to how it is used. Rather than expecting the user to make an explicit choice of which behaviour they want: - use map() to get current iterator behaviour; - use mapview() to get lazy-sequence behaviour; your class tries to do both, and then guesses what the user wants depending on how the map object happens to get used. -- Steve

On Wed, Dec 12, 2018 at 08:06:17PM -0800, Chris Barker - NOAA Federal wrote:
Yes, that's the rule for the iterator protocol. Any object can have an __iter__ method which returns anything you want. (It doesn't even have to be iterable, this is Python, and if you want to shoot yourself in the foot, you can.) But to be an iterator, the rule is that obj.__iter__() must return obj itself. Otherwise we say that obj is an iterable, not an iterator. https://docs.python.org/3/library/stdtypes.html#iterator.__iter__ -- Steve

On 12/11/2018 6:48 AM, E. Madison Bray wrote:
A range represents an arithmetic sequence. Any usage of range that could be replaced by xrange, which is nearly all uses, made no assumption broken by xrange. The basic assumption was and is that a range/xrange could be repeatedly iterated. That this assumption was met in the first case by returning a list was somewhat of an implementation detail. In terms of mutability, a tuple would be have been better, as range objects should not be mutable. (If [2,4,6] is mutated to [2,3,7], it is no longer a range (arithmetic sequence).
and Python 3.2 restored some of that list-like functionality
As I see it, xranges were unfinished as sequence objects and 3.2 finished the job. This included having the min() and max() builtins calculate the min and max efficiently, as a human would, as the first or last of the sequence, rather than uselessly iterating and comparing all the items in the sequence. A proper analogy to range would be a re-iterable mapview (or 'mapseq) like what Steven D'Aprano proposes.
-- Terry Jan Reedy

On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote: [...]
You might need a sequence. Why do you think that has to be an *eager* sequence? I can think of two obvious problems with eager sequences: space and time. They can use too much memory, and they can take too much time to generate them up-front and too much time to reap when they become garbage. And if you have an eager sequence, and all you want is the first item, you still have to generate all of them even though they aren't needed. We can afford to be profligate with memory when the data is small, but eventually you run into cases where having two copies of the data is one copy too many.
Or even if you *are* going to work with the entire collection, but you don't need them all at once. I once knew a guy whose fondest dream was to try the native cuisine of every nation of the world ... but not all in one meal. This is a classic time/space tradeoff: for the cost of calling the mapping function anew each time we index the sequence, we can avoid allocating a potentially huge list and calling a potentially expensive function up front for items we're never going to use. Instead, we call it only on demand. These are the same principles that justify (x)range and dict views. Why eagerly generate a list up front, if you only need the values one at a time on demand? Why make a copy of the dict keys, if you don't need a copy? These are not rhetorical questions. This is about avoiding the need to make unnecessary copies for those times we *don't* need an eager sequence generated up front, keeping the laziness of iterators and the random-access of sequences. map(func, sequence) is a great candidate for this approach. It has to hold onto a reference to the sequence even as an iterator. The function is typically side-effect free (a pure function), and if it isn't, "consenting adults" applies. We've already been told there's at least one major Python project, Sage, where this would have been useful. There's a major functional language, Haskell, where nearly all sequence processing follows this approach. I suggest we provide a separate mapview() type that offers only the lazy sequence API, without trying to be an iterator at the same time. If you want an eager sequence, or an iterator, they're only a single function call away: list(mapview_instance) iter(mapview_instance) # or just stick to map() Rather than trying to guess whether people want to treat their map objects as sequences or iterators, we let them choose which they want and be explicit about it. Consider the history of dict.keys(), values() and items() in Python 2. Originally they returned eager lists. Did we try to retrofit view-like and iterator-like behaviour onto the existing dict.keys() method, returning a cunning object which somehow turned from a list to a view to an iterator as needed? Hell no! We introduced *six new methods* on dicts: - dict.iterkeys() - dict.viewkeys() and similar for items() and values(). Compared to that, adding a single variant on map() that expects a sequence and returns a view on the sequence seems rather timid. -- Steve

Perhaps I got confused by the early part of this discussion. My point was that there is no “map-like” object at the Python level. (That is no Map abc). Py2’s map produced a sequence. Py3’s map produced an iterable. So any API that was expecting a sequence could accept the result of a py2 map, but not a py3 map. There is absolutely nothing special about map here. The example of range has been brought up, but I don’t think it’s analogous — py2 range returns a list, py3 range returns an immutable sequence. Because that’s as close as we can get to a sequence while preserving the lazy evaluation that is wanted. I _think_ someone may be advocating that map() could return an iterable if it is passed a iterable, and a sequence of it is passed a sequence. Yes, it could, but that seems like a bad idea to me. But folks are proposing a “map” that would produce a lazy-evaluated sequence. Sure — as Paul said, put it up on pypi and see if folks find it useful. Personally, I’m still finding it hard to imagine a use case where you need the sequence features, but also lazy evaluation is important. Sure: range() has that, but it came at almost zero cost, and I’m not sure the sequence features are used much. Note: the one use-case I can think of for a lazy evaluated sequence instead of an iterable is so that I can pick a random element with random.choice(). (Try to pick a random item from. a dict), but that doesn’t apply here—pick a random item from the source sequence instead. But this is specific example of a general use case: you need to access only a subset of the mapped sequence (or access it out of order) so using the iterable version won’t work, and it may be large enough that making a new sequence is too resource intensive. Seems rare to me, and in many cases, you could do the subsetting before applying the function, so I think it’s a pretty rare use case. But go ahead and make it — I’ve been wrong before :-) -CHB Sent from my iPhone

On Tue, Dec 11, 2018 at 11:10 AM Terry Reedy <tjreedy@udel.edu> wrote:
well, the iterator / iterable distinction is important in this thread in many places, so I should have been more careful about that -- but not for this reason. Yes, a a sequence is an iterable, but what I meant was an "iterable-that-is-not-a-sequence". -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Steven D'Aprano wrote:
I suggest we provide a separate mapview() type that offers only the lazy sequence API, without trying to be an iterator at the same time.
Then we would be back to the bad old days of having two functions that do almost exactly the same thing. My suggestion was made in the interests of moving the language in the direction of having less warts, rather than adding more or moving the existing ones around. I acknowledge that the dual interface is itself a bit wartish, but it's purely for backwards compatibility, so it could be deprecated and eventually removed if desired. -- Greg

On Wed, Dec 12, 2018 at 11:31:03AM +1300, Greg Ewing wrote:
They aren't "almost exactly the same thing". One is a sequence, which is a rich API that includes random access to items and a length; the other is an iterator, which is an intentionally simple API which fails to meet the needs of some users.
It's a "bit wartish" in the same way that the sun is "a bit warmish".
but it's purely for backwards compatibility
And it fails at that too. x = map(str.upper, "abcd") x is iter(x) returns True with the current map, an actual iterator, and False with your hybrid. Current map() is a proper, non-broken iterator; your hybrid is a broken iterator. (That's not me being derogative: its the official term for iterators which don't stay exhausted.) I'd be more charitable if I thought the flaws were mere bugs that could be fixed. But I don't think there is any way to combine two incompatible interfaces, the sequence and iterator APIs, into one object without these sorts of breakages. Take the __next__ method out of your object, and it is a better version of what I proposed earlier. With the __next__ method, its just broken. -- Steve

On 12/1/2018 8:07 PM, Greg Ewing wrote:
Steven D'Aprano wrote:
After defining a separate iterable mapview sequence class
I presume you mean the '(iterable) sequence' 'iterator' worlds. I don't think they should be mixed. A sequence is reiterable, an iterator is once through and done.
The last two (unnecessarily) restrict this to being a once through iterator. I think much better would be def __iter__: return map(self.func, *self.args) -- Terry Jan Reedy

On 12/1/2018 2:08 PM, Steven D'Aprano wrote:
This proof of concept wrapper class could have been written any time since Python 1.5 or earlier:
class lazymap: def __init__(self, function, sequence):
One could now add at the top of the file from collections.abc import Sequence and here if not isinstance(sequence, Sequence): raise TypeError(f'{sequence} is not a sequence')
For 3.x, I would add def __iter__: return map(self.function, self.sequence) but your point that iteration is possible even without, with the old protocol, is well made.
-- Terry Jan Reedy

To illustrate the distinction that someone (I think Steven D'Aprano) makes, I think these two (modestly tested, but could have flaws) implementations are both sensible for some purposes. Both are equally "obvious," yet they are different:
I wasn't sure what to set self._len to where it doesn't make sense. I thought of None which makes len(mo) raise one exception, or -1 which makes len(mo) raise a different exception. I just choose an arbitrary "big" value in the above implementation. mo.__length_hint__() is a possibility, but that is specialized, not a way of providing a response to len(mo). I don't have to, but I do keep around mo._seqs as a handle to the underlying sequences. In concept those could be re-inspected for other properties as the user of the classes desired. On Sat, Dec 1, 2018 at 12:28 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I raised a related problem a while back when I found that random.sample can only take a sequence. The example I gave was randomly sampling points on a 2D grid to initialize a board for Conway's Game of Life:
It seems like there should be some way to pass along the information that the size *is* known, but I couldn't think of any way of passing that info along without adding massive amounts of complexity everywhere. If map is able to support len() under certain circumstances, it makes sense that other iterators and generators would be able to do the same. You might even want a way to annotate a generator function with logic about how it might support len(). I don't have an answer to this problem, but I hope this provides some sense of the scope of what you're asking. On Mon, Nov 26, 2018 at 3:36 PM Kale Kundert <kale@thekunderts.net> wrote:

On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote:
If you know the input is sizeable, why not check its length instead of the map's?
The consumer of map may not be the producer of map. You might know that alist supports len(), but by the time I see it, I only see map(f, alist), not alist itself. -- Steve

Hi Kale Thank you for the sample code. It's most helpful. Please consider
list(zip(range(4), range(4))) [(0, 0), (1, 1), (2, 2), (3, 3)]
A sequence is iterable. An iterator is iterable. There are other things that are iterable. A random number generator is an iterator, whose underlying object does not have a length. Briefly, I don't like your suggestion because many important iterables don't have a length! -- Jonathan

On Tue, Nov 27, 2018 at 9:15 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Briefly, I don't like your suggestion because many important iterables don't have a length!
That part's fine. The implication is that mapping over an iterable with a length would give a map with a known length, and mapping over something without a length wouldn't. But I think there are enough odd edge cases (for instance, is it okay to call the function twice if you __getitem__ twice, or should you cache it?) that it's probably best to keep the built-in map() simple and reliable. Don't forget, too, that map() can take more than one iterable, and some may not have lengths. (You can define enumerate in terms of map and itertools.count; what is the length of the resulting enumeration?) If you want a map-like object that takes specifically a single list, and is a mapped view to that list, then go for it - but that can be its own beast, not related to the map() built-in function. Also, it may be of value to check out more-itertools; you might find something there that you like. ChrisA

On Tue, Nov 27, 2018 at 09:36:08AM +1100, Chris Angelico wrote:
Don't forget, too, that map() can take more than one iterable
I forgot about that! But in this case, I think the answer is obvious: the length of the map object is the *smallest* length of the iterables, ignoring any unsized or infinite ones. Same would apply to zip(). But as per my previous post, there are other problems with this concept that aren't so easy to solve. -- Steve

On Tue, Nov 27, 2018 at 10:41 AM Steven D'Aprano <steve@pearwood.info> wrote:
Equally obvious and valid answer: The length is the smallest length of its iterables, ignoring any infinite ones, but if any iterable is unsized, the map is unsized. And both answers will surprise people. I still think there's room in the world for a "mapped list view" type, which retains a reference to an underlying list, plus a function, and proxies everything through to the function. It would NOT have the flexibility of map(), but it would be able to directly subscript, it wouldn't need any cache, etc, etc. ChrisA

I don't really agree that there are multiple surprising answers here. If you iterate through the whole map, that will produce some number of elements, and that's the length. Whether you can calculate that number in __len__() depends on the particular iterables you have, which is fine, but I don't think the definition of length is ambiguous. But I think Steven is right that you can't implement __len__() for an iterator without running into some inconsistencies. It's just unfortunate that map() is an iterator. -Kale

I agree many important iterables do have. On Mon, Nov 26, 2018 at 02:06:52PM -0800, Michael Selik wrote:
If you know the input is sizeable, why not check its length instead of the map's?
The consumer of map may not be the producer of map. Very good point. Honestly, i like the proposal but love to see more reviews on the idea. Maybe i am missing something.

On Mon, Nov 26, 2018 at 01:29:21PM -0800, Kale Kundert wrote:
This seems straightforward, but I think there's more complexity than you might realise, a nasty surprise which I expect is going to annoy people no matter what decision we make, and the usefulness is probably less than you might think. First, the usefulness: we still have to wrap the call to len() in a try...except block, even if we know we have a map object, because we won't know whether the underlying iterable supports len. So it won't reduce the amount of code we have to write. At best it will allow us to take a fast-path when len() returns a value, and a slow-path when it raises. Here's the definition of the Sized abc: https://docs.python.org/3/library/collections.abc.html#collections.abc.Sized and the implementation simply checks for the existence of __len__. We (rightly) assume that if __len__ exists, the object has a known length, and that calling len() on it will succeed or at least not raise TypeError. Your proposal will break that expectation. map objects will be sized, but since sometimes the underlying iterator won't be, they may still raise TypeError. Of course there are ways to work around this. We could just change our expectations: even Sized objects might not be *actually* sized. Or map() could catch the TypeError and raise instead a ValueError, or something. Or we could rethink the whole length concept (see below), which after all was invented back in Python 1 days and is looking a bit old. As for the nasty surprise... do you agree that this ought to be an invariant for sized iterables? count = len(it) i = 0 for obj in it: i += 1 assert i == count That's the invariant I expect, and breaking that will annoy me (and I expect many other people) greatly. But that means that map() cannot just delegate its length to the underlying iterable. The implementation must be more complex, keeping track of how many items it has seen. And consider this case: it = map(lambda x: x, [1, 2, 3, 4, 5]) x = next(it) x = next(it) assert len(it) == 5 # underlying length of the iterable assert len(list(it)) == 3 # but only three items left assert len(it) == 5 # still 5 assert len(list(it)) == 0 # but nothing left So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above. But that's going to annoy other people for another reason: we rightly expect that iterables shouldn't change their length just because you iterate over them! The length should only change if you *modify* them. So these two snippets should do the same: # 1 n = len(it) x = sum(it) # 2 x = sum(it) n = len(it) but if map() updates its length as it goes, it will break that invariant. So *whichever* behaviour we choose, we're going to break *something*. Either the reported length isn't necessarily the same as the actual length you get from iterating over the items, which will be annoying and confusing, or it varies as you iterate, which will ALSO be annoying and confusing. Either way, this apparently simple and obvious change will be annoying and confusing. Rethinking object length ------------------------ len() was invented back in Python 1 days, or earlier, when we effectively had only one kind of iterable: sequences like lists, with a known length. Today, iterables can have: 1. a known, finite length; 2. a known infinite length; 3. An unknown length (and usually no way to estimate it). At least. The len() protocol is intentionally simple, it only supports the first case, with the expectation that iterables will simply not define __len__ in the other two cases. Perhaps there is a case for updating the len() concept to explicitly handle cases 2 and 3, instead of simply not defining __len__. Perhaps it could return -1 for unknown and -2 for infinite. Or raise some other exception apart from TypeError. (I know there have been times I've wanted to know if an iterable was infinite, before spending the rest of my life iterating over it...) And perhaps we can come up with a concept of total length, versus length of items remaining. But these aren't simple issues with obvious solutions, it would surely need a PEP. And the benefit isn't obvious either. -- Steve

Hi Steven, Thanks for the good feedback.
I think most of the time you would know whether the underlying iterable was sized or not. After all, if you need the length, whatever code you're writing would probably not work on an infinite/unsized iterable.
So the length of the iterable has to vary as you iterate over it, or you break the invariant shown above.
I think I see the problem here. map() is an iterator, where I was thinking of it as a wrapper around an iterable. Since an iterator is really just a pointer into an iterable, it doesn't really make sense for it to have a length. Give it one, and you end up with the inconsistencies you describe. I guess I probably would have disagreed with the decision to make map() an iterator rather than a wrapper around an iterable. Such a prominent function should have an API geared towards usability, not towards implementing a low-level protocol (in my opinion). But clearly that ship has sailed. -Kale

On Tue, Nov 27, 2018 at 12:37 PM Kale Kundert <kale@thekunderts.net> wrote:
For map() returns an iterable that can be used more than once, it has to be mapping over an iterable that can be used more than once. That limits it. The way map is currently defined, it can accept any iterable, and it returns a one-shot iterable (which happens to be its own iterator). That's why I think the best solution is to create a separate mapped-sequence-view that depends on its iterable being an actual sequence, and exposes itself as a sequence also. (Yes, I said "list" in my previous post, but any sequence would work.) It can carry the length through, it can directly support subscripting, etc, etc, etc. Both it and map() would have their places. ChrisA

On 11/26/2018 4:29 PM, Kale Kundert wrote:
The len function is defined as always returning the length, an int >= 0. Hence .__len__ methods should always do the same. https://docs.python.org/3/reference/datamodel.html#object.__len__ Objects that cannot do that should not have this method. The previous discussion of this issue lead to function operator.length_hint and special method object.__length_hint__ in 3.4. https://docs.python.org/3/library/operator.html#operator.length_hint """ operator.length_hint(obj, default=0) Return an estimated length for the object o. First try to return its actual length, then an estimate using object.__length_hint__(), and finally return the default value. New in version 3.4. """ https://docs.python.org/3/reference/datamodel.html#object.__length_hint__ """ object.__length_hint__(self) Called to implement operator.length_hint(). Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer >= 0. This method is purely an optimization and is never required for correctness. New in version 3.4. """
But in this case, it seems like map() should've known that its length was 3.
As others have pointed out, this is not true. If not infinite, the size, defined as the number of items to be yielded, and hence the size of list(iterator), shrinks by 1 after every next call, just as with pop methods.
Last I heard, list() uses length_hint for its initial allocation. But this is undocumented implementation. Built-in map does not have .__length_hint__, for the reasons others gave for it not having .__len__. But for private code, you are free to define a subclass that does, with the definition you want. -- Terry Jan Reedy

On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert <kale@thekunderts.net> wrote:
I mostly agree with the existing objections, though I have often found myself wanting this too, especially now that `map` does not simply return a list. This problem alone (along with the same problem for filter) has had a ridiculously outsized impact on the Python 3 porting effort for SageMath, and I find it really irritating at times. As a simple counter-proposal which I believe has fewer issues, I would really like it if the built-in `map()` and `filter()` at least provided a Python-level attribute to access the underlying iterables. This is necessary because if I have a function that used to take, say, a list as an argument, and it receives a `map` object, I now have to be able to deal with map()s, and I may have checks I want to perform on the underlying iterables before, say, I try to iterate over the `map`. Exactly what those checks are and whether or not they're useful may be highly application-specific, which is why say a generic `map.__len__` is not workable. However, if I can at least inspect those iterables I can make my own choices on how to handle the map. Exposing the underlying iterables to Python also has dangers in that I could directly call `next()` on them and possibly create some confusion, but consenting adults and all that...

On Wed, Nov 28, 2018 at 2:28 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
I'm a mathematician, so understand your concerns. Here's what I hope is a helpful suggestion. Create a module, say sage.itertools that contains (not tested) def py2map(iterable): return list(map(iterable)) The porting to Python 3 (for map) is now reduced to writing from .itertools import py2map as map at the head of each module. Please let me know if this helps. -- Jonathan

On Thu, Nov 29, 2018 at 1:46 AM Jonathan Fine <jfine2358@gmail.com> wrote:
With the nitpick that the arguments should be (func, *iterables) rather than just the single iterable, yes, this is a viable transition strategy. In fact, it's very similar to what 2to3 would do, except that 2to3 would do it at the call site. If any Py3 porting process is being held up significantly by this, I would strongly recommend giving 2to3 an eyeball - run it on some of your code, then either accept its changes or just learn from the diffs. It's not perfect (nothing is), but it's a useful tool. ChrisA

On Wed, Nov 28, 2018 at 3:54 PM Chris Angelico <rosuav@gmail.com> wrote:
That effort is already mostly done and adding a helper function would not have worked as users *passing* map(...) as an argument to some function just expect it to work. The only alternative would have been replacing the builtin map with something else at the globals level. 2to3 is mostly useless since a major portion of Sage is written in Cython anyways. I just mentioned that porting effort for background. I still believe that the actual proposal of making the arguments to a map(...) call accessible from Python as attributes of the map object (ditto filter, zip, etc.) is useful in its own right, rather than just having this completely opaque iterator.

On Wed, Nov 28, 2018 at 04:04:33PM +0100, E. Madison Bray wrote:
Ah, that's what I was missing. But... surely the function will still work if they pass an opaque iterator *other* than map() and/or filter? it = (func(x) for x in something if condition(x)) some_sage_function(it) You surely don't expect to be able to peer inside every and any iterator that you are given? So if you have to handle the opaque iterator case anyway, how is it *worse* when the user passes map() or filter() instead of a generator like the above?
Perhaps... I *want* to agree with this, but I'm having trouble thinking of when and how it would be useful. Some concrete examples would help justify it. -- Steve

On Wed, Nov 28, 2018 at 4:14 PM Steven D'Aprano <steve@pearwood.info> wrote:
That one is admittedly tricky. For that matter it might be nice to have more introspection of generator expressions too, but there at least we have .gi_code if nothing else. But those are a far less common example in my case, whereas map() is *everywhere* in math code :)

On Thu, Nov 29, 2018 at 2:19 AM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Considering that a genexp can do literally anything, I doubt you'll get anywhere with that introspection.
But those are a far less common example in my case, whereas map() is *everywhere* in math code :)
Perhaps then, the problem is that math code treats "map" as something that is more akin to "instrumented list" than it is to a generator. If you know for certain that you're mapping a low-cost pure function over an immutable collection, the best solution may be to proxy through to the original list than to generate values on the fly. And if that's the case, you don't want the Py3 map *or* the Py2 one, although the Py2 one can behave this way, at the cost of crazy amounts of efficiency. ChrisA

On Wed, Nov 28, 2018 at 4:24 PM Chris Angelico <rosuav@gmail.com> wrote:
Yep, that's a great example where it might be possible to introspect a given `map` object and take it apart to do something more efficient with it. This is less of a problem with internal code where it's easy to just not use map() at all, and that is often the case. But a lot of the people who develop code for Sage are mathematicians, not engineers, and they may not be aware of this, so they write code that passes `map()` objects to more internal machinery. And users will do the same even moreso. I can (and have) written horrible C-level hacks--not for this specific issue, but others like it--and am sometimes tempted to do the same here :(

One thing I'd like to add real quick to this (I'm on my phone so apologies for crappy quoting): Although there are existing cases where there is a loss of efficiency over Python 2 map() when dealing with the opaque, iterable Python 3 map(), the latter also presents many opportunities for enhancements that weren't possible before. For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers. On Wed, Nov 28, 2018, 16:31 E. Madison Bray <erik.m.bray@gmail.com wrote:

+1. Throwing away information is almost always a bad idea. That was fixed with classes and kwargs in 3.6 which removes a lot of fiddle workarounds for example. Throwing away data needlessly is also why 2to3, baron, Parso and probably many more had to reimplement a python parser instead of using the built in. We should have information preservation and transparency be general design goals imo. Not because we can see the obvious use now but because it keeps the door open to discover uses later. / Anders

On Wed, Nov 28, 2018 at 05:37:39PM +0100, Anders Hovmöller wrote:
"Almost always"? Let's take this seriously, and think about the consequences if we actually believed that. If I created a series of integers: a = 23 b = 0x17 c = 0o27 d = 0b10111 e = int('1b', 12) your assertion would say it is a bad idea to throw away the information about how they were created, and hence we ought to treat all five values as distinct and distinguishable. So much for the small integer cache... Perhaps every single object we create ought to hold onto a AST representing the literal or expression which was used to create it. Let's not exaggerate the benefit, and ignore the costs, of "throwing away information". Sometimes we absolutely do want to throw away information, or at least make it inaccessible to the consumer of our data structures. Sometimes the right thing to do is *not* to open up interfaces unless there is a clear need for it to be open. Doing so adds bloat to the interface, prevents many changes in implementation including potential optimizations, and may carry significant memory burdens. Bringing this discussion back to the concrete proposal in this thread, as I said earlier, I want to agree with this proposal. I too like the idea of having map (and filter, and zip...) objects expose their arguments, and for the same reason: "it might be useful some day". But every time we scratch beneath the surface and try to think about how and when we would actually use that information, we run into conceptual and practical problems which suggests strongly to me that doing this will turn it into a serious bug magnet, an anti-feature which sounds good but causes more problems than it solves. I'm really hoping someone can convince me this is a good idea, but so far the proposal seems like an attractive nuisance and not a feature.
While that is a reasonable position to take in some circumstances, in others it goes completely against YAGNI. -- Steve

Hi everyone, first participation in Python’s mailing list, don’t be too hard on me Some suggested above to change the definition of len in the long term. Then I think it could be interesting to define len such as : - If has a finite length : return that length (the way it works now) - If has a length that is infinity : return infinity - If has no length : return None There’s an issue with this solution, having None returned add complexity to the usage of len, then I suggest to have a wrapper over __len__ methods so it throws the current error. But still, there’s a problem with infinite length objects. If people code : for i in range(len(infinite_list)): # Something It’s not clear if people actually want to do this. It’s opened to discussion and it is just a suggestion. If we now consider map, then the length of map (or filter or any other generator based on an iterator) is the same as the iterator itself which could be either infinite or non defined. Cheers

On Thu, Nov 29, 2018 at 2:29 AM Adrien Ricocotam <ricocotam@gmail.com> wrote:
Do you anticipate that the `len()` function will be able to solve the Halting Problem? It is simply not possible to know whether a given iterator will produce finitely many or infinitely many elements. Even those that will produce finitely many do not, in general, have a knowable length without running them until exhaustion. Here's a trivial example:
Here's a slightly less trivial one: In [1]: from itertools import count In [2]: def mandelbrot(z): ...: "Yield each value until escape iteration" ...: c = z ...: for n in count(): ...: if abs(z) > 2: ...: return n ...: yield z ...: z = z*z + c What should len(mandelbrot(my_complex_number)) be? Hint, depending on the complex number chosen, it might be any Natural Number (or it might not terminate). -- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

[David Mertz]
Do you anticipate that the `len()` function will be able to solve the Halting Problem?
It is simply not possible to know whether a given iterator will produce
You don't have to solve the halting problem. You simply ask the object. The default behavior would be "I don't know" whether that's communicated by returning None or some other sentinel value (NaN?) or by raising a special exception. Then you simply override the default behavior for cases where the object does or at least might know. itertools.repeat, for example; would have an infinite length unless "times" is provided, then its length would be the value of "times". map would return the length of the shortest iterable unless there is an unknown sized iterable, then len would be unknown, if all iterables are infinite, the length would be infinite. We could add a decorator for length and/or length hints on generator functions: @length(lambda times: times or float("+inf"))*def* repeat(obj, times=None): if times is None: while True: yield obj else: for i in range(times): yield obj On Thu, Nov 29, 2018 at 10:40 AM David Mertz <mertz@gnosis.cx> wrote:

On Wed, Nov 28, 2018 at 11:04 PM Steven D'Aprano <steve@pearwood.info> wrote:
Not to go too off-topic but I don't think this is a great example either. Although as a practical consideration I agree Python shouldn't preserve the base representation from which an integer were created I often *wish* it would. It's useful information to have. There's nothing I hate more than doing hex arithmetic in Python and having it print out decimal results, then having to wrap everything in hex(...) before displaying. Base representation is still meaningful, often useful information.

E. Madison Bray wrote:
But it will only help if the user passes a map object in particular, and not some other kind of iterator. Also it won't help if the inputs to the map are themselves iterators that aren't amenable to inspection. This smells like exposing an implementation detail of your function in its API. I don't see how it would help with your Sage port either, since the original code only got the result of the mapping and wouldn't have been able to inspect the underlying iterables. I wonder whether it's too late to redefine map() so that it returns a view object instead of an iterator, as was done when merging dict.{items, iter_items} etc. Alternatively, add a mapped() bultin that returns a view. -- Greg

On Wed, Nov 28, 2018 at 03:27:25PM +0100, E. Madison Bray wrote:
*scratches head* I presume that SageMath worked fine with Python 2 map and filter? You can have them back again: # put this in a module called py2 _map = map def map(*args): return list(_map(*args)) And similarly for filter. The only annoying part is to import this new map at the start of every module that needs it, but while that's annoying, I wouldn't call it a "ridiculously outsized impact". Its one line at the top of each module. from py2 import map, filter What am I missing?
Can you give a concrete example of what you would do in practice? I'm having trouble thinking of how and when this sort of thing would be useful. Aside from extracting the length of the iterable(s), under what circumstances would you want to bypass the call to map() or filter() and access the iterables directly?
I don't think that's worse than what we can already do if you hold onto a reference to the underlying iterable: py> a = [1, 2, 3] py> it = map(lambda x: x+100, a) py> next(it) 101 py> a.insert(0, None) py> next(it) 101 -- Steve

On Wed, Nov 28, 2018 at 4:04 PM Steven D'Aprano <steve@pearwood.info> wrote:
For example, some function that used to expect some finite-sized sequence such as a list or tuple is now passed a "map", possibly wrapping one or more iterable of arbitrary, possibly non-finite size. For the purposes of some algorithm I have this is not useful and I need to convert it to a sequence anyways but don't want to do that without some guarantee that I won't blow up the user's memory usage. So I might want to check: finite_definite = True for it in my_map.iters: try: len(it) except TypeError: finite_definite = False if finite_definite: my_seq = list(my_map) else: # some other algorithm Of course, some arbitrary object could lie about its __len__ but I'm not concerned about pathological cases here. There may be other opportunities for optimization as well that are otherwise hidden. Either way, I don't see any reason to hide this data; it's a couple of slot attributes and instantly better introspection capability.

On Wed, Nov 28, 2018 at 04:14:24PM +0100, E. Madison Bray wrote:
But surely you didn't need to do this just because of *map*. Users could have passed an infinite, unsized iterable going back to Python 1 days with the old sequence protocol. They certainly could pass a generator or other opaque iterator apart from map. So I'm having trouble seeing why the Python 2/3 change to map made things worse for SageMath. But in any case, this example comes back to the question of len again, and we've already covered why this is problematic. In case you missed it, let's take a toy example which demonstrates the problem: def mean(it): if isinstance(it, map): # Hypothetical attribute access to the underlying iterable. n = len(it.iterable) return sum(it)/n Now let's pass a map object to it: data = [1, 2, 3, 4, 5] it = map(lambda x: x, data) assert len(it.iterable) == 5 next(it); next(it); next(it) assert mean(it) == 4.5 # fails, as it actually returns 9/5 instead of 9/2 -- Steve

Suppose itr_1 is an iterator. Now consider itr_2 = map(lambda x: x, itr_1) itr_3 = itr_1 We now have itr_1, itr_2 and itr_3. There are all, effectively, the same iterator (unless we do an 'x is y' comparision). I conclude that this suggestion amounts to have a __len__ for ANY iterator, and not just a map. In other words, this suggestion has broader scope and consequences than were presented in the original post. -- Jonathan

Probably the most proliferate reason it made things *worse* is that many functions that can take collections as arguments--in fact probably most--were never written to accept arbitrary iterables in the first place. Perhaps they should have been, but the majority of that was before my time so I and others who worked on the Python 3 port were stuck with that. Sure the fix is simple enough: check if the object is iterable (itself not always as simple as one might assume) and then call list() on it. But we're talking thousands upon thousands of functions that need to be updated where examples involving map previously would have just worked. But on top of the obvious workarounds I would now like to do things like protect users, where possible, from doing things like passing arbitrarily sized data to relatively flimsy C libraries, or as I mentioned in my last message make new optimizations that weren't possible before. Of course this isn't always possible in some cases where dealing with an arbitrary opaque iterator, or some pathological cases. But I'm concerned more about doing the best we can in the most common cases (lists, tuples, vectors, etc) which are *vastly* more common. I use SageMath as an example but I'm sure others could come up with their own clever use cases. I know there are other cases where I've wanted to at least try to get the len of a map, at least in cases where it was unambiguous (for example making a progress meter or something) On Wed, Nov 28, 2018, 16:33 Steven D'Aprano <steve@pearwood.info wrote:

I should add, I know the history here of bitterness surrounding Python 3 complaints and this is not one. I defend most things Python 3 and have ported many projects (Sage just being the largest by orders of magnitude, with every Python 3 porting quirk represented and often magnified). I agree with the new iterable map(), filter(), and zip() and welcomed that change. But I think making them more introspectable would be a useful enhancement. On Wed, Nov 28, 2018, 17:16 E. Madison Bray <erik.m.bray@gmail.com wrote:

Hi Madison Is there a URL somewhere where I can view code written to port sage to Python3? I've already found https://trac.sagemath.org/search?q=python3 And because I'm a bit interested in cluster algebra, I went to https://git.sagemath.org/sage.git/commit/?id=3a6f494ac1d4dbc1e22b0ecbebdbc63... Is this a good example of the change required? Are there other example worth looking at? -- Jonathan

On Wed, Nov 28, 2018 at 11:59 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
You either missed, or completely ignored, my previous message where I addressed this: "For example, previously a user might pass map(func, some_list) where func is some pure function and the iterable is almost always a list of some kind. Previously that map() call would be evaluated (often slowly) first. But now we can treat a map as something a little more formal, as a container for a function and one or more iterables, which happens to have this special functionality when you iterate over it, but is otherwise just a special container. This is technically already the case, we just can't directly access it as a container. If we could, it would be possible to implement various optimizations that a user might not have otherwise been obvious to the user. This is especially the case of the iterable is a simple list, which is something we can check. The function in this case very likely might actually be a C function that was wrapped with Cython. I can easily convert this on the user's behalf to a simple C loop or possibly even some other more optimal vectorized code. These are application-specific special cases of course, but many such cases become easily accessible if map() and friends are usable as specialized containers."

On 11/28/2018 9:27 AM, E. Madison Bray wrote:
One of the guidelines in the Zen of Python is "Special cases aren't special enough to break the rules." This proposal claims that the Python 3 built-in iterator class 'map' is so special that it should break the rule that iterators in general cannot and therefore do not have .__len__ methods because their size may be infinite, unknowable until exhaustion, or declining with each .__next__ call. For iterators, 3.4 added an optional __length_hint__ method. This makes sense for iterators, like tuple_iterator, list_iterator, range_iterator, and dict_keyiterator, based on a known finite collection. At the time, map.__length_hint__ was proposed and rejected as problematic, for obvious reasons, and insufficiently useful. The proposal above amounts to adding an unspecified __length_hint__ misnamed as __len__. Won't happen. Instead, proponents should define and test one or more specific implementations of __length_hint__ in map subclass(es).
What makes the map class special among all built-in iterator classes? It appears not to be a property of the class itself, as an iterator class, but of its name. In Python 2, 'map' was bound to a different implementation of the map idea, a function that produced a list, which has a length. I suspect that if Python 3 were the original Python, we would not have this discussion.
This proposes to make map (and filter) special in a different way, by adding other special (dunder) attributes. In general, built-in callables do not attach their args to their output, for obvious reasons. If they do, they do not expose them. If input data must be saved, the details are implementation dependent. A C-coded callable would not necessarily save information in the form of Python objects. Again, it seems to me that the only thing special about these two, versus the other iterators left in itertools, is the history of the names.
If a function is documented as requiring a list, or a sequence, or a length object, it is a user bug to pass an iterator. The only thing special about map and filter as errors is the rebinding of the names between Py2 and Py3, so that the same code may be good in 2.x and bad in 3.x. Perhaps 2.7, in addition to future imports of text as unicode and print as a function, should have had one to make map and filter be the 3.x iterators. Perhaps Sage needs something like def size_map(func, *iterables): for it in iterables: if not hasattr(it, '__len__'): raise TypeError(f'iterable {repr(it)} has no size') return map(func, *iterables) https://docs.python.org/3/library/functions.html#map says "map(function, iterable, ...) Return an iterator [...]" The wording is intentional. The fact that map is a class and the iterator an instance of the class is a CPython implementation detail. Another implementation could use the generator function equivalent given in the Python 2 itertools doc, or a translation thereof. I don't know what pypy and other implementations do. The fact that CPython itertools callables are (now) C-coded classes instead Python-coded generator functions, or C translations thereof (which is tricky) is for performance and ease of maintenance. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 02:53:50PM -0500, Terry Reedy wrote:
Thanks for the background Terry, but doesn't that suggest that sometimes special cases ARE special enough to break the rules? *wink* Unfortunately, I don't think it is obvious why map.__length_hint__ is problematic. It only needs to return the *maximum* length, or some sentinel (zero?) to say "I don't know". It doesn't need to be accurate, unlike __len__ itself. Perhaps we should rethink the decision not to give map() and filter() a length hint? [...]
No, in fairness, I too have often wanted to know the length of an arbitrary iterator, including map(), without consuming it. In general this is an unsolvable problem, but sometimes it is (or at least, at first glance *seems*) solvable. map() is one of those cases. If we could solve it, that would be great -- but I'm not convinced that it is solvable, since the solution seems worse than the problem it aims to solve. But I live in hope that somebody cleverer than me can point out the flaws in my argument. [...]
I think that's future_builtins: [steve@ando ~]$ python2.7 -c "from future_builtins import *; print map(len, [])" <itertools.imap object at 0xb7ed39ec> But that wouldn't have helped E. Madison Bray or SageMath, since their difficulty is not their own internal use of map(), but their users' use of map(). Unless they simply ban any use of iterators at all, which I imagine will be a backwards-incompatible change (and for that matter an excessive overreaction for many uses), SageMath can't prevent users from providing map() objects or other iterator arguments. -- Steve

On 11/28/2018 5:27 PM, Steven D'Aprano wrote:
Yes, but these cases is not special enough to break the rules for len and __len__, especially when an alternative already exists.
Unfortunately, I don't think it is obvious why map.__length_hint__ is problematic.
It is less obvious (there are more details to fill in) than the (exact) length_hints for the list, tuple, range, and dict iterators. This are *always* based on a sized collection. Map is *sometimes* based on sized collection(s). It is the other cases that are problematic, as illustrated by your next sentence.
Perhaps we should rethink the decision not to give map() and filter() a length hint?
I should have said this more explicitly. This is why I suggested that someone define and test one or specific map.__length_hint__ implementations. Someone doing so should look into the C code for list to see how list handles iterators with a length hint. I suspect that low estimates are better than high estimates. Does list recognize any value as "I don't know"?
The current situation with length_hint reminds me a bit of the situation with annotations before the addition of typing. Perhaps it is time to think about conventions for the non-obvious 'other cases'.
Thanks for the info.
In particular, by people who are not vividly aware that we broke the back-compatibility rule by rebinding 'map' and 'filter' in 3.0. Breaking back-compatibility *again* by redefining len (to mean something like operator.length) is not the right solution to problems caused by the 3.0 break.
I think their special case problem requires some special case solutions. At this point, I am refraining from making suggestions. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano <steve@pearwood.info> wrote:
In general it's unsolvable, so no attempt should be made to provide a pre-baked attempt at a solution that won't always work. But in many, if not the majority of cases, it *is* solvable. So let's give intelligent people the tools they need to solve it in those cases that they know they can solve it :)
That is the majority of the case I was concerned about, yes.

On Thu, Nov 29, 2018 at 12:16:37PM +0100, E. Madison Bray wrote:
On Wed, Nov 28, 2018 at 11:27 PM Steven D'Aprano <steve@pearwood.info> wrote:
["it" below being the length of an arbitrary iterator]
So you say, but the solutions made so far seem fatally flawed to me. Just repeating the assertion that it is solvable isn't very convincing. -- Steve

On Thu, Nov 29, 2018 at 1:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Okay, let's keep it simple: m = map(str, [1, 2, 3]) len_of_m = None if len(m.iters) == 1 and isinstance(m.iters[0], Sized): len_of_m = len(m.iters[0]) You can give me pathological cases where that isn't true, but you can't say there's no context in which that wouldn't be virtually guaranteed and consenting adults can decide whether or not that's a safe-enough assumption in their own code.

On Thu, Nov 29, 2018 at 02:16:48PM +0100, E. Madison Bray wrote:
Yes I can, and they aren't pathological cases. They are ordinary cases working the way iterators are designed to work. All you get is a map object. You have no way of knowing how many times the iterator has been advanced by calling next(). Consequently, there is no guarantee that len(m.iters[0]) == len(list(m)) except by the merest accident that the map object hasn't had next() called on it yet. *This is not pathological behaviour*. This is how iterators are designed to work. The ability to partially advance an iterator, pause, then pass it on to another function to be completed is a huge benefit of the iterator protocol. I've written code like this on more than one occasion: # toy example for x in it: process(x) if condition(x): for y in it: do_something_else(y) # Strictly speaking, this isn't needed, since "it" is consumed. break If I pass the partially consumed map iterator to your function, it will use the wrong length and give me back inaccurate results. (Assuming it actually uses the length as part of the calculated result.) You might say that your users are not so advanced, or that they're naive enough not to even know they could do that, but that's a pretty unsafe assumption as well as being rather insulting to your own users, some of whom are surely advanced Python coders not just naive dabblers. Even if only one in a hundred users knows that they can partially iterate over the map, and only one in a hundred of those actually do so, you're still making an unsafe assumption that will return inaccurate results based on an invalid value of len_of_m.
and consenting adults can decide whether or not that's a safe-enough assumption in their own code.
Which consenting adults? How am I, wearing the hat of a Sage user, supposed to know which of the hundreds of Sage functions make this "safe-enough" assumption and return inaccurate results as a consequence? -- Steve

On Thu, Nov 29, 2018 at 3:43 PM Steven D'Aprano <steve@pearwood.info> wrote:
That's a fair point and probably the killer flaw in this proposal (or any involving getting the lengths of iterators). I still think it would be useful to be able to introspect map objects, but this does throw some doubt on the overall reliability of this. I'd say that in most cases it would still work, but you're right it's harder to guarantee in this context. One obvious workaround would be to attach a flag indicating whether or not __next__ has been called (or as long as you have such a flag, why not a counter for the number of times __next__ has been called)? That would effectively solve the problem, but I admit it's a taller order in terms of adding API surface.

On Thu, Nov 29, 2018 at 2:44 PM Steven D'Aprano <steve@pearwood.info> wrote:
I think that what above all unites Sage users is knowledge of mathematics. Use of Python would be secondary. The goal surely is to discover and develop conventions and interface that work for such a group of users. In this area the original poster is probably the expert, and I think should be respected as such. Steve's post divides Sage users into "advanced Python coders" and "naive dabblers". This misses the point, which is to get something that works well for all users. This, I'd say, is one of the features of Python's success. Most Python users are people who want to get something done. By the way, I'd expect that most Sage users fall into the middle range of Python expertise. I think that to focus on the extremes is both unhelpful and divisive. -- Jonathan

On Thu, Nov 29, 2018 at 7:16 PM Jonathan Fine <jfine2358@gmail.com> wrote:
Yes, thank you. They are all very smart people--most of them much moreso than I. The vast majority are mathematicians first, and software developers second, third, fourth, or even further down the line. Some of the most prolific contributors to Sage barely know how to use git without some wrappers we've provided around it (not that they couldn't learn, but let's be honest git is a terrible tool for anyone who isn't Linus Torvalds). They still write good code and sometimes brilliant algorithms. But they're not all Python experts. Many of them are also students who are only using Python because Sage uses it, and not using Sage because it uses Python. The Sagebook [1] may be their first introduction to Python, and even then it only introduces Python programming in drips and drabs as needed for the topics at hand (e.g. variables, loops, functions). I'm trying to consider users at all levels. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf

On 11/29/2018 8:16 AM, E. Madison Bray wrote:
As I have noted before, the existing sized collection __length_hint__ methods (properly) return the remaining items = len(underlying_iterable) - items_already_produced. This is fairly easy at the C level. The following seems to work in Python. class map1: def __init__(self, func, sized): " if isinstance(sized, (list, tuple, range, dict)): self._iter = iter(sized) self._gen = (func(x) for x in self._iter) else: raise TypeError(f'{size} not one of list, tuple, range, dict') def __iter__(self): return self def __next__(self): return next(self._gen) def __length_hint__(self): return __length_hint__(self._iter) m = map1(int, [1.0, 2.0, 3.0]) print(m.__length_hint__()) print('first item', next(m)) print(m.__length_hint__()) print('remainer', list(m)) print(m.__length_hint__()) # prints, as expected and desired 3 first item 1 2 remainer [2, 3] 0 A package could include a version of this, possibly compiled, for use when applicable. -- Terry Jan Reedy

On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy@udel.edu> wrote:
This seems to be replying to the OP, whom I was quoting. On one hand I would argue that this is cherry-picking the "Zen" since not all rules are special in the first place. But in this case I agree that map should not have a length or possibly even a length hint (although the latter is more justifiable).
Who said anything about "special", or adding "special (dunder) attributes"? Nor did I make any general statement about all built-ins. For arbitrary functions it doesn't necessarily make sense to hold on to their arguments, but in the case of something like map() its arguments are the only thing that give it meaning at all: The fact remains that for something like a map in particular it can be treated in a formal sense as a collection of a function and some sequence of arguments (possibly unbounded) on which that function is to be evaluated (perhaps not immediately). As an analogy, a series in an object in its own right without having to evaluate the entire series: lots of information can be gleaned from the properties of a series without having to evaluate it. Just because you don't see the use doesn't mean others can't find one. The CPython map() implementation already carries this data on it as "func" and "iters" members in its struct. It's trivial to expose those to Python as ".funcs" and ".iters" attributes. Nothing "special" about it. However, that brings me to...
Exactly how intentional is that wording though? If it returns an iterator it has to return *some object* that implements iteration in the manner prescribed by map. Generator functions could theoretically allow attributes attached to them. Roughly speaking: def map(func, *iters): def map_inner(): for args in zip(*iters): yield func(*args) gen = map_inner() gen.func = func gen.iters = iters return gen As it happens this won't work in CPython since it does not allow attribute assignment on generator objects. Perhaps there's some good reason for that, but AFAICT--though I may be missing a PEP or something--this fact is not prescribed anywhere and is also particular to CPython. Point being, I don't think it's a massive leap or imposition on any implementation to go from "Return an iterator [...]" to "Return an iterator that has these attributes [...]" P.S.
It's not a user bug if you're porting a massive computer algebra application that happens to use Python as its implementation language (rather than inventing one from scratch) and your users don't need or want to know too much about Python 2 vs Python 3. Besides, the fact that they are passing an iterator now is probably in many cases a good thing for them, but it takes away my ability as a developer to find out more about what they're trying to do, as opposed to say just being given a list of finite size. That said, I regret bringing up Sage; I was using it as an example but I think the point stands on its own.

On Thu, Nov 29, 2018 at 10:18 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Either this is Python, or it's just an algebra language that happens to be implemented in Python. If the former, the Py2/Py3 distinction should matter to your users, since they are programming in Python. If the latter, it's all about Sage, ergo you can rebind map to mean what you expect it to mean. Take your pick. ChrisA

On Thu, Nov 29, 2018 at 12:21 PM Chris Angelico <rosuav@gmail.com> wrote:
Porque no los dos? Sage is a superset of Python, and while on some level (in terms of advanced programming constructs) users will need to care about the distinction. But most users don't really know exactly what it does when they pass something like map(a_func, a_list) as an argument to a function call. They don't necessarily appreciate the distinction that, depending on how that function is implemented, an arbitrary iterable has to be treated very differently than a list. I certainly don't mind supporting arbitrary iterables--I think they should be supported. But now there are optimizations I can't make that I could have made before when map() just returned a list. In most cases I didn't have to make these optimizations manually because the code is written in Cython. It's true that when a user called map() previously some opportunities for optimization were already lost, but now it's even worse because I have to treat a simple map of a list on par with the necessarily slower arbitrary iterator case, when technically-speaking there is no reason that has to be the case. Cython could even handle that case automatically as well by turning a map(<some_C_function_wrapped_by_cython>, <a_list>) into something like: list = map.iter[0]; for (idx=0; idx < PyList_Length(list); idx++) { wrapped_c_function(PyList_GET_ITEM(list, idx); }
If the latter, it's all about Sage, ergo you can rebind map to mean what you expect it to mean. Take your pick.
I'm' still not sure what makes you think one can just blithely replace a builtin with something that doesn't work how all other Python libraries expect that builtin to work. At best I could subclass map() and add this functionality but now you're adding at least three pointers to every map() that are not necessary since the information is already there in the C struct. For most cases this isn't too bad in terms of overhead but consider cases (which I've seen plenty of), like: list_of_lists = [map(int, x) for x in list_of_lists] Now the user who previously expected to have a list of lists has a list of maps. It's already bad enough that each map holds a pointer to a function but I wouldn't want to make that worse. Anyways, I'd love to get off the topic of Sage and just ask why you would object to useful introspection capabilities? I don't even care if it were CPython-specific.

On Thu, Nov 29, 2018 at 10:21:15PM +1100, Chris Angelico wrote:
False dichotomy. Sage is *all* of these things: - a stand-alone application which is (partially) written in Python; - an application which runs under iPython/Jupiter; - a package which has to interoperate with other Python packages; - an algebra language.
If the former, the Py2/Py3 distinction should matter to your users, since they are programming in Python.
Even if they know, and care, about the difference between iterators and lists, they cannot be expected to know or care about how the hundreds of Sage functions process lists differently from iterators. Which would be implementation details of the Sage functions, and subject to change without warning. I sympathise with this proposal. In my own tiny little way, I've had to grapple with something similar for the stdlib statistics library, and I'm not totally happy with the work-around I came up with. And I have a few ideas for the future which will either render the difference moot, or make the problem worse, I'm not sure which :-)
Sage wraps a number of Python libraries, such as numpy, sympy and others, and itself can run under iPython which for all we know may already have monkeypatched the builtins for its own ~~nefarious~~ useful purposes. Are you really comfortable with monkeypatching the builtins in this way in such a complex ecosystem of packages? Maybe it will work, but I think you're being awfully gung-ho about the suggestion. (At least my earlier suggestion didn't involve monkey-patching the builtin map, merely shadowing it.) Personally, even if monkeypatching in this way solved the problem, as a (potential) user of SageMath I'd be really, really peeved if it patched map() in the way you suggest and regressed map() to the 2.x version. -- Steve

On Fri, Nov 30, 2018 at 12:18 AM Steven D'Aprano <steve@pearwood.info> wrote:
To be quite honest, no, I am not comfortable with it. But I *am* comfortable with expecting Python programmers to program in Python, and thus deeming that breakage as a result of user code being migrated from Py2 to Py3 is to be fixed by the user. You can mess around with map(), but there are plenty of other things you can't mess with, so I don't see why this one thing should be Sage's problem. ChrisA

On Thu, Nov 29, 2018 at 2:22 PM Chris Angelico <rosuav@gmail.com> wrote:
The users--often scientists--of SageMath and many other scientific Python packages* are not "Python programmers" as such**. My job as a software engineer is to make the lower-level libraries they use for their day-to-day research work _just work_, and in particular _optimize_ that lower-level code in as many ways as I can find to. In some cases we do have to tell them about Python 2 vs Python 3 things (especially w.r.t. print()) but most of the time it is relatively transparent, as it should be. Steven has the right idea about it. Not every detail can be made perfectly transparent in terms of how users use or misuse them, no. But there are lots of areas where they should absolutely not have to care (e.g. like Steven wrote they cannot be expected to know how every single function might treat an iterator like map() over a finite sequence distinctly from the original finite sequence itself). In the case of map(), although maybe I have not articulated it well, I can say for sure that I've had perfectly valid use cases that were stymied merely by a semi-arbitrary decision to hide the data the wrapped by the "iterator returned by map()" (if you want to be pedantic about it). I'm willing to accept some explanation for why that would be actively harmful, but as someone with concrete problems to solve I'm less convinced by appeals to abstracts, or "why not just X" as if I hadn't considered "X" and found it flawed (which is not to say that I don't mind any new idea being put thoroughly through its paces.) * (Pandas, SymPy, Astropy, and even lower-level packages like NumPy, not to mention Jupyter which implements kernels for dozens of languages, but is primarily implemented in Python) ** With an obligatory asterisk to counter a common refrain from those who experience impostor syndrome, that if you are using this software then yes you are in fact a Python programmer, you just haven't realized it yet ;)

On Thu, Nov 29, 2018 at 2:05 PM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Well said. Unlike many people on this list, programming Python is not their top skill. For example, Paul Romer, the 2018 Economic Nobel Memorial Laurate. His strength is economics. Python is one of the many tools he uses. But it's not his top skill (smile). https://developers.slashdot.org/story/18/10/09/0042240/economics-nobel-laure... In some sense, I think, what Madison wants is an internal domain specific language (IDSL) that works well for Sage users. Just as Django is an IDSL that works well for many web developers. See, for example https://martinfowler.com/books/dsl.html for the general idea. We might not agree on the specifics. But that's perhaps mostly a matter for the domain experts, such as Madison and Sage users. -- Jonathan

On 11/29/2018 6:13 AM, E. Madison Bray wrote:
On Wed, Nov 28, 2018 at 8:54 PM Terry Reedy <tjreedy@udel.edu> wrote:
I will come back to this when you do.
The use of 'iterator' is exactly intended, and the iterator protocol is *intentionally minimal*, with one iterator specific __next__ method and one boilerplate __iter__ method returning self. This is more minimal than some might like. An argument against the addition of length_hint and __length_hint__ was that it might be seen as extending at least the 'expected' iterator protocol. The docs were written to avoid this.
Instances of C-coded classes generally cannot be augmented. But set this issue aside.
Do you propose exposing the inner struct members of *all* C-coded iterators? (And would you propose that all Python-coded iterators should use public names for the equivalents?) Some subset thereof? (What choice rule?) Or only for map? If the latter, why do you consider map so special?
In both 2 and 3, the function has to deal with iterator inputs one way or another. In both 2 and 3, possible interator inputs includes maps passed as generator comprehensions, '(<expression with x> for x in iterable)'.
As a former 'scientist who programs' I can understand the desire for ignorance of such details. As a Python core developer, I would say that if you want Sage to allow and cater to such ignorance, you have to either make Sage a '2 and 3' environment, without burdening Python 3, or make future Sage a strictly Python 3 environment (as many scientific stack packages are doing or planning to do). ...
That said, I regret bringing up Sage; I was using it as an example but I think the point stands on its own.
Yes, the issues of hiding versus exposing implementation details, and that of saving versus deleting and, when needed, recreating 'redundant' information, are independent of Sage and 2 versus 3. -- Terry Jan Reedy

On Thu, Nov 29, 2018 at 9:36 PM Terry Reedy <tjreedy@udel.edu> wrote:
You still seem to be confusing my point. I'm not advocating even for __length_hint__ (I think there are times that would be useful but it's still pretty problematic). I admit one thing I'm a little stuck on though is that map() currently just immediately calls iter() on its arguments to get their iterators, and does not store references to the original iterables. It would be nice if more iterators could have an exposed reference to the objects they're iterating, in cases where that's even meaningful. For some reason I thought, for example, that a list_iterator could give me a reference back to the list itself. This was probably omitted intentionally but it still feels pretty limiting :(
Not necessarily, no. But certainly a few: I'm using map() as an example but at the very least map() and filter(). An exact choice rule is something worth thinking about but I don't think you're going to find an "objective" rule. I think it goes without saying that map() is special in a way: It's one of the most basic extensions to function application and is a fundamental construct in functional programming and from a category-theortical perspective. I'm not saying Python's built-in map() needs to represent anything mathematically formal, but it's certainly quite fundamental which is why it's a built-in in the first place.
Yes, but those are still less common, and generator expressions were not even around when Sage was first started: I've been around long enough to remember when they were added to the language, and were well predated by map() and filter(). The Sagebook [1] introduces them around page 60. I'm not sure if it even introduces generators expressions at all. I think a lot of Python and C++ experts don't realize that the "iterator" concept is not at all immediately obvious to a lot of non-programmers. Most iterator inputs supplied by users are things like sized collections for which it's easy to think about "going over them one by one" and not more abstract iterators. This is true whether the user is a Python expert or not.
"ignorance" is not a word I would use here, frankly.
I agree there, that this is not really an argument about Sage or Python 2/3. Though I don't think this is an "implementation detail". In an abstract sense a map is a special container for a function and a sequence that has special semantics. As far as I'm concerned this is what it *is* in some ontological sense, and this fact is not a mere implementation detail. [1] http://dl.lateralis.org/public/sagebook/sagebook-ba6596d.pdf

On Fri, Nov 30, 2018 at 10:32:31AM +0100, E. Madison Bray wrote:
Its a built-in in the first place, because back in Python 0.9 or 1.0 or thereabouts, a fan of Lisp added it to the builtins (together with filter and reduce) and nobody objected (possibly because they didn't notice) at the time. It was much easier to add things to the language back then. During the transition to Python 3, Guido wanted to remove all three (as well as lambda): https://www.artima.com/weblogs/viewpost.jsp?thread=98196 Although map, filter and lambda have stayed, reduce has been relegated to the functools module. -- Steve

E. Madison Bray wrote:
This sounds like a backwards way to address the issue. If you have a function that expects a list in particular, it's up to its callers to make sure they give it one. Instead of maing the function do a bunch of looking before it leaps, it would be better to define something like def lmap(f, *args): return list(map(f, *args)) and then replace 'map' with 'lmap' elsewhere in your code. -- Greg

On Mon, Nov 26, 2018 at 10:35 PM Kale Kundert <kale@thekunderts.net> wrote:
Excellent proposal, followed by a flood of confused replies, which I will mostly disregard, since all miss the obvious. What's being proposed is simple, either: * len(map(f, x)) == len(x), or * both raise TypeError That implies, loosely speaking: * map(f, Iterable) -> Iterable, and * map(f, Sequence) -> Sequence But, *not*: * map(f, Iterable|Sequence) -> Magic. So, the map() function becomes a factory, returning an object with __len__ or without, depending on what it was called with. /Paul

That would be great especially if it returned objects of a subclass of map so that it didn't break any code that checks isinstance, however; I think this goes a little beyond map. I've run into cases using itertools where I wished the iterators could support len. I suppose you could turn those all into factories too, but I wonder if that's the most elegant solution. On Thu, Nov 29, 2018 at 7:22 PM Paul Svensson <paul-python@svensson.org> wrote:

On Thu, Nov 29, 2018 at 08:13:12PM -0500, Paul Svensson wrote:
Excellent proposal, followed by a flood of confused replies, which I will mostly disregard, since all miss the obvious.
When everyone around you is making technical responses which you think are "confused", it is wise to consider the possibility that it is you who is missing something rather than everyone else.
Simple, obvious, and problematic. Here's a map object I prepared earlier: from itertools import islice mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) If I now pass you the map object, mo, what should len(mo) return? Five or eight? No matter which choice you make, you're going to surprise and annoy people, and there will be circumstances where that choice will introduce bugs into their code.
But map objects aren't sequences. They're iterators. Just adding a __len__ method isn't going to make them sequences (not even "loosely speaking") or solve the problem above. In principle, we could make this work, by turning the output of map() into a view like dict.keys() etc, or a lazy sequence type like range(). wrapping the underlying sequence. That might be worth exploring. I can't think of any obvious problems with a view-like interface, but that doesn't mean there aren't any. I've spent like 30 seconds thinking about it, so the fact that I can't see any problems with it means little. But its also a big change, not just a matter of exposing the __len__ method of the underlying iterable (or iterables). -- Steve

On Sat, 1 Dec 2018 at 01:17, Steven D'Aprano <steve@pearwood.info> wrote:
Something to consider that, so far, seems to have been overlooked is that the total length of the resulting map isn't only dependent upon the iterable, but also the mapped function. It is a pretty pathological case, but there is no guarantee that the function is a pure function, free from side effects. If the iterable is mutable and the mapped function has a reference to it (either from scoping or the iterable (in)directly containing a reference to itself), there is nothing to prevent the function modifying the iterable as the map is evaluated. For example, map can be used as a filter: it = iter((0, 16, 1, 4, 8, 29, 2, 13, 42)) def filter_odd(x): while x % 2 == 0: x = next(it) return x tuple(map(filter_odd, it)) # (1, 29, 13) The above also illustrates the second way the total length of the map could differ from the length input iterable, even if is immutable. If StopIteration is raised within the mapped function, map finishes early, so can be used in a manner similar to takewhile: def takewhile_lessthan4(x): if x < 4: return x raise StopIteration tuple(map(takewhile_lessthan4, range(9))) # (0, 1, 2, 3) I really don't understand why this is true, under 'normal' usage, map shouldn't have any reason to silently swallow a StopIteration raised _within_ the mapped function. As I opened with, I wouldn't consider using map in either of these ways to be a good idea, and anyone doing so should probably be persuaded to find better alternatives, but it might be something to bear in mind. AJ

On Sat, 1 Dec 2018 at 10:44, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
It's not -- the StopIteration isn't terminating the map, it's terminating the iteration being performed by tuple().
That was a poor choice of wording on my part, it's rather that map doesn't do anything special in that regard. To whatever is iterating over the map, any unexpected StopIteration from the function isn't distinguishable from the expected one from the iterable(s) being exhausted. This issue was dealt with in generators by PEP-479 (by replacing the StopIteration with a RuntimeError). Whilst map, filter, and others may not be generators, I would expect them to be consistent with that PEP when handling the same issue.

A proposal to make map() not return an iterator seems like a non-starter. Yes, Python 2 worked that way, but that was a long time ago and we know better now. In the simple example it doesn't matter much: mo = map(lambda x: x, "aardvark") But map() is more useful for the non-toy case: mo = map(expensive_db_lookup, list_of_keys) list_of_keys can be a concrete list, but I'm using map() mainly specifically to get lazy iterator behavior. On Sat, Dec 1, 2018, 11:10 AM Paul Svensson <paul-python@svensson.org wrote:

On Sat, Dec 01, 2018 at 11:27:31AM -0500, David Mertz wrote:
Paul is certainly not suggesting reverting the behaviour to the Python2 map, at the very least map(func, iterator) will continue to return an iterator. What Paul is *precisely* proposing isn't clear to me, except that map(func, sequence) will be "loosely" a sequence. What that means is not obvious. What is especially unclear is what his map() will do when passed multiple iterable arguments. [...]
list_of_keys can be a concrete list, but I'm using map() mainly specifically to get lazy iterator behavior.
Indeed. That's often why I use it too. But there is a good use-case for having map(), or a map-like function, provide either a lazy sequence like range() or a view. But the devil is in the details. Terry was right to encourage people to experiment with their own map-like function (a subclass?) to identify any tricky corners in the proposal. -- Steve

On Sat, Dec 01, 2018 at 11:07:53AM -0500, Paul Svensson wrote: [...]
I already discussed that: map is not currently a sequence, and just giving it a __len__ is not going to make it one. Making it a sequence, or a view of a sequence, is a bigger change, but worth considering, as I already said in part of my post you deleted. However, it is also a backwards incompatible change. In case its not obvious from my example above, I'll be explicit: # current behaviour mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => passes # future behaviour, with your proposal mo = map(lambda x: x, "aardvark") list(islice(mo, 3)) # discard the first three items assert ''.join(mo) == 'dvark' => fails with AssertionError Given the certainty that this change will break code (I know it will break *my* code, as I often rely on map() being an iterator not a sequence) it might be better to introduce a new "mapview" type rather than change the behaviour of map() itself. On the other hand, since the fix is simple enough: mo = iter(mo) perhaps all we need is a depreciation period of at least one full release before changing the behaviour. Either way, this isn't a simple or obvious change, and will probably need a PEP to nut out all the fine details. -- Steve

On Sat, Dec 1, 2018, 11:54 AM Steven D'Aprano <steve@pearwood.info wrote:
Given that the anti-fix is just as simple and currently available, I don't see why we'd want a change: # map->sequence mo = list(mo) FWIW, I actually do write exactly that code fairly often, it's not hard.

On Sat, Dec 01, 2018 at 12:06:23PM -0500, David Mertz wrote:
Sure, but that makes a copy of the original data and means you lose the benefit of map being lazy. Naturally we will always have the ability to call list and eagerly convert to a sequence, but these proposals are for a way of getting the advantages of sequence-like behaviour while still keeping the advantages of laziness. With iterators, the only way to get that advantage of laziness is to give up the ability to query length, random access to items, etc even when the underlying data is a sequence and that information would have been readily available. We can, at least sometimes, have the best of both worlds. Maybe. -- Steve

Other than being able to ask len(), are there any advantages to a slightly less opaque map()? Getting the actual result of applying the function to the element is necessarily either eager or lazy, you can't have both. On Sat, Dec 1, 2018, 12:24 PM Steven D'Aprano <steve@pearwood.info wrote:

On Sat, Dec 01, 2018 at 12:28:16PM -0500, David Mertz wrote:
I don't understand the point you think you are making here. There's no fundamental need to make a copy of a sequence just to apply a map function to it, especially if the function is cheap. (If it is expensive, you might want to add a cache.) This proof of concept wrapper class could have been written any time since Python 1.5 or earlier: class lazymap: def __init__(self, function, sequence): self.function = function self.wrapped = sequence def __len__(self): return len(self.wrapped) def __getitem__(self, item): return self.function(self.wrapped[item]) It is fully iterable using the sequence protocol, even in Python 3: py> x = lazymap(str.upper, 'aardvark') py> list(x) ['A', 'A', 'R', 'D', 'V', 'A', 'R', 'K'] Mapped items are computed on demand, not up front. It doesn't make a copy of the underlying sequence, it can be iterated over and over again, it has a length and random access. And if you want an iterator, you can just pass it to the iter() function. There are probably bells and whistles that can be added (a nicer repr? any other sequence methods? a cache?) and I haven't tested it fully. For backwards compatibilty reasons, we can't just make map() work like this, because that's a change in behaviour. There may be tricky corner cases I haven't considered, but as a proof of concept I think it shows that the basic premise is sound and worth pursuing. -- Steve

Steven D'Aprano wrote:
For backwards compatibilty reasons, we can't just make map() work like this, because that's a change in behaviour.
Actually, I think it's possible to get the best of both worlds. Consider this: from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return self def __next__(self): if not self.iterator: self.iterator = map(self.func, *self.args) return next(self.iterator) If you give it sequences, it behaves like a sequence:
If you give it iterators, it behaves like an iterator:
If you use it as an iterator after giving it sequences, it also behaves like an iterator:
What do people think? Could we drop something like this in as a replacement for map() without disturbing anything too much? -- Greg

On Sun, Dec 2, 2018 at 12:08 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
I can't help thinking that it will be extremely surprising to have the length remain the same while the items get consumed. After you take a couple of elements off, the length of the map is exactly the same, yet the length of a list constructed from that map won't be. Are there any other non-pathological examples where len(x) != len(list(x))? ChrisA

Chris Angelico wrote:
I can't help thinking that it will be extremely surprising to have the length remain the same while the items get consumed.
That can be fixed. The following version raises an exception if you try to find the length after having used it as an iterator. (I also fixed a bug -- I had screwed up the sequence case, and it wasn't re-iterating properly.) class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator)
It will still report a length if you use len() *before* starting to use it as an iterator, but the length it returns is correct at that point, so I don't think that's a problem.
Are there any other non-pathological examples where len(x) != len(list(x))?
No longer a problem:
-- Greg

On Mon, Dec 03, 2018 at 02:04:31AM +1300, Greg Ewing wrote:
That's not really a "fix" as such, more of a violation of the principle of least astonishment. Perhaps more like the principle of most astonishment: the object changes from sized to unsized even if you don't modify its value or its type, but merely if you look at it the wrong way: # This is okay, doesn't change the nature of the object. for i in range(sys.maxint): try: print(mapview[i]) except IndexError: break # But this unexpectedly changes it from sized to unsized. for x in mapview: break That makes this object a fragile thing that can unexpectedly change from sized to unsized. Neither fish nor fowl with a confusing API that is not quite a sequence, not quite an iterator, not quite sized, but just enough of each to lead people into error. Or... at least that's what the code is supposed to do, the code you give doesn't actually work that way:
I can't reproduce that behaviour with the code you give above. When I try it, it returns the length 3, even after the iterator has been completely consumed. I daresay you could jerry-rig something to "fix" this bug, but I think this is a poor API that tries to make a single type act like two conceptually different things at the same time. -- Steve

Steven D'Aprano wrote:
Yes, but keep in mind the purpose of the whole thing is to provide a sequence interface while not breaking old code that expects an iterator interface. Code that was written to work with the existing map() will not be calling len() on it at all, because that would never have worked.
Yes, it's a compromise in the interests of backwards compatibility. But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that.
It sounds like you were still using the old version with a broken __iter__() method. This is my current complete code together with test cases: #----------------------------------------------------------- from operator import itemgetter class MapView: def __init__(self, func, *args): self.func = func self.args = args self.iterator = None def __len__(self): if self.iterator: raise TypeError("Mapping iterator has no len()") return min(map(len, self.args)) def __getitem__(self, i): return self.func(*list(map(itemgetter(i), self.args))) def __iter__(self): return map(self.func, *self.args) def __next__(self): if not self.iterator: self.iterator = iter(self) return next(self.iterator) if __name__ == "__main__": a = [1, 2, 3, 4, 5] b = [2, 3, 5] print("As a sequence:") m = MapView(pow, a, b) print(list(m)) print(list(m)) print(len(m)) print(m[1]) print() print("As an iterator:") m = MapView(pow, iter(a), iter(b)) print(next(m)) print(list(m)) print(list(m)) try: print(len(m)) except Exception as e: print("***", e) print() print("As an iterator over sequences:") m = MapView(pow, a, b) print(next(m)) print(next(m)) try: print(len(m)) except Exception as e: print("***", e) #----------------------------------------------------------- This is the output I get: As a sequence: [1, 8, 243] [1, 8, 243] 3 8 As an iterator: 1 [8, 243] [] *** Mapping iterator has no len() As an iterator over sequences: 1 8 *** Mapping iterator has no len() -- Greg

On Mon, Dec 10, 2018 at 5:23 AM E. Madison Bray <erik.m.bray@gmail.com> wrote:
Indeed; I believe it is very useful to have a map-like object that is effectively an augmented list/sequence.
but what IS a "map-like object" -- I'm trying to imagine what that actually means. "map" takes a function and maps it onto a interable, returning a new iterable. So a map object is an iterable -- what's under the hood being used to create it is (and should remain) opaque. Back in the day, Python was "all about sequences" -- so map() took a sequence and returned a sequence (an actual list, but that's not the point here). And that's pretty classic "map". With py3, there was a big shift toward iterables, rather than sequences as the core type to work with. There are a few other benefits, but the main one is that often sequences were made, simply so that they could be immediately iterated over, and that was a waste of resources. for i, item in enumerate(a_sequence): ... for x, y in zip(seq1, seq2): ... These two are pretty obvious, but the same approach was taken over much of python: dict.keys(), map(), range(), .... So now in Python, you need to decide, when writing code, what your API is -- does your function take a sequence? or does it take an iterable? Of course, a sequence is an iterable, but a iterable is not (necessarily) a sequence. -- so back in the day, you din't really need to make the decision. So in the case of the Sage example -- I wonder what the real problem is -- if you have an API that requires a sequence, on Py2, folks may have well been passing it the result of a map() call. -- note that they weren't passing a "map object" that is now somehow different than it used to be -- they were passing a list plain and simple. And there are all sorts of places, when converting from py2 to py3, where you will now get an iterable that isn't a proper sequence, and if the API you are using requires a sequence, you need to wrap a list() or tuple() or some such around it to make the sequence. Note that you can write your code to work under either 2 or 3, but it's really hard to write a library so that your users can run it under either 2 or 3 without any change in their code! But note: the fact that it's a map object is just one special case. I suppose one could write an API now that actually expects a map object (rather than a generic sequence or iterable) but it was literally impossible in py2 -- there was no such object. I'm still confused -- what's so wrong with: list(map(func, some_iterable)) if you need a sequence? You can, of course mike lazy-evaluated sequences (like range), and so you could make a map-like function that required a sequence as input, and would lazy evaluate that sequence. This could be useful if you weren't going to work with the entire collection, but really wanted to only index out a few items, but I'm trying to imagine a use case for that, and I haven't. And I don't think that's the use case that started this thread... -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Dec 11, 2018 at 2:16 AM Chris Barker <chris.barker@noaa.gov> wrote:
I don't understand why this is confusing. Greg gave an example of what this *might* mean up thread. It's not the only possible approach but it is one that makes a lot of sense to me. The way you're defining "map" is arbitrary and post-hoc. It's a definition that makes sense for "map" that's restricted to iterating over arbitrary iterators. It's how it happens to be defined in Python 3 for various reasons that you took time to explain at great length, which I regret to inform you was time wasted explaining things I already know. For something like a fixed sequence a "map" could just as easily be defined as a pair (<function>, <sequence>) that applies <function>, which I'm claiming is a pure function, to every element returned by the <sequence>. This transformation can be applied lazily on a per-element basis whether I'm iterating over it, or performing random access (since <sequence> is known for all N). Python has no formal notion of a pure function, but I'm an adult and can accept responsibility if I try to use this "map-like" object in a way that is not logically consistent. The stuff about Sage is beside the point. I'm not even talking about that anymore.

On Tue, 11 Dec 2018 at 10:38, E. Madison Bray <erik.m.bray@gmail.com> wrote:
What's confusing to *me*, at least, is what's actually being suggested here. There's a lot of theoretical discussion, but I've lost track of how it's grounded in reality: 1. If we're saying that "it would be nice if there were a function that acted like map but kept references to its arguments", that's easy to do as a module on PyPI. Go for it - no-one will have any problem with that. 2. If we're saying "the builtin map needs to behave like that", then 2a. *Why*? What is so special about this situation that the builtin has to be changed? 2b. Compatibility questions need to be addressed. Is this important enough to code that "needs" it that such code is OK with being Python 3.8+ only? If not, why aren't the workarounds needed for Python 3.7 good enough? (Long term improvement and simplification of the code *is* a sufficient reason here, it's just something that should be explicit, as it means that the benefits are long-term rather than immediate). 2c. Weird corner case questions, while still being rare, *do* need to be addressed - once a certain behaviour is in the stdlib, changing it is a major pain, so we have a responsibility to get even the corner cases right. 2d. It's not actually clear to me how critical that need actually is. Nice to have, sure (you only need a couple of people who would use a feature for it to be "nice to have") but beyond that I haven't seen a huge number of people offering examples of code that would benefit (you mentioned Sage, but that example rapidly degenerated into debates about Sage's design, and while that's a very good reason for not wanting to continue using that as a use case, it does leave us with few actual use cases, and none that I'm aware of that are in production code...) 3. If we're saying something else (your comment "map could just as easily be defined as..." suggests that you might be) then I'm not clear what it is. Can you describe your proposal as pseudo-code, or a Python implementation of the "map" replacement you're proposing? Paul

On Tue, Dec 11, 2018 at 12:13 PM Paul Moore <p.f.moore@gmail.com> wrote:
It's true, this has been a wide-ranging discussion and it's confusing. Right now I'm specifically responding to the sub-thread that Greg started "Suggested MapView object", so I'm considering this a mostly clean slate from the previous thread "__len__() for map()". Different ideas have been tossed around and the discussion has me thinking about broader possibilities. I responded to this thread because I liked Greg's proposal and the direction he's suggesting. I think that the motivation underlying much of this discussion, forth both the OP who started the original thread, as well as myself, and others is that before Python 3 changed the implementation of map() there were certain assumptions one could make about map() called on a list* which, under normal circumstances were quite reasonable and sane (e.g. len(map(func, lst)) == len(lst), or map(func, lst)[N] == func(lst[N])). Python 3 broke all of these assumptions, for reasons that I personally have no disagreement with, in terms of motivation. However, in retrospect, it might have been nice if more consideration were given to backwards compatibility for some "obvious" simple cases. This isn't a Python 2 vs Python 3 whine though: I'm just trying to think about how I might expect map() to work on different types of arguments, and I see no problem--so long as it's properly documented--with making its behavior somewhat polymorphic on the types of arguments. The idea would be to now enhance the existing built-ins to restore at least some previously lost assumptions, at least in the relevant cases. To give an analogy, Python 3.0 replaced range() with (effectively) xrange(). This broken a lot of assumptions that the object returned by range(N) would work much like a list, and Python 3.2 restored some of that list-like functionality by adding support for slicing and negative indexing on range(N). I believe it's worth considering such enhancements for filter() and map() as well, though these are obviously a bit trickier. * or other fixed-length sequence, but let's just use list as a shorthand, and assume for the sake of simplicity a single list as well.
Sure, though since this is about the behavior of global built-ins that are commonly used by users at all experience levels the problem is a bit hairier. Anybody can implement anything they want and put it in a third-party module. That doesn't mean anyone will use it. I still have to write code that handles map objects. In retrospect I think Guido might have had the right idea of wanting to move map() and filter() into functools along with reduce(). There's a surprisingly lot more at stake in terms of backwards compatibility and least-astonishment when it comes to built-ins. I think that's in part why the new Python 3 definitions of map() and filter() were kept so simple: although they were not backwards compatible I do think they were well designed to minimize astonishment. That's why I don't necessarily disagree with the choices made (but still would like to think about how we can make enhancements going forward).
Same question could apply to last time it was changed. I think now we're trying to find some middle-ground.
That's a good point: I think the same arguments as for enhancing range() apply here, but this is worth further consideration (though having a more concrete proposal in the first place should come first).
It depends on what you mean by getting them "right". It's definitely worth going over as one can think of. Not all corner cases have a satisfying resolution (and may be highly context-dependent). In those cases getting it "right" is probably no more than documenting that corner case and perhaps warning against it.
That's a fair point worthy of further consideration. To me, at least, map on a list working as an augmented list is obvious, clear, useful, at solves most of the use-cases where having map.__len__ might be desirable, among others.
Again, I'm mostly responding to Greg's proposal which I like. To extend it, I'm suggesting that a call to map() where all the arguments are sequences** might return something like his MapView. If even that idea is crazy or impractical though, I can accept that. But I think it's quite analogous to how map on arbitrary iterables went from immediate evaluation to lazy evaluation while iterating: in the same way map on some sequence(s) can be evaluated lazily on random access. ** I have a separate complaint that there's no great way, at the Python level, to define a class that is explicitly a "sequence" as opposed to a more general "mapping", but that's a topic for another thread...

On Tue, 11 Dec 2018 at 11:49, E. Madison Bray <erik.m.bray@gmail.com> wrote:
Thanks. That clarifies the situation for me very well. I agree with most of the comments you made, although I don't have any good answers. I think you're probably right that Guido's original idea to move map and filter to functools might have been better, forcing users to explicitly choose between a genexp and a list comprehension. On the other hand, it might have meant people used more lists than they needed to, as a result. Paul

On Tue, Dec 11, 2018 at 12:48:10PM +0100, E. Madison Bray wrote:
Greg's code can be found here: https://mail.python.org/pipermail/python-ideas/2018-December/054659.html His MapView tries to be both an iterator and a sequence at the same time, but it is neither. The iterator protocol is that iterators must: - have a __next__ method; - have an __iter__ method which returns self; and the test for an iterator is: obj is iter(obj) https://docs.python.org/3/library/stdtypes.html#iterator-types Greg's MapView object is an *iterable* with a __next__ method, which makes it neither a sequence nor a iterator, but a hybrid that will surprise people who expect it to act considently as either. This is how iterators work: py> x = iter("abcdef") # An actual iterator. py> next(x) 'a' py> next(x) 'b' py> next(iter(x)) 'c' Greg's hybrid violates that expected behaviour: py> x = MapView(str.upper, "abcdef") # An imposter. py> next(x) 'A' py> next(x) 'B' py> next(iter(x)) 'A' As an iterator, it is officially "broken", continuing to yield values even after it is exhausted: py> x = MapView(str.upper, 'a') py> next(x) 'A' py> next(x) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/gregmapview.py", line 24, in __next__ return next(self.iterator) StopIteration py> list(x) # But wait! There's more! ['A'] py> list(x) # And even more! ['A'] This hybrid is fragile: whether operations succeed or not depend on the order that you call them: py> x = MapView(str.upper, "abcdef") py> len(x)*next(x) # Safe. But only ONCE. 'AAAAAA' py> y = MapView(str.upper, "uvwxyz") py> next(y)*len(y) # Looks safe. But isn't. Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/steve/gregmapview.py", line 12, in __len__ raise TypeError("Mapping iterator has no len()") TypeError: Mapping iterator has no len() (For brevity, from this point on I shall trim the tracebacks and show only the final error message.) Things that work once, don't work a second time. py> len(x)*next(x) # Worked a moment ago, but now it is broken. TypeError: Mapping iterator has no len() If you pass your MapView object to another function, it can accidentally sabotage your code: py> def innocent_looking_function(obj): ... next(obj) ... py> x = MapView(str.upper, "abcdef") py> len(x) 6 py> innocent_looking_function(x) py> len(x) TypeError: Mapping iterator has no len() I presume this is just an oversight, but indexing continues to work even when len() has been broken. Greg seems to want to blame the unwitting coder who runs into these boobytraps: "But there are no surprises as long as you stick to one interface or the other. Weird things happen if you mix them up, but sane code won't be doing that." (URL as above). This MapView class offers a hybrid "sequence plus iterator, together at last!" double-headed API, and even its creator says that sane code shouldn't use that API. Unfortunately, you can't use the iterator API, because its broken as an iterator, and you can't use it as a sequence, because any function you pass it to might use it as an iterator and pull the rug out from under your feet. Greg's code is, apart from the addition of the __next__ method, almost identical to the version of mapview I came up with in my own testing. Except Greg's is even better, since I didn't bother handling the multiple-sequences case and his does. Its the __next__ method which ruins it, by trying to graft on almost- but-not-really iterator behaviour onto something which otherwise is a sequence. I don't think there's any way around that: I think that any attempt to make a single MapView object work as either a sequence with a length and indexing AND an iterator with next() and no length and no indexing is doomed to the same problems. Far from minimizing surprise, it will maximise it. Look at how many violations of the Principle Of Least Surprise Greg's MapView has: - If an object has a __len__ method, calling len() on it shouldn't raise TypeError; - If you called len() before, and it succeeded, calling it again should also succeed; - if an object has a __next__ method, it should be an iterator, and that means iter(obj) is obj; - if it isn't an iterator, you shouldn't be able to call next() on it; - if it is an iterator, once it is exhausted, it should stay exhausted; - iterating over an object (calling next() or iter() on it) shouldn't change it from a sequence to a non-sequence; - passing a sequence to another function, shouldn't result in that sequence no longer supporting len() or indexing; - if an object has a length, then it should still have a length even after iterating over it. I may have missed some. -- Steve

Steven D'Aprano wrote:
By that test, it identifies as a sequence, as does testing it for the presence of __len__:
So, code that doesn't know whether it has a sequence or iterator and tries to find out, will conclude that it has a sequence. Presumably it will then proceed to treat it as a sequence, which will work fine.
That's a valid point, but it can be fixed: def __iter__(self): return self.iterator or map(self.func, *self.args) Now it gives
There is still one case that will behave differently from the current map(), i.e. using list() first and then expecting it to behave like an exhausted iterator. I'm finding it hard to imagine real code that would depend on that behaviour, though.
But what sane code is going to do that? Remember, the iterator interface is only there for backwards compatibility. That would fail under both Python 2 and the current Python 3.
If you're using len(), you clearly expect to have a sequence, not an iterator, so why are you calling a function that blindly expects an iterator? Again, this cannot be and could never have been working code.
I presume this is just an oversight, but indexing continues to work even when len() has been broken.
That could be fixed.
No. I would document it like this: It provides a sequence API. It also, *for backwards compatibility*, implements some parts of the iterator API, but new code should not rely on that, nor should any code expect to be able to use both interfaces on the same object. The backwards compatibility would not be perfect, but I think it would work in the vast majority of cases. I also envisage that the backwards compatibility provisions would not be kept forever, and that it would eventually become a pure sequence object. I'm not necessarily saying this *should* be done, just pointing out that it's a possible strategy for migrating map() from an iterator to a view, if we want to do that. -- Greg

On 12/11/2018 6:50 PM, Greg Ewing wrote:
Python has list and list_iterator, tuple and tuple_iterator, set and set_iterator, dict and dict_iterator, range and range_iterator. In 3.0, we could have turned map into a finite sequence analogous to range, and add a new map_iterator. To be completely lazy, such a map would have to restrict input to Sequences. To be compatible with 2.0 map, it would have to use list(iterable) to turn other finite iterables into concrete lists, making it only semi-lazy. Since I am too lazy to write the multi-iterable version, here is the one-iterable version to show the idea. def __init__(func, iterable): self.func = func self.seq = iterable if isinstance(iterable, Sequence) else list(iterable) Given the apparent little need for the extra complication, and the possibility of keeping a reference to sequences and explicitly applying list otherwise, it was decided to rebind 'map' to the fully lazy and general itertools.map. -- Terry Jan Reedy

On Wed, Dec 12, 2018 at 12:50:41PM +1300, Greg Ewing wrote:
Since existing map objects are iterators, that breaks backwards compatibility. For code that does something like this: if obj is iter(obj): process_iterator() else: n = len(obj) process_sequence() it will change behaviour, shifting map objects from the iterator branch to the sequence branch. That's a definite change in behaviour, which alone could change the meaning of the code. E.g. if the two process_* functions use different algorithms. Or it could break the code outright, because your MapView objects can raise TypeError when you call len() on them. I know that any object with a __len__ could in principle raise TypeError. But for anything else, we are justified in calling it a bug in the __len__ implementation. You're trying to sell it as a feature.
It will work fine, unless something has called __next__, which will cause len() to blow up in their face by raising TypeError. I call these sorts of designs "landmines". They're absolutely fine, right up to the point where you hit the right combination of actions and step on the landmine. For anything else, this sort of thing would be a bug. You're calling it a feature.
That's not the only breakage. This is a pattern which I sometimes use: def test(iterator): # Process items up to some symbol one way, # and items after that symbol another way. for a in iterator: print(1, a) if a == 'C': break # This relies on iterator NOT resetting to the beginning, # but continuing from where we left off # i.e. not being broken for b in iterator: print(2, b) Being an iterator, right now I can pass map() objects directly to that code, and it works as expected: py> test(map(str.upper, 'abcde')) 1 A 1 B 1 C 2 D 2 E Your MapView does not: py> test(MapView(str.upper, 'abcde')) 1 A 1 B 1 C 2 A 2 B 2 C 2 D 2 E This is why such iterators are deemed to be "broken".
You have an object that supports len() and next(). Why shouldn't people use both len() and next() on it when both are supported methods? They don't have to be in a single expression: x = MapView(blah blah blah) a = some_function_that_calls_len(x) b = some_function_that_calls_next(x) That works. But reverse the order, and you step on a landmine: b = some_function_that_calls_next(x) a = some_function_that_calls_len(x) The caller may not even know that the functions call next() or len(), they could be implementation details buried deep inside some library function they didn't even know they were calling. Do you still think that it is the caller's code that is insane?
Remember, the iterator interface is only there for backwards compatibility.
Famous last words.
That would fail under both Python 2 and the current Python 3.
Honestly Greg, you've been around long enough that you ought to recognise *minimal examples* for what they are. They're not meant to be real-world production code. They're the simplest, most minimal example that demonstates the existence of a problem. The fact that they are *simple* is to make it easy to see the underlying problem, not to give you an excuse to dismiss it. You're supposed to imagine that in real-life code, the call to next() could be buried deep, deep, deep in a chain of 15 function calls in some function in some third party library that I don't even know is being called, and it took me a week to debug why len(obj) would sometimes fail mysteriously. The problem is not the caller, or even the library code, but that your class magically and implictly swaps from a sequence to a pseudo-iterator whether I want it to or not. A perfect example of why DWIM code is so hated: http://www.catb.org/jargon/html/D/DWIM.html
*Minimal example* again. You ought to be able to imagine the actual function is fleshed out, without expecting me to draw you a picture: if hasattr(obj, '__next__'): first = next(obj, sentinel) Or if you prefer: try: first = next(obj) except TypeError: # fall back on sequence algorithm except StopIteration: # empty iterator None of this boilerplate adds any insight at all to the discussion. There's a reason bug reports ask for minimal examples. The point is, I'm calling some innocent looking function, and it breaks my sequence: len(obj) worked before I called the function, and afterwards, it raises TypeError. I wouldn't have to care about the implementation if your MapView object didn't magically flip from sequence to iterator behind my back. -- Steve

and the test for an iterator is:
obj is iter(obj)
Is that a hard and fast rule? I know it’s the vast majority of cases, but I imagine you could make an object that behaved exactly like an iterator, but returned some proxy object rather that itself. Not sure why one would do that, but it should be possible. - CHB

On Thu, Dec 13, 2018 at 3:07 PM Chris Barker - NOAA Federal via Python-ideas <python-ideas@python.org> wrote:
Yes, it is. https://docs.python.org/3/library/stdtypes.html#iterator-types For an iterable, __iter__ needs to return an appropriate iterator. For an iterator, __iter__ needs to return self (which is, by definition, the "appropriate iterator"). Note also that the behaviour around StopIteration is laid out there, including that an iterator whose __next__ has raised SI but then subsequently doesn't continue to raise SI is broken. (Though it *is* legit to raise StopIteration with a value the first time, and then raise a vanilla SI subsequently. Generators do this, rather than retain the return value indefinitely.) ChrisA

Chris Angelico wrote:
The docs aren't very clear on this point. They claim this is necessary so that the iterator can be used in a for-loop, but that's obviously not strictly true, since a proxy object could also be used. They also make no mention about whether one should be able to rely on this as a definitive test of iterator-ness. In any case, I don't claim that my MapView implements the full iterator protocol, only enough of it to pass for an iterator in most likely scenarios that assume one. -- Greg

On Thu, Dec 13, 2018 at 4:54 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
iterator.__iter__() Return the iterator object itself. I do believe "the iterator object itself" means that "iterator.__iter__() is iterator" should always be true. But maybe there's some other way to return "the object itself" other than actually returning "the object itself"? ChrisA

On Thu, 13 Dec 2018 at 05:55, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
See also https://docs.python.org/3.7/glossary.html#term-iterator, which reiterates the point that "Iterators are required to have an __iter__() method that returns the iterator object itself". By that point, I'd say the docs are pretty clear...
They also make no mention about whether one should be able to rely on this as a definitive test of iterator-ness.
That glossary entry is linked from https://docs.python.org/3.7/library/collections.abc.html#collections.abc.Ite..., so it would be pretty hard to argue that it's not part of the "definitive test of iterator-ness".
But not enough that it's legitimate to describe it as an "iterator". It may well be a useful class, and returning it from a map-like function may be a practical and effective thing to do, but describing it as an "iterator" does nothing apart from leading to distracting debates on how it doesn't work the same as an iterator. Better to just accept that it's *not* an iterator, and focus on whether it's useful... IMO, it sounds like it's useful, but it's not backward compatible (because it's not an iterator ;-)). Whether it's *sufficiently* useful to justify breaking backward compatibility is a different discussion (all I can say on that question is that I've never personally had a case where the current Python 3 behaviour of map is a problem). Paul

On Thu, Dec 13, 2018 at 06:53:54PM +1300, Greg Ewing wrote:
Whether your hybrid sequence+iterator is close enough to an iterator or not isn't the critical point here. If we really wanted to, we could break backwards compatibility, with or without a future import or a deprecation period, and simply declare that this is how map() will work in the future. Doing that, or not, becomes a question of whether the gain is worth the breakages. The critical question here is whether a builtin ought to include the landmines your hybrid class does. *By design*, your class will blow up in people's faces if they try to use the full API offered. It violates at least two expected properties: - As an iterator, it is officially "broken" because in at least two reasonable scenarios, it automatically resets after being exhausted. (Although presumably we could fix that with an "is_exhausted" flag.) - As a sequence, it violates the expectation that if an object is Sized (it has a __len__ method) calling len() on it should not raise TypeError; As a sequence, it is fragile and easily breakable, changing from a sequence to a (pseudo-)iterator whether the caller wants it to or not. Third-party code could easily flip the switch, leading to obscure errors. That second one is critical to your "Do What I Mean" design; the whole point of your class is for the object to automagically swap from behaving like a sequence to behaving like an iterator according to how it is used. Rather than expecting the user to make an explicit choice of which behaviour they want: - use map() to get current iterator behaviour; - use mapview() to get lazy-sequence behaviour; your class tries to do both, and then guesses what the user wants depending on how the map object happens to get used. -- Steve

On Wed, Dec 12, 2018 at 08:06:17PM -0800, Chris Barker - NOAA Federal wrote:
Yes, that's the rule for the iterator protocol. Any object can have an __iter__ method which returns anything you want. (It doesn't even have to be iterable, this is Python, and if you want to shoot yourself in the foot, you can.) But to be an iterator, the rule is that obj.__iter__() must return obj itself. Otherwise we say that obj is an iterable, not an iterator. https://docs.python.org/3/library/stdtypes.html#iterator.__iter__ -- Steve

On 12/11/2018 6:48 AM, E. Madison Bray wrote:
A range represents an arithmetic sequence. Any usage of range that could be replaced by xrange, which is nearly all uses, made no assumption broken by xrange. The basic assumption was and is that a range/xrange could be repeatedly iterated. That this assumption was met in the first case by returning a list was somewhat of an implementation detail. In terms of mutability, a tuple would be have been better, as range objects should not be mutable. (If [2,4,6] is mutated to [2,3,7], it is no longer a range (arithmetic sequence).
and Python 3.2 restored some of that list-like functionality
As I see it, xranges were unfinished as sequence objects and 3.2 finished the job. This included having the min() and max() builtins calculate the min and max efficiently, as a human would, as the first or last of the sequence, rather than uselessly iterating and comparing all the items in the sequence. A proper analogy to range would be a re-iterable mapview (or 'mapseq) like what Steven D'Aprano proposes.
-- Terry Jan Reedy

On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote: [...]
You might need a sequence. Why do you think that has to be an *eager* sequence? I can think of two obvious problems with eager sequences: space and time. They can use too much memory, and they can take too much time to generate them up-front and too much time to reap when they become garbage. And if you have an eager sequence, and all you want is the first item, you still have to generate all of them even though they aren't needed. We can afford to be profligate with memory when the data is small, but eventually you run into cases where having two copies of the data is one copy too many.
Or even if you *are* going to work with the entire collection, but you don't need them all at once. I once knew a guy whose fondest dream was to try the native cuisine of every nation of the world ... but not all in one meal. This is a classic time/space tradeoff: for the cost of calling the mapping function anew each time we index the sequence, we can avoid allocating a potentially huge list and calling a potentially expensive function up front for items we're never going to use. Instead, we call it only on demand. These are the same principles that justify (x)range and dict views. Why eagerly generate a list up front, if you only need the values one at a time on demand? Why make a copy of the dict keys, if you don't need a copy? These are not rhetorical questions. This is about avoiding the need to make unnecessary copies for those times we *don't* need an eager sequence generated up front, keeping the laziness of iterators and the random-access of sequences. map(func, sequence) is a great candidate for this approach. It has to hold onto a reference to the sequence even as an iterator. The function is typically side-effect free (a pure function), and if it isn't, "consenting adults" applies. We've already been told there's at least one major Python project, Sage, where this would have been useful. There's a major functional language, Haskell, where nearly all sequence processing follows this approach. I suggest we provide a separate mapview() type that offers only the lazy sequence API, without trying to be an iterator at the same time. If you want an eager sequence, or an iterator, they're only a single function call away: list(mapview_instance) iter(mapview_instance) # or just stick to map() Rather than trying to guess whether people want to treat their map objects as sequences or iterators, we let them choose which they want and be explicit about it. Consider the history of dict.keys(), values() and items() in Python 2. Originally they returned eager lists. Did we try to retrofit view-like and iterator-like behaviour onto the existing dict.keys() method, returning a cunning object which somehow turned from a list to a view to an iterator as needed? Hell no! We introduced *six new methods* on dicts: - dict.iterkeys() - dict.viewkeys() and similar for items() and values(). Compared to that, adding a single variant on map() that expects a sequence and returns a view on the sequence seems rather timid. -- Steve

Perhaps I got confused by the early part of this discussion. My point was that there is no “map-like” object at the Python level. (That is no Map abc). Py2’s map produced a sequence. Py3’s map produced an iterable. So any API that was expecting a sequence could accept the result of a py2 map, but not a py3 map. There is absolutely nothing special about map here. The example of range has been brought up, but I don’t think it’s analogous — py2 range returns a list, py3 range returns an immutable sequence. Because that’s as close as we can get to a sequence while preserving the lazy evaluation that is wanted. I _think_ someone may be advocating that map() could return an iterable if it is passed a iterable, and a sequence of it is passed a sequence. Yes, it could, but that seems like a bad idea to me. But folks are proposing a “map” that would produce a lazy-evaluated sequence. Sure — as Paul said, put it up on pypi and see if folks find it useful. Personally, I’m still finding it hard to imagine a use case where you need the sequence features, but also lazy evaluation is important. Sure: range() has that, but it came at almost zero cost, and I’m not sure the sequence features are used much. Note: the one use-case I can think of for a lazy evaluated sequence instead of an iterable is so that I can pick a random element with random.choice(). (Try to pick a random item from. a dict), but that doesn’t apply here—pick a random item from the source sequence instead. But this is specific example of a general use case: you need to access only a subset of the mapped sequence (or access it out of order) so using the iterable version won’t work, and it may be large enough that making a new sequence is too resource intensive. Seems rare to me, and in many cases, you could do the subsetting before applying the function, so I think it’s a pretty rare use case. But go ahead and make it — I’ve been wrong before :-) -CHB Sent from my iPhone

On Tue, Dec 11, 2018 at 11:10 AM Terry Reedy <tjreedy@udel.edu> wrote:
well, the iterator / iterable distinction is important in this thread in many places, so I should have been more careful about that -- but not for this reason. Yes, a a sequence is an iterable, but what I meant was an "iterable-that-is-not-a-sequence". -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Steven D'Aprano wrote:
I suggest we provide a separate mapview() type that offers only the lazy sequence API, without trying to be an iterator at the same time.
Then we would be back to the bad old days of having two functions that do almost exactly the same thing. My suggestion was made in the interests of moving the language in the direction of having less warts, rather than adding more or moving the existing ones around. I acknowledge that the dual interface is itself a bit wartish, but it's purely for backwards compatibility, so it could be deprecated and eventually removed if desired. -- Greg

On Wed, Dec 12, 2018 at 11:31:03AM +1300, Greg Ewing wrote:
They aren't "almost exactly the same thing". One is a sequence, which is a rich API that includes random access to items and a length; the other is an iterator, which is an intentionally simple API which fails to meet the needs of some users.
It's a "bit wartish" in the same way that the sun is "a bit warmish".
but it's purely for backwards compatibility
And it fails at that too. x = map(str.upper, "abcd") x is iter(x) returns True with the current map, an actual iterator, and False with your hybrid. Current map() is a proper, non-broken iterator; your hybrid is a broken iterator. (That's not me being derogative: its the official term for iterators which don't stay exhausted.) I'd be more charitable if I thought the flaws were mere bugs that could be fixed. But I don't think there is any way to combine two incompatible interfaces, the sequence and iterator APIs, into one object without these sorts of breakages. Take the __next__ method out of your object, and it is a better version of what I proposed earlier. With the __next__ method, its just broken. -- Steve

On 12/1/2018 8:07 PM, Greg Ewing wrote:
Steven D'Aprano wrote:
After defining a separate iterable mapview sequence class
I presume you mean the '(iterable) sequence' 'iterator' worlds. I don't think they should be mixed. A sequence is reiterable, an iterator is once through and done.
The last two (unnecessarily) restrict this to being a once through iterator. I think much better would be def __iter__: return map(self.func, *self.args) -- Terry Jan Reedy

On 12/1/2018 2:08 PM, Steven D'Aprano wrote:
This proof of concept wrapper class could have been written any time since Python 1.5 or earlier:
class lazymap: def __init__(self, function, sequence):
One could now add at the top of the file from collections.abc import Sequence and here if not isinstance(sequence, Sequence): raise TypeError(f'{sequence} is not a sequence')
For 3.x, I would add def __iter__: return map(self.function, self.sequence) but your point that iteration is possible even without, with the old protocol, is well made.
-- Terry Jan Reedy

To illustrate the distinction that someone (I think Steven D'Aprano) makes, I think these two (modestly tested, but could have flaws) implementations are both sensible for some purposes. Both are equally "obvious," yet they are different:
I wasn't sure what to set self._len to where it doesn't make sense. I thought of None which makes len(mo) raise one exception, or -1 which makes len(mo) raise a different exception. I just choose an arbitrary "big" value in the above implementation. mo.__length_hint__() is a possibility, but that is specialized, not a way of providing a response to len(mo). I don't have to, but I do keep around mo._seqs as a handle to the underlying sequences. In concept those could be re-inspected for other properties as the user of the classes desired. On Sat, Dec 1, 2018 at 12:28 PM David Mertz <mertz@gnosis.cx> wrote:
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.

I raised a related problem a while back when I found that random.sample can only take a sequence. The example I gave was randomly sampling points on a 2D grid to initialize a board for Conway's Game of Life:
It seems like there should be some way to pass along the information that the size *is* known, but I couldn't think of any way of passing that info along without adding massive amounts of complexity everywhere. If map is able to support len() under certain circumstances, it makes sense that other iterators and generators would be able to do the same. You might even want a way to annotate a generator function with logic about how it might support len(). I don't have an answer to this problem, but I hope this provides some sense of the scope of what you're asking. On Mon, Nov 26, 2018 at 3:36 PM Kale Kundert <kale@thekunderts.net> wrote:
participants (19)
-
Abe Dillon
-
Adam Johnson
-
Adrien Ricocotam
-
Anders Hovmöller
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
danish bluecheese
-
David Mertz
-
E. Madison Bray
-
Greg Ewing
-
Jonathan Fine
-
Kale Kundert
-
Michael Selik
-
Paul Moore
-
Paul Svensson
-
Steven D'Aprano
-
Terry Reedy
-
Todd