[Python-3000] map() Returns Iterator

Nicko van Someren nicko at nicko.org
Mon Aug 6 14:10:11 CEST 2007


On 4 Aug 2007, at 06:11, Kurt B. Kaiser wrote:

> Although there has been quite a bit of discussion on dropping reduce()
> and retaining map(), filter(), and zip(), there has been less  
> discussion
> (at least that I can find) on changing them to return iterators  
> instead
> of lists.
>
> I think of map() and filter() as sequence transformers.  To me, it's
> an unexpected semantic change that the result is no longer a list.

I agree.  In almost all of the cases where I would naturally use map  
rather than a list comprehension either I want the transformed list  
(rather something that can generate it) or I want the function  
explicitly called on all the elements of the source list right away  
(rather than some time later, or perhaps never).

> In existing Lib/ code, it's twice as likely that the result of map()
> will be assigned than to use it as an iterator in a flow control
> statement.
>
> If the statistics on the usage of map() stay the same, 2/3 of the time
> the current implementation will require code like
>
>         foo = list(map(fcn, bar)).

I presume that if this semantic change stays we are going to have to  
add something to 2to3 which will force the creation of a list from  
the result of any call to map.

> map() and filter() were retained primarily because they can produce
> more compact and readable code when used correctly.  Adding list()  
> most
> of the time seems to diminish this benefit, especially when  
> combined with
> a lambda as the first arg.
>
> There are a number of instances where map() is called for its side
> effect, e.g.
>
>         map(print, line_sequence)
>
> with the return result ignored.  In py3k this has caused many silent
> failures.  We've been weeding these out, and there are only a couple
> left, but there are no doubt many more in 3rd party code.

I'm sure that there are lots of these.  Other scenarios which will  
make for ugly bugs include things like map(db_commit,  
changed_record_list).

> The situation with filter() is similar, though it's not used purely
> for side effects.  zip() is infrequently used.  However, IMO for
> consistency they should all act the same way.

Filter returning an iterator is going to break lots of code which  
says things like:
	interesting_things = filter(predicate, things)
	...
	if foo in interesting_things: ...

Again, if this semantic stays then 2to3 better fix it.  Arguably 2to3  
could translate a call to filter() to a list comprehension.

> I've seen GvR slides suggesting replacing map() et. al. with list
> comprehensions, but never with generator expressions.
>
> PEP 3100: "Make built-ins return an iterator where appropriate
> (e.g. range(), zip(), map(), filter(), etc.)"
>
> It makes sense for range() to return an iterator.  I have my doubts on
> map(), filter(), and zip().  Having them return iterators seems to
> be a premature optimization.  Could something be done in the ast phase
> of compilation instead?

Looking through code I've written, I suspect that basically whenever  
I use map(), filter() or zip() in any context other than in a for...  
loop I am after the concrete list and not an iterator for it.

I would hesitate to suggest that it be optimised at compile time,  
irrespective of the issues resulting from these being built-ins  
rather that keywords (and thus can be reassigned).  Consider we have  
a function f() has a printing side effect, then we have:
	for j in [f(i) for i in range(3)]: print j
	f:  0
	f:  1
	f:  2
	0
	1
	2
And we have
	for j in (f(i) for i in range(3)): print j
	f:  0
	0
	f:  1
	1
	f:  2
	2
We're talking about changing the behaviour of:
	for j in map(f, range(3)): print j
from the former to the later.  If we did some AST phase optimisation  
so that most of the time map() returned a list but it gave an  
iterator if it was used inside a for... loop I think it would be  
dreadfully confusing.

IMHO, when I read "Make built-ins return an iterator where  
appropriate..." I'm inclined to think that it's appropriate for range 
() and but not for the others.

	Nicko



More information about the Python-3000 mailing list