[Python-ideas] Suggested MapView object (Re: __len__() for map())

Steven D'Aprano steve at pearwood.info
Tue Dec 11 09:47:30 EST 2018


On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote:
[...]
> I'm still confused -- what's so wrong with:
> 
> list(map(func, some_iterable))
> 
> if you need a sequence?

You might need a sequence. Why do you think that has to be an *eager* 
sequence?

I can think of two obvious problems with eager sequences: space and 
time. They can use too much memory, and they can take too much time to 
generate them up-front and too much time to reap when they become 
garbage. And if you have an eager sequence, and all you want is the 
first item, you still have to generate all of them even though they 
aren't needed.

We can afford to be profligate with memory when the data is small, but 
eventually you run into cases where having two copies of the data is one 
copy too many.


> You can, of course mike lazy-evaluated sequences (like range), and so you
> could make a map-like function that required a sequence as input, and would
> lazy evaluate that sequence. This could be useful if you weren't going to
> work with the entire collection, 

Or even if you *are* going to work with the entire collection, but you 
don't need them all at once. I once knew a guy whose fondest dream was 
to try the native cuisine of every nation of the world ... but not all 
in one meal.

This is a classic time/space tradeoff: for the cost of calling the 
mapping function anew each time we index the sequence, we can avoid 
allocating a potentially huge list and calling a potentially expensive 
function up front for items we're never going to use. Instead, we call 
it only on demand.

These are the same principles that justify (x)range and dict views. Why 
eagerly generate a list up front, if you only need the values one at a 
time on demand? Why make a copy of the dict keys, if you don't need a 
copy? These are not rhetorical questions.

This is about avoiding the need to make unnecessary copies for those 
times we *don't* need an eager sequence generated up front, keeping the 
laziness of iterators and the random-access of sequences.

map(func, sequence) is a great candidate for this approach. It has to 
hold onto a reference to the sequence even as an iterator. The function 
is typically side-effect free (a pure function), and if it isn't, 
"consenting adults" applies. We've already been told there's at least 
one major Python project, Sage, where this would have been useful.

There's a major functional language, Haskell, where nearly all sequence 
processing follows this approach.

I suggest we provide a separate mapview() type that offers only the lazy 
sequence API, without trying to be an iterator at the same time. If you 
want an eager sequence, or an iterator, they're only a single function 
call away:

    list(mapview_instance)
    iter(mapview_instance)  # or just stick to map()

Rather than trying to guess whether people want to treat their map 
objects as sequences or iterators, we let them choose which they want 
and be explicit about it.

Consider the history of dict.keys(), values() and items() in Python 2. 
Originally they returned eager lists. Did we try to retrofit view-like 
and iterator-like behaviour onto the existing dict.keys() method, 
returning a cunning object which somehow turned from a list to a view to 
an iterator as needed? Hell no! We introduced *six new methods* on 
dicts:

- dict.iterkeys()
- dict.viewkeys()

and similar for items() and values().

Compared to that, adding a single variant on map() that expects a 
sequence and returns a view on the sequence seems rather timid.



-- 
Steve


More information about the Python-ideas mailing list