[Python-ideas] Suggested MapView object (Re: __len__() for map())
Steven D'Aprano
steve at pearwood.info
Tue Dec 11 09:47:30 EST 2018
On Mon, Dec 10, 2018 at 05:15:36PM -0800, Chris Barker via Python-ideas wrote:
[...]
> I'm still confused -- what's so wrong with:
>
> list(map(func, some_iterable))
>
> if you need a sequence?
You might need a sequence. Why do you think that has to be an *eager*
sequence?
I can think of two obvious problems with eager sequences: space and
time. They can use too much memory, and they can take too much time to
generate them up-front and too much time to reap when they become
garbage. And if you have an eager sequence, and all you want is the
first item, you still have to generate all of them even though they
aren't needed.
We can afford to be profligate with memory when the data is small, but
eventually you run into cases where having two copies of the data is one
copy too many.
> You can, of course mike lazy-evaluated sequences (like range), and so you
> could make a map-like function that required a sequence as input, and would
> lazy evaluate that sequence. This could be useful if you weren't going to
> work with the entire collection,
Or even if you *are* going to work with the entire collection, but you
don't need them all at once. I once knew a guy whose fondest dream was
to try the native cuisine of every nation of the world ... but not all
in one meal.
This is a classic time/space tradeoff: for the cost of calling the
mapping function anew each time we index the sequence, we can avoid
allocating a potentially huge list and calling a potentially expensive
function up front for items we're never going to use. Instead, we call
it only on demand.
These are the same principles that justify (x)range and dict views. Why
eagerly generate a list up front, if you only need the values one at a
time on demand? Why make a copy of the dict keys, if you don't need a
copy? These are not rhetorical questions.
This is about avoiding the need to make unnecessary copies for those
times we *don't* need an eager sequence generated up front, keeping the
laziness of iterators and the random-access of sequences.
map(func, sequence) is a great candidate for this approach. It has to
hold onto a reference to the sequence even as an iterator. The function
is typically side-effect free (a pure function), and if it isn't,
"consenting adults" applies. We've already been told there's at least
one major Python project, Sage, where this would have been useful.
There's a major functional language, Haskell, where nearly all sequence
processing follows this approach.
I suggest we provide a separate mapview() type that offers only the lazy
sequence API, without trying to be an iterator at the same time. If you
want an eager sequence, or an iterator, they're only a single function
call away:
list(mapview_instance)
iter(mapview_instance) # or just stick to map()
Rather than trying to guess whether people want to treat their map
objects as sequences or iterators, we let them choose which they want
and be explicit about it.
Consider the history of dict.keys(), values() and items() in Python 2.
Originally they returned eager lists. Did we try to retrofit view-like
and iterator-like behaviour onto the existing dict.keys() method,
returning a cunning object which somehow turned from a list to a view to
an iterator as needed? Hell no! We introduced *six new methods* on
dicts:
- dict.iterkeys()
- dict.viewkeys()
and similar for items() and values().
Compared to that, adding a single variant on map() that expects a
sequence and returns a view on the sequence seems rather timid.
--
Steve
More information about the Python-ideas
mailing list