[Python-ideas] Integrate some itertools into the Python syntax
Andrew Barnert
abarnert at yahoo.com
Mon Mar 21 20:45:08 EDT 2016
On Mar 21, 2016, at 16:06, Michel Desmoulin <desmoulinmichel at gmail.com> wrote:
In addition to all the issues Chris raised...
>
> So my first proposal is to be able to do:
>
> def stop(element):
> return element > 4
> print(numbers[:stop])
The first issue is pretty minor compared to the later ones, but already shows the problems of thinking about only lists and iterators, so I won't skip it:
Applying this to a sequence that copies when slicing makes sense. Applying it to an iterator (together with your #2) makes sense. Applying it to a type that returns views on slicing, like a NumPy array, doesn't necessarily make sense, especially f that type is mutable, like a NumPy array. (Should the view change if the first element > 4 changes?)
> Slicing any iterable
> ======================
>
> Now, while I do like islice, I miss the straigthforwardness of [:]:
>
>
> from itertools import islice
>
> def func_accepting_any_iterable(foo):
> return bar(islice(foo, 3, 7))
>
> It's verbose, and renders the [3:7] syntaxe almost useless if you don't
> have control over the code creating the iterable you are going to
> process since you don't know what it's going to be.
>
> So the second proposal is to allow:
>
> def func_accepting_any_iterable(foo):
> return bar(foo[3:7])
>
> The slicing would then return a list if it's a list, a typle if it's a
> tuple, and a islice(generator) if it's a generator. If somebody uses a
> negative index, it would then raises a ValueError like islice.
And what if it's a dict? Returning an islice of an iterator over the dict _works_, but it's almost certainly not what you want, because an iterator over the dict gives you the dict's keys, not its values. If d[3:7] means anything, I'd expect it to mean something like {k: v for (k, v) in d.items() if 3<=k<7}, or {v for ...same...}, not 4 arbitrary keys out of the dict. (Imagine that d's keys are all integers. Surely you'd want it to include d[3], d[4], d[5], and d[6], right?)
And what if it's one of the collections on PyPI that already provides a non-islice-like meaning for slice syntax? For example, many of the sorted-dict types do key slicing, which returns you something like {k: v for (k, v) in d.items() if 3<=k<7} but still sorted, in log rather than linear time, and sometimes a view rather than a copy.
And what if it's a collection for which indexing makes no sense, not even the wrong kind of sense, like a set? It'll slice out 4 values in arbitrary order iff there are at least 7 values. What's the good in that?
And what if it's an iterator that isn't a generator? Does it just return some arbitrary new iterator type, even if the input type provided some additional interface on top of Iterator (as generators do)?
Also, even with iterators a lot of things you do with slicing no longer make sense, but would run and silently do the wrong thing. For example:
if matches_header(seq[:6]):
handle_body(seq[6:])
If seq is an iterator, that first line is going to consume the first 6 elements, which means the second is now going to start on the 12th element rather than the 6th. It will be a lot less fun to debug "why do some of my messages lose their first word or so, but others work fine?" than the current "why do I sometimes get a TypeError telling me that type list_iterator isn't indexable?"
> Chaining iterable
> ==================
>
> Iterating on heterogenous iterable is not clear.
>
> You can add lists with lists and tuples with tuples, but if you need
> more, then you need itertools.chain. Few people know about it, so I
> usually see duplicate loops and conversion to lists/tuples.
>
> So My first proposal is to overload the "&" operator so that anything
> defining __iter__ can be used with it.
So, what does {1} & {1, 2, 3} do? This is a trick question: sets already define the & operator, as do other set-like collections. So this proposal either has to treat sets as not iterable, or break the set interface.
All of these are part of the same problem: you're assuming that all iterables are either sequences or generators, but many of them--including two very important built-in types, not to mention all of the builtin types' iterators--are not.
More information about the Python-ideas
mailing list