[Python-ideas] Integrate some itertools into the Python syntax

Andrew Barnert abarnert at yahoo.com
Mon Mar 21 20:45:08 EDT 2016


On Mar 21, 2016, at 16:06, Michel Desmoulin <desmoulinmichel at gmail.com> wrote:

In addition to all the issues Chris raised...
> 
> So my first proposal is to be able to do:
> 
> def stop(element):
>    return element > 4
> print(numbers[:stop])

The first issue is pretty minor compared to the later ones, but already shows the problems of thinking about only lists and iterators, so I won't skip it:

Applying this to a sequence that copies when slicing makes sense. Applying it to an iterator (together with your #2) makes sense. Applying it to a type that returns views on slicing, like a NumPy array, doesn't necessarily make sense, especially f that type is mutable, like a NumPy array. (Should the view change if the first element > 4 changes?)

> Slicing any iterable
> ======================
> 
> Now, while I do like islice, I miss the straigthforwardness of [:]:
> 
> 
> from itertools import islice
> 
> def func_accepting_any_iterable(foo):
>    return bar(islice(foo, 3, 7))
> 
> It's verbose, and renders the [3:7] syntaxe almost useless if you don't
> have control over the code creating the iterable you are going to
> process since you don't know what it's going to be.
> 
> So the second proposal is to allow:
> 
> def func_accepting_any_iterable(foo):
>    return bar(foo[3:7])
> 
> The slicing would then return a list if it's a list, a typle if it's a
> tuple, and a islice(generator) if it's a generator. If somebody uses a
> negative index, it would then raises a ValueError like islice.

And what if it's a dict? Returning an islice of an iterator over the dict _works_, but it's almost certainly not what you want, because an iterator over the dict gives you the dict's keys, not its values. If d[3:7] means anything, I'd expect it to mean something like {k: v for (k, v) in d.items() if 3<=k<7}, or {v for ...same...}, not 4 arbitrary keys out of the dict. (Imagine that d's keys are all integers. Surely you'd want it to include d[3], d[4], d[5], and d[6], right?)

And what if it's one of the collections on PyPI that already provides a non-islice-like meaning for slice syntax? For example, many of the sorted-dict types do key slicing, which returns you something like {k: v for (k, v) in d.items() if 3<=k<7} but still sorted, in log rather than linear time, and sometimes a view rather than a copy.

And what if it's a collection for which indexing makes no sense, not even the wrong kind of sense, like a set? It'll slice out 4 values in arbitrary order iff there are at least 7 values. What's the good in that?

And what if it's an iterator that isn't a generator? Does it just return some arbitrary new iterator type, even if the input type provided some additional interface on top of Iterator (as generators do)?

Also, even with iterators a lot of things you do with slicing no longer make sense, but would run and silently do the wrong thing. For example:

    if matches_header(seq[:6]):
        handle_body(seq[6:])

If seq is an iterator, that first line is going to consume the first 6 elements, which means the second is now going to start on the 12th element rather than the 6th. It will be a lot less fun to debug "why do some of my messages lose their first word or so, but others work fine?" than the current "why do I sometimes get a TypeError telling me that type list_iterator isn't indexable?"

> Chaining iterable
> ==================
> 
> Iterating on heterogenous iterable is not clear.
> 
> You can add lists with lists and tuples with tuples, but if you need
> more, then you need itertools.chain. Few people know about it, so I
> usually see duplicate loops and conversion to lists/tuples.
> 
> So My first proposal is to overload the "&" operator so that anything
> defining __iter__ can be used with it.

So, what does {1} & {1, 2, 3} do? This is a trick question: sets already define the & operator, as do other set-like collections. So this proposal either has to treat sets as not iterable, or break the set interface.

All of these are part of the same problem: you're assuming that all iterables are either sequences or generators, but many of them--including two very important built-in types, not to mention all of the builtin types' iterators--are not.



More information about the Python-ideas mailing list