[Python-ideas] Integrate some itertools into the Python syntax

Michel Desmoulin desmoulinmichel at gmail.com
Tue Mar 22 08:45:21 EDT 2016



Le 22/03/2016 01:45, Andrew Barnert a écrit :
> On Mar 21, 2016, at 16:06, Michel Desmoulin <desmoulinmichel at gmail.com> wrote:
> 
> In addition to all the issues Chris raised...
>>
>> So my first proposal is to be able to do:
>>
>> def stop(element):
>>    return element > 4
>> print(numbers[:stop])
> 
> The first issue is pretty minor compared to the later ones, but already shows the problems of thinking about only lists and iterators, so I won't skip it:
> 
> Applying this to a sequence that copies when slicing makes sense. Applying it to an iterator (together with your #2) makes sense. Applying it to a type that returns views on slicing, like a NumPy array, doesn't necessarily make sense, especially f that type is mutable, like a NumPy array. (Should the view change if the first element > 4 changes?)
> 

Numpy is not part of the stdlib. We should not prevent adding a feature
in Python because it will not immediately benefit an 3rd party lib, even
a famous one. The proposal doesn't hurt Numpy : they override
__getitem__ and choose not to accept the default behavior anyway, and
are not affected by the default behavior.

Besides, you generally try to not mix Numpy and non Numpy manipulation
code as it has it's own semantics (no for loop, special slicing, etc.).


>> Slicing any iterable
>> ======================
>>
>> Now, while I do like islice, I miss the straigthforwardness of [:]:
>>
>>
>> from itertools import islice
>>
>> def func_accepting_any_iterable(foo):
>>    return bar(islice(foo, 3, 7))
>>
>> It's verbose, and renders the [3:7] syntaxe almost useless if you don't
>> have control over the code creating the iterable you are going to
>> process since you don't know what it's going to be.
>>
>> So the second proposal is to allow:
>>
>> def func_accepting_any_iterable(foo):
>>    return bar(foo[3:7])
>>
>> The slicing would then return a list if it's a list, a typle if it's a
>> tuple, and a islice(generator) if it's a generator. If somebody uses a
>> negative index, it would then raises a ValueError like islice.
> 
> And what if it's a dict? Returning an islice of an iterator over the dict _works_, but it's almost certainly not what you want, because an iterator over the dict gives you the dict's keys, not its values. If d[3:7] means anything, I'd expect it to mean something like {k: v for (k, v) in d.items() if 3<=k<7}, or {v for ...same...}, not 4 arbitrary keys out of the dict. (Imagine that d's keys are all integers. Surely you'd want it to include d[3], d[4], d[5], and d[6], right?)
> 

This is a point to discuss. I would raise a ValueError, trying to slice
a dict is almost always a mistake.

E.G: if you design a function that needs an argument to be sliceable
(event with islice), you usually don't want people to pass in dicts, and
when you strangely do (to sample maybe ?), then you would cast it
manually. It would be a rare use case, compared to the multiple
occasions you need a more generic slicing.

> And what if it's one of the collections on PyPI that already provides a non-islice-like meaning for slice syntax? For example, many of the sorted-dict types do key slicing, which returns you something like {k: v for (k, v) in d.items() if 3<=k<7} but still sorted, in log rather than linear time, and sometimes a view rather than a copy.
> 

See the point about Numpy.


> And what if it's a collection for which indexing makes no sense, not even the wrong kind of sense, like a set? It'll slice out 4 values in arbitrary order iff there are at least 7 values. What's the good in that?
> 

See the point about dict.

> And what if it's an iterator that isn't a generator? Does it just return some arbitrary new iterator type, even if the input type provided some additional interface on top of Iterator (as generators do)?
> 

This can be discussed and is more about the proper implementation, but
does not discard the validity of the idea.

> Also, even with iterators a lot of things you do with slicing no longer make sense, but would run and silently do the wrong thing. For example:
> 
>     if matches_header(seq[:6]):
>         handle_body(seq[6:])
> 
> If seq is an iterator, that first line is going to consume the first 6 elements, which means the second is now going to start on the 12th element rather than the 6th. It will be a lot less fun to debug "why do some of my messages lose their first word or so, but others work fine?" than the current "why do I sometimes get a TypeError telling me that type list_iterator isn't indexable?"

Either you use generators or you don't. If you use generators, you know
they will be consumed when you pass them around. This has nothing to do
with the slicing syntax.

The one problem I can see is when:

- seq is a generator you didn't produce;
- you don't know it's a generator.
- you get a surprising behavior because slicing cause no errors.

It's an edge case and is worth considering if it's going to be a blocker
or not.

Also, one alternative is to only add slicing to all objects returned by
iter() in the stdlib. This would force people to explicitly mark that
they know what they are doing, and while less convenient, remains very
handy.

>> Chaining iterable
>> ==================
>>
>> Iterating on heterogenous iterable is not clear.
>>
>> You can add lists with lists and tuples with tuples, but if you need
>> more, then you need itertools.chain. Few people know about it, so I
>> usually see duplicate loops and conversion to lists/tuples.
>>
>> So My first proposal is to overload the "&" operator so that anything
>> defining __iter__ can be used with it.
> 
> So, what does {1} & {1, 2, 3} do? This is a trick question: sets already define the & operator, as do other set-like collections. So this proposal either has to treat sets as not iterable, or break the set interface.

Indeed I forgot about sets. Maybe there is another operator that would
do the trick, such as "<<". We can't use "+" as it would be confusing,
and sets() overide a LOT of operators.

Anyway, let's not throw the baby with the water. This is the least
important part of the proposal, and any part can be changed, improved or
ditched. That's what Python-ideas is for.

> 
> All of these are part of the same problem: you're assuming that all iterables are either sequences or generators, but many of them--including two very important built-in types, not to mention all of the builtin types' iterators--are not.

"Slicing any iterable" was a bad title. It should have been "extend the
slicing application". I'm not assuming any iterable is a sequence or a
generator, I think we can come up with a reasonable behavior for the
case slicing doesn't make sense, adding more power to this handy tool.

Please also consider the iter() proposal as a more verbose alternative,
yet still powerful alternative. Maybe even easier to implement ?



More information about the Python-ideas mailing list