[Python-ideas] Consider making enumerate a sequence if its argument is a sequence
abarnert at yahoo.com
Wed Sep 30 23:33:44 CEST 2015
On Sep 30, 2015, at 12:47, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 30.09.2015 21:19, Andrew Barnert wrote:
>>> On Sep 30, 2015, at 11:43, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>> On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote:
>>>>>> On Sep 30, 2015, at 11:11, M.-A. Lemburg <mal at egenix.com> wrote:
>>>>>> On 30.09.2015 19:19, Neil Girdhar wrote:
>>>>>> I guess, I'm just asking for enumerate to go through the same change that
>>>>>> range went through. Why wasn't it a problem for range?
>>>>> range() returns a list in Python 2 and a generator in Python 3.
>>>> No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator.
>>> You are right that it's not of a generator type
>>> and more like a lazy sequence. To be exact, it returns
>>> a range object and does implement the iter protocol via
>>> a range_iterator object.
>> To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence.
>> In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient—e.g., a range that fits into C longs can test an int for __contains__ in constant time).
>>>> I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.)
>>> Perhaps because it behaves like one ? :-)
>>> Unlike an iterator, it doesn't iterate over a sequence, but instead
>>> generates the values on the fly.
>> You're confusing things even worse here.
> I guess I used the wrong level of detail. I was trying
> explain things in terms of concepts, not object types,
> isinstance() and ABCs.
But you're conflating the concept of "lazy" with the concept of "iterator". While generators, and iterators in general, are always technically lazy and nearly-always practically lazy, lazy things are not always iterators. Range, dict views, memoryview/buffer objects, NumPy slices, third-party lazy-list types, etc. are not generators, nor are they like generators in any way, except for being lazy. They're lazy sequences (well, except for the ones that aren't sequences, but they're still lazy containers, or lazy non-iterator iterables if you want to stick to terms in the glossary).
And I think experienced developers conflating the two orthogonal concepts is part of what leads to novices getting confused. They think that if they want laziness, they need a generator. That makes them unable to even form the notion that what they really want is a view/lazy container/virtual container even when that's what they want.
And it makes it hard to discuss issues like this thread clearly.
(The fact that we don't have a term for "non-iterator iterable", and that experienced users and even the documentation sometimes use the term "sequence" for that, only makes things worse. For example, a dict_keys is not a sequence in any useful sense, but the glossary says it is, because there is no word for what it wants to say.)
> Back on the topic:
> The way I understand the proposal is that Neil wants the
> above to return:
>>>> isinstance(e, collections.Sequence)
>>>> isinstance(e, collections.Iterator)
> iff isinstance(arg, collections.Sequence)
That's one way to give him what he wants.
But another option would be to always return a lazy sequence--the same kind you'd get if you picked one of the LazyList classes off PyPI (which provide a sequence interface by iterating and caching an iterable), and just wrote "e = LazyList(enumerate(arg))". This is still only creating the values on demand, and only consuming the iterator (if that's what it's given) as needed. (Of course it does mean you can now demand multiple values at once from that iterator, e.g., by calling e or len(e) when arg was an iterator.)
Or you could be even cleverer: enumerate always returns a lazy sequence, which uses random access if given a sequence, cached iteration if given any other iterable. That gives you the best of both worlds, right?
Either of these avoids the problem that the type of enumerate depends on the type of its input, and the more serious problem that you can't tell from inspection whether what it returns is reusable or one-shot, but of course they introduce other problems.
I don't think any of the three is worth doing. The three most consistent ways of doing this, if you were designing a language from scratch, seem to be:
1. Python: Always return an iterator; if people want sequence behavior (with whatever variety of laziness they desire), they can wrap it.
2. Haskell: Make everything in the language as lazy as possible, so you can just always return a list, and it will automatically be as lazy as possible.
3. Swift: Merge indexing and iteration, and bake in views as a fundamental concept, so you can always return a view, but whether its indices are random-access or not depends on whether its input's indices are.
I'm not sure that #1 is the best of the three, but it is exactly what Python already has, and the other two would be very hard to get to from here, so I think #1 is the best for Python 3.6 (or 4.0).
(The blog post I referenced earlier in the thread explores whether we could get to #3, or get part-way there, from here; if you don't agree that it would be harder than is worth doing, please read it and point out where I went wrong. Because that could be pretty cool.)
More information about the Python-ideas