Re: [Python-ideas] Consider making enumerate a sequence if its argument is a sequence

On Wed, 30 Sep 2015 at 09:38 Neil Girdhar <mistersheik@gmail.com> wrote:
No, it's not like multiplication. =) I hate saying this since I think it's tossed around too much, but int/float substitution doesn't lead to a Liskov substitution violation like substituting out a sequence for an iterator (which is what will happen if the type of the argument to `enumerate` changes). And since you can just call `list` or `tuple` on enumerate and get exactly what you're after without potential bugs cropping up if you don't realize from afar you're affecting an assumption someone made, I'm -1000 on this idea. -Brett

Can you help understand how this is a Liskov substitution violation? A Sequence is an Iterator. Getting the sequence back should never hurt. The current interface doesn't promise that the returned object won't have additional methods or implement additional interfaces, does it? On Wed, Sep 30, 2015 at 12:43 PM Brett Cannon <brett@python.org> wrote:

On Wed, Sep 30, 2015 at 9:53 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
Can you help understand how this is a Liskov substitution violation? A Sequence is an Iterator. Getting the sequence back should never hurt.
no but getting a non-sequence iterator back when you expect a sequence sure can hurt. which is why I said that if you want a sequence back from enumerate, it should always return a sequence. which could (should) be lazy-evaluated. I think Neil's point is that calling list() or tuple() on it requires that the entire sequence be evaluated and stored -- if you really only want one item (and especially not one at the end), that could be a pretty big performance hit. Which makes me wonder why ALL iterators couldn't support indexing? It might work like crap in some cases, but wouldn't it always be as good or better than wrapping it in a tuple? And then some cases (like enumerate) could do an index operation efficiently when they are working with "real" sequences. Maybe a generic lazy_sequence object that could be wrapped around an iterator to create a lazy-evaluating sequence?? -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Ah good point. Well, in the case of a sequence argument, an enumerate object could be both a sequence and an iterator. On Wed, Sep 30, 2015 at 1:15 PM Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:

I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range? On Wed, Sep 30, 2015 at 1:18 PM Neil Girdhar <mistersheik@gmail.com> wrote:

On Wed, Sep 30, 2015 at 10:19 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
well, range is simpler -- you don't pass arbitrary iterables into it. It always has to compute integer values according to start, stop, step -- easy to implement as either iteration or indexing. enumerate, on the other hand, takes an arbitrary iterable -- so it can't just index into that iterable if asked for an index. You are right, of course, that it COULD do that if it was passed a sequence in the first place, but then you have an intera e whereby you get a different kind of object depending on how you created it, which is pretty ugly. But again, we could add indexing to enumerate, and have it do the ugly inefficient thing when it's using an underlying non-indexable iterator, and do the efficient thing when it has a sequence to work with, thereby providing the same API regardless. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 9/30/2015 1:28 PM, Chris Barker wrote:
But again, we could add indexing to enumerate, and have it do the ugly inefficient thing when it's using an underlying non-indexable iterator,
If the ugly inefficient thing is to call list(iterable), then that does not work with unbounded iterables. Or the input iterable might produce inputs at various times in the future. -- Terry Jan Reedy

Terry Reedy writes:
I think he means from itertools import islice a = list(islice(iterable, 0, 99))[42]
Or the input iterable might produce inputs at various times in the future.
Horrors! We'll have to add a "block=False" parameter to next(). (We can bikeshed on the default later.) Seriously, I think that one we just have to live with, just as we already live with it in any context where we access an iterable. Regards,

On 30.09.2015 19:19, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
range() returns a list in Python 2 and a generator in Python 3. enumerate() has never returned a sequence. It was one of the first builtin APIs in Python to return a generator: https://www.python.org/dev/peps/pep-0279/ after iterators and generators were introduced to the language: https://www.python.org/dev/peps/pep-0234/ https://www.python.org/dev/peps/pep-0255/ The main purpose of enumerate() is to allow enumeration of objects in a sequence or other iterable. If you need a sequence, simply wrap it with list(), e.g. list(enumerate(sequence)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sep 30, 2015, at 11:11, M.-A. Lemburg <mal@egenix.com> wrote:
No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) There's no conceptual reason that Python couldn't have more lazy sequences, and tools to build your own lazy sequences more easily. However, things do get messy once you get into the details. For example, zip can return a lazy sequence if given only sequences, but what if it's given iterators, or other iterables that aren't sequences; filter can return something that's sort of like a sequence in that it can be repeatedly iterated but it can't be randomly-accessed. You really need a broader concept that integrates iteration and indexing, as in the C++ standard library. Swift provides the perfect example of how you could do something like that without losing the natural features of Python indexing and iteration. But it turns out to be complicated to explain, and to work with, and you end up writing multiple implementations for each iterable-processing function. I don't think the benefit is worth the cost. Another alternative is just to wrap any iterable in a caching LazyList type. This runs into complications because there are different choices that make sense for different uses (obviously you have to handle negative indexing, and obviously you have to handle infinite lists, so... Oops!), so it makes more sense to leave that up to the application to supply whatever lazy list type it needs and use it explicitly.

I just remembered that the last few times related things came up, I wrote some blog posts going into details that I didn't want to have to dump on the list: * http://stupidpythonideas.blogspot.com/2013/08/lazy-restartable-iteration.htm... * http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-vie... * http://stupidpythonideas.blogspot.com/2014/07/lazy-cons-lists.html * http://stupidpythonideas.blogspot.com/2014/07/lazy-python-lists.html * http://stupidpythonideas.blogspot.com/2015/07/creating-new-sequence-type-is-... The one about Swift-style map and filter views is, I think, the most interesting here. The tl;dr is that views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs.

Yup, the swift-style map is a great blog entry Andrew and exactly what I was proposing for enumerate. I 100% agree that "views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs." However, I wonder what Python will look like 5 years from now. Maybe it will be time for more sequences. On Wed, Sep 30, 2015 at 2:32 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote:
You are right that it's not of a generator type and more like a lazy sequence. To be exact, it returns a range object and does implement the iter protocol via a range_iterator object. In Python 2 we have the xrange object which has similar properties, but not the same, e.g. you can't slice it.
I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.)
Perhaps because it behaves like one ? :-) Unlike an iterator, it doesn't iterate over a sequence, but instead generates the values on the fly. FWIW: I don't think many people use the lazy sequence features of range(), e.g. the slicing or index support. By far most uses are in for-loops. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

It doesn't behave like a generator because it doesn't implement send, throw, or close. It's a sequence because it implements: __getitem__, __len__ __contains__, __iter__, __reversed__, index, and count. On Wed, Sep 30, 2015 at 2:43 PM M.-A. Lemburg <mal@egenix.com> wrote:

On Wed, Sep 30, 2015 at 2:46 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
It doesn't behave like a generator because it doesn't implement send, throw, or close.
It is not a generator because Python says it is not:
isinstance(range(0), collections.Generator) False
It's a sequence because it implements: __getitem__, __len__ __contains__, __iter__, __reversed__, index, and count.
Ditto
isinstance(range(0), collections.Sequence) True

On Sep 30, 2015, at 11:43, M.-A. Lemburg <mal@egenix.com> wrote:
To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence. In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient—e.g., a range that fits into C longs can test an int for __contains__ in constant time).
You're confusing things even worse here. A generator is an iterator. It's a perfect subtype relationship. A range does not behave like a generator, or like any other kind of iterator. It behaves like a sequence. Laziness is orthogonal to the iterator-vs.-sequenceness. Dictionary views are also lazy but not iterators, for example. And there's nothing stopping you from writing a generator with "yield from f.readlines()" (except that it would be stupid), which would be an iterator despite being not lazy in any useful sense. Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness.
I've used range as a sequence (or at least a reusable iterable, a sized object, and a container). I've answered questions from people on StackOverflow who are doing so, and seen the highest-rep Python answerer on SO suggest such uses to other people. I don't think I'd ever use the index method (although I did see one SO user who was doing so, to wrap up some arithmetic in a way that avoids a possibly off-by-one error, and wanted to know why it was so slow in 3.1 but worked fine in 3.2...), but there's no reason range should be a defective "not-quite-sequence" instead of a sequence. What would be the point of that?

On Sep 30, 2015, at 12:19, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness.
I've just remembered that I said the exact same thing last time this discussion came up (less than 4 months ago), and someone pointed out to me that the docs already define the word "view" in the glossary specifically for dict/mapping views, and use the term "lazy sequence" in that definition, and use the term "virtual sequence" elsewhere. It's worth noting that dict views are not actually sequences, so defining view in terms of lazy sequence is probably not a good idea... Anyway, we probably don't need to invent any new terms; maybe we just need to pick some wording, define it clearly, and use it consistently throughout the docs.

On 30.09.2015 21:19, Andrew Barnert wrote:
I guess I used the wrong level of detail. I was trying explain things in terms of concepts, not object types, isinstance() and ABCs. The reason was that the subject line makes a suggestion which simply doesn't fit the main concept behind enumerate: that of generating values on the fly instead of allocating them as sequence. We just got side tracked with range(), since Neil brought this up as example of why changing enumerate() should be possible. Back on the topic:
The way I understand the proposal is that Neil wants the above to return:
iff isinstance(arg, collections.Sequence) and because this only makes sense iff e doesn't actually create a list, enumerate(arg) would have to return a lazy/virtual/whatever-term-you-use-for-generated-on-the-fly sequence :-) Regardless of this breaking backwards compatibility, what's the benefit of such a change ? Just like range(), enumerate() is most commonly used in for-loops, so the added sequence-ishness doesn't buy you anything much (except the need for more words in the glossary :-)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sep 30, 2015, at 12:47, M.-A. Lemburg <mal@egenix.com> wrote:
But you're conflating the concept of "lazy" with the concept of "iterator". While generators, and iterators in general, are always technically lazy and nearly-always practically lazy, lazy things are not always iterators. Range, dict views, memoryview/buffer objects, NumPy slices, third-party lazy-list types, etc. are not generators, nor are they like generators in any way, except for being lazy. They're lazy sequences (well, except for the ones that aren't sequences, but they're still lazy containers, or lazy non-iterator iterables if you want to stick to terms in the glossary). And I think experienced developers conflating the two orthogonal concepts is part of what leads to novices getting confused. They think that if they want laziness, they need a generator. That makes them unable to even form the notion that what they really want is a view/lazy container/virtual container even when that's what they want. And it makes it hard to discuss issues like this thread clearly. (The fact that we don't have a term for "non-iterator iterable", and that experienced users and even the documentation sometimes use the term "sequence" for that, only makes things worse. For example, a dict_keys is not a sequence in any useful sense, but the glossary says it is, because there is no word for what it wants to say.)
That's one way to give him what he wants. But another option would be to always return a lazy sequence--the same kind you'd get if you picked one of the LazyList classes off PyPI (which provide a sequence interface by iterating and caching an iterable), and just wrote "e = LazyList(enumerate(arg))". This is still only creating the values on demand, and only consuming the iterator (if that's what it's given) as needed. (Of course it does mean you can now demand multiple values at once from that iterator, e.g., by calling e[10] or len(e) when arg was an iterator.) Or you could be even cleverer: enumerate always returns a lazy sequence, which uses random access if given a sequence, cached iteration if given any other iterable. That gives you the best of both worlds, right? Either of these avoids the problem that the type of enumerate depends on the type of its input, and the more serious problem that you can't tell from inspection whether what it returns is reusable or one-shot, but of course they introduce other problems. I don't think any of the three is worth doing. The three most consistent ways of doing this, if you were designing a language from scratch, seem to be: 1. Python: Always return an iterator; if people want sequence behavior (with whatever variety of laziness they desire), they can wrap it. 2. Haskell: Make everything in the language as lazy as possible, so you can just always return a list, and it will automatically be as lazy as possible. 3. Swift: Merge indexing and iteration, and bake in views as a fundamental concept, so you can always return a view, but whether its indices are random-access or not depends on whether its input's indices are. I'm not sure that #1 is the best of the three, but it is exactly what Python already has, and the other two would be very hard to get to from here, so I think #1 is the best for Python 3.6 (or 4.0). (The blog post I referenced earlier in the thread explores whether we could get to #3, or get part-way there, from here; if you don't agree that it would be harder than is worth doing, please read it and point out where I went wrong. Because that could be pretty cool.)

On 30.09.2015 23:33, Andrew Barnert via Python-ideas wrote:
I have absolutely no idea what you are talking about here. ;) I have to admit I try to avoid thinking too much about such tiny little details by using generators/lists/sequences/iterables/iterators/did-I-miss-one? directly in for loops only. Thus, the differences between all of them go away pretty fast. But honestly, does it really need to be that complicated? Best, Sven

On 9/30/2015 5:33 PM, Andrew Barnert via Python-ideas wrote:
(The fact that we don't have a term for "non-iterator iterable",
'collection' Some are concrete: they contain reference to actual Python objects. Some are virtual (lazy): they contain the information need to create Python objects as needed. Strings are a bit in between. -- Terry Jan Reedy

On Sep 30, 2015, at 17:04, Terry Reedy <tjreedy@udel.edu> wrote:
That's a perfectly good term, but it's not used that way in the docs, nor is anyone else using it in the discussions so far. Are you suggesting that we should start doing so? There are definitely parts of the docs that could be clarified or simplified with this term, such as the glossary entry and definitions for dict views, which inaccurately use the term "sequence". (And similarly, although not quite as badly, someone in this thread referred to "sequences and sequence-like things", which may be a little more intuitive than my "non-iterator iterables", but still isn't all that clear.) Also, the tutorial uses the phrases "data structures" or "data type" a few zillion times, apparently to avoid having to come up with a term that includes sequences, sets, dicts, and strings without being inaccurate. I've seen novices have no idea what "data structure" means, or get confused by what the difference between a "data type" and a "regular type" is.
Some are concrete: they contain reference to actual Python objects. Some are virtual (lazy): they contain the information need to create Python objects as needed.
I think the docs used to use the word "virtual" as a more specific term than "lazy": a view onto an object that conceptually exists but doesn't actually exist is "virtual" (like range, which is a view into the infinite set of integers), but a view into a real object isn't (like dict_keys, which has a reference to an actual dict), nor is something that isn't conceptually view-like at all (like a RNG iterator), even though they're all "lazy". It looks like the word "virtual" in this context doesn't appear anywhere in the docs anymore, so I suppose it could be repurposed, but if it's just a synonym for "lazy", what's wrong with "lazy"?

On 9/30/2015 7:24 PM, Andrew Barnert via Python-ideas wrote:
Also, the tutorial uses the phrases "data structures" or "data type" a few zillion times, apparently to avoid having to come up with a term that includes sequences, sets, dicts, and strings without being inaccurate. I've seen novices have no idea what "data structure" means, or get confused by what the difference between a "data type" and a "regular type" is.
container? https://docs.python.org/3/library/collections.html Emile

On Sep 30, 2015, at 21:05, Emile van Sebille <emile@fenx.com> wrote:
But that means something with a __contains__ test. Containers don't even have to be iterables. It's true that all of the types discussed in the tutorial are containers, but is that actually the meaning we're looking for, or just something that's coincidentally true? At any rate, even if that does work for the tutorial, I don't think it solves the more general problem. When I want to talk about iterables that give you a different, independent iterator each time you call __iter__, "container" is not the right word for that. Terry's "collection" seems like a better choice, because it doesn't already have a conflicting meaning.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
If either "lazy" or "virtual" means that the contained objects don't exist as python objects until they are accessed, doesn't this extend to strings and arrays (and byte strings, byte arrays, and memory views)?

On Sep 30, 2015, at 21:13, Random832 <random832@fastmail.com> wrote:
There's a sense in which that's true, and in some discussions (e.g., intimately involving the GC or object sharing or optimization of array algorithms) that would be the most relevant sense, but there's also a sense in which they concretely hold all the values in memory, and in most discussions (e.g., talking about generic sequence algorithms) that would be more relevant. I don't think there's a major problem here—we don't need to eliminate all ambiguity from our speech, only the ambiguity that actually gets in the way.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes: ...
(The fact that we don't have a term for "non-iterator iterable", and
All iterators are iterable but some iterables are not iterators. If your code accepts only iterators then use the term *iterator*. Otherwise the term *iterable* could be used. It is misleading to use *iterable* if your code only accepts iterators. If an iterable is an iterator; It is called *iterator*. The term *iterable* implies that some instances are not iterators.

Akira Li <4kir4.1i@gmail.com> writes:
There are three (well, three and a half) kinds of code that consume iterables, how would you describe each simply? 1. Does not call iter, simply calls next. Therefore cannot consume a non-iterator iterable. 2. Calls iter, but can accept an iterator (e.g. only goes through it once) 3. Cannot accept an iterator (goes through it twice, or permanently stores a reference to it, etc) 4. Can accept either, but behaves differently in each case (e.g. zip when passed two of the same iterator) - this can be regarded as a special case of #2.

On Sep 30, 2015, at 19:04, Akira Li <4kir4.1i@gmail.com> wrote:
And this is exactly the problem. We don't have any way to simply describe this thing. Hence all the confusion in this thread, and in similar discussions elsewhere, and even in the docs (e.g., describing dict views as sequences and then going on to explain that they're not actually sequences). The fact that it took your previous message four paragraphs without inventing a new term, to say what I said in one sentence with a new term, demonstrates the problem. As does the fact that my attempted new term, "non-iterator iterable", is sufficiently ugly and not intuitively helpful enough that you felt the need to expand on it for four paragraphs.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
Use *iterable* instead of "non-iterator iterable" -- it is that simple. "dict views" seems a pretty good term for dict views. Are you suggesting to call dict views "non-iterator iterable"? I don't see that it says more than that all dict views are iterable. It seems there is a bug in the glossary: the entry name should be "dict views", not just "view" that is too generic for the description. I've submitted a patch http://bugs.python.org/issue25286
I don't need 4 paragraphs to describe it: if you need an iterator; use the term *iterator* -- otherwise use *iterable* unless you need something more specific e.g., *seq* name is common for generic sequences I don't remember ever using "non-iterator iterable". "non-iterator iterable" does not qualify as more specific. You need to introduce new requirements to the type for that.

Akira Li <4kir4.1i@gmail.com> writes:
The question is, how do you *simply* state the very common requirement for an iterable to not behave in a specific undesirable way that all iterators do, and that it is very uncommon for any iterable other than an iterator to do? Are you opposed to having a word for this concept at all, or do you just not like the terms other people are suggesting?

Random832 <random832@fastmail.com> writes:
That term is **iterable**. As I already said: Specific application may use more specific requirements e.g.: list(iterable): - does it mean that all iterables must be finite? - do we need a special word to describe what list() accepts? set(iterable): - does it mean that all iterables must yield hashable items? - do we need a special word to describe what set() accepts? dict(iterable): - does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts? You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted. https://mail.python.org/pipermail/python-ideas/2015-October/036692.html

On Sep 30, 2015, at 22:59, Akira Li <4kir4.1i@gmail.com> wrote:
This is a link to a reply where you pasted exactly the same text as in this reply—and in a third one. What is that supposed to mean? I feel like you must be trying to get across something really important here, and it's my fault for not getting it, but I still can't get it. Can you try rewording it instead of just pasting the same text again and/or a link to the same text? If it helps, let me try to ask specific questions: Are you arguing one of the following: * there is no such thing as an iterable that isn't an iterator, or an iterable that is repeatable, or an iterable that provides a new iterator each time iter is called? * there are such things, but no corresponding property that can be used to characterize a set? * that such sets do exist, but are never useful to discuss? * that such sets may be useful to discuss, but the names I (and Terry and others) came up with are unhelpful? More concretely: the documentation for dict views goes out of its way to point out that these are not iterators, but a different kind of iterable that's more like a sequence (presumably meaning at least one of the three things above). But it does so inaccurately, by saying they are sequences, which is not true. How could it be rewritten to get that point across accurately, but still concisely and readably?

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
On Sep 30, 2015, at 22:59, Akira Li <4kir4.1i@gmail.com> wrote: ...
It means exactly what it says -- literally: "the word *iterable* may be used in the context when not all iterables are accepted." and list(iterable), set(iterable), dict(iterable) are the specific examples. It is a statement of "how it *is*" and that it is acceptable in my view and there is no need to change it. Obviously, list/set/dict docs describe what subset of iterables they accept. If you agree on that then there is no disagreement. And it should answer the questions from your post. If you disagree then to ground the discussion what _specific_ places in the documentation would you like to change? ...
"How could it be rewritten": I remember posting the link to Python issue already http://bugs.python.org/issue25286

On 10/1/2015 1:59 AM, Akira Li wrote:
finite_iterable
iterable_of_hashables
iterable_of_pairs (whose first member is hashable)
- does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts?
Whatever word is used in the signature, the description should be in the doc and docstring, preferable in the first line. Return a list with members from a finite iterable. Return a set with members from an iterable of hashable objects.* Return a string joining items from an iterable of strings. *Also, equality between object should be transitive
You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.
Right. Each one should be documented properly. -- Terry Jan Reedy

On Sep 30, 2015, at 22:39, Akira Li <4kir4.1i@gmail.com> wrote:
No it isn't. The word "iterable" just means "iterable". When you want to talk about sequences—a subtype of iterables—you don't just say "iterable", you say "sequence". And likewise, when you want to talk about iterables that aren't iterators, or iterables that are repeatable, or any other subtype of iterables, you have to use a word (or phrase) that actually means what you're saying. I don't know how to explain this any better. Everyone else seems to get it, but you just post the same reply to each of them that you posted to me when they try to explain further. What am I not getting across here?
"dict views" seems a pretty good term for dict views. Are you suggesting to call dict views "non-iterator iterable"?
Why would you think that?
There's a much larger problem. The glossary says that dict views are sequences. They aren't. The actual documentation for dict views is a little better, because it explains that they're not actually sequences. But the problem is still there: what the docs are trying to say is that dict views are some kind of non-iterator iterable, but, because we don't have a term form that, they use the incorrect term "sequence".
Why would you use "seq" instead of "sequence" for the name of the abstract sequence type? And, more importantly, what name do you use when you need something more specific than "iterable", but less specific than "sequence"—as in the glossary entry for dict views, for example?
I don't remember ever using "non-iterator iterable".
Why would you expect to remember using it, when you're replying to a message where I invented it for lack of an established better name (and in hopes that someone would come up with one)?
There are things that are iterables, that are not non-iterator iterables, but the reverse is not true. It's a name for a strict subset. Which means it's more specific. As for a new requirement: an iterable is a non-iterator iterable if its __iter__ method does not return self. (If you're going to argue that this requirement can't be checked by, e.g., a structural type checker, remember that neither is the distinction between sequence and mapping, and that doesn't mean they're the same type.)

On 2015-09-30 23:24, Andrew Barnert via Python-ideas wrote:
Not sure I followed all the discussion of these terms, but is your main reason for wanting this term to describe the behavior that non-iterator iterables can be "restarted" (and are so restarted if reused in a different context)? Personally I prefer to take a duck-typing view and focus on what operations you can or can't do on these various things. Whether you call it a view or a virtual indexer or whatever is, to me, less important than what you can do with the object. I agree there are a number of relevant subcategories of objects here, some of which we have a name for and some that we don't. But I think it gets easier if we move from generic nouns like "view" to specific adjectives describing the behaviors the object support. Something like "re-entrant iterable" (meaning if you use it in two for loops right after each other you get the whole thing both times) would focus on that aspect of the behavior. Something like "random-accessible" or "sliceable" if we want to talk about iterables where we can "jump ahead" or slice if needed. It's an interesting idea to think about what kinds of operations (map, filter, etc.) could return iterables supporting what other kinds of operations. That is, can we make sure the result of map/filter can be sliced/indexed/reentered if the source can. To me the interesting question is which of these actual behaviors can usefully and non-mind-bendingly be preserved through map/filter/etc. manipulations. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
"iterables that aren't iterators" unlike sequences do not introduce new requirements -- __iter__ returning non-self doesn't count -- the return value is still an _arbitrary_ iterator e.g., it may return the same iterator each time. Sequences on the other hand do introduce new requirements (__len__, __getitem__ and their specific semantics that distinguishes them from Mappings).
I have no objection to the phrase "repeatable iterable" because it does introduce a useful distinction. ...
I meant in the code e.g., random.choice(seq), to communicate that an arbitrary iterable is not enough.
As I said, I would use the term "dict views". If you mean how "dict views" could be defined then I've already linked more than once to the corresponding Python docs issue with the patch http://bugs.python.org/issue25286
Could you provide a non-hypothetical practical example from existing code of a function that accepts arbitrary iterables but rejects iterators? Perhaps discussing a specific example would help to iron out the terminology.

On 9/30/2015 10:31 PM, Andrew Barnert via Python-ideas wrote:
re-iterable (with implication of same sequence of yields) I have used this for years on python list and do not think I am unique.
Depends of the 'we'. -- Terry Jan Reedy

Akira Li <4kir4.1i@gmail.com> writes:
True or false?: It is reasonable to write algorithms that iterate twice over a passed-in iterable, with the expectation that said iterable will typically be an object (or a view of such an object) which will not be concurrently modified (e.g. by a different thread or by a side-effect of a callback) during the execution of the algorithm, but which does not behave in a useful way when given an iterator, a generator, or any other kind of iterable which exhibits similar behavior whereby the second and further attempts to iterate will yield no items.

Random832 <random832@fastmail.com> writes:
True or false?: do all iterables return the same items twice? http://www.fallacyfiles.org/loadques.html Specific application may use more specific requirements e.g.: list(iterable): - does it mean that all iterables must be finite? - do we need a special word to describe what list() accepts? set(iterable): - does it mean that all iterables must yield hashable items? - do we need a special word to describe what set() accepts? dict(iterable): - does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts? You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.

On Thu, Oct 01, 2015 at 08:15:25AM +0300, Akira Li wrote:
True or false?: do all iterables return the same items twice? http://www.fallacyfiles.org/loadques.html
[Aside: I have no idea what point you are making with the above link.] Of course they don't necessarily do so, but those that don't are not necessarily well-behaved. In the case of sequences and collections, the concept is that (absent any explicit mutation operation), iterating over it twice *should* give the same results, that is the normal expectation. But that isn't enforced, we can write something that breaks that rule: class WeirdIterable: def __getitem__(self, i): if random.random() > 0.9: raise IndexError return random.choice(["fe", "fi", "fo", "fum"]) but most people would consider that to be a pathological case. Yes, you can do it, and maybe you have a reason to do so, but you can't expect other people's code to deal with it gracefully. In the case of iterators, the answer is *certainly not*. Iterators are designed for the express purpose of handling not just the "lazy sequence" case where you choose to calculate results on demand as an optimization, but the case where you *have no choice* because the results are coming from some source which may change from run to run, e.g. an external data source. An iterator *may* repeat if run twice, but there is no expectation that it will do so. It's not just that the rule about repeatability is not enforced, but that there is no such rule in the first place. (By the way, when I talk about running an iterator twice, I'm completely aware that technically you cannot ever do so. What I mean is to iterate over the object, then *recreate the object* in some sense, then iterate over it again.)
No, and no. In principle, list() will quite happily create an infinite list for you, if you have infinite memory :-) The fact that in practice lists are probably limited to something of the order of 2**64 items or less is a mere quality of implementation issue :-) But to be more serious, no, in context we should understand that lists have actual physical limits, and even finite iterables may not be capable of being turned into lists: def gen(): for i in range(10**10000): yield i Perfectly finite in size, but you cannot have a list that big. It's not just *infinite iterables* which are prohibited, that's just a special case of iterables that will provide more items than you have memory to store. And that's not a fixed limit, it will differ from machine to machine. [...]
You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.
Sure. But the distinction is that while there are a whole lot of different iterables: - iterables with a sufficiently small number of items - iterables of hashable items - iterables of (hashable key, item) pairs - iterables of prime numbers less than one million - iterables of strings containing exactly 1 vowel etc they are special cases and don't need specialised names. But there is a *general* distinction between two cases: - iterables which are iterators - iterables which are not iterators We have a name for the first set: "iterators". But we don't have a name for the second set. Andrew suggested "non-iterator iterables" is too clumsy for general use, and suggests we need a better name. You suggested "iterables", but that clearly cannot work, since iterators are a kind of iterable. -- Steve

On Thu, Oct 1, 2015 at 8:10 AM, Steven D'Aprano <steve@pearwood.info> wrote:
sure -- but I've lost track of why it matters. "iterator" is well defined. And so is "iterable" -- why do we need to care whether the iterable returns itself when asked for an iterator? the term "sequence" is useful -- it defines certain behavior. So is the term "iterable", for the same reason. And it would be useful to say that given object is both a sequence and an iterable (are sequences iterable by definition?) But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 01.10.2015 20:29, Chris Barker wrote:
You say some terms are useful because they define certain behavior. I don't question this, but what I find questionable is the proliferation of all this equally-sounding and -feeling concepts. Your reaction supports that observation. Best, Sven

On Thu, Oct 1, 2015, at 14:29, Chris Barker wrote:
But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail?
Because an iterator *cannot possibly* allow you to loop through the contents twice [either one after the other or in parallel], whereas *most* non-iterator iterables do allow this. This (among other things such as representing a well-defined finite bag of values) is the property we're really chasing, "non-iterator iterable" is just a clumsy and inaccurate way of saying it. (I'm actually moderately disappointed, incidentally, that there's no easy way to create e.g. an iterable that will spin up a fresh copy of the same generator each time it's called. But it's easy enough to make a decorator for that.)

On 1 October 2015 at 19:41, Random832 <random832@fastmail.com> wrote:
If I understand what you mean by "non-iterator iterable", then a long time ago, there was a similar discussion and the term "reiterable" was used (Google will probably find references). Nothing ever came of the discussion - if I recall, there was a lot of theoretical debate, but few practical use cases. Anyone wanting to avoid a long, inconclusive discussion should probably chase up that old thread and see if anything new has been added this time around :-) Paul

On Thu, Oct 1, 2015 at 11:41 AM, Random832 <random832@fastmail.com> wrote:
um, then shod;nt you simply describe the iterator as an iterator? so any "iterable" would be assumed to be a non-iterator iterable. I guess this all comes about because we don't want to have to write this: for i in iter(an_iterable): ..... i.e have a different interface for interable and an iterable but I'm still lost on when tha all has to be spelled out... And back the original question, for enumerate: OK, you can't really have it e both an iterator AND a sequence, but couldn't it be an iterator and support indexing? Though I'm starting to wonder about the use case: enumerate() is a way to get the items in an iterable and an index at the same time -- so if you want to pass in a sequence, and index the result, why not just index into the sequence in the first place?? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Oct 01, 2015 at 11:29:51AM -0700, Chris Barker wrote:
In and of itself, it probably isn't, except as a short-cut for deciding whether something is an iterator. [...]
But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail?
I forget the original context -- I think it was Andrew who first mentioned this. Possibly over confusion about (x)range. But in general, it's important because: - iterators are not random access, other iterables typically are; - iterators are one-shot (cannot be restarted), other iterables are typically re-runnable. This makes a difference. Just a few days ago, somebody mis-reported a supposed "bug" in all() and any(). For example: values = (x%5 == 3 for x in range(8)) print(list(values)) print(all(values)) # should return False Obvious error is obvious: having printed out the values from the generator expression, values is now exhausted, and all() of the empty set is True (vacuous truth). The difference between general iterables which may or may not be one-shot iterators, and those which are definitely not iterators, is not always just an implementation detail. -- Steve

On Thu, Oct 1, 2015 at 12:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
they used a generator expression, when they clearly wanted a list comprehension -- so yes, it matters what they were getting, I don't know that adding more vocabulary would help prevent people from making that mistake... if they had been smart enough to call the list() again, before claiming there was a bug in all -- it may have explained itself. -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Oct 01, 2015 at 04:13:37PM -0700, Chris Barker wrote:
You're missing the point. Don't focus on the fact that the bug was in their understanding of what their code did. Let's just pretend that their *intentional* algorithm was: def alg(iterable): print(list(iterable)) print(all(iterable)) and for the sake of bring this never-ending thread to an end, let's agree for the sake of argument that it cannot be re-written in any other way. Since the semantics of the function are intentional and correct, the parameter is named misleadingly. *iterable* is not sufficiently precise, because the function does not accept any old iterable -- it fails to work correctly on *iterators*, are a sub-kind of iterable. If you want a more practical example, any algorithm which needs to iterate over an interable two or more times needs to specify "iterable which is not an iterator". -- Steve

On Fri, Oct 2, 2015 at 1:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
For that particular case, I'd reiterate what others have suggested, and use the term "reiterable" for something you can iterate over more than once and get the same results. Sequences are normally reiterable. Any object whose __iter__ is a generator function with stable results will be reiterable. An iterator is not; nor is an open file object, or any other object where iteration consumes an external resource. This seems reasonable. ChrisA

On 2015-10-01 20:58, Steven D'Aprano wrote:
I would disagree with this, because this terminology is both too technical and not technical enough. Just because something isn't an iterator doesn't mean you can iterate it multiple times. You *could* write an iterable which is not an iterator but still can't be iterated over multiple times (because, say, it returns a reference to some stored iterator that can't be restarted, or because it creates a custom iterator that references some persistent state of the iterable). If what you want is an iterable that can be iterated multiple times, then just say that. (Or say "reiterable" or "reentrant iterable" or whatever.) There's no need to bring iterators into it at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Since the semantics of the function are intentional and correct, the parameter is named misleadingly. *iterable* is not sufficiently
On 2015-10-01 20:58, Steven D'Aprano wrote: precise,
I would disagree with this, because this terminology is both too technical and not technical enough. Just because something isn't an iterator doesn't mean you can iterate it multiple times. You *could* write an iterable which is not an iterator but still can't be iterated over multiple times (because, say, it returns a reference to some stored iterator that can't be restarted, or because it creates a custom iterator that references some persistent state of the iterable). If what you want is an iterable that can be iterated multiple times, then just say that. (Or say "reiterable" or "reentrant iterable" or whatever.) There's no need to bring iterators into it at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Steven D'Aprano <steve@pearwood.info> writes:
list(iterable) does not work for infinite iterables. set(iterable) does not work for iterables that yield non-hashable items. dict(iterable) does not work for iterables that do not yield key,value pairs. Each builtin specifies what type of iterable it accepts but the parameter IS called *iterable*. Are you suggesting to rename the parameters?
If the intent is to write: def alg(iterable): seq = list(iterable) print(seq) print(all(seq)) You could say that the 1st alg() accepts a "repeatable deterministic iterable".

On Oct 1, 2015, at 23:30, Akira Li <4kir4.1i@gmail.com> wrote:
I think this is where the confusion arises. No one but you is talking about parameter names. The issue is just about having consistent, well-understood names that can be used in documentation, discussions like this thread, answers on the main Python list or StackOverflow, etc. The question of exactly which things we need to name (non-iterator iterable, repeatable iterables, repeatable iterables that always iterate the same elements, iterables that don't return self when iterates, iterables that always return a new object each time they're iterafed, …) may be an open question (because you're right that they're not all the same thing, even if they usually overlap), but some subset of those things is relevant often enough that some of them need names. For example, you could tell the user who had the any/all "bug" that "The way you've coded that function, it only works for re-iterables/collections/whatever, but you're passing it the result of a generator expression, which is an iterator." That (together with a place the user can look up re-iterable/collection/whatever--whether that place is the glossary or the collective mind of the community or whatever) explains the problem without the need for a long explanation on the fact that not all iterables are reusable in the way he's trying to reuse them, and in particular iterators never are, and so on. The user doesn't then need to change his parameter name from "tests" to "reiterable" or anything like that--he's already got a perfectly good name. Of course it's not impossible that some of these concepts might also be useful as ABCs and/or static types, in which case it's possible he may want to add an annotation. But that's a separate issue, one which, again, nobody else has raised.

On Thu, Oct 1, 2015, at 11:10, Steven D'Aprano wrote:
I think what he is claiming, more or less, is that there is not a universal notion of "well-behaved" (this is true), or indeed *any* broadly-applicable notions of "well-behaved" (this is false).

Random832 <random832@fastmail.com> writes:
I would say that "well-behaved" "non-iterator iterable" is a strict subset of "non-iterator iterable". You probably want "reiterable" word that I see mentioned in the thread. I don't know whether *reiterable* implies that it produces the same items the second time but it certainly implies that next(iter(reiterable)) _may produce something if_ list(reiterable) call is successful. Perhaps *rerunnable* (that I also see mentioned in the thread) more strongly implies that the same items should be produced.

Steven D'Aprano <steve@pearwood.info> writes: ...
I meant that you won't use the word *iterable* if your function accepts _only_ iterators and therefore if I see *iterable" I expect that the function can handle arbitrary iterables, not just iterators. If your function rejects iterators then in practice it means that you might want a re-iterable/re-runnable (that I see mentioned in the thread) iterable (not an arbitrary "non-iterator iterable"). If there is no restriction that the same items should be produced the second time ("rerunnable iterable"?) then *collection" (introduced at the top of the thread) may work. Though *collection* implies that an iterable can't return the same (non-self) iterator -- otherwise an iterator is also a collection. I still don't see a practical need to avoid the word "iterable" unless new requirements (in addition to being non-iterator) are present.

On Fri, Oct 02, 2015 at 05:13:39AM +0300, Akira Li wrote:
I still don't see a practical need to avoid the word "iterable" unless new requirements (in addition to being non-iterator) are present.
I don't think anyone has suggested that we should avoid the word iterable. At most, some have suggested that we don't have a good word for those iterables which are not iterators. -- Steve

On Wed, Sep 30, 2015 at 12:19:05PM -0700, Andrew Barnert via Python-ideas wrote: [...]
There's also __contains__. Personally, I don't like it, but using "n in range(a, b+1)" for testing whether integer n falls within a particular range seems to be popular. I don't know why they don't just write a <= n <= b, but it seems to be a popular idiom for some weird reason. -- Steve

On Wed, Sep 30, 2015, at 13:19, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
Range has always returned a sequence. Anyway, why stop there? Why not have map return a sequence? Zip? Anything that is a 1:1 mapping (or 1+1:1 in zip's case) could in principle be changed to return a sequence when given one. Who decides what does and doesn't benefit from random access? Or sliceability. It wouldn't be hard, in principle, to write a general-purpose function for slicing an iterator (i.e. returning an iterator that yields the elements that slicing a list of the same length would have given), particularly if it's limited to positive values.

On Sep 30, 2015, at 12:25, Random832 <random832@fastmail.com> wrote:
Even when it's called with a set, or an iterator? Yes, you _could_ do that by lazily adding values to a list as needed, but that could lead to some confusing behavior. For example, len(m) or m[-1] has to evaluate the rest of the input, which could take infinite time (well, it'll run out of memory first…).
The end user, of course. Some applications will never pass an infinite, or even very long, iterable into map, so they'd want random access and size and reversibility. Others won't ever want those features, but would want to pass in infinite iterators. That's why I think the best answer is to let people write (or install from PyPI) LazyList classes that fit their use cases, instead of trying to come up with one that tries to do everything and is misleading as often as it's useful. It's not actually impossible to design something that does a lot more without being inconsistent or confusing, but it's a bigger change than it appears at first glance, and would add a lot more complexity to the language than I think is worth it for the benefits. Again, see http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-vie... for details.
You mean itertools.islice?

On Wed, Sep 30, 2015 at 05:19:53PM +0000, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
There is a pernicious myth that (x)range is an iterator. It is not. It is a sequence, but one where the items are calculated on demand rather than pre-populated into some large data structure (a list or array). This is not just a matter of labels. It is a matter of the actual behaviour. (x)range objects don't behave like iterators except in the simplest sense that you can iterate over them. So your question is based on false assumptions -- range didn't go through any such change. In Python 2, range was eager and xrange lazy, but both are sequences, and in Python 3 the eager version is gone and the lazy version renamed without the "x" prefix. -- Steve

On 30.09.15 20:18, Neil Girdhar wrote:
Ah good point. Well, in the case of a sequence argument, an enumerate object could be both a sequence and an iterator.
It can't be. For sequence:
For iterator:

On Wed, Sep 30, 2015 at 10:33 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
well, that's because zip is using the same iterator it two places. would that ever be the case with enumerate? -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sep 30, 2015, at 10:47, Chris Barker <chris.barker@noaa.gov> wrote:
The point is that _nothing_ can be an iterator and a sequence at the same time. (And therefore, an enumerate object can't be both at the same time.) The zip function is just a handy way of demonstrating the problem; it's not the actual problem. You could also demonstrate it by, e.g., calling len(x), next(x), list(x): If x is an iterator, next(x) will use up the 'a' so list will only give you ['b', 'c', 'd'], even though len gave you 4. Conceptually: iterators are inherently one-shot iterables; sequences are inherently reusable iterables. While there's no explicit rule that __iter__ can't return self for a sequence, there's no reasonable way to make a sequence that does so. Which means no sequence can be an iterator.

Can you help understand how this is a Liskov substitution violation? A Sequence is an Iterator. Getting the sequence back should never hurt. The current interface doesn't promise that the returned object won't have additional methods or implement additional interfaces, does it? On Wed, Sep 30, 2015 at 12:43 PM Brett Cannon <brett@python.org> wrote:

On Wed, Sep 30, 2015 at 9:53 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
Can you help understand how this is a Liskov substitution violation? A Sequence is an Iterator. Getting the sequence back should never hurt.
no but getting a non-sequence iterator back when you expect a sequence sure can hurt. which is why I said that if you want a sequence back from enumerate, it should always return a sequence. which could (should) be lazy-evaluated. I think Neil's point is that calling list() or tuple() on it requires that the entire sequence be evaluated and stored -- if you really only want one item (and especially not one at the end), that could be a pretty big performance hit. Which makes me wonder why ALL iterators couldn't support indexing? It might work like crap in some cases, but wouldn't it always be as good or better than wrapping it in a tuple? And then some cases (like enumerate) could do an index operation efficiently when they are working with "real" sequences. Maybe a generic lazy_sequence object that could be wrapped around an iterator to create a lazy-evaluating sequence?? -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Ah good point. Well, in the case of a sequence argument, an enumerate object could be both a sequence and an iterator. On Wed, Sep 30, 2015 at 1:15 PM Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:

I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range? On Wed, Sep 30, 2015 at 1:18 PM Neil Girdhar <mistersheik@gmail.com> wrote:

On Wed, Sep 30, 2015 at 10:19 AM, Neil Girdhar <mistersheik@gmail.com> wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
well, range is simpler -- you don't pass arbitrary iterables into it. It always has to compute integer values according to start, stop, step -- easy to implement as either iteration or indexing. enumerate, on the other hand, takes an arbitrary iterable -- so it can't just index into that iterable if asked for an index. You are right, of course, that it COULD do that if it was passed a sequence in the first place, but then you have an intera e whereby you get a different kind of object depending on how you created it, which is pretty ugly. But again, we could add indexing to enumerate, and have it do the ugly inefficient thing when it's using an underlying non-indexable iterator, and do the efficient thing when it has a sequence to work with, thereby providing the same API regardless. -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 9/30/2015 1:28 PM, Chris Barker wrote:
But again, we could add indexing to enumerate, and have it do the ugly inefficient thing when it's using an underlying non-indexable iterator,
If the ugly inefficient thing is to call list(iterable), then that does not work with unbounded iterables. Or the input iterable might produce inputs at various times in the future. -- Terry Jan Reedy

Terry Reedy writes:
I think he means from itertools import islice a = list(islice(iterable, 0, 99))[42]
Or the input iterable might produce inputs at various times in the future.
Horrors! We'll have to add a "block=False" parameter to next(). (We can bikeshed on the default later.) Seriously, I think that one we just have to live with, just as we already live with it in any context where we access an iterable. Regards,

On 30.09.2015 19:19, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
range() returns a list in Python 2 and a generator in Python 3. enumerate() has never returned a sequence. It was one of the first builtin APIs in Python to return a generator: https://www.python.org/dev/peps/pep-0279/ after iterators and generators were introduced to the language: https://www.python.org/dev/peps/pep-0234/ https://www.python.org/dev/peps/pep-0255/ The main purpose of enumerate() is to allow enumeration of objects in a sequence or other iterable. If you need a sequence, simply wrap it with list(), e.g. list(enumerate(sequence)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sep 30, 2015, at 11:11, M.-A. Lemburg <mal@egenix.com> wrote:
No it doesn't. It returns a (lazy) sequence. Not a generator, or any other kind of iterator. I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.) There's no conceptual reason that Python couldn't have more lazy sequences, and tools to build your own lazy sequences more easily. However, things do get messy once you get into the details. For example, zip can return a lazy sequence if given only sequences, but what if it's given iterators, or other iterables that aren't sequences; filter can return something that's sort of like a sequence in that it can be repeatedly iterated but it can't be randomly-accessed. You really need a broader concept that integrates iteration and indexing, as in the C++ standard library. Swift provides the perfect example of how you could do something like that without losing the natural features of Python indexing and iteration. But it turns out to be complicated to explain, and to work with, and you end up writing multiple implementations for each iterable-processing function. I don't think the benefit is worth the cost. Another alternative is just to wrap any iterable in a caching LazyList type. This runs into complications because there are different choices that make sense for different uses (obviously you have to handle negative indexing, and obviously you have to handle infinite lists, so... Oops!), so it makes more sense to leave that up to the application to supply whatever lazy list type it needs and use it explicitly.

I just remembered that the last few times related things came up, I wrote some blog posts going into details that I didn't want to have to dump on the list: * http://stupidpythonideas.blogspot.com/2013/08/lazy-restartable-iteration.htm... * http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-vie... * http://stupidpythonideas.blogspot.com/2014/07/lazy-cons-lists.html * http://stupidpythonideas.blogspot.com/2014/07/lazy-python-lists.html * http://stupidpythonideas.blogspot.com/2015/07/creating-new-sequence-type-is-... The one about Swift-style map and filter views is, I think, the most interesting here. The tl;dr is that views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs.

Yup, the swift-style map is a great blog entry Andrew and exactly what I was proposing for enumerate. I 100% agree that "views (lazy sequences) are nifty, and there's nothing actually stopping Python for using them in more places, but they do add complexity, and the benefits probably don't outweigh the costs." However, I wonder what Python will look like 5 years from now. Maybe it will be time for more sequences. On Wed, Sep 30, 2015 at 2:32 PM Andrew Barnert <abarnert@yahoo.com> wrote:

On 30.09.2015 20:26, Andrew Barnert via Python-ideas wrote:
You are right that it's not of a generator type and more like a lazy sequence. To be exact, it returns a range object and does implement the iter protocol via a range_iterator object. In Python 2 we have the xrange object which has similar properties, but not the same, e.g. you can't slice it.
I don't know why so many people seem to believe it returns a generator. (And, when you point out what it returns, most of them say, "Why was that changed from 2.x xrange, which returned a generator?" but xrange never returned a generator either--it returned a lazy almost-a-sequence from the start.)
Perhaps because it behaves like one ? :-) Unlike an iterator, it doesn't iterate over a sequence, but instead generates the values on the fly. FWIW: I don't think many people use the lazy sequence features of range(), e.g. the slicing or index support. By far most uses are in for-loops. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

It doesn't behave like a generator because it doesn't implement send, throw, or close. It's a sequence because it implements: __getitem__, __len__ __contains__, __iter__, __reversed__, index, and count. On Wed, Sep 30, 2015 at 2:43 PM M.-A. Lemburg <mal@egenix.com> wrote:

On Wed, Sep 30, 2015 at 2:46 PM, Neil Girdhar <mistersheik@gmail.com> wrote:
It doesn't behave like a generator because it doesn't implement send, throw, or close.
It is not a generator because Python says it is not:
isinstance(range(0), collections.Generator) False
It's a sequence because it implements: __getitem__, __len__ __contains__, __iter__, __reversed__, index, and count.
Ditto
isinstance(range(0), collections.Sequence) True

On Sep 30, 2015, at 11:43, M.-A. Lemburg <mal@egenix.com> wrote:
To be exact, it returns an object which returns True for isinstance(r, Sequence), which offers correct implementations of the entire sequence protocol. In other words, it's not "more like a lazy sequence", it's _exactly_ a lazy sequence. In 2.3-2.5, xrange was a lazy "sequence-like object", and the docs explained how it didn't have all the methods of a sequence but otherwise was like one. When the collections ABCs were added, xrange (2.x)/range (3.x) started claiming to be a sequence, but the implementation was incomplete, so it was defective. This was fixed in 3.2 (which also made all of the sequence methods efficient—e.g., a range that fits into C longs can test an int for __contains__ in constant time).
You're confusing things even worse here. A generator is an iterator. It's a perfect subtype relationship. A range does not behave like a generator, or like any other kind of iterator. It behaves like a sequence. Laziness is orthogonal to the iterator-vs.-sequenceness. Dictionary views are also lazy but not iterators, for example. And there's nothing stopping you from writing a generator with "yield from f.readlines()" (except that it would be stupid), which would be an iterator despite being not lazy in any useful sense. Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness.
I've used range as a sequence (or at least a reusable iterable, a sized object, and a container). I've answered questions from people on StackOverflow who are doing so, and seen the highest-rep Python answerer on SO suggest such uses to other people. I don't think I'd ever use the index method (although I did see one SO user who was doing so, to wrap up some arithmetic in a way that avoids a possibly off-by-one error, and wanted to know why it was so slow in 3.1 but worked fine in 3.2...), but there's no reason range should be a defective "not-quite-sequence" instead of a sequence. What would be the point of that?

On Sep 30, 2015, at 12:19, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Maybe the problem is that we don't have enough words. I've tried to use "view" to refer to a lazy non-iterator iterable (dict views, range, NumPy slices), which seems to help within the context of a single long explanation for a single user's problem, but I'm not sure that's something we'd want enshrined in the glossary, since it's a general English word that probably has wider usefulness.
I've just remembered that I said the exact same thing last time this discussion came up (less than 4 months ago), and someone pointed out to me that the docs already define the word "view" in the glossary specifically for dict/mapping views, and use the term "lazy sequence" in that definition, and use the term "virtual sequence" elsewhere. It's worth noting that dict views are not actually sequences, so defining view in terms of lazy sequence is probably not a good idea... Anyway, we probably don't need to invent any new terms; maybe we just need to pick some wording, define it clearly, and use it consistently throughout the docs.

On 30.09.2015 21:19, Andrew Barnert wrote:
I guess I used the wrong level of detail. I was trying explain things in terms of concepts, not object types, isinstance() and ABCs. The reason was that the subject line makes a suggestion which simply doesn't fit the main concept behind enumerate: that of generating values on the fly instead of allocating them as sequence. We just got side tracked with range(), since Neil brought this up as example of why changing enumerate() should be possible. Back on the topic:
The way I understand the proposal is that Neil wants the above to return:
iff isinstance(arg, collections.Sequence) and because this only makes sense iff e doesn't actually create a list, enumerate(arg) would have to return a lazy/virtual/whatever-term-you-use-for-generated-on-the-fly sequence :-) Regardless of this breaking backwards compatibility, what's the benefit of such a change ? Just like range(), enumerate() is most commonly used in for-loops, so the added sequence-ishness doesn't buy you anything much (except the need for more words in the glossary :-)). -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Sep 30 2015)
2015-09-25: Started a Python blog ... ... http://malemburg.com/ 2015-10-21: Python Meeting Duesseldorf ... 21 days to go ::::: Try our mxODBC.Connect Python Database Interface for free ! :::::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On Sep 30, 2015, at 12:47, M.-A. Lemburg <mal@egenix.com> wrote:
But you're conflating the concept of "lazy" with the concept of "iterator". While generators, and iterators in general, are always technically lazy and nearly-always practically lazy, lazy things are not always iterators. Range, dict views, memoryview/buffer objects, NumPy slices, third-party lazy-list types, etc. are not generators, nor are they like generators in any way, except for being lazy. They're lazy sequences (well, except for the ones that aren't sequences, but they're still lazy containers, or lazy non-iterator iterables if you want to stick to terms in the glossary). And I think experienced developers conflating the two orthogonal concepts is part of what leads to novices getting confused. They think that if they want laziness, they need a generator. That makes them unable to even form the notion that what they really want is a view/lazy container/virtual container even when that's what they want. And it makes it hard to discuss issues like this thread clearly. (The fact that we don't have a term for "non-iterator iterable", and that experienced users and even the documentation sometimes use the term "sequence" for that, only makes things worse. For example, a dict_keys is not a sequence in any useful sense, but the glossary says it is, because there is no word for what it wants to say.)
That's one way to give him what he wants. But another option would be to always return a lazy sequence--the same kind you'd get if you picked one of the LazyList classes off PyPI (which provide a sequence interface by iterating and caching an iterable), and just wrote "e = LazyList(enumerate(arg))". This is still only creating the values on demand, and only consuming the iterator (if that's what it's given) as needed. (Of course it does mean you can now demand multiple values at once from that iterator, e.g., by calling e[10] or len(e) when arg was an iterator.) Or you could be even cleverer: enumerate always returns a lazy sequence, which uses random access if given a sequence, cached iteration if given any other iterable. That gives you the best of both worlds, right? Either of these avoids the problem that the type of enumerate depends on the type of its input, and the more serious problem that you can't tell from inspection whether what it returns is reusable or one-shot, but of course they introduce other problems. I don't think any of the three is worth doing. The three most consistent ways of doing this, if you were designing a language from scratch, seem to be: 1. Python: Always return an iterator; if people want sequence behavior (with whatever variety of laziness they desire), they can wrap it. 2. Haskell: Make everything in the language as lazy as possible, so you can just always return a list, and it will automatically be as lazy as possible. 3. Swift: Merge indexing and iteration, and bake in views as a fundamental concept, so you can always return a view, but whether its indices are random-access or not depends on whether its input's indices are. I'm not sure that #1 is the best of the three, but it is exactly what Python already has, and the other two would be very hard to get to from here, so I think #1 is the best for Python 3.6 (or 4.0). (The blog post I referenced earlier in the thread explores whether we could get to #3, or get part-way there, from here; if you don't agree that it would be harder than is worth doing, please read it and point out where I went wrong. Because that could be pretty cool.)

On 30.09.2015 23:33, Andrew Barnert via Python-ideas wrote:
I have absolutely no idea what you are talking about here. ;) I have to admit I try to avoid thinking too much about such tiny little details by using generators/lists/sequences/iterables/iterators/did-I-miss-one? directly in for loops only. Thus, the differences between all of them go away pretty fast. But honestly, does it really need to be that complicated? Best, Sven

On 9/30/2015 5:33 PM, Andrew Barnert via Python-ideas wrote:
(The fact that we don't have a term for "non-iterator iterable",
'collection' Some are concrete: they contain reference to actual Python objects. Some are virtual (lazy): they contain the information need to create Python objects as needed. Strings are a bit in between. -- Terry Jan Reedy

On Sep 30, 2015, at 17:04, Terry Reedy <tjreedy@udel.edu> wrote:
That's a perfectly good term, but it's not used that way in the docs, nor is anyone else using it in the discussions so far. Are you suggesting that we should start doing so? There are definitely parts of the docs that could be clarified or simplified with this term, such as the glossary entry and definitions for dict views, which inaccurately use the term "sequence". (And similarly, although not quite as badly, someone in this thread referred to "sequences and sequence-like things", which may be a little more intuitive than my "non-iterator iterables", but still isn't all that clear.) Also, the tutorial uses the phrases "data structures" or "data type" a few zillion times, apparently to avoid having to come up with a term that includes sequences, sets, dicts, and strings without being inaccurate. I've seen novices have no idea what "data structure" means, or get confused by what the difference between a "data type" and a "regular type" is.
Some are concrete: they contain reference to actual Python objects. Some are virtual (lazy): they contain the information need to create Python objects as needed.
I think the docs used to use the word "virtual" as a more specific term than "lazy": a view onto an object that conceptually exists but doesn't actually exist is "virtual" (like range, which is a view into the infinite set of integers), but a view into a real object isn't (like dict_keys, which has a reference to an actual dict), nor is something that isn't conceptually view-like at all (like a RNG iterator), even though they're all "lazy". It looks like the word "virtual" in this context doesn't appear anywhere in the docs anymore, so I suppose it could be repurposed, but if it's just a synonym for "lazy", what's wrong with "lazy"?

On 9/30/2015 7:24 PM, Andrew Barnert via Python-ideas wrote:
Also, the tutorial uses the phrases "data structures" or "data type" a few zillion times, apparently to avoid having to come up with a term that includes sequences, sets, dicts, and strings without being inaccurate. I've seen novices have no idea what "data structure" means, or get confused by what the difference between a "data type" and a "regular type" is.
container? https://docs.python.org/3/library/collections.html Emile

On Sep 30, 2015, at 21:05, Emile van Sebille <emile@fenx.com> wrote:
But that means something with a __contains__ test. Containers don't even have to be iterables. It's true that all of the types discussed in the tutorial are containers, but is that actually the meaning we're looking for, or just something that's coincidentally true? At any rate, even if that does work for the tutorial, I don't think it solves the more general problem. When I want to talk about iterables that give you a different, independent iterator each time you call __iter__, "container" is not the right word for that. Terry's "collection" seems like a better choice, because it doesn't already have a conflicting meaning.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
If either "lazy" or "virtual" means that the contained objects don't exist as python objects until they are accessed, doesn't this extend to strings and arrays (and byte strings, byte arrays, and memory views)?

On Sep 30, 2015, at 21:13, Random832 <random832@fastmail.com> wrote:
There's a sense in which that's true, and in some discussions (e.g., intimately involving the GC or object sharing or optimization of array algorithms) that would be the most relevant sense, but there's also a sense in which they concretely hold all the values in memory, and in most discussions (e.g., talking about generic sequence algorithms) that would be more relevant. I don't think there's a major problem here—we don't need to eliminate all ambiguity from our speech, only the ambiguity that actually gets in the way.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes: ...
(The fact that we don't have a term for "non-iterator iterable", and
All iterators are iterable but some iterables are not iterators. If your code accepts only iterators then use the term *iterator*. Otherwise the term *iterable* could be used. It is misleading to use *iterable* if your code only accepts iterators. If an iterable is an iterator; It is called *iterator*. The term *iterable* implies that some instances are not iterators.

Akira Li <4kir4.1i@gmail.com> writes:
There are three (well, three and a half) kinds of code that consume iterables, how would you describe each simply? 1. Does not call iter, simply calls next. Therefore cannot consume a non-iterator iterable. 2. Calls iter, but can accept an iterator (e.g. only goes through it once) 3. Cannot accept an iterator (goes through it twice, or permanently stores a reference to it, etc) 4. Can accept either, but behaves differently in each case (e.g. zip when passed two of the same iterator) - this can be regarded as a special case of #2.

On Sep 30, 2015, at 19:04, Akira Li <4kir4.1i@gmail.com> wrote:
And this is exactly the problem. We don't have any way to simply describe this thing. Hence all the confusion in this thread, and in similar discussions elsewhere, and even in the docs (e.g., describing dict views as sequences and then going on to explain that they're not actually sequences). The fact that it took your previous message four paragraphs without inventing a new term, to say what I said in one sentence with a new term, demonstrates the problem. As does the fact that my attempted new term, "non-iterator iterable", is sufficiently ugly and not intuitively helpful enough that you felt the need to expand on it for four paragraphs.

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
Use *iterable* instead of "non-iterator iterable" -- it is that simple. "dict views" seems a pretty good term for dict views. Are you suggesting to call dict views "non-iterator iterable"? I don't see that it says more than that all dict views are iterable. It seems there is a bug in the glossary: the entry name should be "dict views", not just "view" that is too generic for the description. I've submitted a patch http://bugs.python.org/issue25286
I don't need 4 paragraphs to describe it: if you need an iterator; use the term *iterator* -- otherwise use *iterable* unless you need something more specific e.g., *seq* name is common for generic sequences I don't remember ever using "non-iterator iterable". "non-iterator iterable" does not qualify as more specific. You need to introduce new requirements to the type for that.

Akira Li <4kir4.1i@gmail.com> writes:
The question is, how do you *simply* state the very common requirement for an iterable to not behave in a specific undesirable way that all iterators do, and that it is very uncommon for any iterable other than an iterator to do? Are you opposed to having a word for this concept at all, or do you just not like the terms other people are suggesting?

Random832 <random832@fastmail.com> writes:
That term is **iterable**. As I already said: Specific application may use more specific requirements e.g.: list(iterable): - does it mean that all iterables must be finite? - do we need a special word to describe what list() accepts? set(iterable): - does it mean that all iterables must yield hashable items? - do we need a special word to describe what set() accepts? dict(iterable): - does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts? You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted. https://mail.python.org/pipermail/python-ideas/2015-October/036692.html

On Sep 30, 2015, at 22:59, Akira Li <4kir4.1i@gmail.com> wrote:
This is a link to a reply where you pasted exactly the same text as in this reply—and in a third one. What is that supposed to mean? I feel like you must be trying to get across something really important here, and it's my fault for not getting it, but I still can't get it. Can you try rewording it instead of just pasting the same text again and/or a link to the same text? If it helps, let me try to ask specific questions: Are you arguing one of the following: * there is no such thing as an iterable that isn't an iterator, or an iterable that is repeatable, or an iterable that provides a new iterator each time iter is called? * there are such things, but no corresponding property that can be used to characterize a set? * that such sets do exist, but are never useful to discuss? * that such sets may be useful to discuss, but the names I (and Terry and others) came up with are unhelpful? More concretely: the documentation for dict views goes out of its way to point out that these are not iterators, but a different kind of iterable that's more like a sequence (presumably meaning at least one of the three things above). But it does so inaccurately, by saying they are sequences, which is not true. How could it be rewritten to get that point across accurately, but still concisely and readably?

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
On Sep 30, 2015, at 22:59, Akira Li <4kir4.1i@gmail.com> wrote: ...
It means exactly what it says -- literally: "the word *iterable* may be used in the context when not all iterables are accepted." and list(iterable), set(iterable), dict(iterable) are the specific examples. It is a statement of "how it *is*" and that it is acceptable in my view and there is no need to change it. Obviously, list/set/dict docs describe what subset of iterables they accept. If you agree on that then there is no disagreement. And it should answer the questions from your post. If you disagree then to ground the discussion what _specific_ places in the documentation would you like to change? ...
"How could it be rewritten": I remember posting the link to Python issue already http://bugs.python.org/issue25286

On 10/1/2015 1:59 AM, Akira Li wrote:
finite_iterable
iterable_of_hashables
iterable_of_pairs (whose first member is hashable)
- does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts?
Whatever word is used in the signature, the description should be in the doc and docstring, preferable in the first line. Return a list with members from a finite iterable. Return a set with members from an iterable of hashable objects.* Return a string joining items from an iterable of strings. *Also, equality between object should be transitive
You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.
Right. Each one should be documented properly. -- Terry Jan Reedy

On Sep 30, 2015, at 22:39, Akira Li <4kir4.1i@gmail.com> wrote:
No it isn't. The word "iterable" just means "iterable". When you want to talk about sequences—a subtype of iterables—you don't just say "iterable", you say "sequence". And likewise, when you want to talk about iterables that aren't iterators, or iterables that are repeatable, or any other subtype of iterables, you have to use a word (or phrase) that actually means what you're saying. I don't know how to explain this any better. Everyone else seems to get it, but you just post the same reply to each of them that you posted to me when they try to explain further. What am I not getting across here?
"dict views" seems a pretty good term for dict views. Are you suggesting to call dict views "non-iterator iterable"?
Why would you think that?
There's a much larger problem. The glossary says that dict views are sequences. They aren't. The actual documentation for dict views is a little better, because it explains that they're not actually sequences. But the problem is still there: what the docs are trying to say is that dict views are some kind of non-iterator iterable, but, because we don't have a term form that, they use the incorrect term "sequence".
Why would you use "seq" instead of "sequence" for the name of the abstract sequence type? And, more importantly, what name do you use when you need something more specific than "iterable", but less specific than "sequence"—as in the glossary entry for dict views, for example?
I don't remember ever using "non-iterator iterable".
Why would you expect to remember using it, when you're replying to a message where I invented it for lack of an established better name (and in hopes that someone would come up with one)?
There are things that are iterables, that are not non-iterator iterables, but the reverse is not true. It's a name for a strict subset. Which means it's more specific. As for a new requirement: an iterable is a non-iterator iterable if its __iter__ method does not return self. (If you're going to argue that this requirement can't be checked by, e.g., a structural type checker, remember that neither is the distinction between sequence and mapping, and that doesn't mean they're the same type.)

On 2015-09-30 23:24, Andrew Barnert via Python-ideas wrote:
Not sure I followed all the discussion of these terms, but is your main reason for wanting this term to describe the behavior that non-iterator iterables can be "restarted" (and are so restarted if reused in a different context)? Personally I prefer to take a duck-typing view and focus on what operations you can or can't do on these various things. Whether you call it a view or a virtual indexer or whatever is, to me, less important than what you can do with the object. I agree there are a number of relevant subcategories of objects here, some of which we have a name for and some that we don't. But I think it gets easier if we move from generic nouns like "view" to specific adjectives describing the behaviors the object support. Something like "re-entrant iterable" (meaning if you use it in two for loops right after each other you get the whole thing both times) would focus on that aspect of the behavior. Something like "random-accessible" or "sliceable" if we want to talk about iterables where we can "jump ahead" or slice if needed. It's an interesting idea to think about what kinds of operations (map, filter, etc.) could return iterables supporting what other kinds of operations. That is, can we make sure the result of map/filter can be sliced/indexed/reentered if the source can. To me the interesting question is which of these actual behaviors can usefully and non-mind-bendingly be preserved through map/filter/etc. manipulations. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Andrew Barnert via Python-ideas <python-ideas@python.org> writes:
"iterables that aren't iterators" unlike sequences do not introduce new requirements -- __iter__ returning non-self doesn't count -- the return value is still an _arbitrary_ iterator e.g., it may return the same iterator each time. Sequences on the other hand do introduce new requirements (__len__, __getitem__ and their specific semantics that distinguishes them from Mappings).
I have no objection to the phrase "repeatable iterable" because it does introduce a useful distinction. ...
I meant in the code e.g., random.choice(seq), to communicate that an arbitrary iterable is not enough.
As I said, I would use the term "dict views". If you mean how "dict views" could be defined then I've already linked more than once to the corresponding Python docs issue with the patch http://bugs.python.org/issue25286
Could you provide a non-hypothetical practical example from existing code of a function that accepts arbitrary iterables but rejects iterators? Perhaps discussing a specific example would help to iron out the terminology.

On 9/30/2015 10:31 PM, Andrew Barnert via Python-ideas wrote:
re-iterable (with implication of same sequence of yields) I have used this for years on python list and do not think I am unique.
Depends of the 'we'. -- Terry Jan Reedy

Akira Li <4kir4.1i@gmail.com> writes:
True or false?: It is reasonable to write algorithms that iterate twice over a passed-in iterable, with the expectation that said iterable will typically be an object (or a view of such an object) which will not be concurrently modified (e.g. by a different thread or by a side-effect of a callback) during the execution of the algorithm, but which does not behave in a useful way when given an iterator, a generator, or any other kind of iterable which exhibits similar behavior whereby the second and further attempts to iterate will yield no items.

Random832 <random832@fastmail.com> writes:
True or false?: do all iterables return the same items twice? http://www.fallacyfiles.org/loadques.html Specific application may use more specific requirements e.g.: list(iterable): - does it mean that all iterables must be finite? - do we need a special word to describe what list() accepts? set(iterable): - does it mean that all iterables must yield hashable items? - do we need a special word to describe what set() accepts? dict(iterable): - does it mean that all iterables must yield pairs? - do we need a special word to describe what dict() accepts? You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.

On Thu, Oct 01, 2015 at 08:15:25AM +0300, Akira Li wrote:
True or false?: do all iterables return the same items twice? http://www.fallacyfiles.org/loadques.html
[Aside: I have no idea what point you are making with the above link.] Of course they don't necessarily do so, but those that don't are not necessarily well-behaved. In the case of sequences and collections, the concept is that (absent any explicit mutation operation), iterating over it twice *should* give the same results, that is the normal expectation. But that isn't enforced, we can write something that breaks that rule: class WeirdIterable: def __getitem__(self, i): if random.random() > 0.9: raise IndexError return random.choice(["fe", "fi", "fo", "fum"]) but most people would consider that to be a pathological case. Yes, you can do it, and maybe you have a reason to do so, but you can't expect other people's code to deal with it gracefully. In the case of iterators, the answer is *certainly not*. Iterators are designed for the express purpose of handling not just the "lazy sequence" case where you choose to calculate results on demand as an optimization, but the case where you *have no choice* because the results are coming from some source which may change from run to run, e.g. an external data source. An iterator *may* repeat if run twice, but there is no expectation that it will do so. It's not just that the rule about repeatability is not enforced, but that there is no such rule in the first place. (By the way, when I talk about running an iterator twice, I'm completely aware that technically you cannot ever do so. What I mean is to iterate over the object, then *recreate the object* in some sense, then iterate over it again.)
No, and no. In principle, list() will quite happily create an infinite list for you, if you have infinite memory :-) The fact that in practice lists are probably limited to something of the order of 2**64 items or less is a mere quality of implementation issue :-) But to be more serious, no, in context we should understand that lists have actual physical limits, and even finite iterables may not be capable of being turned into lists: def gen(): for i in range(10**10000): yield i Perfectly finite in size, but you cannot have a list that big. It's not just *infinite iterables* which are prohibited, that's just a special case of iterables that will provide more items than you have memory to store. And that's not a fixed limit, it will differ from machine to machine. [...]
You've got the idea: the word *iterable* may be used in the context when not all iterables are accepted.
Sure. But the distinction is that while there are a whole lot of different iterables: - iterables with a sufficiently small number of items - iterables of hashable items - iterables of (hashable key, item) pairs - iterables of prime numbers less than one million - iterables of strings containing exactly 1 vowel etc they are special cases and don't need specialised names. But there is a *general* distinction between two cases: - iterables which are iterators - iterables which are not iterators We have a name for the first set: "iterators". But we don't have a name for the second set. Andrew suggested "non-iterator iterables" is too clumsy for general use, and suggests we need a better name. You suggested "iterables", but that clearly cannot work, since iterators are a kind of iterable. -- Steve

On Thu, Oct 1, 2015 at 8:10 AM, Steven D'Aprano <steve@pearwood.info> wrote:
sure -- but I've lost track of why it matters. "iterator" is well defined. And so is "iterable" -- why do we need to care whether the iterable returns itself when asked for an iterator? the term "sequence" is useful -- it defines certain behavior. So is the term "iterable", for the same reason. And it would be useful to say that given object is both a sequence and an iterable (are sequences iterable by definition?) But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 01.10.2015 20:29, Chris Barker wrote:
You say some terms are useful because they define certain behavior. I don't question this, but what I find questionable is the proliferation of all this equally-sounding and -feeling concepts. Your reaction supports that observation. Best, Sven

On Thu, Oct 1, 2015, at 14:29, Chris Barker wrote:
But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail?
Because an iterator *cannot possibly* allow you to loop through the contents twice [either one after the other or in parallel], whereas *most* non-iterator iterables do allow this. This (among other things such as representing a well-defined finite bag of values) is the property we're really chasing, "non-iterator iterable" is just a clumsy and inaccurate way of saying it. (I'm actually moderately disappointed, incidentally, that there's no easy way to create e.g. an iterable that will spin up a fresh copy of the same generator each time it's called. But it's easy enough to make a decorator for that.)

On 1 October 2015 at 19:41, Random832 <random832@fastmail.com> wrote:
If I understand what you mean by "non-iterator iterable", then a long time ago, there was a similar discussion and the term "reiterable" was used (Google will probably find references). Nothing ever came of the discussion - if I recall, there was a lot of theoretical debate, but few practical use cases. Anyone wanting to avoid a long, inconclusive discussion should probably chase up that old thread and see if anything new has been added this time around :-) Paul

On Thu, Oct 1, 2015 at 11:41 AM, Random832 <random832@fastmail.com> wrote:
um, then shod;nt you simply describe the iterator as an iterator? so any "iterable" would be assumed to be a non-iterator iterable. I guess this all comes about because we don't want to have to write this: for i in iter(an_iterable): ..... i.e have a different interface for interable and an iterable but I'm still lost on when tha all has to be spelled out... And back the original question, for enumerate: OK, you can't really have it e both an iterator AND a sequence, but couldn't it be an iterator and support indexing? Though I'm starting to wonder about the use case: enumerate() is a way to get the items in an iterable and an index at the same time -- so if you want to pass in a sequence, and index the result, why not just index into the sequence in the first place?? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Oct 01, 2015 at 11:29:51AM -0700, Chris Barker wrote:
In and of itself, it probably isn't, except as a short-cut for deciding whether something is an iterator. [...]
But if why do you need to know that something is an iterable, but NOT an iterator? isn't that an implementation detail?
I forget the original context -- I think it was Andrew who first mentioned this. Possibly over confusion about (x)range. But in general, it's important because: - iterators are not random access, other iterables typically are; - iterators are one-shot (cannot be restarted), other iterables are typically re-runnable. This makes a difference. Just a few days ago, somebody mis-reported a supposed "bug" in all() and any(). For example: values = (x%5 == 3 for x in range(8)) print(list(values)) print(all(values)) # should return False Obvious error is obvious: having printed out the values from the generator expression, values is now exhausted, and all() of the empty set is True (vacuous truth). The difference between general iterables which may or may not be one-shot iterators, and those which are definitely not iterators, is not always just an implementation detail. -- Steve

On Thu, Oct 1, 2015 at 12:37 PM, Steven D'Aprano <steve@pearwood.info> wrote:
they used a generator expression, when they clearly wanted a list comprehension -- so yes, it matters what they were getting, I don't know that adding more vocabulary would help prevent people from making that mistake... if they had been smart enough to call the list() again, before claiming there was a bug in all -- it may have explained itself. -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Thu, Oct 01, 2015 at 04:13:37PM -0700, Chris Barker wrote:
You're missing the point. Don't focus on the fact that the bug was in their understanding of what their code did. Let's just pretend that their *intentional* algorithm was: def alg(iterable): print(list(iterable)) print(all(iterable)) and for the sake of bring this never-ending thread to an end, let's agree for the sake of argument that it cannot be re-written in any other way. Since the semantics of the function are intentional and correct, the parameter is named misleadingly. *iterable* is not sufficiently precise, because the function does not accept any old iterable -- it fails to work correctly on *iterators*, are a sub-kind of iterable. If you want a more practical example, any algorithm which needs to iterate over an interable two or more times needs to specify "iterable which is not an iterator". -- Steve

On Fri, Oct 2, 2015 at 1:58 PM, Steven D'Aprano <steve@pearwood.info> wrote:
For that particular case, I'd reiterate what others have suggested, and use the term "reiterable" for something you can iterate over more than once and get the same results. Sequences are normally reiterable. Any object whose __iter__ is a generator function with stable results will be reiterable. An iterator is not; nor is an open file object, or any other object where iteration consumes an external resource. This seems reasonable. ChrisA

On 2015-10-01 20:58, Steven D'Aprano wrote:
I would disagree with this, because this terminology is both too technical and not technical enough. Just because something isn't an iterator doesn't mean you can iterate it multiple times. You *could* write an iterable which is not an iterator but still can't be iterated over multiple times (because, say, it returns a reference to some stored iterator that can't be restarted, or because it creates a custom iterator that references some persistent state of the iterable). If what you want is an iterable that can be iterated multiple times, then just say that. (Or say "reiterable" or "reentrant iterable" or whatever.) There's no need to bring iterators into it at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Since the semantics of the function are intentional and correct, the parameter is named misleadingly. *iterable* is not sufficiently
On 2015-10-01 20:58, Steven D'Aprano wrote: precise,
I would disagree with this, because this terminology is both too technical and not technical enough. Just because something isn't an iterator doesn't mean you can iterate it multiple times. You *could* write an iterable which is not an iterator but still can't be iterated over multiple times (because, say, it returns a reference to some stored iterator that can't be restarted, or because it creates a custom iterator that references some persistent state of the iterable). If what you want is an iterable that can be iterated multiple times, then just say that. (Or say "reiterable" or "reentrant iterable" or whatever.) There's no need to bring iterators into it at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Steven D'Aprano <steve@pearwood.info> writes:
list(iterable) does not work for infinite iterables. set(iterable) does not work for iterables that yield non-hashable items. dict(iterable) does not work for iterables that do not yield key,value pairs. Each builtin specifies what type of iterable it accepts but the parameter IS called *iterable*. Are you suggesting to rename the parameters?
If the intent is to write: def alg(iterable): seq = list(iterable) print(seq) print(all(seq)) You could say that the 1st alg() accepts a "repeatable deterministic iterable".

On Oct 1, 2015, at 23:30, Akira Li <4kir4.1i@gmail.com> wrote:
I think this is where the confusion arises. No one but you is talking about parameter names. The issue is just about having consistent, well-understood names that can be used in documentation, discussions like this thread, answers on the main Python list or StackOverflow, etc. The question of exactly which things we need to name (non-iterator iterable, repeatable iterables, repeatable iterables that always iterate the same elements, iterables that don't return self when iterates, iterables that always return a new object each time they're iterafed, …) may be an open question (because you're right that they're not all the same thing, even if they usually overlap), but some subset of those things is relevant often enough that some of them need names. For example, you could tell the user who had the any/all "bug" that "The way you've coded that function, it only works for re-iterables/collections/whatever, but you're passing it the result of a generator expression, which is an iterator." That (together with a place the user can look up re-iterable/collection/whatever--whether that place is the glossary or the collective mind of the community or whatever) explains the problem without the need for a long explanation on the fact that not all iterables are reusable in the way he's trying to reuse them, and in particular iterators never are, and so on. The user doesn't then need to change his parameter name from "tests" to "reiterable" or anything like that--he's already got a perfectly good name. Of course it's not impossible that some of these concepts might also be useful as ABCs and/or static types, in which case it's possible he may want to add an annotation. But that's a separate issue, one which, again, nobody else has raised.

On Thu, Oct 1, 2015, at 11:10, Steven D'Aprano wrote:
I think what he is claiming, more or less, is that there is not a universal notion of "well-behaved" (this is true), or indeed *any* broadly-applicable notions of "well-behaved" (this is false).

Random832 <random832@fastmail.com> writes:
I would say that "well-behaved" "non-iterator iterable" is a strict subset of "non-iterator iterable". You probably want "reiterable" word that I see mentioned in the thread. I don't know whether *reiterable* implies that it produces the same items the second time but it certainly implies that next(iter(reiterable)) _may produce something if_ list(reiterable) call is successful. Perhaps *rerunnable* (that I also see mentioned in the thread) more strongly implies that the same items should be produced.

Steven D'Aprano <steve@pearwood.info> writes: ...
I meant that you won't use the word *iterable* if your function accepts _only_ iterators and therefore if I see *iterable" I expect that the function can handle arbitrary iterables, not just iterators. If your function rejects iterators then in practice it means that you might want a re-iterable/re-runnable (that I see mentioned in the thread) iterable (not an arbitrary "non-iterator iterable"). If there is no restriction that the same items should be produced the second time ("rerunnable iterable"?) then *collection" (introduced at the top of the thread) may work. Though *collection* implies that an iterable can't return the same (non-self) iterator -- otherwise an iterator is also a collection. I still don't see a practical need to avoid the word "iterable" unless new requirements (in addition to being non-iterator) are present.

On Fri, Oct 02, 2015 at 05:13:39AM +0300, Akira Li wrote:
I still don't see a practical need to avoid the word "iterable" unless new requirements (in addition to being non-iterator) are present.
I don't think anyone has suggested that we should avoid the word iterable. At most, some have suggested that we don't have a good word for those iterables which are not iterators. -- Steve

On Wed, Sep 30, 2015 at 12:19:05PM -0700, Andrew Barnert via Python-ideas wrote: [...]
There's also __contains__. Personally, I don't like it, but using "n in range(a, b+1)" for testing whether integer n falls within a particular range seems to be popular. I don't know why they don't just write a <= n <= b, but it seems to be a popular idiom for some weird reason. -- Steve

On Wed, Sep 30, 2015, at 13:19, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
Range has always returned a sequence. Anyway, why stop there? Why not have map return a sequence? Zip? Anything that is a 1:1 mapping (or 1+1:1 in zip's case) could in principle be changed to return a sequence when given one. Who decides what does and doesn't benefit from random access? Or sliceability. It wouldn't be hard, in principle, to write a general-purpose function for slicing an iterator (i.e. returning an iterator that yields the elements that slicing a list of the same length would have given), particularly if it's limited to positive values.

On Sep 30, 2015, at 12:25, Random832 <random832@fastmail.com> wrote:
Even when it's called with a set, or an iterator? Yes, you _could_ do that by lazily adding values to a list as needed, but that could lead to some confusing behavior. For example, len(m) or m[-1] has to evaluate the rest of the input, which could take infinite time (well, it'll run out of memory first…).
The end user, of course. Some applications will never pass an infinite, or even very long, iterable into map, so they'd want random access and size and reversibility. Others won't ever want those features, but would want to pass in infinite iterators. That's why I think the best answer is to let people write (or install from PyPI) LazyList classes that fit their use cases, instead of trying to come up with one that tries to do everything and is misleading as often as it's useful. It's not actually impossible to design something that does a lot more without being inconsistent or confusing, but it's a bigger change than it appears at first glance, and would add a lot more complexity to the language than I think is worth it for the benefits. Again, see http://stupidpythonideas.blogspot.com/2014/07/swift-style-map-and-filter-vie... for details.
You mean itertools.islice?

On Wed, Sep 30, 2015 at 05:19:53PM +0000, Neil Girdhar wrote:
I guess, I'm just asking for enumerate to go through the same change that range went through. Why wasn't it a problem for range?
There is a pernicious myth that (x)range is an iterator. It is not. It is a sequence, but one where the items are calculated on demand rather than pre-populated into some large data structure (a list or array). This is not just a matter of labels. It is a matter of the actual behaviour. (x)range objects don't behave like iterators except in the simplest sense that you can iterate over them. So your question is based on false assumptions -- range didn't go through any such change. In Python 2, range was eager and xrange lazy, but both are sequences, and in Python 3 the eager version is gone and the lazy version renamed without the "x" prefix. -- Steve

On 30.09.15 20:18, Neil Girdhar wrote:
Ah good point. Well, in the case of a sequence argument, an enumerate object could be both a sequence and an iterator.
It can't be. For sequence:
For iterator:

On Wed, Sep 30, 2015 at 10:33 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
well, that's because zip is using the same iterator it two places. would that ever be the case with enumerate? -CHB
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sep 30, 2015, at 10:47, Chris Barker <chris.barker@noaa.gov> wrote:
The point is that _nothing_ can be an iterator and a sequence at the same time. (And therefore, an enumerate object can't be both at the same time.) The zip function is just a handy way of demonstrating the problem; it's not the actual problem. You could also demonstrate it by, e.g., calling len(x), next(x), list(x): If x is an iterator, next(x) will use up the 'a' so list will only give you ['b', 'c', 'd'], even though len gave you 4. Conceptually: iterators are inherently one-shot iterables; sequences are inherently reusable iterables. While there's no explicit rule that __iter__ can't return self for a sequence, there's no reasonable way to make a sequence that does so. Which means no sequence can be an iterator.
participants (17)
-
Akira Li
-
Alexander Belopolsky
-
Andrew Barnert
-
Brendan Barnwell
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Emile van Sebille
-
M.-A. Lemburg
-
Neil Girdhar
-
Paul Moore
-
Random832
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Terry Reedy