
Hello, I am a new programmer and have been using Python for a few months. I was experimenting the other day with unpacking (lists, tuples, etc.) And I realized something: when you type:
a, b, *rest = count()
The interpreter gets caught in an infinite loop which I could not kill without terminating my REPL. Would there be a way to add generators to the unpackables, even if it was only in the front? Thanks, Ed M

On Fri, Feb 12, 2016 at 07:09:38AM -0500, Edward Minnix wrote:
That's because count() is an infinite generator.
Would there be a way to add generators to the unpackables, even if it was only in the front?
Generators can already be unpacked. They just have to be finite: py> def gen(): ... yield 1 ... yield 2 ... yield 3 ... yield 4 ... py> a, b, *c = gen() py> a 1 py> b 2 py> c [3, 4] I'm surprised that the unpacking can't be interrupted with Ctrl-C. I think that is a bug. -- Steve

On 12 February 2016 at 22:46, Steven D'Aprano <steve@pearwood.info> wrote:
I'm surprised that the unpacking can't be interrupted with Ctrl-C. I think that is a bug.
It's a consequence of looping in C over an infinite iterator also implemented in C - "sum(itertools.count())" will hang the same way, since control never gets back to the eval loop to notice that Ctrl-C has been pressed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Feb 12, 2016 at 08:39:53AM -0800, Guido van Rossum wrote:
http://bugs.python.org/issue26351 -- Steve

On Feb 12, 2016, at 04:09, Edward Minnix <egregius313@gmail.com> wrote:
I think what you're _really_ suggesting here is that there should be a way to unpack iterators lazily, presumably by unpacking them into iterators. And if you could come up with a good intuitive way to make that work, that would be a great suggestion. But I don't know of such a way. Here's some background: You already_can_ unpack generators and other iterators: >>> c = (i*2 for i in range(5)) >>> a, b, *rest = c >>> rest [4. 6, 8] It works something like this: >>> _c = iter(c) >>> a = next(_c) >>> b = next(_c) >>> rest = list(_c) The problem is that when you give it an infinite iterator, like count(), that last step is attempting to create an infinite list, which takes infinite time (well, it'll raise a MemoryError at some point--but that could take a long time, especially if it drives your system into swap hell along the way). Even for non-infinite iterators, this is sometimes not what you want, because it means you lose all laziness. For example: >>> f = sock.makefile('r') >>> a, b, *rest = f In these last two cases, what you actually want is something like this: >>> _c = iter(c) >>> a = next(_c) >>> b = next(_c) >>> rest = _c That would finish immediately, with rest as a lazy iterator instead of a list. But is that what you want in the first case? Would it still be what you wanted in "a, b, *rest = [1, 2, 3, 4]"? Many novices would be confused if they wrote that, and then printed out rest and got "<list_iterator at 0x108e49ef0>" instead of "[3, 4]". Maybe that would be acceptable; people would just have to learn to use list(rest) when that's what they want (the same way they do with map, etc.). But it's probably too late for that, for backward compatibility reasons. Also, consider "a, *rest, b = [1, 2, 3, 4]". I don't think there's any way that could be done lazily. Meanwhile, you can always write the expanded version out explicitly. (And you can leave off the first line when you know c is already an iterator.) Or you can use itertools.islice to make it more compact: >>> a, b = itertools.islice(c, 2) >>> rest = c If you can come up with intuitive syntax, or an automated rule, or something else, to distinguish the case where you want a list from the case where you want a lazy iterator (and without breaking backward compatibility or over-complicating the language), then we could avoid that. If not, learn to love itertools. :) (That's actually good advise anyway--anyone writing the kind of code that needs infinite lists will get a lot out of digging deep into what itertools can do for you.)

On 12/02/16 19:54, Andrew Barnert via Python-ideas wrote:
Why not just have an itertools.unpack() - a simple version without argument checking: def unpack(seq, num): it = iter(seq) yield from (i[1] for i in zip(range(num), it)) yield it foo, bar, rest = unpack([1, 2, 3, 4, 5, 6], 2) Because it's in itertools, the expectation is that it has something to do with iterators so the final return value always being an iterator regardless of the original sequence type is reasonable (and is perhaps the only justification for putting it in itertools in the first place ;) ). E.

On Fri, Feb 12, 2016 at 6:01 PM Erik <python@lucidity.plus.com> wrote:
There's some visual dissonance since the ``num`` argument is asking for the number of elements to unpack, but the left-hand of the assignment has num+1 variables. What do you think about just using islice? >>> from itertools import islice >>> it = iter(range(5)) >>> (first, second), rest = islice(it, 0, 2), it >>> first 0 >>> second 1 >>> rest <range_iterator object at 0x1011944b0> I suppose it might read better broken apart into two lines to emphasize that the state changed. >>> first, second = islice(it, 0, 2) >>> rest = it

On Feb 12, 2016, at 15:35, Michael Selik <mike@selik.org> wrote:
Creating a range just to zip with just to throw away the values seems like overcomplicating things. Unless there's some performance benefit to doing it that way, why not just keep it simple? def unpack(seq, num): it = iter(seq) yield from islice(it, num) yield it Or, to be super novice-friendly: def unpack(seq, num): it = iter(seq) for _ in range(num): yield next(it) yield it
foo, bar, rest = unpack([1, 2, 3, 4, 5, 6], 2)
That doesn't really distinguish the "rest" very clearly. Of course we could just change the last line to "yield [it]", and then call it with "foo, bar, *rest =", but that seems cheesy to me. I don't know; if there is something better than islice to be found here, I think you're probably on the right track, but I don't think you're there yet, and I'm not sure there is anything to find. Unpacking syntax just feels "sequency" to me, and in a language with distinct sequences and iterators (instead of, say, lazy sequences like Haskell, or views wrapping many iterators like Swift), I think that means non-lazy. But hopefully I'm wrong. :)
That's exactly what was in my email, as the way to do things today, which he was replying to:
So I think we can assume that he thinks his version improves over that, or he wouldn't have suggested it...

On 13/02/16 00:03, Andrew Barnert wrote:
I agree. I've never used islice() before, so I missed that as a better way of yielding the first 'n' values.
My "suggestion" was simply that perhaps creating a very short wrapper function somewhere that handles whether the sequence is already an iterator or not etc (and using islice() or whatever - I don't really care ;)) would perhaps be a more pragmatic option than trying to squeeze in some syntax change or underlying unpack heuristic/mechanic (which is where I thought the thread was heading). Perhaps I didn't express that very clearly. I'm happy do drop it ;) E.

On 13 February 2016 at 17:59, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The main problem with that specific spelling of the idea is that it's the inverse of what the bare "*" means when declaring function parameters - there, it's a way of marking the end of the positional arguments, when you want to put keyword only arguments after it. The other problem is that it makes: a *= value and a, b, *= value mean wildly different things. Where these discussions generally end up is: 1. The cases where you actually want "unpack this many values, ignore the rest" are pretty rare 2. When you do really need it, islice handles it 3. Adding new syntax isn't warranted for a relatively rare use case the stdlib already covers Probably the most plausible idea would be a "head()" recipe that does something like: def head(iterable, n): itr = iter(iterable) return tuple(islice(itr, n)), itr Usable as: (a, b), rest = head(iterable, 2) Making it a recipe means folks can customise it as they wish (e.g. omitting the tuple call) However, I'm not sure how much that would actually help, since using islice directly here is already pretty straightforward (once you know about it), and for small numbers of items, you can also just use the next builtin if you know you have an iterator: itr = iter(iterable) a = next(itr) b, c = next(itr), next(itr) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 February 2016 at 13:06, Nick Coghlan <ncoghlan@gmail.com> wrote:
1. The cases where you actually want "unpack this many values, ignore the rest" are pretty rare
It's not so much ignore the rest but rather retain the rest for separate consumption. This happens when you want to either peek or split the first item. For example in parsing a csv file: def readcsv(csvfile): csvfile = map(str.split, csvfile) try: fieldnames = next(csvfile) except StopIteration: raise ValueError('Bad csv file') return [dict(zip(fieldnames, line)) for line in csvfile] It would be nicer to write that as something like fieldnames, * = csvfile Another situation where I've wanted that is given an iterable that yields sequences all of the same length I might want to peek the first item to check its length before the loop begins.
2. When you do really need it, islice handles it
That's true so you can do fieldnames, = islice(csvfile, 1) Somehow I don't like that but really it's fine. -- Oscar

On Feb 13, 2016, at 08:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
So instead of getting an exception that says "bad csv file" you get one that says "unpacking expected at least 1 value, found 0" or something? That doesn't seem nicer. But if we really do need such functionality commonly, just adding an exception type parameter to next would cover it. Or just putting an exception-wrapping function in the stdlib, so you can just write "fieldnames = exwrap(next, StopIteration, ValueError)(csvfile)". But then anyone can write exwrap as a three-line function today, and I don't think anyone ever does, so I doubt it needs to be standardized... Also, is it worth mentioning that this doesn't actually parse csv files, but whitespace-separated files, without any form of escaping or quoting, and probably doing the wrong thing on short rows? Because there is a really easy way of writing this function that's a lot nicer and a lot shorter and raises meaningful exceptions and actually works: def readcsv(csvfile): return list(csv.DictReader(csvfile)) (Although I'm not sure why you want a list rather than an iterator in the first place.)
Again, next.

Nick Coghlan wrote:
I know, but despite that, this still seems like tbe "obvious" way to spell it to me.
Another possibility is a, b, ... = value
2. When you do really need it, islice handles it
I find that answer unsatisfying, because by using islice I'm telling it to do *more* work, whereas I really want to tell it to do *less* work. It just seems wrong, like a kind of abstraction inversion. -- Greg

On 14 February 2016 at 07:40, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Another possibility is
a, b, ... = value
Now, *that* spelling to turn off the "implicit peek" behaviour in iterable unpacking I quite like. arg_iter = iter(args) command, ... = arg_iter run_command(command, arg_iter) Although again, the main downside would be that "..." here means something rather different from what it means as a subscript element. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
On 14 February 2016 at 07:40, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Another possibility is
a, b, ... = value
+1 It may be TOOWTDI, I thought of it independently.
But not what it means in natural language, which is basically "continues as expected". That's somewhat different from "*" which has no established meaning in natural language (except "note here"), and which is already heavily used in Python. Steve

On 02/14/2016 08:43 AM, Nick Coghlan wrote:
Assigning to Ellipsis? Interesting idea, but I'd probably go a bit further in the similarity to star-assignment: a, b, *... = value Ellipsis could then be a general "throw-away" lvalue. This would make it possible to say a, ..., b, ... = some_function() i.e. skip exactly one item in unpacking.
Although again, the main downside would be that "..." here means something rather different from what it means as a subscript element.
Keep in mind that Ellipsis is also a legal rvalue anywhere else. I.e. this would be legal (and a no-op): ... = ... But thinking about it, this is also legal at the moment: [] = [] Interestingly, this raises: () = () cheers, Georg

Georg Brandl wrote:
I thought about that too, but it seemed like it would be too confusing -- the above *looks* like it should be skipping an arbitrary number of items. I think this interpretation would be even more inconsistent with existing uses of ... in both Python and English. -- Greg

Georg Brandl writes:
I don't get the "*". Normally that means create or unpack a container, but here the semantics is "leave the container alone". And to my eyes, it's not an assignment to Ellipsis semantically. It's an operator that says "b gets an element, and the tail of value is used somewhere else, don't copy it as a sequence to b". Perhaps a, b ...= consumable_value would make that clearer but it looks strange and ugly to me. What would a, b, ... = some_list mean? (Using the OP's notation without prejudice to other notations.) Would it pop a and b off the list?
That has a rather different meaning in math texts, though. It means "an infinite sequence starting at a with a generic element denoted 'b'". The number of elements between a and b is arbitrary, and typically indicated by writing b as an expression involving an index variable such as i or n.
Keep in mind that Ellipsis is also a legal rvalue anywhere else.
That might kill the operator interpretation.

On Feb 14, 2016, at 10:57, Georg Brandl <g.brandl@gmx.net> wrote:
But we already have a general "throw-away": underscore. You can already write "a, _, b, _ = it" and it will unpack 4 values and throw away 2 of them. And you can also write "a, b, *_ = it" to unpack 2 or more values and throw away all but the first 2. And, for that matter, "a, *_, b= it". There's nothing special about the underscore--you could get the same effect by writing "a, dummy, b, evendummier = it"--but it's conventional. Anyway, what people are looking for in this thread is almost the exact opposite: they don't want to unpack the value and throw it away, they want to leave the value there for later unpacking. (Of course for collections, there's no such distinction, but for iterators there is.)
With your version, where ... is a normal target that just ignores its value, sure. With the original version, where ... means "unpack 0 elements from the iterable and stop", it would presumably raise a TypeError("'ellipsis' object is not iterable").
But thinking about it, this is also legal at the moment:
[] = []
Yes, but that's completely different. The [] on the left isn't an expression, or even a target, but a target list with 0 targets in it. Assignment to target lists is defined recursively, so assigning to 0 targets is legal iff you're unpacking 0 values. The fact that you have specifically [] on the right side is irrelevant. You can get the same effect by writing [] = (), or [] = {}, or [] = (i for i in range(5) if i<0). And clearly, you're not assigning to "the empty list", because each empty list created with [] is a distinct object. Anyway, ... is a constant; it's currently illegal as a target because of the rule added in 3.0 banning assignment to a handful of special constants (..., None, etc.). If you have it a special meaning as a target, then of course that rule no longer applies.
Interestingly, this raises:
() = ()
This just doesn't parse, because the target-list grammar doesn't have the same special case for () as the parenthesized expression grammar. (Remember, both tuples and target lists are made by commas, not parens, and () really is a special case.) If such a rule were added, then it would presumably mean the same thing as [] = (). But what would be the point of adding that rule? (You could argue that the inconsistency makes the language harder to remember, but it's been this way for decades, and nobody notices it until they've been using Python for years, so that's not very compelling.)

On 02/15/2016 07:31 AM, Andrew Barnert via Python-ideas wrote:
I kinda know, having written the PEP and implementation for the latter two. It was just a quick, not very well thought-out idea of how to generalize Nick's interesting suggestion.
Yes, what you're saying can be expressed as "of course it's legal, since it's legal". Please step back a bit and look at this from the "how it looks" perspective, not from the "what the grammar rules" perspective. My observation was simple that we already have the case of X = X where both Xes are the same syntax, but have a wildly differing interpretation.

On Feb 14, 2016, at 23:30, Georg Brandl <g.brandl@gmx.net> wrote:
No; common sense isn't the same thing as tautology. It has an obvious semantics, there's no good reason to ban it, and it comes for free with the simplest grammar--therefore, it's common sense that the language should allow it. It's only because Python generally does such a great job following common sense that you don't notice. :) And it's similar common sense that your version of "..." should make "... = ..." legal, while Nick's version should make it illegal.
x = x" is a no-op (assuming x was already bound) and "False = False" is an error. Given that, and the fact that (unlike those cases) "[] = []" _doesn't_ have the same syntax for the two Xes, I don't see what the observation demonstrates.

On 15.02.2016 07:31, Andrew Barnert via Python-ideas wrote:
I agree.
Interestingly, doing the following results in an syntax error.
[1,2]=[3,4]
File "<input>", line 1 SyntaxError: can't assign to literal So, if it's illegal to assign to the literal [1,2], I don't see why it should be legal for []. But as you said, that's a highly theoretical problem. That just reminds me of the js part of https://www.destroyallsoftware.com/talks/wat Best, Sven

On Mon, Feb 15, 2016 at 02:55:26PM +0100, Sven R. Kunze wrote:
On 15.02.2016 07:31, Andrew Barnert via Python-ideas wrote:
[..]
I believe you are misinterpreting the error. The error isn't that you are trying to assign to the literal [1,2]. The error is that you are trying to assign to the literal 1. It's a subtle difference, but important to understand that difference in order to understand why [] is a legal assignment target. py> [a, b, c, 1] = "abcd" File "<stdin>", line 1 SyntaxError: can't assign to literal Obviously [a, b, c, 1] is not a literal, but 1 is. If there is any doubt: py> [a, b, c, None] = "abcd" File "<stdin>", line 1 SyntaxError: cannot assign to None Since [1,2] is a *list of assignment targets*, not a single target, the assignment you attempted [1, 2] = [3, 4] is roughly equivalent to: 1 = 3 2 = 4 which obviously gives a syntax error. [] is a valid assignment target so long as the right hand side is an empty iterable. If it is not empty, you get a TypeError: py> [] = "abc" Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: too many values to unpack There are three values on the right hand side, and zero targets. This might even be useful. You can confirm that an iterable is empty (why you might want to do this, I can't imagine, but suppose you did) by assigning it to zero targets: [] = iterable succeeds only if iterable has zero items, otherwise it raises TypeError. -- Steve

On 15.02.2016 15:42, Steven D'Aprano wrote:
You are indeed right. And while we are on it, that reminds me of KeyError. I always feel annoyed by the fact that Python doesn't tell me what key a dict is missing. So, if I were to make a suggestion, I would like to see the issue-causing thing to be mentioned in those two types of exceptions.
Completely true and nothing to add; except, as Georg already noted, the "as it looks" experience is somewhat weird and I always would consider this a syntax error (don't ask me why). Best, Sven

On Feb 15, 2016, at 07:13, Sven R. Kunze <srkunze@mail.de> wrote:
IIRC, there's a series of bugs on adding more useful information to the builtin exceptions, which includes putting the key in the error message _and_ as an attribute of the exception (a la filename in the filesystem exceptions), but they've been waiting on someone to do the work for a few years now. If so, there's an obvious way to kick-start the solution: find the bug and write a patch. (I suspect some people would object to fixing KeyError without also fixing IndexError, and want to argue about whether the key member goes in each exception directly or is part of LookupError, and so on, and there's the more substantive issue of keeping alive potentially-large keys, and so on, so it probably won't be as simple as "submit a trivial patch and it gets accepted", but at least having a patch would get the process moving.) But for SyntaxError, there is no object to stick in the error object or error message, just a source token or an AST node. Both of those have pointers back to start and end in the source string, and that's what goes into the SyntaxError, which is already shown to the user by printing the relevant line of source and pointing a caret at it. When that's insufficient or misleading (as with the ever-popular missed-')' errors), printing the token or AST node itself would probably just make it more misleading. And, as for putting it in the error object, it's pretty rare that anyone handles SyntaxError in code (unlike KeyError, FileNotFoundError, etc.), so that seems a lot less useful. Maybe there is a clean way to improve SyntaxError akin to KeyError, but it would take a lot more thought and design (and probably argument), so I'd work on KeyError first.

On Mon, Feb 15, 2016 at 01:55:32PM -0800, Andrew Barnert via Python-ideas wrote:
Showing the missing key in the error message goes all the way back to Python 1.5: [steve@ando ~]$ python1.5 -c '{}["spam"]' Traceback (innermost last): File "<string>", line 1, in ? KeyError: spam Admittedly, the exception object itself doesn't keep a reference to the missing key, so you can't programmatically query it for the key, but normally that's not a problem since you just tried to look it up so you should still have the key.
See discussion here: http://bugs.python.org/issue18162 Also http://bugs.python.org/issue1182143 And the PEP: https://www.python.org/dev/peps/pep-0473/
Sadly, CPython doesn't manage to even display the caret at all: [steve@ando ~]$ python -c "[a, 2] = 'xy'" File "<string>", line 1 SyntaxError: can't assign to literal Jython gives a much more informative error: steve@orac:~$ jython -c "[a, 2] = 'ab'" File "<string>", line 1 [a, 2] = 'ab' ^ SyntaxError: can't assign to number -- Steve

Well, there's no `key` attribute for example, but the KeyError exception has exactly one argument, the missing key.
So, you *can* query it for the missing key, even though it's a bit ugly. But as far as TOOWTDI goes, I think this is fine as-is. This is just my opinion, though :) -- Emanuel

On 16.02.2016 01:34, Steven D'Aprano wrote:
Sorry. I confused KeyError with IndexError (bracket all over the place).
That's true. However, most the time I see such tracebacks while sifting through server logs. So, the more data an exception exposes, the better. I wouldn't even mind an excerpt of the dict/list itself in order to get a faster understanding of the problem domain (wrong type of keys, etc.).
That seems to be a great idea. Best, Sven

On 15.02.2016 23:11, Sven R. Kunze wrote:
In fact, I have been asked several times by some older Python developers on what the Pythonic way of doing exactly this is. Like this: Dev: Sven, do you know a better way to get THE item from single-item list rather than just [0]? Sven: huh? *distracted* yeah *thinking* why not [0]? seems simple enough to get the intent *back to topic* I think because of that thread, I won't forget anymore. ;-) Best, Sven

On Mon, Feb 15, 2016 at 06:53:45PM +0100, Sven R. Kunze wrote:
No, it's exactly the same Python idiom (assignment to a list of targets) as we've been talking about for the last few posts. We've had examples with four targets, three targets, two targets and zero targets. This is an example with one target. [a] = iterable requires that the right-hand side be iterable, and after unpacking it must contain exactly one item, to match the one assignment target given on the left. -- Steve

On Feb 15, 2016, at 15:43, Steven D'Aprano <steve@pearwood.info> wrote:
I see this more often written in tuple-ish form: index, value = struct.unpack_from('!HH', buf, off) namelen, = struct.unpack_from('!H', buf, off+4) Somehow, that feels more natural and pythonic than using [index, value] and [namelen], despite the fact that without the brackets it's easy to miss the trailing comma and end up with (16,) instead of 16 as your length and a confusing TypeError a few lines down.

On Feb 16, 2016, at 02:11, Sven R. Kunze <srkunze@mail.de> wrote:
It's not a tuple _with_ the parentheses either: it's a parenthesized target list, which has a syntax and semantics that are, while not completely unrelated to tuple, definitely not the same. One way in which it _is_ like a tuple is that this doesn't help: (namelen) = struct.unpack_from('!H', buf, off+4) Just like tuples, parenthesized target lists need commas, so this will bind namelen to the tuple (16,) instead of to 16:

On 16.02.2016 00:43, Steven D'Aprano wrote:
No, it's exactly the same Python idiom (assignment to a list of targets) as we've been talking about for the last few posts.
I think we better distinguish between idioms and language features.
Of course, it's quite straightforward once you ponder about it. I recently talked to a coworker about this. The concrete example is about "How do I get the one-and-only element of a **set** which obviously does not support subscripting". Another aspect, I came to think of is the following asymmetry: a, b, c, d = mylist4 # works a, b, c = mylist3 # also works a, b = mylist2 # works too [a] = mylist1 # special case? One might resolve the asymmetry by writing: [a, b, c, d] = mylist4 [a, b, c] = mylist3 [a, b] = mylist2 [a] = mylist1 # fits in [] = mylist0 # even that is possible now Thus, the parentheses-less variants are special cases. However, when glancing over our production source, the parentheses-less variant is rather the norm than a special case (haven't even seen nested ones). I suspect a special-character-phobia (ever used a German keyboard ;-) ?) -- Why should I write '''[a, b] = [b, a]''' when '''a, b = b, a''' suffices? So, it seems to me that inducing the 1-target and 0-target concepts from the norm is not that as easy as you might believe. Last but not least, most devs are not used to magic on the lhs. Imagine how weird and interesting you would find that the following being possible in Python: mylist = [6, 7] a*3, b+5 = mylist Best, Sven

On Wed, Feb 17, 2016 at 4:54 AM, Georg Brandl <g.brandl@gmx.net> wrote:
The one-target case is actually available to everything else; the only difference is that the trailing comma is mandatory, not optional:
a,b,c, = range(3)
Which is, again, the same as with tuples. ChrisA

Sven R. Kunze wrote:
But [1,2] is not a literal -- the individual elements 1 and 2 are. It's the inability to assign to those elements that makes it illegal. On the other hand, [] doesn't contain any elements that it's illegal to assign to, so there's no reason to reject it. But it doesn't contain any elements that it's legal to assign to either, so you could say there's no reason to accept it. This is a philosophical question. When you've eaten the last chocolate, do you have an empty box of chocolates, or just an empty box? -- Greg

On 14 February 2016 at 07:43, Nick Coghlan <ncoghlan@gmail.com> wrote:
IMO, the other downside is that the semantic difference between a, b, ... = value and a, b, *_ = value is very subtle, and (even worse) only significant if value is an iterable as opposed to a concrete container such as a list. IMO, explicit is better than implicit here, and itertools.islice is the right way to go:
(And of course in my first attempt I forgot I needed iter(range(...)) as range is not a pure iterable - proving my point about the subtle semantics!) Paul

On Feb 15, 2016, at 00:12, Paul Moore <p.f.moore@gmail.com> wrote:
You mean iterator, not iterable. And being "concrete" has nothing to do with it--a dict view, a memoryview, a NumPy slice, etc. aren't iterators any more than a list is. This is exactly why I think we need an official term like "collection" for "iterables that are not iterators" (or "iterables whose __iter__ doesn't return self" or similar). People struggling to come up with terms end up confusing themselves--not just about wording, but about actual concepts. As proven below:
Ranges are as iterable as both lists and iterators. You're a smart guy, and you know Python. So why do you make this mistake? Because you don't have a term to fit "range" into, so your brain struggles between two prototypes--is it like a list, or like a generator? Well, it's lazy, so it's like a generator, so you don't need to call iter here, right? Nope. This is the same as the experiment where they give people the English cousin rule and asked them to evaluate which family relationships count as cousins. People whose native language has no word for "cousin"--e.g., because they have unrelated words for "maternal cousin" and "paternal cousin"--make a lot more mistakes than people whose language has a word that matches the rule.

On 15 February 2016 at 08:39, Andrew Barnert <abarnert@yahoo.com> wrote:
Precisely :-) Nor is a range object, see the (other!) mistake I made.
This is exactly why I think we need an official term like "collection" for "iterables that are not iterators" (or "iterables whose __iter__ doesn't return self" or similar). People struggling to come up with terms end up confusing themselves--not just about wording, but about actual concepts. As proven below:
Indeed.
Thanks for providing another perspective on my point. This whole area is rather too full of semantic confusion (I'm not as sure as you seem to be that agreeing on a terminology will address that issue, but that's a side matter) and I think we should be very careful about introducing syntax that requires a good understanding of the concepts to use correctly, when there's a perfectly acceptable explicit form (even if it is slightly more verbose/repetitive). Paul

On Mon, Feb 15, 2016 at 7:39 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Allow me to take this opportunity to reiterate the recommendation for "reiterable", which has come up enough times that I've completely forgotten who first came up with it. An iterable can be iterated over at least once; a reiterable can be iterated over more than once, and will often produce the same sequence of values each time. ChrisA

On 2016-02-15 00:39, Andrew Barnert via Python-ideas wrote:
I still don't think that is at all what we need, as this example shows. Whether the value is an iterable or an iterator is not relevant. Whether the iterable's iterator is self is not relevant. What is relevant is the difference in *behavior* --- namely, whether you can rewind, restart, or otherwise retrieve already-obtained values from the object, or whether advancing it is an irreversible operation and there is no way to get old values except by storing them yourself. In this example, being able to say that value was or was not an iterator or an iterable would in no way help to clarify how the code would behave differently. Saying that it is an iterable or an iterator is just saying that it has or doesn't have .next() and/or .__iter__() methods that follow certain very broad protocols, but what matters for understanding examples like this is what those methods actually DO. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Feb 15, 2016, at 01:21, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
You're making the same mistake that this idea was meant to cure: iterators _are_ iterables. Which means this is vacuously true. But whether the value is a collection/reiterable/whatever or an iterator is exactly what's relevant.
Whether the iterable's iterator is self is not relevant. What is relevant is the difference in *behavior* --- namely, whether you can rewind, restart, or otherwise retrieve already-obtained values from the object, or whether advancing it is an irreversible operation and there is no way to get old values except by storing them yourself.
An iterator returns self from iter, which means advancing the iterator is consuming self, so there is no way to get the old values again. A non-iterator iterable may return a different object from iter, which means advancing the iterator doesn't have to consume self; you can get the old values again just by calling iter to get a new iterator at the start. A collection, or reiterable, or whatever, could be defined as an iterable that _doesn't_ return self. Or, nearly-equivalently, it could be defined as an iterable that returns a new iterator over the same values (unless mutated in between iter calls), borrowing the distinction that's already in the docs for defining dict and dict view semantics. Of course any useful definition would leave pathological types that are neither iterator nor collection (e.g., an object that returns a new iterator each time, but those iterators destructively modify self), or maybe where they do qualify but misleadingly so (I can't think of any examples). And there may also be cases where it isn't clear (e.g., is a collection of shared memory not a collection because some other process can change its values, or does that count as "unless mutated"? that probably depends on how and why your app is using that shared memory). But that isn't a problem; we're not trying to come up with a definition that could be used to write type-theoretic behavior proofs for Python semantics, but something that's useful in practice for discussing almost all real Python programs with other humans.
In this example, being able to say that value was or was not an iterator or an iterable would in no way help to clarify how the code would behave differently. Saying that it is an iterable or an iterator is just saying that it has or doesn't have .next() and/or .__iter__() methods that follow certain very broad protocols, but what matters for understanding examples like this is what those methods actually DO.
The iterator protocol defines what those methods do. The __iter__ method returns self, and the __next__ method advances self to return the next value. The fact that these semantics aren't checked (even by the ABC or by mypy) doesn't mean they don't exist or are meaningless. The iterable protocol, on the other hand, just tells you that __iter__ returns an iterator. Because iterators and collections are both iterable, just knowing that something is an iterable doesn't tell you whether it's reusable. But knowing that something is a collection (assuming we have a well-defined term "collection") would. Which is exactly the point of the proposal.

On 2016-02-15 14:27, Andrew Barnert wrote:
I still don't understand why you want to use the term "collection" when everything you say about the purpose of the term (as for instance your last paragraph here) suggests you want to know whether it is reusable. To ordinary humans "collection" says nothing about iteration or reusability. If you want to say it's reusable, just say it's reusable. As a bonus, you don't have to get anyone to buy into a term like "collection", or even know the difference between iterators and iterables, because "reusable iterator" and "reusable iterable" (or "re-iterable", or "restartable" or other such terms) already have an obvious interpretation (even if one of them is technically impossible). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Feb 15, 2016, at 14:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
First, the word "reiterable" is acceptable to me (and certainly better than not having any such term)--that's why I said "like 'reiterable' or 'collection'" multiple times. It's an easy-to-remember term, and it's name helps keep it straight. The biggest problem is that it's not an actual word, and it looks and sounds ugly. It's much better than not having a word at all, and forcing people to use phrases like "reusable non-iterator iterable", but it's not ideal. The reason "collection" is better is that it matches an intuitive category that actually means what we want. And the fact that it's not immediately obvious that it does what we want is a _good_ thing, because it means people have one simple thing to learn and remember: you can iterate collections over and over without affecting them. And "collection" stays intuitive when used in compounds like "virtual collection" or "lazy collection" or "infinite collection". So, I can say, "a range is a lazy collection", and you know that means it's not a one-shot iterator, because collections aren't one-shot iterators--"lazy" ones, don't physically hold the stuff, but come up with it when asked, but they're still collections, so they're still not one-shot iterators. Also, notice that we already to use the term "collection" informally to mean exactly what people intuitively think of--including in the documentation of the "collections" module in the stdlib. We certainly don't use the term "reiterable" anywhere (especially when typing with autocorrect on). I think people worry about the case of things that aren't collections but also aren't iterators. But let's look at one: class Random3to6d6: def __iter__(self): for _ in range(randint(3, 6)): yield randint(1, 6) That's clearly not a collection. It's also clearly not an iterator. So, if I want to use it, that's a clear sign that I have to think about it carefully. Is it a reusable or a reiterable? Depends how you define the term. Which means I have to know, and have internalized, the exact definition before "reusable" is any help here. I still have to think about it carefully, but it's less obvious that I have to do so. Again, I think adding "reiterable" to the glossary (and maybe collections.abc and typing) would be a big improvement; I just think adding "collection" would be a slightly better one.

On Mon, Feb 15, 2016 at 10:59 PM Rob Cliffe <rob.cliffe@btinternet.com> wrote:
The glossary entry for "iterator" already says, "A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop." I expect that no one will misunderstand you if you say the word "reiterable" in the context of checking for an exhausted iterator or an iterable that returns a new iterator each time. A glossary entry just opens the opportunity for nitpicks and confusion between a natural interpretation of the word and the glossary definition. If not in the glossary, conversants will know to be careful to (re)define the term as appropriate to the conversation if necessary.

On Feb 15, 2016, at 21:59, Michael Selik <mike@selik.org> wrote:
The glossary entry for "iterator" already says, "A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop."
Well, "container" would be a perfectly good word if we didn't already use it to mean something different: an object with the "in" operator. You can have collections of things that can't efficiently test containment (a lazy linked list would take linear time and space for __contains__), and containment of things that aren't even iterables (e.g., a real interval).
I expect that no one will misunderstand you if you say the word "reiterable" in the context of checking for an exhausted iterator or an iterable that returns a new iterator each time.
The problem isn't so much whether I can use it--I do, in fact, use both "collection" and "reiterable", defining them when necessary (it often isn't necessary--but when it is, at least I only have to define them once). It's whether other people know there's a word they can use. When they don't, they struggle to explain what they mean, and frequently make mistakes, like Paul saying that a range is not an iterable while trying to explain someone else's confusion about iterables, or whoever wrote the docs on dict views saying that they're not iterators but sequences (which took years for someone to notice and fix).

On Feb 15, 2016, at 19:47, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Because "reiterable" is really not immediately obvious. It has no intuitive meaning; it's not even a real word. You have to learn what a reiterable is, by definition. That's no easier than learning a simple fact about collections, and you end up knowing less after that learning. Think of my random example. Is that a reiterable? Well, you can call iter on its multiple times and get different iterators. But they don't iterate the same value. So, yes or no? There's no intuition to guide you there; all you have is a definitional rule that you had to memorize. Similarly, you can't meaningfully attach adjectives to "reiterable" that restrict or expand its scope. "Lazy reiterable" doesn't mean anything; "lazy collection" does. But I've already made all these points and you just skipped over them. And more importantly, as I've already said, I'd be very happy for "reiterable" to be added to the glossary and maybe the ABCs--it's not as good as "collection", but it's a lot better than nothing--so I don't really want to argue against it.

On Tue, Feb 16, 2016 at 1:02 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Seems intuitive to me that reiterable is repeatedly iterable, or at least iterable more than once. I suppose the dictionary would say that the "re-" prefix means "again". The word "non-iterable" isn't in the glossary and people seem to get that it means not iterable. If something is autoiterable it'd iterate over itself (so an iterator is autoiterable?), if something is semiiterable, it'd be kinda-iterable or half-iterable. I'm not sure what that means, but I'll bet I'd understand it in context. Maybe when you iterate over it you don't actually get all the elements. The English language is very flexible and can match Python's flexibility. I think it's the desire to use an inflexible glossary word that led people into the confusions/misuse you've described.
It's whether other people know there's a word they can use.
I doubt a glossary entry would solve that problem. Instead I think premature glossarizing (a new word!) would create confusion as some people use a word with the glossary definition and some use it however the word makes sense in that particular conversation. Perhaps we could try something similar to the progression of a module from personal to PyPI to popularity to standard library. Jargon should first be popular in natural usage (without long debates) on the mailing lists before it gets added to the glossary. I know you really want to add something like this to the glossary, as you've brought it up before. I think a very convincing argument would be referencing a few conversations where you used a term, either word or phrase, to successfully clarify a conversation without needing to debate its definition.

On Feb 15, 2016, at 22:50, Michael Selik <mike@selik.org> wrote:
On Tue, Feb 16, 2016 at 1:02 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
I doubt a glossary entry would solve that problem. Instead I think premature glossarizing (a new word!)
I don't think it's at all premature. This was first suggested somewhere before 2006, and came close to being approved in 2006, but fell apart over bikeshedding issues. A decade later--a decade of watching people make exactly the mistakes that we've seen in this thread--has shown that the problem still needs a solution.

On Mon, Feb 15, 2016 at 10:02:26PM -0800, Andrew Barnert via Python-ideas wrote:
What's a "real word"? Is "closure" a real word? How about "iterator"? "Closure" is in the Oxford dictionary, but it doesn't mention anything about functions and variables from surrounding scopes. "Iterator" isn't in the Oxford dictionary at all; neither is "iterable". As far as intuitive meanings go, "re-" is a standard English prefix meaning (among other things) "again", as in reborn, renew, retake, rewind, retry -- and "reiterate".
I'm not sure I understand how learning the definition of "reiterable" means I know *less* than when I started. Presumably that means I have forgotten something that I previously knew. I hope it wasn't something important.
Think of my random example.
I would, but I've forgotten what it is.
I'm honestly not sure what actual problem you think this word is going to solve. I can see that there is some difficulty with people confusing the concepts of iterables and iterators, and some edge cases where people mistake "lazy sequences" like (x)range for an iterator. I acknowledge those difficulties. But: (1) I don't think those difficulties are worth the thousands of words written in this thread so far. (2) I don't understand how redefining (there's that re- prefix again) "collection" or adding "reiterable" will help. I think the meanings of iterable and iterator are already clearly documented, and people still get them confused, so how will adding "reiterable" help? (3) I don't think that the confusion between lazy sequences and iterators is especially harmful. 9 times out of 10, it's a difference that makes no difference. I like to be pedantic and tell people that (x)range is not an iterator, but if I'm honest to myself, that fact is rarely going to help them write better code. (It won't make them write worse code either.) (4) You've already acknowledged that "collection" currently has a meaning in Python. It's normally used for lists, tuples and dicts, as well as the things in the `collections` module. I don't understand why "collection" is your preferred term for an iterable that can be iterated over multiple times (a *re-iterable* in plain English). I think that this problem is similtaneously too big to solve, and too trivial to care about solving. As I see it, we have a great many related categories of things that can be iterated over. Some of them are represented by concrete or abstract classes in the standard library, some of them aren't. Many of them overlap: * eager sequences, mappings and sets; * lazy sequences, mappings or sets; * things which support the `in` operator (collections); * things that obey the iterator protocol ("iterators"); * things which obey the old sequence protocol; * things which obey the iterator protocol, except for the part about "once they become empty and raise StopIteration, they should always raise StopIteration" -- these are officially called "broken iterators"; * things which are broken iterators with an official API for restarting them (perhaps called "reiterators"?); * reiterators which can be restarted, but won't necessarily give the same results each time; * iterators that provide a look-ahead or look-behind method; * sequences that use the function-call API, like random.random(); * things that you can pass to iter(); * things which you can pass to iter() multiple times, and get iterators which yield the same values each time; * things which you can pass to iter() multiple times, and get iterators which DON'T yield the same values each time; and probably more. I don't think it is practical, or necessary, to try naming them all. We have a good set of names already: - collection for things that support `in`; - iterable for things that can be iterated over; - iterator for a particular subset of iterables; - sequence for things like lists; - lazy sequence for things that are sequences where the items are calculated on demand rather than in advance. I think adding to that set of names is a case of YAGNI. I do see some sense in having a name for iterators with a "restart" method, and the obvious name for that would be reiterator. But I don't think the glossary needs to have that entry until such a time there actually is a restartable iterator in the std lib. I *don't* think there's any need for a name for "iterables apart from iterators". While it is occasionally useful to be able to talk about such, that's not frequent enough that we need a name for it. We can just say "iterables apart from iterators", or we can give concrete examples: "sequences, mappings and sets".
I don't see why you think lazy reiterable doesn't mean anything. It is clearly a reiterable where the items are calculated on demand rather than in advance. What else could it mean?
But I've already made all these points and you just skipped over them.
It's a huge thread, with (no offense) a lot of rambling, pedantic arguments over the number of angels that can dance on the head of a pin (what sort of angels? seraphim, cherubim, ophanim, malakhim, dominions, powers, virtues, or something else? what size pin?). It's easy to get lost in the discussion, partly (I believe) because the problem being solved is so abstract. This seems to me to be a debate over definitions for their own sake, not in order to solve an actual problem. I could be wrong, of course. But if there is anything actually concrete and important being solved by this proposal, it is well and truly hidden in the thousands of words of abstract argument. -- Steve

Andrew Barnert via Python-ideas writes:
Are there really places where we *want* "foo(range(n))" to raise but "foo(iter(range(n)))" to do something useful? In other words, is this really a terminology problem, rather than a problem with the design of APIs that take iterators rather than iterables?

On Feb 15, 2016, at 19:42, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Sure. Any multi-pass algorithm should raise when given an iterator. (Maybe some could instead copy the iterator to a list or something, but as a general rule, silently and unexpectedly jumping from 0 to O(N) space depending on the input type isn't very friendly.) In the other direction, of course, the next function requires an iterator, and should and does raise when given something else. And the slightly higher-level functions discussed in this thread are the same. A function intended to consume and return the first few values in an iterator while leaving the iterator holding the rest is basically just a "multinext", and it would be very confusing if it took a collection and left it holding all the values, including the two you already used. And how do you explain that to a human? You could say "While it's an iterable, it's not an iterator." And that's exactly what Python has said for the last however-many years. And that's exactly what leads to the confusion Paul was talking about (and the additional confusion he inadvertently revealed). And the problem isn't "Paul is an idiot", because I've seen multiple core devs, and other highly experienced Python programmers, make the same mistake. Even the official docs made a similar mistake, and nobody caught it for 4 years. If the current terminology were sufficiently clear and intuitive that nothing needed to be done, these problems wouldn't exist.

Andrew Barnert writes:
On Feb 15, 2016, at 19:42, Stephen J. Turnbull <stephen@xemacs.org> wrote:
That's not my problem. I'm not suggesting that the coercion between iterators and non-iterators should go both directions. If a function that wants a Sequence is handed an iterator it should feel free to raise.
In the other direction, of course, the next function requires an iterator, and should and does raise when given something else.
Sure. But next() *can't* work if the object passed doesn't maintain internal state. I don't think it will bother anybody if next raises on a non-iterator, at least no more than it bothers people when something raises on an attempt to assign to a tuple element. My question is "why are they iter[ator]-tools, not iter[able]-tools?" Mostly because sequences don't need those tools very often, I guess.
And the slightly higher-level functions discussed in this thread are the same.
Their implementations are, yes (except for the lhs ellipsis syntax, which doesn't exist yet).
Greg Ewing evidently thinks that's the natural thing, so I accept that it's not un-natural.<wink/> And you can easily change that behavior *without* perverse effects on space or time with iter(), at the expense of choosing a name, and an assignment statement. But there's an alternative interpretion of what we're looking at here with ellipsis, at least, and that's "multipop", in which case you do expect it to consume a Sequence, and all iterables would behave the same in that respect. Again, you can change that behavior with an appropriate call to iter() (and this time you don't even need to choose a name). I agree that we can't get to win-win-win here, after all both iterators and lists are optimizations of iterable, and that always costs you something. But I wonder if it wouldn't be possible to take the attitude that we should do our best to optimize either list() or iter() out of Python programs. Since the costs of willy-nilly applying list() to arbitrary iterables are obvious, iter() is the one we should try to make implicit where possible. We can't get rid of it entirely (as with multinext vs. multipop, where either choice leaves us wanting to call iter() for some use cases). But maybe we could reduce the frequency of cases where it's all too easy to forget to call it.
If the current terminology were sufficiently clear and intuitive that nothing needed to be done, these problems wouldn't exist.
That takes the current implementation as given, which is exactly what I'm questioning.

On 16.02.2016 06:56, Andrew Barnert via Python-ideas wrote:
I think both is true. The current API, being one of the best available out there, exposes some warts (as you described) which in turn lead to a plethora of abstract wording expressing almost the same: - lists, iterables, iterators, sets, generators, views, range, etc. These datastructures evolved over time and proved useful. In my point of view, what's missing a structure to put them into. If one looks closely, one can see how they differ in usage: * *1) Is it iterable?* *- yes to all (as far as I know)* ** *2) Is it subscriptable?* * - yes: list, range - no: iterator, generator, set, view * *3) Has it a length?* * - yes: list, range, set, view - no: iterator, generator *4) Has it a contains test?* - yes: list, range, set, view - no: iterator, generator * 5) Are items materialized?* - yes: list, set - no: iterator, generator, range, view *6) Can it have mutable underlying data?* - yes: generator, iterator, view - no: list, set, range *7) Does iteration change it?* - yes: iterator, generator - no: list, set, range, view As expected, they almost always have nothing in common except 1). I am perfectly fine with calling these things collections; if I have a container of n things, I almost certainly can go through them one by one. 2) to 4) are usually the most common operations and the safe when implemented. range is like list; view is like set. 5) to 7) are usually things you worry about when you are concerned with performance. As usual, those things are hard to do right. *My suggestions:* Let's call all of them *collections*. Let's call collections *list-like* if they support 2) to 4) and *set-like* if they support 4). Let's call collections *lazy* when they don't fit 5). Let's call collections *view-like* when they feature 6). Let's call collections *iteration-stable* when they don't fit 7). As you can see, for me it's not a matter of inventing new nouns, but a matter of finding the right adjective to further specify the type of collection. Does it help? I don't know but it helped me to structure the concepts. Best, Sven

On Fri, Feb 12, 2016 at 07:09:38AM -0500, Edward Minnix wrote:
That's because count() is an infinite generator.
Would there be a way to add generators to the unpackables, even if it was only in the front?
Generators can already be unpacked. They just have to be finite: py> def gen(): ... yield 1 ... yield 2 ... yield 3 ... yield 4 ... py> a, b, *c = gen() py> a 1 py> b 2 py> c [3, 4] I'm surprised that the unpacking can't be interrupted with Ctrl-C. I think that is a bug. -- Steve

On 12 February 2016 at 22:46, Steven D'Aprano <steve@pearwood.info> wrote:
I'm surprised that the unpacking can't be interrupted with Ctrl-C. I think that is a bug.
It's a consequence of looping in C over an infinite iterator also implemented in C - "sum(itertools.count())" will hang the same way, since control never gets back to the eval loop to notice that Ctrl-C has been pressed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Fri, Feb 12, 2016 at 08:39:53AM -0800, Guido van Rossum wrote:
http://bugs.python.org/issue26351 -- Steve

On Feb 12, 2016, at 04:09, Edward Minnix <egregius313@gmail.com> wrote:
I think what you're _really_ suggesting here is that there should be a way to unpack iterators lazily, presumably by unpacking them into iterators. And if you could come up with a good intuitive way to make that work, that would be a great suggestion. But I don't know of such a way. Here's some background: You already_can_ unpack generators and other iterators: >>> c = (i*2 for i in range(5)) >>> a, b, *rest = c >>> rest [4. 6, 8] It works something like this: >>> _c = iter(c) >>> a = next(_c) >>> b = next(_c) >>> rest = list(_c) The problem is that when you give it an infinite iterator, like count(), that last step is attempting to create an infinite list, which takes infinite time (well, it'll raise a MemoryError at some point--but that could take a long time, especially if it drives your system into swap hell along the way). Even for non-infinite iterators, this is sometimes not what you want, because it means you lose all laziness. For example: >>> f = sock.makefile('r') >>> a, b, *rest = f In these last two cases, what you actually want is something like this: >>> _c = iter(c) >>> a = next(_c) >>> b = next(_c) >>> rest = _c That would finish immediately, with rest as a lazy iterator instead of a list. But is that what you want in the first case? Would it still be what you wanted in "a, b, *rest = [1, 2, 3, 4]"? Many novices would be confused if they wrote that, and then printed out rest and got "<list_iterator at 0x108e49ef0>" instead of "[3, 4]". Maybe that would be acceptable; people would just have to learn to use list(rest) when that's what they want (the same way they do with map, etc.). But it's probably too late for that, for backward compatibility reasons. Also, consider "a, *rest, b = [1, 2, 3, 4]". I don't think there's any way that could be done lazily. Meanwhile, you can always write the expanded version out explicitly. (And you can leave off the first line when you know c is already an iterator.) Or you can use itertools.islice to make it more compact: >>> a, b = itertools.islice(c, 2) >>> rest = c If you can come up with intuitive syntax, or an automated rule, or something else, to distinguish the case where you want a list from the case where you want a lazy iterator (and without breaking backward compatibility or over-complicating the language), then we could avoid that. If not, learn to love itertools. :) (That's actually good advise anyway--anyone writing the kind of code that needs infinite lists will get a lot out of digging deep into what itertools can do for you.)

On 12/02/16 19:54, Andrew Barnert via Python-ideas wrote:
Why not just have an itertools.unpack() - a simple version without argument checking: def unpack(seq, num): it = iter(seq) yield from (i[1] for i in zip(range(num), it)) yield it foo, bar, rest = unpack([1, 2, 3, 4, 5, 6], 2) Because it's in itertools, the expectation is that it has something to do with iterators so the final return value always being an iterator regardless of the original sequence type is reasonable (and is perhaps the only justification for putting it in itertools in the first place ;) ). E.

On Fri, Feb 12, 2016 at 6:01 PM Erik <python@lucidity.plus.com> wrote:
There's some visual dissonance since the ``num`` argument is asking for the number of elements to unpack, but the left-hand of the assignment has num+1 variables. What do you think about just using islice? >>> from itertools import islice >>> it = iter(range(5)) >>> (first, second), rest = islice(it, 0, 2), it >>> first 0 >>> second 1 >>> rest <range_iterator object at 0x1011944b0> I suppose it might read better broken apart into two lines to emphasize that the state changed. >>> first, second = islice(it, 0, 2) >>> rest = it

On Feb 12, 2016, at 15:35, Michael Selik <mike@selik.org> wrote:
Creating a range just to zip with just to throw away the values seems like overcomplicating things. Unless there's some performance benefit to doing it that way, why not just keep it simple? def unpack(seq, num): it = iter(seq) yield from islice(it, num) yield it Or, to be super novice-friendly: def unpack(seq, num): it = iter(seq) for _ in range(num): yield next(it) yield it
foo, bar, rest = unpack([1, 2, 3, 4, 5, 6], 2)
That doesn't really distinguish the "rest" very clearly. Of course we could just change the last line to "yield [it]", and then call it with "foo, bar, *rest =", but that seems cheesy to me. I don't know; if there is something better than islice to be found here, I think you're probably on the right track, but I don't think you're there yet, and I'm not sure there is anything to find. Unpacking syntax just feels "sequency" to me, and in a language with distinct sequences and iterators (instead of, say, lazy sequences like Haskell, or views wrapping many iterators like Swift), I think that means non-lazy. But hopefully I'm wrong. :)
That's exactly what was in my email, as the way to do things today, which he was replying to:
So I think we can assume that he thinks his version improves over that, or he wouldn't have suggested it...

On 13/02/16 00:03, Andrew Barnert wrote:
I agree. I've never used islice() before, so I missed that as a better way of yielding the first 'n' values.
My "suggestion" was simply that perhaps creating a very short wrapper function somewhere that handles whether the sequence is already an iterator or not etc (and using islice() or whatever - I don't really care ;)) would perhaps be a more pragmatic option than trying to squeeze in some syntax change or underlying unpack heuristic/mechanic (which is where I thought the thread was heading). Perhaps I didn't express that very clearly. I'm happy do drop it ;) E.

On 13 February 2016 at 17:59, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
The main problem with that specific spelling of the idea is that it's the inverse of what the bare "*" means when declaring function parameters - there, it's a way of marking the end of the positional arguments, when you want to put keyword only arguments after it. The other problem is that it makes: a *= value and a, b, *= value mean wildly different things. Where these discussions generally end up is: 1. The cases where you actually want "unpack this many values, ignore the rest" are pretty rare 2. When you do really need it, islice handles it 3. Adding new syntax isn't warranted for a relatively rare use case the stdlib already covers Probably the most plausible idea would be a "head()" recipe that does something like: def head(iterable, n): itr = iter(iterable) return tuple(islice(itr, n)), itr Usable as: (a, b), rest = head(iterable, 2) Making it a recipe means folks can customise it as they wish (e.g. omitting the tuple call) However, I'm not sure how much that would actually help, since using islice directly here is already pretty straightforward (once you know about it), and for small numbers of items, you can also just use the next builtin if you know you have an iterator: itr = iter(iterable) a = next(itr) b, c = next(itr), next(itr) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 13 February 2016 at 13:06, Nick Coghlan <ncoghlan@gmail.com> wrote:
1. The cases where you actually want "unpack this many values, ignore the rest" are pretty rare
It's not so much ignore the rest but rather retain the rest for separate consumption. This happens when you want to either peek or split the first item. For example in parsing a csv file: def readcsv(csvfile): csvfile = map(str.split, csvfile) try: fieldnames = next(csvfile) except StopIteration: raise ValueError('Bad csv file') return [dict(zip(fieldnames, line)) for line in csvfile] It would be nicer to write that as something like fieldnames, * = csvfile Another situation where I've wanted that is given an iterable that yields sequences all of the same length I might want to peek the first item to check its length before the loop begins.
2. When you do really need it, islice handles it
That's true so you can do fieldnames, = islice(csvfile, 1) Somehow I don't like that but really it's fine. -- Oscar

On Feb 13, 2016, at 08:36, Oscar Benjamin <oscar.j.benjamin@gmail.com> wrote:
So instead of getting an exception that says "bad csv file" you get one that says "unpacking expected at least 1 value, found 0" or something? That doesn't seem nicer. But if we really do need such functionality commonly, just adding an exception type parameter to next would cover it. Or just putting an exception-wrapping function in the stdlib, so you can just write "fieldnames = exwrap(next, StopIteration, ValueError)(csvfile)". But then anyone can write exwrap as a three-line function today, and I don't think anyone ever does, so I doubt it needs to be standardized... Also, is it worth mentioning that this doesn't actually parse csv files, but whitespace-separated files, without any form of escaping or quoting, and probably doing the wrong thing on short rows? Because there is a really easy way of writing this function that's a lot nicer and a lot shorter and raises meaningful exceptions and actually works: def readcsv(csvfile): return list(csv.DictReader(csvfile)) (Although I'm not sure why you want a list rather than an iterator in the first place.)
Again, next.

Nick Coghlan wrote:
I know, but despite that, this still seems like tbe "obvious" way to spell it to me.
Another possibility is a, b, ... = value
2. When you do really need it, islice handles it
I find that answer unsatisfying, because by using islice I'm telling it to do *more* work, whereas I really want to tell it to do *less* work. It just seems wrong, like a kind of abstraction inversion. -- Greg

On 14 February 2016 at 07:40, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Another possibility is
a, b, ... = value
Now, *that* spelling to turn off the "implicit peek" behaviour in iterable unpacking I quite like. arg_iter = iter(args) command, ... = arg_iter run_command(command, arg_iter) Although again, the main downside would be that "..." here means something rather different from what it means as a subscript element. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan writes:
On 14 February 2016 at 07:40, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Another possibility is
a, b, ... = value
+1 It may be TOOWTDI, I thought of it independently.
But not what it means in natural language, which is basically "continues as expected". That's somewhat different from "*" which has no established meaning in natural language (except "note here"), and which is already heavily used in Python. Steve

On 02/14/2016 08:43 AM, Nick Coghlan wrote:
Assigning to Ellipsis? Interesting idea, but I'd probably go a bit further in the similarity to star-assignment: a, b, *... = value Ellipsis could then be a general "throw-away" lvalue. This would make it possible to say a, ..., b, ... = some_function() i.e. skip exactly one item in unpacking.
Although again, the main downside would be that "..." here means something rather different from what it means as a subscript element.
Keep in mind that Ellipsis is also a legal rvalue anywhere else. I.e. this would be legal (and a no-op): ... = ... But thinking about it, this is also legal at the moment: [] = [] Interestingly, this raises: () = () cheers, Georg

Georg Brandl wrote:
I thought about that too, but it seemed like it would be too confusing -- the above *looks* like it should be skipping an arbitrary number of items. I think this interpretation would be even more inconsistent with existing uses of ... in both Python and English. -- Greg

Georg Brandl writes:
I don't get the "*". Normally that means create or unpack a container, but here the semantics is "leave the container alone". And to my eyes, it's not an assignment to Ellipsis semantically. It's an operator that says "b gets an element, and the tail of value is used somewhere else, don't copy it as a sequence to b". Perhaps a, b ...= consumable_value would make that clearer but it looks strange and ugly to me. What would a, b, ... = some_list mean? (Using the OP's notation without prejudice to other notations.) Would it pop a and b off the list?
That has a rather different meaning in math texts, though. It means "an infinite sequence starting at a with a generic element denoted 'b'". The number of elements between a and b is arbitrary, and typically indicated by writing b as an expression involving an index variable such as i or n.
Keep in mind that Ellipsis is also a legal rvalue anywhere else.
That might kill the operator interpretation.

On Feb 14, 2016, at 10:57, Georg Brandl <g.brandl@gmx.net> wrote:
But we already have a general "throw-away": underscore. You can already write "a, _, b, _ = it" and it will unpack 4 values and throw away 2 of them. And you can also write "a, b, *_ = it" to unpack 2 or more values and throw away all but the first 2. And, for that matter, "a, *_, b= it". There's nothing special about the underscore--you could get the same effect by writing "a, dummy, b, evendummier = it"--but it's conventional. Anyway, what people are looking for in this thread is almost the exact opposite: they don't want to unpack the value and throw it away, they want to leave the value there for later unpacking. (Of course for collections, there's no such distinction, but for iterators there is.)
With your version, where ... is a normal target that just ignores its value, sure. With the original version, where ... means "unpack 0 elements from the iterable and stop", it would presumably raise a TypeError("'ellipsis' object is not iterable").
But thinking about it, this is also legal at the moment:
[] = []
Yes, but that's completely different. The [] on the left isn't an expression, or even a target, but a target list with 0 targets in it. Assignment to target lists is defined recursively, so assigning to 0 targets is legal iff you're unpacking 0 values. The fact that you have specifically [] on the right side is irrelevant. You can get the same effect by writing [] = (), or [] = {}, or [] = (i for i in range(5) if i<0). And clearly, you're not assigning to "the empty list", because each empty list created with [] is a distinct object. Anyway, ... is a constant; it's currently illegal as a target because of the rule added in 3.0 banning assignment to a handful of special constants (..., None, etc.). If you have it a special meaning as a target, then of course that rule no longer applies.
Interestingly, this raises:
() = ()
This just doesn't parse, because the target-list grammar doesn't have the same special case for () as the parenthesized expression grammar. (Remember, both tuples and target lists are made by commas, not parens, and () really is a special case.) If such a rule were added, then it would presumably mean the same thing as [] = (). But what would be the point of adding that rule? (You could argue that the inconsistency makes the language harder to remember, but it's been this way for decades, and nobody notices it until they've been using Python for years, so that's not very compelling.)

On 02/15/2016 07:31 AM, Andrew Barnert via Python-ideas wrote:
I kinda know, having written the PEP and implementation for the latter two. It was just a quick, not very well thought-out idea of how to generalize Nick's interesting suggestion.
Yes, what you're saying can be expressed as "of course it's legal, since it's legal". Please step back a bit and look at this from the "how it looks" perspective, not from the "what the grammar rules" perspective. My observation was simple that we already have the case of X = X where both Xes are the same syntax, but have a wildly differing interpretation.

On Feb 14, 2016, at 23:30, Georg Brandl <g.brandl@gmx.net> wrote:
No; common sense isn't the same thing as tautology. It has an obvious semantics, there's no good reason to ban it, and it comes for free with the simplest grammar--therefore, it's common sense that the language should allow it. It's only because Python generally does such a great job following common sense that you don't notice. :) And it's similar common sense that your version of "..." should make "... = ..." legal, while Nick's version should make it illegal.
x = x" is a no-op (assuming x was already bound) and "False = False" is an error. Given that, and the fact that (unlike those cases) "[] = []" _doesn't_ have the same syntax for the two Xes, I don't see what the observation demonstrates.

On 15.02.2016 07:31, Andrew Barnert via Python-ideas wrote:
I agree.
Interestingly, doing the following results in an syntax error.
[1,2]=[3,4]
File "<input>", line 1 SyntaxError: can't assign to literal So, if it's illegal to assign to the literal [1,2], I don't see why it should be legal for []. But as you said, that's a highly theoretical problem. That just reminds me of the js part of https://www.destroyallsoftware.com/talks/wat Best, Sven

On Mon, Feb 15, 2016 at 02:55:26PM +0100, Sven R. Kunze wrote:
On 15.02.2016 07:31, Andrew Barnert via Python-ideas wrote:
[..]
I believe you are misinterpreting the error. The error isn't that you are trying to assign to the literal [1,2]. The error is that you are trying to assign to the literal 1. It's a subtle difference, but important to understand that difference in order to understand why [] is a legal assignment target. py> [a, b, c, 1] = "abcd" File "<stdin>", line 1 SyntaxError: can't assign to literal Obviously [a, b, c, 1] is not a literal, but 1 is. If there is any doubt: py> [a, b, c, None] = "abcd" File "<stdin>", line 1 SyntaxError: cannot assign to None Since [1,2] is a *list of assignment targets*, not a single target, the assignment you attempted [1, 2] = [3, 4] is roughly equivalent to: 1 = 3 2 = 4 which obviously gives a syntax error. [] is a valid assignment target so long as the right hand side is an empty iterable. If it is not empty, you get a TypeError: py> [] = "abc" Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: too many values to unpack There are three values on the right hand side, and zero targets. This might even be useful. You can confirm that an iterable is empty (why you might want to do this, I can't imagine, but suppose you did) by assigning it to zero targets: [] = iterable succeeds only if iterable has zero items, otherwise it raises TypeError. -- Steve

On 15.02.2016 15:42, Steven D'Aprano wrote:
You are indeed right. And while we are on it, that reminds me of KeyError. I always feel annoyed by the fact that Python doesn't tell me what key a dict is missing. So, if I were to make a suggestion, I would like to see the issue-causing thing to be mentioned in those two types of exceptions.
Completely true and nothing to add; except, as Georg already noted, the "as it looks" experience is somewhat weird and I always would consider this a syntax error (don't ask me why). Best, Sven

On Feb 15, 2016, at 07:13, Sven R. Kunze <srkunze@mail.de> wrote:
IIRC, there's a series of bugs on adding more useful information to the builtin exceptions, which includes putting the key in the error message _and_ as an attribute of the exception (a la filename in the filesystem exceptions), but they've been waiting on someone to do the work for a few years now. If so, there's an obvious way to kick-start the solution: find the bug and write a patch. (I suspect some people would object to fixing KeyError without also fixing IndexError, and want to argue about whether the key member goes in each exception directly or is part of LookupError, and so on, and there's the more substantive issue of keeping alive potentially-large keys, and so on, so it probably won't be as simple as "submit a trivial patch and it gets accepted", but at least having a patch would get the process moving.) But for SyntaxError, there is no object to stick in the error object or error message, just a source token or an AST node. Both of those have pointers back to start and end in the source string, and that's what goes into the SyntaxError, which is already shown to the user by printing the relevant line of source and pointing a caret at it. When that's insufficient or misleading (as with the ever-popular missed-')' errors), printing the token or AST node itself would probably just make it more misleading. And, as for putting it in the error object, it's pretty rare that anyone handles SyntaxError in code (unlike KeyError, FileNotFoundError, etc.), so that seems a lot less useful. Maybe there is a clean way to improve SyntaxError akin to KeyError, but it would take a lot more thought and design (and probably argument), so I'd work on KeyError first.

On Mon, Feb 15, 2016 at 01:55:32PM -0800, Andrew Barnert via Python-ideas wrote:
Showing the missing key in the error message goes all the way back to Python 1.5: [steve@ando ~]$ python1.5 -c '{}["spam"]' Traceback (innermost last): File "<string>", line 1, in ? KeyError: spam Admittedly, the exception object itself doesn't keep a reference to the missing key, so you can't programmatically query it for the key, but normally that's not a problem since you just tried to look it up so you should still have the key.
See discussion here: http://bugs.python.org/issue18162 Also http://bugs.python.org/issue1182143 And the PEP: https://www.python.org/dev/peps/pep-0473/
Sadly, CPython doesn't manage to even display the caret at all: [steve@ando ~]$ python -c "[a, 2] = 'xy'" File "<string>", line 1 SyntaxError: can't assign to literal Jython gives a much more informative error: steve@orac:~$ jython -c "[a, 2] = 'ab'" File "<string>", line 1 [a, 2] = 'ab' ^ SyntaxError: can't assign to number -- Steve

Well, there's no `key` attribute for example, but the KeyError exception has exactly one argument, the missing key.
So, you *can* query it for the missing key, even though it's a bit ugly. But as far as TOOWTDI goes, I think this is fine as-is. This is just my opinion, though :) -- Emanuel

On 16.02.2016 01:34, Steven D'Aprano wrote:
Sorry. I confused KeyError with IndexError (bracket all over the place).
That's true. However, most the time I see such tracebacks while sifting through server logs. So, the more data an exception exposes, the better. I wouldn't even mind an excerpt of the dict/list itself in order to get a faster understanding of the problem domain (wrong type of keys, etc.).
That seems to be a great idea. Best, Sven

On 15.02.2016 23:11, Sven R. Kunze wrote:
In fact, I have been asked several times by some older Python developers on what the Pythonic way of doing exactly this is. Like this: Dev: Sven, do you know a better way to get THE item from single-item list rather than just [0]? Sven: huh? *distracted* yeah *thinking* why not [0]? seems simple enough to get the intent *back to topic* I think because of that thread, I won't forget anymore. ;-) Best, Sven

On Mon, Feb 15, 2016 at 06:53:45PM +0100, Sven R. Kunze wrote:
No, it's exactly the same Python idiom (assignment to a list of targets) as we've been talking about for the last few posts. We've had examples with four targets, three targets, two targets and zero targets. This is an example with one target. [a] = iterable requires that the right-hand side be iterable, and after unpacking it must contain exactly one item, to match the one assignment target given on the left. -- Steve

On Feb 15, 2016, at 15:43, Steven D'Aprano <steve@pearwood.info> wrote:
I see this more often written in tuple-ish form: index, value = struct.unpack_from('!HH', buf, off) namelen, = struct.unpack_from('!H', buf, off+4) Somehow, that feels more natural and pythonic than using [index, value] and [namelen], despite the fact that without the brackets it's easy to miss the trailing comma and end up with (16,) instead of 16 as your length and a confusing TypeError a few lines down.

On Feb 16, 2016, at 02:11, Sven R. Kunze <srkunze@mail.de> wrote:
It's not a tuple _with_ the parentheses either: it's a parenthesized target list, which has a syntax and semantics that are, while not completely unrelated to tuple, definitely not the same. One way in which it _is_ like a tuple is that this doesn't help: (namelen) = struct.unpack_from('!H', buf, off+4) Just like tuples, parenthesized target lists need commas, so this will bind namelen to the tuple (16,) instead of to 16:

On 16.02.2016 00:43, Steven D'Aprano wrote:
No, it's exactly the same Python idiom (assignment to a list of targets) as we've been talking about for the last few posts.
I think we better distinguish between idioms and language features.
Of course, it's quite straightforward once you ponder about it. I recently talked to a coworker about this. The concrete example is about "How do I get the one-and-only element of a **set** which obviously does not support subscripting". Another aspect, I came to think of is the following asymmetry: a, b, c, d = mylist4 # works a, b, c = mylist3 # also works a, b = mylist2 # works too [a] = mylist1 # special case? One might resolve the asymmetry by writing: [a, b, c, d] = mylist4 [a, b, c] = mylist3 [a, b] = mylist2 [a] = mylist1 # fits in [] = mylist0 # even that is possible now Thus, the parentheses-less variants are special cases. However, when glancing over our production source, the parentheses-less variant is rather the norm than a special case (haven't even seen nested ones). I suspect a special-character-phobia (ever used a German keyboard ;-) ?) -- Why should I write '''[a, b] = [b, a]''' when '''a, b = b, a''' suffices? So, it seems to me that inducing the 1-target and 0-target concepts from the norm is not that as easy as you might believe. Last but not least, most devs are not used to magic on the lhs. Imagine how weird and interesting you would find that the following being possible in Python: mylist = [6, 7] a*3, b+5 = mylist Best, Sven

On Wed, Feb 17, 2016 at 4:54 AM, Georg Brandl <g.brandl@gmx.net> wrote:
The one-target case is actually available to everything else; the only difference is that the trailing comma is mandatory, not optional:
a,b,c, = range(3)
Which is, again, the same as with tuples. ChrisA

Sven R. Kunze wrote:
But [1,2] is not a literal -- the individual elements 1 and 2 are. It's the inability to assign to those elements that makes it illegal. On the other hand, [] doesn't contain any elements that it's illegal to assign to, so there's no reason to reject it. But it doesn't contain any elements that it's legal to assign to either, so you could say there's no reason to accept it. This is a philosophical question. When you've eaten the last chocolate, do you have an empty box of chocolates, or just an empty box? -- Greg

On 14 February 2016 at 07:43, Nick Coghlan <ncoghlan@gmail.com> wrote:
IMO, the other downside is that the semantic difference between a, b, ... = value and a, b, *_ = value is very subtle, and (even worse) only significant if value is an iterable as opposed to a concrete container such as a list. IMO, explicit is better than implicit here, and itertools.islice is the right way to go:
(And of course in my first attempt I forgot I needed iter(range(...)) as range is not a pure iterable - proving my point about the subtle semantics!) Paul

On Feb 15, 2016, at 00:12, Paul Moore <p.f.moore@gmail.com> wrote:
You mean iterator, not iterable. And being "concrete" has nothing to do with it--a dict view, a memoryview, a NumPy slice, etc. aren't iterators any more than a list is. This is exactly why I think we need an official term like "collection" for "iterables that are not iterators" (or "iterables whose __iter__ doesn't return self" or similar). People struggling to come up with terms end up confusing themselves--not just about wording, but about actual concepts. As proven below:
Ranges are as iterable as both lists and iterators. You're a smart guy, and you know Python. So why do you make this mistake? Because you don't have a term to fit "range" into, so your brain struggles between two prototypes--is it like a list, or like a generator? Well, it's lazy, so it's like a generator, so you don't need to call iter here, right? Nope. This is the same as the experiment where they give people the English cousin rule and asked them to evaluate which family relationships count as cousins. People whose native language has no word for "cousin"--e.g., because they have unrelated words for "maternal cousin" and "paternal cousin"--make a lot more mistakes than people whose language has a word that matches the rule.

On 15 February 2016 at 08:39, Andrew Barnert <abarnert@yahoo.com> wrote:
Precisely :-) Nor is a range object, see the (other!) mistake I made.
This is exactly why I think we need an official term like "collection" for "iterables that are not iterators" (or "iterables whose __iter__ doesn't return self" or similar). People struggling to come up with terms end up confusing themselves--not just about wording, but about actual concepts. As proven below:
Indeed.
Thanks for providing another perspective on my point. This whole area is rather too full of semantic confusion (I'm not as sure as you seem to be that agreeing on a terminology will address that issue, but that's a side matter) and I think we should be very careful about introducing syntax that requires a good understanding of the concepts to use correctly, when there's a perfectly acceptable explicit form (even if it is slightly more verbose/repetitive). Paul

On Mon, Feb 15, 2016 at 7:39 PM, Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
Allow me to take this opportunity to reiterate the recommendation for "reiterable", which has come up enough times that I've completely forgotten who first came up with it. An iterable can be iterated over at least once; a reiterable can be iterated over more than once, and will often produce the same sequence of values each time. ChrisA

On 2016-02-15 00:39, Andrew Barnert via Python-ideas wrote:
I still don't think that is at all what we need, as this example shows. Whether the value is an iterable or an iterator is not relevant. Whether the iterable's iterator is self is not relevant. What is relevant is the difference in *behavior* --- namely, whether you can rewind, restart, or otherwise retrieve already-obtained values from the object, or whether advancing it is an irreversible operation and there is no way to get old values except by storing them yourself. In this example, being able to say that value was or was not an iterator or an iterable would in no way help to clarify how the code would behave differently. Saying that it is an iterable or an iterator is just saying that it has or doesn't have .next() and/or .__iter__() methods that follow certain very broad protocols, but what matters for understanding examples like this is what those methods actually DO. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Feb 15, 2016, at 01:21, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
You're making the same mistake that this idea was meant to cure: iterators _are_ iterables. Which means this is vacuously true. But whether the value is a collection/reiterable/whatever or an iterator is exactly what's relevant.
Whether the iterable's iterator is self is not relevant. What is relevant is the difference in *behavior* --- namely, whether you can rewind, restart, or otherwise retrieve already-obtained values from the object, or whether advancing it is an irreversible operation and there is no way to get old values except by storing them yourself.
An iterator returns self from iter, which means advancing the iterator is consuming self, so there is no way to get the old values again. A non-iterator iterable may return a different object from iter, which means advancing the iterator doesn't have to consume self; you can get the old values again just by calling iter to get a new iterator at the start. A collection, or reiterable, or whatever, could be defined as an iterable that _doesn't_ return self. Or, nearly-equivalently, it could be defined as an iterable that returns a new iterator over the same values (unless mutated in between iter calls), borrowing the distinction that's already in the docs for defining dict and dict view semantics. Of course any useful definition would leave pathological types that are neither iterator nor collection (e.g., an object that returns a new iterator each time, but those iterators destructively modify self), or maybe where they do qualify but misleadingly so (I can't think of any examples). And there may also be cases where it isn't clear (e.g., is a collection of shared memory not a collection because some other process can change its values, or does that count as "unless mutated"? that probably depends on how and why your app is using that shared memory). But that isn't a problem; we're not trying to come up with a definition that could be used to write type-theoretic behavior proofs for Python semantics, but something that's useful in practice for discussing almost all real Python programs with other humans.
In this example, being able to say that value was or was not an iterator or an iterable would in no way help to clarify how the code would behave differently. Saying that it is an iterable or an iterator is just saying that it has or doesn't have .next() and/or .__iter__() methods that follow certain very broad protocols, but what matters for understanding examples like this is what those methods actually DO.
The iterator protocol defines what those methods do. The __iter__ method returns self, and the __next__ method advances self to return the next value. The fact that these semantics aren't checked (even by the ABC or by mypy) doesn't mean they don't exist or are meaningless. The iterable protocol, on the other hand, just tells you that __iter__ returns an iterator. Because iterators and collections are both iterable, just knowing that something is an iterable doesn't tell you whether it's reusable. But knowing that something is a collection (assuming we have a well-defined term "collection") would. Which is exactly the point of the proposal.

On 2016-02-15 14:27, Andrew Barnert wrote:
I still don't understand why you want to use the term "collection" when everything you say about the purpose of the term (as for instance your last paragraph here) suggests you want to know whether it is reusable. To ordinary humans "collection" says nothing about iteration or reusability. If you want to say it's reusable, just say it's reusable. As a bonus, you don't have to get anyone to buy into a term like "collection", or even know the difference between iterators and iterables, because "reusable iterator" and "reusable iterable" (or "re-iterable", or "restartable" or other such terms) already have an obvious interpretation (even if one of them is technically impossible). -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

On Feb 15, 2016, at 14:41, Brendan Barnwell <brenbarn@brenbarn.net> wrote:
First, the word "reiterable" is acceptable to me (and certainly better than not having any such term)--that's why I said "like 'reiterable' or 'collection'" multiple times. It's an easy-to-remember term, and it's name helps keep it straight. The biggest problem is that it's not an actual word, and it looks and sounds ugly. It's much better than not having a word at all, and forcing people to use phrases like "reusable non-iterator iterable", but it's not ideal. The reason "collection" is better is that it matches an intuitive category that actually means what we want. And the fact that it's not immediately obvious that it does what we want is a _good_ thing, because it means people have one simple thing to learn and remember: you can iterate collections over and over without affecting them. And "collection" stays intuitive when used in compounds like "virtual collection" or "lazy collection" or "infinite collection". So, I can say, "a range is a lazy collection", and you know that means it's not a one-shot iterator, because collections aren't one-shot iterators--"lazy" ones, don't physically hold the stuff, but come up with it when asked, but they're still collections, so they're still not one-shot iterators. Also, notice that we already to use the term "collection" informally to mean exactly what people intuitively think of--including in the documentation of the "collections" module in the stdlib. We certainly don't use the term "reiterable" anywhere (especially when typing with autocorrect on). I think people worry about the case of things that aren't collections but also aren't iterators. But let's look at one: class Random3to6d6: def __iter__(self): for _ in range(randint(3, 6)): yield randint(1, 6) That's clearly not a collection. It's also clearly not an iterator. So, if I want to use it, that's a clear sign that I have to think about it carefully. Is it a reusable or a reiterable? Depends how you define the term. Which means I have to know, and have internalized, the exact definition before "reusable" is any help here. I still have to think about it carefully, but it's less obvious that I have to do so. Again, I think adding "reiterable" to the glossary (and maybe collections.abc and typing) would be a big improvement; I just think adding "collection" would be a slightly better one.

On Mon, Feb 15, 2016 at 10:59 PM Rob Cliffe <rob.cliffe@btinternet.com> wrote:
The glossary entry for "iterator" already says, "A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop." I expect that no one will misunderstand you if you say the word "reiterable" in the context of checking for an exhausted iterator or an iterable that returns a new iterator each time. A glossary entry just opens the opportunity for nitpicks and confusion between a natural interpretation of the word and the glossary definition. If not in the glossary, conversants will know to be careful to (re)define the term as appropriate to the conversation if necessary.

On Feb 15, 2016, at 21:59, Michael Selik <mike@selik.org> wrote:
The glossary entry for "iterator" already says, "A container object (such as a list) produces a fresh new iterator each time you pass it to the iter() function or use it in a for loop."
Well, "container" would be a perfectly good word if we didn't already use it to mean something different: an object with the "in" operator. You can have collections of things that can't efficiently test containment (a lazy linked list would take linear time and space for __contains__), and containment of things that aren't even iterables (e.g., a real interval).
I expect that no one will misunderstand you if you say the word "reiterable" in the context of checking for an exhausted iterator or an iterable that returns a new iterator each time.
The problem isn't so much whether I can use it--I do, in fact, use both "collection" and "reiterable", defining them when necessary (it often isn't necessary--but when it is, at least I only have to define them once). It's whether other people know there's a word they can use. When they don't, they struggle to explain what they mean, and frequently make mistakes, like Paul saying that a range is not an iterable while trying to explain someone else's confusion about iterables, or whoever wrote the docs on dict views saying that they're not iterators but sequences (which took years for someone to notice and fix).

On Feb 15, 2016, at 19:47, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
Because "reiterable" is really not immediately obvious. It has no intuitive meaning; it's not even a real word. You have to learn what a reiterable is, by definition. That's no easier than learning a simple fact about collections, and you end up knowing less after that learning. Think of my random example. Is that a reiterable? Well, you can call iter on its multiple times and get different iterators. But they don't iterate the same value. So, yes or no? There's no intuition to guide you there; all you have is a definitional rule that you had to memorize. Similarly, you can't meaningfully attach adjectives to "reiterable" that restrict or expand its scope. "Lazy reiterable" doesn't mean anything; "lazy collection" does. But I've already made all these points and you just skipped over them. And more importantly, as I've already said, I'd be very happy for "reiterable" to be added to the glossary and maybe the ABCs--it's not as good as "collection", but it's a lot better than nothing--so I don't really want to argue against it.

On Tue, Feb 16, 2016 at 1:02 AM Andrew Barnert via Python-ideas < python-ideas@python.org> wrote:
Seems intuitive to me that reiterable is repeatedly iterable, or at least iterable more than once. I suppose the dictionary would say that the "re-" prefix means "again". The word "non-iterable" isn't in the glossary and people seem to get that it means not iterable. If something is autoiterable it'd iterate over itself (so an iterator is autoiterable?), if something is semiiterable, it'd be kinda-iterable or half-iterable. I'm not sure what that means, but I'll bet I'd understand it in context. Maybe when you iterate over it you don't actually get all the elements. The English language is very flexible and can match Python's flexibility. I think it's the desire to use an inflexible glossary word that led people into the confusions/misuse you've described.
It's whether other people know there's a word they can use.
I doubt a glossary entry would solve that problem. Instead I think premature glossarizing (a new word!) would create confusion as some people use a word with the glossary definition and some use it however the word makes sense in that particular conversation. Perhaps we could try something similar to the progression of a module from personal to PyPI to popularity to standard library. Jargon should first be popular in natural usage (without long debates) on the mailing lists before it gets added to the glossary. I know you really want to add something like this to the glossary, as you've brought it up before. I think a very convincing argument would be referencing a few conversations where you used a term, either word or phrase, to successfully clarify a conversation without needing to debate its definition.

On Feb 15, 2016, at 22:50, Michael Selik <mike@selik.org> wrote:
On Tue, Feb 16, 2016 at 1:02 AM Andrew Barnert via Python-ideas <python-ideas@python.org> wrote:
I doubt a glossary entry would solve that problem. Instead I think premature glossarizing (a new word!)
I don't think it's at all premature. This was first suggested somewhere before 2006, and came close to being approved in 2006, but fell apart over bikeshedding issues. A decade later--a decade of watching people make exactly the mistakes that we've seen in this thread--has shown that the problem still needs a solution.

On Mon, Feb 15, 2016 at 10:02:26PM -0800, Andrew Barnert via Python-ideas wrote:
What's a "real word"? Is "closure" a real word? How about "iterator"? "Closure" is in the Oxford dictionary, but it doesn't mention anything about functions and variables from surrounding scopes. "Iterator" isn't in the Oxford dictionary at all; neither is "iterable". As far as intuitive meanings go, "re-" is a standard English prefix meaning (among other things) "again", as in reborn, renew, retake, rewind, retry -- and "reiterate".
I'm not sure I understand how learning the definition of "reiterable" means I know *less* than when I started. Presumably that means I have forgotten something that I previously knew. I hope it wasn't something important.
Think of my random example.
I would, but I've forgotten what it is.
I'm honestly not sure what actual problem you think this word is going to solve. I can see that there is some difficulty with people confusing the concepts of iterables and iterators, and some edge cases where people mistake "lazy sequences" like (x)range for an iterator. I acknowledge those difficulties. But: (1) I don't think those difficulties are worth the thousands of words written in this thread so far. (2) I don't understand how redefining (there's that re- prefix again) "collection" or adding "reiterable" will help. I think the meanings of iterable and iterator are already clearly documented, and people still get them confused, so how will adding "reiterable" help? (3) I don't think that the confusion between lazy sequences and iterators is especially harmful. 9 times out of 10, it's a difference that makes no difference. I like to be pedantic and tell people that (x)range is not an iterator, but if I'm honest to myself, that fact is rarely going to help them write better code. (It won't make them write worse code either.) (4) You've already acknowledged that "collection" currently has a meaning in Python. It's normally used for lists, tuples and dicts, as well as the things in the `collections` module. I don't understand why "collection" is your preferred term for an iterable that can be iterated over multiple times (a *re-iterable* in plain English). I think that this problem is similtaneously too big to solve, and too trivial to care about solving. As I see it, we have a great many related categories of things that can be iterated over. Some of them are represented by concrete or abstract classes in the standard library, some of them aren't. Many of them overlap: * eager sequences, mappings and sets; * lazy sequences, mappings or sets; * things which support the `in` operator (collections); * things that obey the iterator protocol ("iterators"); * things which obey the old sequence protocol; * things which obey the iterator protocol, except for the part about "once they become empty and raise StopIteration, they should always raise StopIteration" -- these are officially called "broken iterators"; * things which are broken iterators with an official API for restarting them (perhaps called "reiterators"?); * reiterators which can be restarted, but won't necessarily give the same results each time; * iterators that provide a look-ahead or look-behind method; * sequences that use the function-call API, like random.random(); * things that you can pass to iter(); * things which you can pass to iter() multiple times, and get iterators which yield the same values each time; * things which you can pass to iter() multiple times, and get iterators which DON'T yield the same values each time; and probably more. I don't think it is practical, or necessary, to try naming them all. We have a good set of names already: - collection for things that support `in`; - iterable for things that can be iterated over; - iterator for a particular subset of iterables; - sequence for things like lists; - lazy sequence for things that are sequences where the items are calculated on demand rather than in advance. I think adding to that set of names is a case of YAGNI. I do see some sense in having a name for iterators with a "restart" method, and the obvious name for that would be reiterator. But I don't think the glossary needs to have that entry until such a time there actually is a restartable iterator in the std lib. I *don't* think there's any need for a name for "iterables apart from iterators". While it is occasionally useful to be able to talk about such, that's not frequent enough that we need a name for it. We can just say "iterables apart from iterators", or we can give concrete examples: "sequences, mappings and sets".
I don't see why you think lazy reiterable doesn't mean anything. It is clearly a reiterable where the items are calculated on demand rather than in advance. What else could it mean?
But I've already made all these points and you just skipped over them.
It's a huge thread, with (no offense) a lot of rambling, pedantic arguments over the number of angels that can dance on the head of a pin (what sort of angels? seraphim, cherubim, ophanim, malakhim, dominions, powers, virtues, or something else? what size pin?). It's easy to get lost in the discussion, partly (I believe) because the problem being solved is so abstract. This seems to me to be a debate over definitions for their own sake, not in order to solve an actual problem. I could be wrong, of course. But if there is anything actually concrete and important being solved by this proposal, it is well and truly hidden in the thousands of words of abstract argument. -- Steve

Andrew Barnert via Python-ideas writes:
Are there really places where we *want* "foo(range(n))" to raise but "foo(iter(range(n)))" to do something useful? In other words, is this really a terminology problem, rather than a problem with the design of APIs that take iterators rather than iterables?

On Feb 15, 2016, at 19:42, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Sure. Any multi-pass algorithm should raise when given an iterator. (Maybe some could instead copy the iterator to a list or something, but as a general rule, silently and unexpectedly jumping from 0 to O(N) space depending on the input type isn't very friendly.) In the other direction, of course, the next function requires an iterator, and should and does raise when given something else. And the slightly higher-level functions discussed in this thread are the same. A function intended to consume and return the first few values in an iterator while leaving the iterator holding the rest is basically just a "multinext", and it would be very confusing if it took a collection and left it holding all the values, including the two you already used. And how do you explain that to a human? You could say "While it's an iterable, it's not an iterator." And that's exactly what Python has said for the last however-many years. And that's exactly what leads to the confusion Paul was talking about (and the additional confusion he inadvertently revealed). And the problem isn't "Paul is an idiot", because I've seen multiple core devs, and other highly experienced Python programmers, make the same mistake. Even the official docs made a similar mistake, and nobody caught it for 4 years. If the current terminology were sufficiently clear and intuitive that nothing needed to be done, these problems wouldn't exist.

Andrew Barnert writes:
On Feb 15, 2016, at 19:42, Stephen J. Turnbull <stephen@xemacs.org> wrote:
That's not my problem. I'm not suggesting that the coercion between iterators and non-iterators should go both directions. If a function that wants a Sequence is handed an iterator it should feel free to raise.
In the other direction, of course, the next function requires an iterator, and should and does raise when given something else.
Sure. But next() *can't* work if the object passed doesn't maintain internal state. I don't think it will bother anybody if next raises on a non-iterator, at least no more than it bothers people when something raises on an attempt to assign to a tuple element. My question is "why are they iter[ator]-tools, not iter[able]-tools?" Mostly because sequences don't need those tools very often, I guess.
And the slightly higher-level functions discussed in this thread are the same.
Their implementations are, yes (except for the lhs ellipsis syntax, which doesn't exist yet).
Greg Ewing evidently thinks that's the natural thing, so I accept that it's not un-natural.<wink/> And you can easily change that behavior *without* perverse effects on space or time with iter(), at the expense of choosing a name, and an assignment statement. But there's an alternative interpretion of what we're looking at here with ellipsis, at least, and that's "multipop", in which case you do expect it to consume a Sequence, and all iterables would behave the same in that respect. Again, you can change that behavior with an appropriate call to iter() (and this time you don't even need to choose a name). I agree that we can't get to win-win-win here, after all both iterators and lists are optimizations of iterable, and that always costs you something. But I wonder if it wouldn't be possible to take the attitude that we should do our best to optimize either list() or iter() out of Python programs. Since the costs of willy-nilly applying list() to arbitrary iterables are obvious, iter() is the one we should try to make implicit where possible. We can't get rid of it entirely (as with multinext vs. multipop, where either choice leaves us wanting to call iter() for some use cases). But maybe we could reduce the frequency of cases where it's all too easy to forget to call it.
If the current terminology were sufficiently clear and intuitive that nothing needed to be done, these problems wouldn't exist.
That takes the current implementation as given, which is exactly what I'm questioning.

On 16.02.2016 06:56, Andrew Barnert via Python-ideas wrote:
I think both is true. The current API, being one of the best available out there, exposes some warts (as you described) which in turn lead to a plethora of abstract wording expressing almost the same: - lists, iterables, iterators, sets, generators, views, range, etc. These datastructures evolved over time and proved useful. In my point of view, what's missing a structure to put them into. If one looks closely, one can see how they differ in usage: * *1) Is it iterable?* *- yes to all (as far as I know)* ** *2) Is it subscriptable?* * - yes: list, range - no: iterator, generator, set, view * *3) Has it a length?* * - yes: list, range, set, view - no: iterator, generator *4) Has it a contains test?* - yes: list, range, set, view - no: iterator, generator * 5) Are items materialized?* - yes: list, set - no: iterator, generator, range, view *6) Can it have mutable underlying data?* - yes: generator, iterator, view - no: list, set, range *7) Does iteration change it?* - yes: iterator, generator - no: list, set, range, view As expected, they almost always have nothing in common except 1). I am perfectly fine with calling these things collections; if I have a container of n things, I almost certainly can go through them one by one. 2) to 4) are usually the most common operations and the safe when implemented. range is like list; view is like set. 5) to 7) are usually things you worry about when you are concerned with performance. As usual, those things are hard to do right. *My suggestions:* Let's call all of them *collections*. Let's call collections *list-like* if they support 2) to 4) and *set-like* if they support 4). Let's call collections *lazy* when they don't fit 5). Let's call collections *view-like* when they feature 6). Let's call collections *iteration-stable* when they don't fit 7). As you can see, for me it's not a matter of inventing new nouns, but a matter of finding the right adjective to further specify the type of collection. Does it help? I don't know but it helped me to structure the concepts. Best, Sven
participants (22)
-
Andrew Barnert
-
Brendan Barnwell
-
Chris Angelico
-
Edward Minnix
-
Erik
-
Ethan Furman
-
Georg Brandl
-
Greg Ewing
-
Guido van Rossum
-
Ian Kelly
-
Jared Grubb
-
Jonathan Goble
-
Michael Selik
-
Mike Müller
-
Nick Coghlan
-
Oscar Benjamin
-
Paul Moore
-
Rob Cliffe
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven R. Kunze
-
Émanuel Barry