Unpacking in tuple/list/set/dict comprehensions
Extended unpacking notation (* and **) from PEP 448 gives us great ways to concatenate a few iterables or dicts: ``` (*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts # roughly equivalent to dict1 | dict2 | dict3 thanks to PEP 584 ``` I propose (not for the first time) that similarly concatenating an unknown number of iterables or dicts should be possible via comprehensions: ``` (*it for it in its) # tuple with the concatenation of iterables in 'its' [*it for it in its] # list with the concatenation of iterables in 'its' {*it for it in its} # set with the union of iterables in 'its' {**d for d in dicts} # dict with the combination of dicts in 'dicts' ``` The above is my attempt to argue that the proposed notation is natural: `[*it for it in its]` is exactly analogous to `[*its[0], *its[1], ..., *its[len(its)-1]]`. There are other ways to do this, of course: ``` [x for it in its for x in it] itertools.chain(*its) sum(it for it in its, []) functools.reduce(operator.concat, its, []) ``` But none are as concise and (to me, and hopefully others who understand * notation) as intuitive. For example, I recently wanted to write a recursion like so, which accumulated a set of results from within a tree structure: ``` def recurse(node): # base case omitted return {*recurse(child) for child in node.children} ``` In fact, I am teaching a class and just asked a question on a written exam for which several students wrote this exact code in their solution (which inspired writing this message). So I do think it's quite intuitive, even to those relatively new to Python. Now, on to previous proposals. I found this thread from 2016 (!); please let me know if there are others. https://mail.python.org/archives/list/python-ideas@python.org/thread/SBM3LYE... There are several arguments for and against this feature in that thread. I'll try to summarize: Arguments for: * Natural extension to PEP 448 (it's mentioned as a variant within PEP 448) * Easy to implement: all that's needed in CPython is to *remove* some code blocking this. Arguments against: * Counterintuitive (to some) * Hard to teach * `[...x... for x in y]` is no longer morally equivalent to `answer = []; for x in y: answer.append(...x...)` (unless `list1.append(a, b)` were equivalent to `list1.extend([a, b])`) Above I've tried to counter the first two "against" arguments. Some counters to the third "against" argument: 1. `[*...x... for x in y]` is equivalent to `answer = []; for x in y: answer.extend(...x...)` (about as easy to teach, I'd say) 2. Maybe `list1.append(a, b)` should be equivalent to `list1.extend([a, b])`? It is in JavaScript (`Array.push`). And I don't see why one would expect it to append a tuple `(a, b)`; that's what `list1.append((a, b))` is for. I think the main argument against this is to avoid programming errors, which is fine, but I don't see why it should hold back an advanced feature involving both unpacking and comprehensions. Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/
Seems sensible to me. I’d write the equivalency as for x in y: answer.extend([…x…]) On Sat, Oct 16, 2021 at 07:11 Erik Demaine <edemaine@mit.edu> wrote:
Extended unpacking notation (* and **) from PEP 448 gives us great ways to concatenate a few iterables or dicts:
``` (*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts # roughly equivalent to dict1 | dict2 | dict3 thanks to PEP 584 ```
I propose (not for the first time) that similarly concatenating an unknown number of iterables or dicts should be possible via comprehensions:
``` (*it for it in its) # tuple with the concatenation of iterables in 'its' [*it for it in its] # list with the concatenation of iterables in 'its' {*it for it in its} # set with the union of iterables in 'its' {**d for d in dicts} # dict with the combination of dicts in 'dicts' ```
The above is my attempt to argue that the proposed notation is natural: `[*it for it in its]` is exactly analogous to `[*its[0], *its[1], ..., *its[len(its)-1]]`.
There are other ways to do this, of course:
``` [x for it in its for x in it] itertools.chain(*its) sum(it for it in its, []) functools.reduce(operator.concat, its, []) ```
But none are as concise and (to me, and hopefully others who understand * notation) as intuitive. For example, I recently wanted to write a recursion like so, which accumulated a set of results from within a tree structure:
``` def recurse(node): # base case omitted return {*recurse(child) for child in node.children} ```
In fact, I am teaching a class and just asked a question on a written exam for which several students wrote this exact code in their solution (which inspired writing this message). So I do think it's quite intuitive, even to those relatively new to Python.
Now, on to previous proposals. I found this thread from 2016 (!); please let me know if there are others.
https://mail.python.org/archives/list/python-ideas@python.org/thread/SBM3LYE...
There are several arguments for and against this feature in that thread. I'll try to summarize:
Arguments for:
* Natural extension to PEP 448 (it's mentioned as a variant within PEP 448)
* Easy to implement: all that's needed in CPython is to *remove* some code blocking this.
Arguments against:
* Counterintuitive (to some)
* Hard to teach
* `[...x... for x in y]` is no longer morally equivalent to `answer = []; for x in y: answer.append(...x...)` (unless `list1.append(a, b)` were equivalent to `list1.extend([a, b])`)
Above I've tried to counter the first two "against" arguments. Some counters to the third "against" argument:
1. `[*...x... for x in y]` is equivalent to `answer = []; for x in y: answer.extend(...x...)` (about as easy to teach, I'd say)
2. Maybe `list1.append(a, b)` should be equivalent to `list1.extend([a, b])`? It is in JavaScript (`Array.push`). And I don't see why one would expect it to append a tuple `(a, b)`; that's what `list1.append((a, b))` is for. I think the main argument against this is to avoid programming errors, which is fine, but I don't see why it should hold back an advanced feature involving both unpacking and comprehensions.
Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/7G732V... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido (mobile)
+1 I think this is a very sensible proposal and I encountered the use-cases you mentioned several times.
POn Sat, Oct 16, 2021, 10:10 AM Erik Demaine
(*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts
I'm +0 on the last three of these. But the first one is much more suggestive of a generator comprehension. I would want/expect it to be equivalent to itertools.chain(), not create a tuple. Moreover, it is an anti-pattern to create large and indefinite sized tuples, whereas such large collections as lists, sets, and dicts are common and useful.
On Sat, 16 Oct 2021, David Mertz, Ph.D. wrote:
On Sat, Oct 16, 2021, 10:10 AM Erik Demaine (*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts
I'm +0 on the last three of these.
But the first one is much more suggestive of a generator comprehension. I would want/expect it to be equivalent to itertools.chain(), not create a tuple.
I guess you were referring to `(*it for it in its)` (proposed notation) rather than `(*it1, *it2, *it3)` (which already exists and builds a tuple). Very good point! This is confusing. I could also read `(*it for it in its)` as wanting to build the following generator (or something like it): ``` def generate(): for it in its: yield from it ``` I guess the question is whether to define `(*it for it in its)` to mean tuple or generator comprehension or nothing at all. Tuples are nice because they mirror `(*it1, *it2, *it3)` but bad for the reasons you raise:
Moreover, it is an anti-pattern to create large and indefinite sized tuples, whereas such large collections as lists, sets, and dicts are common and useful.
I'd be inclined to not define `(*it for it in its)`, given the ambiguity. Assuming the support remains relatively unanimous for [*...], {*...}, and {**...} (thanks for all the quick replies!), I'll put together a PEP. On Sat, 16 Oct 2021, Guido van Rossum wrote:
Seems sensible to me. I’d write the equivalency as
for x in y: answer.extend([…x…])
Oh, nice! That indeed works in all cases. Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/
On Sat, Oct 16, 2021, 11:42 AM Erik Demaine
But the first one is much more suggestive of a generator comprehension. I would want/expect it to be equivalent to itertools.chain(), not create a tuple.
I guess you were referring to `(*it for it in its)` (proposed notation) rather than `(*it1, *it2, *it3)` (which already exists and builds a tuple).
Very good point! This is confusing. I could also read `(*it for it in its)` as wanting to build the following generator (or something like it):
Oops. Yes. I trimmed the wrong part from my phone. What you write!
On Sat, Oct 16, 2021 at 11:42:49AM -0400, Erik Demaine wrote:
I guess the question is whether to define `(*it for it in its)` to mean tuple or generator comprehension or nothing at all.
I don't see why that is even a question. We don't have tuple comprehensions and `(expr for x in items)` is always a generator, never a tuple. There's no ambiguity there. Why would allowing unpacking turn it into a tuple? It would be extremely surprising for `(*expr for x in items)` to return a tuple instead instead of a generator, or for it to be forbidden when unpacking versions of list/set/dict comprehensions are allowed. Remember that it is *commas*, not the brackets, that make a tuple (the empty tuple excepted). The brackets around `(expr for x in items)` doesn't make it a tuple any more than the brackets in f(arg) makes a tuple. [...]
I'd be inclined to not define `(*it for it in its)`, given the ambiguity.
The only tricky corner case is that generator comprehensions can forgo the surrounding brackets in the case of a function call: func( (expr for x in items) ) func( expr for x in items ) # we can leave out the brackets But with the unpacking operator, it is unclear whether the unpacking star applies to the entire generator or the inner expression: func(*expr for x in items) That could be read as either: it = (expr for x in items) func(*it) or this: it = (*expr for x in items) func(it) Of course we can disambiguate it with precedence rules, in the same way that there is not really any ambiguity in `-3**2` (for example). Is it minus (3 squared) or (minus 3) squared? The precedence rules are clear that it absolutely is the first, but people do tend to read it wrong. In this case, even if the precedence rules make the "unpacking generator comprehension inside a function call" case unambiguous, it *might* be clearer to require the extra brackets: func(*(expr for x in items)) func((*expr for x in items)) are then clearly different. (The non-unpacking case of course can remain as it is today.) But it would be quite surprising for this minor issue to lead to the major inconsistency of prohibiting unpacking inside generator comps when it is allowed in list, dict and set comps. -- Steve
On Sun, 17 Oct 2021, Steven D'Aprano wrote:
On Sat, Oct 16, 2021 at 11:42:49AM -0400, Erik Demaine wrote:
I guess the question is whether to define `(*it for it in its)` to mean tuple or generator comprehension or nothing at all.
I don't see why that is even a question. We don't have tuple comprehensions and `(expr for x in items)` is always a generator, never a tuple. There's no ambiguity there. Why would allowing unpacking turn it into a tuple?
Agreed. I got confused by the symmetry.
The only tricky corner case is that generator comprehensions can forgo the surrounding brackets in the case of a function call:
func( (expr for x in items) ) func( expr for x in items ) # we can leave out the brackets
But with the unpacking operator, it is unclear whether the unpacking star applies to the entire generator or the inner expression:
func(*expr for x in items)
That could be read as either:
it = (expr for x in items) func(*it)
or this:
it = (*expr for x in items) func(it)
Of course we can disambiguate it with precedence rules, [...]
I'd be inclined to go that way, as the latter seems like the only reasonable (to me) parse for that syntax. Indeed, that's how the current parser interprets this: ``` func(*expr for x in items) ^ SyntaxError: iterable unpacking cannot be used in comprehension ``` To get the former meaning, which is possible today, you already need parentheses, as in
func(*(expr for x in items))
But it would be quite surprising for this minor issue to lead to the major inconsistency of prohibiting unpacking inside generator comps when it is allowed in list, dict and set comps.
Good point. Now I'm much more inclined to define the generator expression `(*expr for x in items)`. Thanks for your input! On Sat, 16 Oct 2021, Serhiy Storchaka wrote:
It was considered and rejected in PEP 448. What was changed since? What new facts or arguments have emerged?
I need to read the original discussion more (e.g. https://mail.python.org/pipermail/python-dev/2015-February/138564.html), but you can see the summary of why it was removed here: https://www.python.org/dev/peps/pep-0448/#variations In particular, there was "limited support" before (and the generator ambiguity issue discussed above). I expect now that we've gotten to enjoy PEP 448 for 5 years, it's more "obvious" that this functionality is missing and useful. So far that seems true (all responses have been at least +0), but if anyone disagree, please say so. Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/
On 2021-10-16 20:31, Erik Demaine wrote:
On Sun, 17 Oct 2021, Steven D'Aprano wrote:
On Sat, Oct 16, 2021 at 11:42:49AM -0400, Erik Demaine wrote:
I guess the question is whether to define `(*it for it in its)` to mean tuple or generator comprehension or nothing at all.
I don't see why that is even a question. We don't have tuple comprehensions and `(expr for x in items)` is always a generator, never a tuple. There's no ambiguity there. Why would allowing unpacking turn it into a tuple?
Agreed. I got confused by the symmetry.
The only tricky corner case is that generator comprehensions can forgo the surrounding brackets in the case of a function call:
func( (expr for x in items) ) func( expr for x in items ) # we can leave out the brackets
But with the unpacking operator, it is unclear whether the unpacking star applies to the entire generator or the inner expression:
func(*expr for x in items)
That could be read as either:
it = (expr for x in items) func(*it)
or this:
it = (*expr for x in items) func(it)
Of course we can disambiguate it with precedence rules, [...]
I'd be inclined to go that way, as the latter seems like the only reasonable (to me) parse for that syntax. Indeed, that's how the current parser interprets this:
``` func(*expr for x in items) ^ SyntaxError: iterable unpacking cannot be used in comprehension ```
To get the former meaning, which is possible today, you already need parentheses, as in
func(*(expr for x in items))
But it would be quite surprising for this minor issue to lead to the major inconsistency of prohibiting unpacking inside generator comps when it is allowed in list, dict and set comps.
Good point. Now I'm much more inclined to define the generator expression `(*expr for x in items)`. Thanks for your input!
As we can already have: func(*a, *b) it's clear that * has a high precedence, so having: func(*expr for x in items) be equivalent to: func((*expr for x in items)) does makes sense. It would be passing a generator. On the other hand, wouldn't that make: [*s for s in ["abc", "def", "ghi"]] be equivalent to: [(*s for s in ["abc", "def", "ghi"])] ? It would be making a list that contains a generator. If: [*s for s in ["abc", "def", "ghi"]] should unpack, then shouldn't: func(*expr for x in items) also unpack?
On Sat, 16 Oct 2021, Serhiy Storchaka wrote:
It was considered and rejected in PEP 448. What was changed since? What new facts or arguments have emerged?
I need to read the original discussion more (e.g. https://mail.python.org/pipermail/python-dev/2015-February/138564.html), but you can see the summary of why it was removed here: https://www.python.org/dev/peps/pep-0448/#variations
In particular, there was "limited support" before (and the generator ambiguity issue discussed above). I expect now that we've gotten to enjoy PEP 448 for 5 years, it's more "obvious" that this functionality is missing and useful. So far that seems true (all responses have been at least +0), but if anyone disagree, please say so.
Erik
On Sat, 16 Oct 2021, Erik Demaine wrote:
Assuming the support remains relatively unanimous for [*...], {*...}, and {**...} (thanks for all the quick replies!), I'll put together a PEP.
As promised, I put together a pre-PEP (together with my friend and coteacher Adam Hartz, not currently subscribed, but I'll keep him aprised): https://github.com/edemaine/peps/blob/unpacking-comprehensions/pep-9999.rst For this to become an actual PEP, it needs a sponsor. If a core developer would be willing to be the sponsor for this, please let me know. (This is my first PEP, so if I'm going about this the wrong way, also let me know.) Meanwhile, I'd welcome any comments! In writing things up, I became convinced that generators should be supported, but arguments should not be supported; see the document for details why. Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/
I like this. I think explicitly discussing order of inclusion would be worthwhile. I know it's implied by the approximate equivalents, but actually stating it would improve the PEP, IMO. For example: nums = [(1, 2, 3), (1.0, 2.0, 3.0)] nset = {*n for n in nums} Does 'nset' wind up containing integers or floats? Is this a language guarantee? On Mon, Oct 25, 2021, 9:52 PM Erik Demaine <edemaine@mit.edu> wrote:
On Sat, 16 Oct 2021, Erik Demaine wrote:
Assuming the support remains relatively unanimous for [*...], {*...}, and {**...} (thanks for all the quick replies!), I'll put together a PEP.
As promised, I put together a pre-PEP (together with my friend and coteacher Adam Hartz, not currently subscribed, but I'll keep him aprised):
https://github.com/edemaine/peps/blob/unpacking-comprehensions/pep-9999.rst
For this to become an actual PEP, it needs a sponsor. If a core developer would be willing to be the sponsor for this, please let me know. (This is my first PEP, so if I'm going about this the wrong way, also let me know.)
Meanwhile, I'd welcome any comments! In writing things up, I became convinced that generators should be supported, but arguments should not be supported; see the document for details why.
Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/L6NZLE... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Oct 26, 2021 at 1:10 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
I like this. I think explicitly discussing order of inclusion would be worthwhile. I know it's implied by the approximate equivalents, but actually stating it would improve the PEP, IMO.
For example:
nums = [(1, 2, 3), (1.0, 2.0, 3.0)] nset = {*n for n in nums}
Does 'nset' wind up containing integers or floats? Is this a language guarantee?
Easy way to find out: take out the extra nesting level and try it.
nums = [1, 2, 3, 1.0, 2.0, 3.0] nset = {n for n in nums} nset {1, 2, 3}
The *n version would have the exact same behaviour, since it will see the elements in the exact same order. ChrisA
I do get what it does, but the phrase in the PEP feels like there is wiggle room: "The new notation listed above is effectively short-hand for the following existing notation." "Effectively" doesn't quite feel the same as "guaranteed exactly equivalent." On Mon, Oct 25, 2021, 10:22 PM Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Oct 26, 2021 at 1:10 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
I like this. I think explicitly discussing order of inclusion would be
worthwhile. I know it's implied by the approximate equivalents, but actually stating it would improve the PEP, IMO.
For example:
nums = [(1, 2, 3), (1.0, 2.0, 3.0)] nset = {*n for n in nums}
Does 'nset' wind up containing integers or floats? Is this a language
guarantee?
Easy way to find out: take out the extra nesting level and try it.
nums = [1, 2, 3, 1.0, 2.0, 3.0] nset = {n for n in nums} nset {1, 2, 3}
The *n version would have the exact same behaviour, since it will see the elements in the exact same order.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2QCOFW... Code of Conduct: http://python.org/psf/codeofconduct/
For the backwards compatibility section, it would be good to analyze how the change impacts error reporting. (1) is the suggested syntax currently a 'common error' that will become harder to detect once it's not a syntax error? (2) would adding this syntax impact the parser's ability to provide friendly error messages? (Pablo did a lot of work on error messages for 3.10, so check a current python version). I'm not saying I'm seeing an issue - just that these points need to be thought through and perhaps mentioned in the PEP. On Tuesday, October 26, 2021, 02:55:09 AM GMT+1, Erik Demaine <edemaine@mit.edu> wrote: On Sat, 16 Oct 2021, Erik Demaine wrote:
Assuming the support remains relatively unanimous for [*...], {*...}, and {**...} (thanks for all the quick replies!), I'll put together a PEP.
As promised, I put together a pre-PEP (together with my friend and coteacher Adam Hartz, not currently subscribed, but I'll keep him aprised): https://github.com/edemaine/peps/blob/unpacking-comprehensions/pep-9999.rst For this to become an actual PEP, it needs a sponsor. If a core developer would be willing to be the sponsor for this, please let me know. (This is my first PEP, so if I'm going about this the wrong way, also let me know.) Meanwhile, I'd welcome any comments! In writing things up, I became convinced that generators should be supported, but arguments should not be supported; see the document for details why. Erik -- Erik Demaine | edemaine@mit.edu | http://erikdemaine.org/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/L6NZLE... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Oct 16, 2021 at 10:56:07AM -0400, David Mertz, Ph.D. wrote:
POn Sat, Oct 16, 2021, 10:10 AM Erik Demaine
(*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts
I'm +0 on the last three of these.
But the first one is much more suggestive of a generator comprehension. I would want/expect it to be equivalent to itertools.chain(), not create a tuple.
Too late. >>> (*"abc", *"def", *"ghi") ('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i') I think you may have missed that those examples of unpacking are existing functionality, not the proposal. The proposal is to allow unpacking in *comprehensions*. # not currently permitted [*s for s in ["abc", "def", "ghi"]]
Moreover, it is an anti-pattern to create large and indefinite sized tuples,
Is it? In what way? As far as I understand it, a large tuple is more memory efficient than a large list (it has no over-allocated space). The only issue that I know of is that if the length of the tuple is not known ahead of time, the interpreter may have to grow, or shrink, the underlying array before completing the tuple construction. -- Steve
On Sat, Oct 16, 2021, 12:08 PM Steven D'Aprano
unpacking in *comprehensions*.
# not currently permitted [*s for s in ["abc", "def", "ghi"]]
Moreover, it is an anti-pattern to create large and indefinite sized tuples,
Is it? In what way?
As mentioned, I clipped the wrong part. That said, even `(*it1, *it2, *it3)` feels like an anti-pattern in most cases, although syntactically defined. A hypothetical "tuple comprehension" seems that much worse. I'm not making any claims about tuple creation speed vs. list creation on microbenchmarks. It might we'll be 10% faster to create a million item tuple than a million item list. Or maybe the opposite, I don't know. Rather, I'm concerned with readability and programmer expectations. Tuples are best used as "records" of heterogeneous but structured data. This is what makes namedtuples such an elegant extension. When I see a collection of tuples, I generally expect each to have the same "shape", such as a string at index 0, a float at index 1, and an int at index 2. If those positions have attribute names, so much the better. In contrast, lists (and iterators) I expect to contain many things that are "the same" in a duck-type way. I usually want to loop through them and perform the same operation on each (maybe with some switches, but in the same block). Having a million such similar items is commonplace. Having a million *fields* is non-existent.
On Sat, Oct 16, 2021 at 12:21 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote:
I'm not making any claims about tuple creation speed vs. list creation on microbenchmarks. It might we'll be 10% faster to create a million item tuple than a million item list. Or maybe the opposite, I don't know.
The thing to know about this (for everyone) is that creating a new tuple of known size requires a single allocation while creating a new list always requires two allocations. Where this really makes a difference is not when you have a million items but when you create thousands or millions of objects of modest size -- for lists it takes twice as many allocations. If the number of items is not known there are different strategies depending on what API you use; interestingly, the strategy deployed by PySequence_Tuple() has a comment claiming it "can grow a bit faster than for lists because unlike lists the over-allocation isn't permanent." Finally, the bytecode generated for (*a, *b) creates a list first and then turns that into a tuple (which will be allocated with the right size since it's known at that point). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
Even though I am just a regular user of Python, +10 for me. It makes sense and I think I can easily teach it to people. However, (*expr for expr in its) should be generator expression and should be allowed to have nice mirror to all the un-packing comprehensions. It would be equivalent to: def gen(its): for expr in its: for x in expr: yield x Abdulla Sent from my iPhone
On 17 Oct 2021, at 12:28 AM, Guido van Rossum <guido@python.org> wrote:
On Sat, Oct 16, 2021 at 12:21 PM David Mertz, Ph.D. <david.mertz@gmail.com> wrote: I'm not making any claims about tuple creation speed vs. list creation on microbenchmarks. It might we'll be 10% faster to create a million item tuple than a million item list. Or maybe the opposite, I don't know.
The thing to know about this (for everyone) is that creating a new tuple of known size requires a single allocation while creating a new list always requires two allocations. Where this really makes a difference is not when you have a million items but when you create thousands or millions of objects of modest size -- for lists it takes twice as many allocations.
If the number of items is not known there are different strategies depending on what API you use; interestingly, the strategy deployed by PySequence_Tuple() has a comment claiming it "can grow a bit faster than for lists because unlike lists the over-allocation isn't permanent."
Finally, the bytecode generated for (*a, *b) creates a list first and then turns that into a tuple (which will be allocated with the right size since it's known at that point).
-- --Guido van Rossum (python.org/~guido) Pronouns: he/him (why is my pronoun here?) _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MNQBV6... Code of Conduct: http://python.org/psf/codeofconduct/
We don't have tuple comprehensions, and this proposal isn't to add them, so this post is edging into off-topic for the thread so feel free to skip it. On Sat, Oct 16, 2021 at 03:18:20PM -0400, David Mertz, Ph.D. wrote:
Rather, I'm concerned with readability and programmer expectations. Tuples are best used as "records" of heterogeneous but structured data. This is what makes namedtuples such an elegant extension. When I see a collection of tuples, I generally expect each to have the same "shape", such as a string at index 0, a float at index 1, and an int at index 2. If those positions have attribute names, so much the better.
In contrast, lists (and iterators) I expect to contain many things that are "the same" in a duck-type way. I usually want to loop through them and perform the same operation on each (maybe with some switches, but in the same block).
Right-o, the old "heterogeneous tuples versus homogeneous lists" distinction, I remember that from the old 1.5 days. I haven't heard it mentioned for a long time! That certainly remains a reasonable guideline to make, albeit less so today: - for structured data, today we have other options such as named tuples (rather than anonymous builtin tuples), SimpleNamespace and Dataclasses, which may be considered closer matches to "record" or "struct" types from other languages; - lists and tuples now share the same Sequence interface; - which now justifies treating tuples as "frozen lists", or lists as "mutable tuples". But I note that this distinction doesn't make unpacking less useful. Here's an example. Suppose I have two data types represented by coordinates, as tuples: * a point is a 2-tuple (x, y) * a rectangle is a 4-tuple (left, top, right, bottom). Then I can construct a rectangle from the coordinates of the top-left and bottom-right points: rect = (*topleft, *bottomright)
Having a million such similar items is commonplace. Having a million *fields* is non-existent.
I will grant you that having a million items in a tuple is *unusual*, but it takes more than that to make something an anti-pattern. To be an anti-pattern, something must be commonly used as a solution to some problem, while actually being ineffective or counter-productive. https://en.wikipedia.org/wiki/Anti-pattern -- Steve
On Sun, Oct 17, 2021 at 4:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Right-o, the old "heterogeneous tuples versus homogeneous lists" distinction, I remember that from the old 1.5 days. I haven't heard it mentioned for a long time!
You must not have looked at type annotations then. :-) Type annotations legitimize this, by favoring the syntax list[int] for a homogeneous list versus tuple[int, str, bool] for a heterogeneous tuple. (There's a syntax for a homogeneous tuple too, but it's intentionally slightly longer.) So this rule of thumb is definitely not dead (as you seemed to imply). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-change-the-world/>
On Sun, Oct 17, 2021 at 05:02:23PM -0700, Guido van Rossum wrote:
On Sun, Oct 17, 2021 at 4:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Right-o, the old "heterogeneous tuples versus homogeneous lists" distinction, I remember that from the old 1.5 days. I haven't heard it mentioned for a long time!
You must not have looked at type annotations then. :-)
Not closely enough to draw the connection unprompted, no. Thanks for the reminder.
Type annotations legitimize this, by favoring the syntax list[int] for a homogeneous list versus tuple[int, str, bool] for a heterogeneous tuple. (There's a syntax for a homogeneous tuple too, but it's intentionally slightly longer.)
This makes sense.
So this rule of thumb is definitely not dead (as you seemed to imply).
What I actually said: "That certainly remains a reasonable guideline to make, albeit less so today" (and then gave reasons why less so). I explicitly agreed that it still is a good distinction to make. It is a particularly egregiously unfair misrepresentation to say I "seemed to imply" otherwise. That's not cool. Another factor in favour of the heterogeneous vs homogeneous distinction is the practical difficulty in working with complex data structures made from tuples, due to their immutability. I've just spent some time using an API that represents data as nested tuples, and due to their immutability they can be a pain to assemble. Nested lists are much easier. It's okay if you know your data up front and can write it as a nested tuple display, but if you need to assemble it dynamically, it is not fun. Would not recommend. Fortunately it's my own API and so I can take a flame thrower to it and replace it with something better :-) -- Steve
+1 from me, I've actually had several situations where I was in need of such notation. I also think that the proposed syntax in intuitive enough so that it should not cause any confusion. On Wed, Oct 20, 2021 at 2:08 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Oct 17, 2021 at 05:02:23PM -0700, Guido van Rossum wrote:
On Sun, Oct 17, 2021 at 4:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
Right-o, the old "heterogeneous tuples versus homogeneous lists" distinction, I remember that from the old 1.5 days. I haven't heard it mentioned for a long time!
You must not have looked at type annotations then. :-)
Not closely enough to draw the connection unprompted, no. Thanks for the reminder.
Type annotations legitimize this, by favoring the syntax list[int] for a homogeneous list versus tuple[int, str, bool] for a heterogeneous tuple. (There's a syntax for a homogeneous tuple too, but it's intentionally slightly longer.)
This makes sense.
So this rule of thumb is definitely not dead (as you seemed to imply).
What I actually said:
"That certainly remains a reasonable guideline to make, albeit less so today"
(and then gave reasons why less so).
I explicitly agreed that it still is a good distinction to make. It is a particularly egregiously unfair misrepresentation to say I "seemed to imply" otherwise. That's not cool.
Another factor in favour of the heterogeneous vs homogeneous distinction is the practical difficulty in working with complex data structures made from tuples, due to their immutability. I've just spent some time using an API that represents data as nested tuples, and due to their immutability they can be a pain to assemble. Nested lists are much easier.
It's okay if you know your data up front and can write it as a nested tuple display, but if you need to assemble it dynamically, it is not fun. Would not recommend.
Fortunately it's my own API and so I can take a flame thrower to it and replace it with something better :-)
-- Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/M4DRP2... Code of Conduct: http://python.org/psf/codeofconduct/
Erik Demaine writes:
I propose (not for the first time) that similarly concatenating an unknown number of iterables or dicts should be possible via comprehensions:
``` (*it for it in its) # tuple with the concatenation of iterables in 'its' [*it for it in its] # list with the concatenation of iterables in 'its' {*it for it in its} # set with the union of iterables in 'its' {**d for d in dicts} # dict with the combination of dicts in 'dicts' ```
I don't have a problem with this, although that's moot if Guido doesn't.
There are other ways to do this, of course:
``` [x for it in its for x in it] itertools.chain(*its) sum(it for it in its, []) functools.reduce(operator.concat, its, []) ```
The last three are more or less obscure, and the sum version looks potentially very inefficient. Nested comprehensions are quite awkward because of the way both the occurances of 'x' and the occurances of 'it' are separated (and have to be). Despite the redundancy the suggestion seems plausible. There was a thread during the summer (sorry, no reference, it's bedtime) about something like container[*it] which uncovered some subtleties about implementing unpacking notation. I don't recall the details, except that it had to do with the fact that in indexing the indicies are passed as a tuple (even though there are no stdlib types that take multiple indicies). I think that means that it's not relevant to your proposal, but you might want to check, especially for tuple comprehensions like (*it for it in its).
2. Maybe `list1.append(a, b)` should be equivalent to `list1.extend([a, b])`?
[*it for it in its] has much greater appeal than multiargument .append, which looks much like .extend, and .extend already exists. I think equivalencing *it and .extend(it) makes much more sense. Steve
16.10.21 17:07, Erik Demaine пише:
Extended unpacking notation (* and **) from PEP 448 gives us great ways to concatenate a few iterables or dicts:
``` (*it1, *it2, *it3) # tuple with the concatenation of three iterables [*it1, *it2, *it3] # list with the concatenation of three iterables {*it1, *it2, *it3} # set with the union of three iterables {**dict1, **dict2, **dict3} # dict with the combination of three dicts # roughly equivalent to dict1 | dict2 | dict3 thanks to PEP 584 ```
I propose (not for the first time) that similarly concatenating an unknown number of iterables or dicts should be possible via comprehensions:
``` (*it for it in its) # tuple with the concatenation of iterables in 'its' [*it for it in its] # list with the concatenation of iterables in 'its' {*it for it in its} # set with the union of iterables in 'its' {**d for d in dicts} # dict with the combination of dicts in 'dicts' ```
It was considered and rejected in PEP 448. What was changed since? What new facts or arguments have emerged?
I propose (not for the first time) that similarly concatenating an unknown number of iterables or dicts should be possible via comprehensions:
I'm really happy to see this feature proposal come up again! Every time, it seems this feature is even more intuitive to even more people, and I'm excited to see so many +1s compared to the -1s. I hope it will be accepted, if not this time, then one day :) It was considered and rejected in PEP 448. What was changed since? What new
facts or arguments have emerged?
Just for the record, both Joshua and I worked on implementing PEP 448. We both wanted this feature because we felt it was consistent. However, we wanted to maximize the probability that PEP 448 was accepted, so we chose to defer this feature. Best, Neil
16.10.21 17:07, Erik Demaine пише:
(*it for it in its) # tuple with the concatenation of iterables in 'its'
As others already have said, it should evaluate to a generator, not to a tuple. But other question is occurred now. Should it be equivalent to def gen(its): for it in its: for x in it: yield x or to def gen(its): for it in its: yield from it ? There is a subtle difference between these codes.
We can already easily simulate your first alternative in a generator comprehension: (x for it in its for x in it) # equivalent to def gen(its): for it in its: for x in it: yield x so anyone who wants that behaviour can easily get it. So unpacking in a comprehension should provide the second alternative: (*it for it in its) # equivalent to def gen(its): for it in its: yield from it As you say, the difference is subtle, and usually not important, so most people will not care and will use whatever is easier to type :-) (I think the difference has to do with sending values into the generator, throwing and catching exceptions, but I can't think of a simple example where it would make a difference.) -- Steve
On Sun, Oct 17, 2021 at 8:26 PM Steven D'Aprano <steve@pearwood.info> wrote:
(I think the difference has to do with sending values into the generator, throwing and catching exceptions, but I can't think of a simple example where it would make a difference.)
Yeah mainly. And genexps don't usually do that. So it needs to be defined, but it's going to be irrelevant to most normal cases and doesn't need to be bikeshedded. (If someone disagrees and we debate whether something is worth bikeshedding, is that metabikeshedding?) Chrisa
On Oct 17, 2021, at 3:40 AM, Serhiy Storchaka <storchaka@gmail.com> wrote:
16.10.21 17:07, Erik Demaine пише:
(*it for it in its) # tuple with the concatenation of iterables in 'its'
As others already have said, it should evaluate to a generator, not to a tuple.
But other question is occurred now. Should it be equivalent to
def gen(its): for it in its: for x in it: yield x
or to
def gen(its): for it in its: yield from it
? There is a subtle difference between these codes.
Serhiy: could you explain the difference? Eric
17.10.21 16:08, Eric V. Smith пише:
Serhiy: could you explain the difference?
The difference between `for x in it: yield x` and `yield from it` is than in the latter case any values passed in with send() and any exceptions passed in with throw() are passed to the underlying iterator if it has the appropriate methods. See https://www.python.org/dev/peps/pep-0380/ .
+1 from me too. I just had the case yesterday of having to chain a bunch of lists and I naturally wrote it as [*lst for lst in lst_of_lsts] only to see my IDE complain :) I've known about itertools.chain() for a while, too. Yet, every time I have to chain iterables like this, for some reason, maybe because it feels so natural, this proposed syntax is always my first go-to.
participants (15)
-
Abdulla Al Kathiri
-
Chris Angelico
-
David Mertz, Ph.D.
-
Eric V. Smith
-
Erik Demaine
-
Guido van Rossum
-
Irit Katriel
-
MRAB
-
Neil Girdhar
-
Piotr Waszkiewicz
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano
-
thomas.d.mckay@gmail.com
-
Valentin Berlier