
Hi all, I came across the following performance issue with the sum function while summing lists: https://bugs.python.org/issue18305 It's been discussed previously in this list and other issues. The rationale in that ticket makes the case that performance won't be fixed because sum() should not be treated as a generic way to concatenate sequences (and there is documentation saying this). Having non-linear complexity is not a suitable way to discourage this behaviour though. Non-linearity is hard to detect in testing and development, so is more likely to lead to performance bugs in libraries and production, than to discourage use from the outset. Moreover, it is confusing to end users to have a feature while discouraging its use through a somewhat passive-aggressive runtime. New users, and experienced users who do not pour through the documentation will use sum because it supports this API, and one would assume that's intentional. On the other hand, the behaviour for strings is a far superior user-experience. When attempting to use sum() on strings, users are presented with a TypeError and a handy tip (use "".join(seq) instead). It is clear to users that they are attempting an anti-pattern and how to correct it. So I propose moving (eventually) to the same system with lists as for strings. Or else removing the performance trip-hazard from the sum function with lists. Best regards, Oliver

On Wed, Jun 16, 2021 at 02:50:06PM +0000, Oliver Margetts wrote:
Did you actually use sum to concatenate lists in real code, or is this a theoretical issue? What circumstances do you have where you want to concatenate a large number of lists and thought that sum was an appropriate way to do it? [...]
Having non-linear complexity is not a suitable way to discourage this behaviour though.
We didn't put non-linear complexity in as a way to discourage the use of sum. The non-linear complexity comes about as a natural consequence of the behaviour of lists, it is not a "passive-aggressive runtime". There is no practical way for us to detect ahead of time all imaginable types where repeated addition has non-linear behaviour. And although sum was intended for use only with numbers, there is no *hard* rule in Python that people can't use functions for unintended purposes that make sense to them. It may even be that there are people who consider the convenience of sum worth it even for lists, since quadratic runtime behaviour is still fast for small enough N. I sometimes use it myself, in the REPL, to merge a handful of lists. If it takes 100 ms instead of a microsecond, I don't even notice. The performance error for strings should be considered an anomaly, not a feature to be extended to anything that could be used, or misused, with non-linear behaviour. At the very least, we would probably need to see evidence that this issue (poor performance) is a widespread problem before breaking code which works fine for small N. -- Steve

I’m pretty sure that using sum with strings was a real issue in real code before it was disallowed. But the irony is that strings in the cPython interpreter have an optimization that makes it actually work fine :-( I’d rather remove the error for strings than add more Type limitations. -CHB At the very least, we would probably need to see
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Thu, Jun 17, 2021 at 9:04 AM Christopher Barker <pythonchb@gmail.com> wrote:
That *can* make it work. It doesn't always. There are limitations on the optimization.
I’d rather remove the error for strings than add more Type limitations.
I'm not sure why it doesn't special-case it to "".join() instead of erroring out, but presumably there's a good reason. Given that it can't easily be done in a cross-interpreter efficient way, it's better to have people do things in the reliable way rather than depend on a CPython implementation detail. For instance, this code will probably work in CPython: def modify_file(fn): data = open(fn).read() mutate(data) open(fn, "w").write(data) But we don't do that, because it won't be reliable on all interpreters. Are *all* Python interpreters going to be required to optimize strings the way CPython does? If not, it's better to not encourage code to rely on it. ChrisA

On Thu, Jun 17, 2021 at 9:46 AM Oliver Margetts <oliver.margetts@gmail.com> wrote:
I'm not sure why it doesn't special-case it to "".join() One reason might be because you'd have to read the entire iterator to know if it really was only strings. So there are concerns with generators which complicate things a fair bit
Perhaps, but I'm not sure what's actually being verified here. The check is applied to the start parameter. (If you don't set start, you'll get a TypeError for trying to add an int and a str.) In fact, if you disrupt that check, sum() is quite happy to add strings together - with potentially quadratic runtime:
Either way, if there's a non-str part way through, it'll bomb when it reaches that. I've no idea what the reason is for not bouncing it straight into join(), but I'm sure there must be one. ChrisA

On 2021-06-17 00:57, Chris Angelico wrote:
I wonder whether we could introduce a new dunder __sum__ for summing sequences. It would call type(start).__sum__ with the start value and the sequence. str.__sum__ would use str.join and list.__sum__ would use list.extend. The fallback would be to do what it does currently.

On Wed, Jun 16, 2021 at 04:01:24PM -0700, Christopher Barker wrote:
I’m pretty sure that using sum with strings was a real issue in real code before it was disallowed.
But not with the sum() function, which has prohibited strings from it's introduction in 2.3 :-) https://docs.python.org/release/2.3/lib/built-in-funcs.html You are right that the reason the restriction was baked in was because of real issues. -- Steve

I came across this just processing JSON data - flattening a list of 'deals' into a list of individual 'trades'. Flattening nested lists is not uncommon. And yes sum seemed the natural way to do it at the time. Yes, you're right some people might find it convenient and use it in the REPL. It just seems like a decision has been made that that's the wrong way to do it. Hence no fast paths or optimisations. As others have mentioned, you -could- go the opposite way and optimise for some builtins - at the expense of surprise for user-defined types. I think it would be good to make it harder for people to get it wrong. What kind of evidence are you taking about here? There are several bug reports. Code audits? Tricky to obtain, but I could try if this is of genuine interest. On Wed, 16 Jun 2021, 11:16 pm Steven D'Aprano, <steve@pearwood.info> wrote:

On Wed, Jun 16, 2021 at 7:35 PM Oliver Margetts <oliver.margetts@gmail.com> wrote:
I remember the discussion in 2013 about the quadratic performance of summing strings. There were proposals for optimization of the string case, but they amounted to using the "fast path" sometimes present in `.__iadd__()` take over the `.__add__()` operation. In fact, I gave keynotes at PyCon-UK 2013, and PyCon-ZA 2014 on exactly this discussion (but really more about what it shows about Python design: https://gnosis.cx/pycon-za-2014/Keynote-Ideas.pdf). Even though I used the `sum(list_of_lists)` example in my presentations, I don't think I have EVER encountered it in the wild. I'm sympathetic to raising an exception on `sum(list_of_lists)` similar to `sum(list_of_strings)`. But what exactly is the recommended substitute? We have this: list(chain.from_iterable(list_of_lists)) Which is pretty nice, and what I'd probably do, but it DOES require importing from itertools to use it. That feels like a drawback, at least a minor one. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Wed, Jun 16, 2021 at 8:04 PM David Mertz <mertz@gnosis.cx> wrote:
IIUC, the idea behind raising, rather than optimizing, is that there is a "better" way to do it, and it's a good lesson. After all, sum() aside, we also want to discourage folks from doing a lot of string concatenation in a loop as well. I'm sympathetic to raising an exception on `sum(list_of_lists)` similar to
I agree -- str.join() is built in and really idiomatic Python -- itertools.chain is, I suppose idiomatic Python, but there's the import, a lot of extra typing, and a substantial cognitive load to figure it out the first (few) times you use it. (I don't think I ever have...) If, as suggested, flattening a list of lists is a common operation, a nice clean and efficient built in way to do it would be reasonable. Heck, you could make it a list method :-) If you read the BPO the OP linked, that was a suggested patch to optimize sum(list_of_lists) -- I'm not sure that's such a bad idea after all. -CHB NOTE: One reason I've never had to flatten a list of lists is because I'm a heavy numpy user -- I'd be more likely to have an ndarray and flatten that -- which is very easy and efficient :-) -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Thu, Jun 17, 2021, 1:14 AM Christopher Barker
The proposal was to drop in .__iadd__() for .__add__(), wasn't it? As a heavy NumPy user, you know those sometimes have different semantics. I actually showed that in my 7 year old talk as one argument against it. More-itertools has flatten(): https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.flat.... That seems better than a method specific to lists.


On Wed, Jun 16, 2021 at 10:24 PM David Mertz <mertz@gnosis.cx> wrote:
well, yes, but the numpy example was brought up -- and a patch that addressed that was offered. I think it's clearly possible to optimize certain types -- and maybe all Sequence container types. The question is whether that's a good idea. The fact that it was decided to raise for strings, when an optimization could have been added answers that question. Despite my personal opinion, I think the only options are to raise for more types, with a helpful message, or just leave it alone. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Jun 16, 2021 at 10:13:40PM -0700, Christopher Barker wrote:
People have been talking about a flatten builtin or list method since Python 1.5 days. Perhaps it is time to do a PEP? Here is one example implementation from eleven years ago: https://code.activestate.com/recipes/577250-flatten-a-list/ Start designing the API for flatten and one quickly discovers that the problem isn't how to implement it, but the number of options that people may, or may not, want. What counts as an atomic type? Flatten all the way down or just one level? Return a list or an iterator? Inplace or a new list? Etc. -- Steve

Arbitrary and complex nested structures do seem like they would require a complex solution. OTOH `more_itertools.flatten` seems ergonomic - and it is very simple, just a wrapper around `itertools.chain.from_iterable` with a memorable name. If that's the preferred solution, nudging users in the direction of those functions, or a comprehension, would be useful (thinking here there should be one-- and preferably only one --obvious way to do it). Maybe deprecation is extreme, but what about a warning ("looks like you're trying to X maybe you should Y")? Adding hints might become annoying (think "Clippy the office assistant"), a solution with added complexity would be to warn when you get to the 100th list/tuple - so you can still sum a few lists in the REPL. On Thu, 17 Jun 2021 at 10:46, Steven D'Aprano <steve@pearwood.info> wrote:

17.06.21 06:03, David Mertz пише:
And it is equivalent to pure Python code [x for chunk in list_of_lists for x in chunk] It has linear complexity and is only by a constant factor slower because it iterates loops in bytecode instead of the C code. It would be possible to make the compiler recognizing such pattern and generating more efficient bytecode (LIST_EXTEND instead of an internal loop with LIST_APPEND), but I am not sure that this case is common enough to complicate the compiler.

On Thu, Jun 17, 2021 at 12:37 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
And it is equivalent to pure Python code
[x for chunk in list_of_lists for x in chunk]
Okay, slightly off-topic, but can we *please* allow [*chunk for chunk in list_of_lists] some day. I think it was left out because some discussion concluded it would be too confusing, which is ridiculous. I assumed it would work and was confused to find that it didn't. It's blatantly inconsistent. It's not even the performance I care about, it's the unreadability of having an extra "for i_have_to_think_of_yet_another_variable_name in whatever" at the end of the list comprehension (at maximum distance from where the variable is used). I've wished for this feature ridiculously many times.
In my personal experience it's very common.

On Thu, Jun 17, 2021, 5:24 PM Ben Rudiak-Gould
Okay, slightly off-topic, but can we *please* allow
[*chunk for chunk in list_of_lists]
It is completely non-obvious to me what that would even MEAN. I cannot derive anything obvious from other uses of *. If I had to guess, I'd think that this was tuple unpacking, and the result would be a list of tuples. However, apparently in your mind there is some way to read this as "flatten." I don't know how to get there mentally (other than just memorizing a weird behavior).

On Thu, Jun 17, 2021 at 5:52 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
It is completely non-obvious to me what that would even MEAN. I cannot
derive anything obvious from other uses of *.
I guess I can kinda see that analogy. But a loop is an assignment, and this doesn't mean "flatten": a, b, *c = 1, 2, 3, 4, 5, 6 Moreover, in a regular loop:
Neither of my examples, I admit, are quite the same as the comprehension. Nonetheless, I definitely don't think the proposed syntax is "the one obvious way to do it." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Thu, Jun 17, 2021 at 02:51:44PM -0700, Jelle Zijlstra wrote:
Oh, that's clever, and I might even have thought of that myself if it wasn't described as "off-topic" *wry smile* So in a generator comprehension, what would it do? (*chunk for chunk in values) # equivalent to... ? # (item for chunk in values for item in chunk) perhaps? I guess we could allow an equivalent in dict comprehensions: {**chunk for chunk in values} for unpacking nested dicts or (key,value) pairs. Clever... or maybe *too* clever? -- Steve

David Mertz writes:
Pretty clearly the * means to unpack the chunk, and substitute extend for append, no? result = [] for chunk in list_of_lists: result.extend(chunk) You write elsewhere that a loop is an (iterated) assignment (to chunk), but it's *not* an assignment to result, it's an in-place modification. It did take me a bit of thought to come to Ben's intended interpretation, but I think if you explain it this way, it becomes obvious to the non-Dutch. I'll grant that this doesn't *really* work for genexps, but I think of those as very lazy, very forgetful lists, so WFM. Serhiy writes that a reason for not allowing this is that you'd want to allow [x, y for x in l], splicing the x, y values into the result list. That doesn't make sense to me, for two reasons. x, y already has a meaning in that context, and the result should be a list of pairs, each with y as the second element. On the other hand, the r-value *(x, y) requires a context into which it can be spliced: w, z = *(x, y) is a syntax error. As the result element in a comprehension, the context is clearly the comprehension being built. Perhaps Serhiy meant a similar but different syntax that's problematic? But if not, I kinda like this. Steve

On Fri, Jun 18, 2021 at 10:10:33PM +0900, Stephen J. Turnbull wrote:
So perhaps not actually that clear then? *wink* In hindsight, after realising what Ben's intention was (at least we assume that is what he was thinking of) it does seem like a reasonable way of unpacking multiple items into a list or set comprehension at once. I don't think that this is something that we could say was "intuitively obvious" to somebody who is neither Dutch not experienced with Python, but I think it is retroactively obvious once explained. Unpacking comprehensions: [*item for item in sequence] {*item for item in sequence] are syntactic sugar for something roughly equivalent to: result = [] # or set() for item in sequence: for tmp in item: result.append(tmp) # or result.add The exact implementation could vary, e.g. by using `extend` for list comprehensions. That suggests a meaning for double-star unpacking in a dict comprehension (single-star unpacking would make it a set). {**item for item in seq} # equivalent to: result = {} for item in seq: result.update(item) And these suggest meanings for the equivalent in generator expressions: (*item for item in sequence) # equivalent to: (tmp for item in sequence for tmp in item) The double-star version would follow similar rules to dict.update: (**item for item in sequence) # equivalent to: def gen(): for item in sequence: if hasattr(item, 'keys'): for k in item: yield (k, item[k]) else: for k, v in item: yield (k, v) Works for me. -- Steve

On Thu, Jun 17, 2021 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:
Okay, slightly off-topic, but can we *please* allow
[*chunk for chunk in list_of_lists]
What would that do? The only thing I can guess it would do is the equivalent of: result = [] for chunk in list_of_lists: result.append(*chunk) which is a long and obfuscated way of saying `raise TypeError` :-) Well, there is this: result = [] for chunk in list_of_lists: *temp, = chunk result.append(temp) which would make it an obfuscated way to spell `list(chunk)`.
Blatently inconsistent with what? I have no idea what you are contrasting the non-support of sequence unpacking with. It's not this: >>> chunk = (1, 2, 3) >>> t = *chunk File "<stdin>", line 1 SyntaxError: can't use starred expression here but I can't tell what you're thinking of. Some context with sequence unpacking that is "slightly off-topic". -- Steve

On Thu, Jun 17, 2021 at 3:09 PM Steven D'Aprano <steve@pearwood.info> wrote:
The difference between chunk and *chunk in the expression of a list comprehension would be the same as the difference between them in the expressions of a starred_list. The only thing I can guess it would do is the
It would be reasonable to allow list.append to take any number of arguments to be appended to the list, as though its definition was def append(self, *args): self.extend(args) If it did, then that translation would work and do the right thing. Some similar functions do accept multiple arguments as a convenience, though it's not very consistent: myset.add(1, 2) # no myset.update([1, 2], [3, 4]) # ok mylist.append(1, 2) # no mylist.extend([1, 2], [3, 4]) # no mydict.update({'a': 1}, b=2, c=3) # ok mydict.update({'a': 1}, {'b': 2}, c=3) # no Well, there is this:
Unpacking would be useless in every context if you interpreted it like that.

18.06.21 00:22, Ben Rudiak-Gould пише:
It was originally proposed in PEP 448 (Additional Unpacking Generalizations) but was excluded after discussing. If we allow [*chunk for chunk in list_of_lists] we should allow also [x, y for x in a] which would be equivalent to [a[0], y, a[1], y, a[2], y, ...]

On Fri, Jun 18, 2021 at 07:38:49AM -0700, Guido van Rossum wrote:
We already have a rule to disambiguate generator comprehensions: they must always be parenthesized unless they are already parenthised: g = (y for y in a) # parens required t = 999, (y for y in a) # parens required func((y for y in a)) # inner parens optional
That’s a good enough reason for me to also disallow *chunks.
That's an odd way to look at it. We must disallow an unambiguous syntax because a completely different syntax would have been ambiguous if we didn't already have a rule in place that disambiguates it. -- Steve

On Fri, Jun 18, 2021 at 8:40 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, that's exactly what I was referring to.
Maybe, but the two are not unrelated. In other contexts, when we accept "*chunk", and 'chunk' equals "1, 2", we also accept "1, 2" in the original position, and it means the same thing. This is useful as an explanation of the semantics of the unary '*' operator[1]. For example # Given: chunk = 1, 2 # Equivalent: f(*chunk) f(1, 2) # Equivalent: [*chunk] [1, 2] So then if we were to allow this: [*chunk for chunk in ...] we ought to consider this equivalent: [1, 2 for chunk in ...] (Note there's nothing that says the expressions to the left of 'for' need to involve the for-control variable 'chunk'. :-) Now, this shouldn't be considered an airtight argument against [*chunk for ...], but it does show that there's no straightforward explanation of its meaning through equivalence (like the OP seemed to think), and I think this is what Serhiy was also getting at in his post. __________ [1] Does the unary star operator have a name when used like this in Python? In JavaScript the equivalent syntax ("...chunk", where the "..." are a literal ellipsis) is called "spread". We could borrow this term. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Fri, Jun 18, 2021 at 09:33:49PM -0700, Guido van Rossum wrote:
Indeed. I was initially confused by what Ben thought was a simple and obvious connection between star unpacking in some other contexts and his suggestion for comprehensions. The analogy with `[*a]` never crossed my mind, and I don't think that we should look at this as literally the application of sequence unpacking in a comprehension, for reasons I gave in my earlier post. But having it explained to me, I think that treating this as an analogy rather than literal unpacking works. We already give unary star and double star a number of meanings, not all of which are related: - import wildcard; - capture positional and keyword parameters `def func(*args, **kw)` - sequence and keyword unpacking in function calls; - sequence capture in assignment targets `head, *a, tail = items` - sequence unpacking in list etc displays; Have I missed any? We could define *star comprehensions* as syntactic sugar for nested comprehensions, to aid in flattening nested sequences and mappings. [*expression for name in sequence if condition] results in this: result = [] for name in sequence: if condition: for tmp in expression: result.append(tmp) return result I haven't thought deeply into this, but I think that if the starred expression is anything but a simple name, it may require parentheses? *name # okay *(name.attr or []) # everything else needs parens Alternatively, we could just do something that people have been asking about since Python 1.5 and provide a flatten builtin or list method :-) -- Steve

On Fri, Jun 18, 2021 at 10:15 PM Steven D'Aprano <steve@pearwood.info> wrote:
What you seem to be (intentionally) missing is that all but the import wildcard *are* closely related, even though they are specified separately in the syntax. (Saying that they are unrelated would be like saying that the name occurring after a 'def' keyword and the function name occurring in a function call are unrelated. :-)
The grammar for the last three forms you give allows pretty much any expression, for example https://github.com/python/cpython/blob/291848195f85e23c01adb76d5a0ff9c6eb7f2...
Alternatively, we could just do something that people have been asking about since Python 1.5 and provide a flatten builtin or list method :-)
Probably a builtin, since in my mind I want to write flatten(a), never a.flatten(), and it would be useful for any kind of sequence of sequences (or even iterable of iterables). I think I would find flatten(a) more readable than [*chunk for chunk in a], and more discoverable: this operation is called "flatten" in other languages, so users are going to search the docs or help for that. But there could be endless debate about whether flatten( ("x", "y") ) should return a list or a tuple... -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 19.06.2021 17:03, Guido van Rossum wrote:
+1
But there could be endless debate about whether flatten( ("x", "y") ) should return a list or a tuple...
Have it return an iterator :-) flatten() would be in the same category of builtins as reversed() and enumerate(). I think we'll see more discussion about exactly how to flatten the structures, e.g. do you stop at strings or flatten them into lists of characters ? But I'm sure we'd reach a sensible default which makes most happy. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 19 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 19.06.2021 17:17, Serhiy Storchaka wrote:
Well, like I said: modulo the discussion around what "flatten" should mean, e.g. you will probably want to have flatten() go a certain number of levels deep and not necessarily flatten strings. But yes, such a definition is certainly a good start. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 19 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Flatten is used in numpy: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html?... Where it is a method, and goes all the way to one dimension. A builtin flatten can’t be the same, but it would be nice if it weren’t too different. e.g. list(flatten(a_2d_array)) would do the same thing as list(a_2d_array.flatten()). For a builtin, perhaps it could take an optional integer “depth” parameter which would provide flexibility and be one way to control its behavior with strings. -CHB On Sat, Jun 19, 2021 at 8:36 AM Marc-Andre Lemburg <mal@egenix.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Jun 20, 2021 at 1:07 PM Christopher Barker <pythonchb@gmail.com> wrote:
Flatten is used in numpy: Where it is a method, and goes all the way to one dimension.
I think it's worth keeping in mind the differences though. In NumPy, arr.flatten() doesn't even TOUCH the array itself. It is solely a manipulation of the `.shape` attribute. In essence, we could define it like this: arr.flatten = lambda a: a.reshape(reduce(mul, a.shape, 1)) Actual NumPy arrays don't allow attaching a monkey-patch method, but if they did that would be it. I guess this example relies on knowing that `.reshape` also doesn't touch the data though (in particular, zero memory copies). A builtin flatten can’t be the same, but it would be nice if it weren’t too
I think that would be nice as a signature. I don't really care about builtin vs itertools, but something like `flatten(seq, depth=2)` would be handy. The question is really whether the default depth is 1 or sys.maxsize. Both have plausible cases. Some special depth like 2 or 3 could be useful at times, but definitely should not be default. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, 2021-06-20 at 13:48 -0400, David Mertz wrote:
Passer by comment. But that is not true: `flatten` explicitly copies the data and does not create a view (in case the argument was about names). NumPy has: * `flatten()` (alwasy copy) * `ravel()` (copies if needed, and additionally ensures contiguity) * `reshape(-1)` (copies if needed) They are all subtly different, unfortunately. - Sebastian

On Sun, 20 Jun 2021 at 23:29, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
There's also np.concatenate which to me seems like the closest to what sum does with lists. If there is a problem to solve here it's that people want to concatenate things and reach out to the sum function because the concatenate function doesn't exist. The sum function refuses to sum strings but the preferred alternative is text = ''.join(strings) which is just weird looking. Why would you call a method on the empty string? It would be much better spelt as text = concatenate(strings) and that spelling could work equally well for lists, tuples, etc. -- Oscar

But that’s not the problem here — sum IS concatenation for any type for which __add__ means concatonate. The problem is that continuous concatenation is inefficient for many types. np.concatonate() exists because __add__ means something different for arrays. And it is only used for arrays, so it can be done efficiently. But how would a builtin concatonate() function know how to concatonate arbitrary objects? I suppose there could be a __concat__ dunder. Which is what __add__ already is for Sequences. The fact is that only the type itself knows how to efficiently concatonate a lot of smaller objects. Which is why str.join() is a string method — it is about strings, and only works with strings. Which makes me think maybe the solution is to make join() (or call it concatonate()) a Sequence method. Or, since it’s problematic to add new methods to ABCs ( the name may already be used by existing custom Sequence types), go back to the __concat__ dunder, which would take an iterable of objects to concatonate. Which inspires a new idea: extend the Sequence __add__ to take an optional additional parameter, which could be an iterable of objects to add up. I’m not sure which is more expensive— trying to pass an extra parameter to an object to see if it will except it, or checking for the existence for a dunder. But I do like that there wouldn’t be a need for a new dunder. What remains is a key question of what people need: a generic and efficient way to concatonate Sequences, or a way to flatten sequences. They are overlap, but are not the same, particularly if there is a need to flatten more than one level deep. -CHB The sum function refuses to sum
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

19.06.21 07:33, Guido van Rossum пише:
I would not call is an operator. Operator forms an expression, but *x is not an expression. It is not more operator than colon. Is there such term as "syntax form" in English? I would call *x, x for x in a, x:y syntax forms. They are not expressions, but can be part of expression. They can only be used in some context, and the meaning depends on the context (x:y are two different things in {x:y} and in a[x:y]).

As someone who originally worked on implementing PEP 448, I wanted to chime in that while I agreed with the decision to be conservative when adding new features, I originally hoped that [*x for x in xs] would eventually make its way into Python. To me, it seemed intuitive and I hoped people would eventually come around to seeing things that way, and to Steven's similar suggestion for dicts {**x for x in xs}. However, I had never seen Serhiy and Guido's point about the similarity to [a, b for a, b in abs]. I agree that the comparison makes sense, and that that notation is really confusing. Maybe this should be added to the PEP (https://www.python.org/dev/peps/pep-0448/) as a reminder? Given the counterpoint, I don't have a strong opinion either way. Maybe it's better to remain conservative and see where the language is at in another few years? On another tangent, I'd like to see array[*x, y] working one day. If I remember correctly, that was an oversight. It regularly comes up when indexing numpy arrays, and needs an ugly workaround. Best, Neil

18.06.21 17:38, Guido van Rossum пише:
Yes, I think that it could be interpreted in one of the following ways: [x, (y for y in a)] [x, *(y for y in a)] [(x, y) for y in a] [*(x, y) for y in a] # if allow [*chunk for ...] Any interpretation can be well-justified and formally non-ambiguous once we choose the one to be allowed. But it will still *look* ambiguous, so it is better to avoid such syntax in Python which is famous for its clear syntax. I withed that I could write just [*chunk for ...] several times per year, but I understand that there were reasons to not allow it.

Why are we arguing about `[x,y for y in a]` when nobody has requested that syntax? -- Steve

On Sat, Jun 19, 2021, 5:00 AM Steven D'Aprano <steve@pearwood.info> wrote: Why are we arguing about `[x,y for y in a]` when nobody has requested that syntax? -- Steve I've wanted many times to be able to write the star unpack there, even as a relatively modestly experienced python user. But it has never occurred to me to write this one.

I don't see how allowing [x, y for x in a] follows from allowing [*chunk for chunk in list_of_lists].

On Fri, 18 Jun 2021, Serhiy Storchaka wrote:
The first rejected variation in the PEP look perfectly unambigous to me. Only in a the second variation could there be multiple interpretations, but nothing worse than the difference between f(x, y, z) f((x, y, z)) Suggested solution: require appropriate brackets or parentheses. This would leave the left form in the second variation a syntax error. but allow the right side the obvious interpretation. If someone would like to "unpack into the arguments of the call to f", that could then be an entirely separate discussion. However, Discarding the first variation because the second could be ambigous, seems to me like a case of baby and bath water.
But why ??
which would be equivalent to
[a[0], y, a[1], y, a[2], y, ...]
I dont see that equivalence, what are you building it on ? We already have [x, (y for x in a)] [(x, y) for x in a] both meaning something else. Try this instead: [*(x, y) for x in a] = [*(a[0], y), *(a[1], y), *(a[2], y), ...] = [a[0], y, a[1], y, a[2], y, ...] /Paul

On Wed, Jun 16, 2021 at 02:50:06PM +0000, Oliver Margetts wrote:
Did you actually use sum to concatenate lists in real code, or is this a theoretical issue? What circumstances do you have where you want to concatenate a large number of lists and thought that sum was an appropriate way to do it? [...]
Having non-linear complexity is not a suitable way to discourage this behaviour though.
We didn't put non-linear complexity in as a way to discourage the use of sum. The non-linear complexity comes about as a natural consequence of the behaviour of lists, it is not a "passive-aggressive runtime". There is no practical way for us to detect ahead of time all imaginable types where repeated addition has non-linear behaviour. And although sum was intended for use only with numbers, there is no *hard* rule in Python that people can't use functions for unintended purposes that make sense to them. It may even be that there are people who consider the convenience of sum worth it even for lists, since quadratic runtime behaviour is still fast for small enough N. I sometimes use it myself, in the REPL, to merge a handful of lists. If it takes 100 ms instead of a microsecond, I don't even notice. The performance error for strings should be considered an anomaly, not a feature to be extended to anything that could be used, or misused, with non-linear behaviour. At the very least, we would probably need to see evidence that this issue (poor performance) is a widespread problem before breaking code which works fine for small N. -- Steve

I’m pretty sure that using sum with strings was a real issue in real code before it was disallowed. But the irony is that strings in the cPython interpreter have an optimization that makes it actually work fine :-( I’d rather remove the error for strings than add more Type limitations. -CHB At the very least, we would probably need to see
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Thu, Jun 17, 2021 at 9:04 AM Christopher Barker <pythonchb@gmail.com> wrote:
That *can* make it work. It doesn't always. There are limitations on the optimization.
I’d rather remove the error for strings than add more Type limitations.
I'm not sure why it doesn't special-case it to "".join() instead of erroring out, but presumably there's a good reason. Given that it can't easily be done in a cross-interpreter efficient way, it's better to have people do things in the reliable way rather than depend on a CPython implementation detail. For instance, this code will probably work in CPython: def modify_file(fn): data = open(fn).read() mutate(data) open(fn, "w").write(data) But we don't do that, because it won't be reliable on all interpreters. Are *all* Python interpreters going to be required to optimize strings the way CPython does? If not, it's better to not encourage code to rely on it. ChrisA

On Thu, Jun 17, 2021 at 9:46 AM Oliver Margetts <oliver.margetts@gmail.com> wrote:
I'm not sure why it doesn't special-case it to "".join() One reason might be because you'd have to read the entire iterator to know if it really was only strings. So there are concerns with generators which complicate things a fair bit
Perhaps, but I'm not sure what's actually being verified here. The check is applied to the start parameter. (If you don't set start, you'll get a TypeError for trying to add an int and a str.) In fact, if you disrupt that check, sum() is quite happy to add strings together - with potentially quadratic runtime:
Either way, if there's a non-str part way through, it'll bomb when it reaches that. I've no idea what the reason is for not bouncing it straight into join(), but I'm sure there must be one. ChrisA

On 2021-06-17 00:57, Chris Angelico wrote:
I wonder whether we could introduce a new dunder __sum__ for summing sequences. It would call type(start).__sum__ with the start value and the sequence. str.__sum__ would use str.join and list.__sum__ would use list.extend. The fallback would be to do what it does currently.

On Wed, Jun 16, 2021 at 04:01:24PM -0700, Christopher Barker wrote:
I’m pretty sure that using sum with strings was a real issue in real code before it was disallowed.
But not with the sum() function, which has prohibited strings from it's introduction in 2.3 :-) https://docs.python.org/release/2.3/lib/built-in-funcs.html You are right that the reason the restriction was baked in was because of real issues. -- Steve

I came across this just processing JSON data - flattening a list of 'deals' into a list of individual 'trades'. Flattening nested lists is not uncommon. And yes sum seemed the natural way to do it at the time. Yes, you're right some people might find it convenient and use it in the REPL. It just seems like a decision has been made that that's the wrong way to do it. Hence no fast paths or optimisations. As others have mentioned, you -could- go the opposite way and optimise for some builtins - at the expense of surprise for user-defined types. I think it would be good to make it harder for people to get it wrong. What kind of evidence are you taking about here? There are several bug reports. Code audits? Tricky to obtain, but I could try if this is of genuine interest. On Wed, 16 Jun 2021, 11:16 pm Steven D'Aprano, <steve@pearwood.info> wrote:

On Wed, Jun 16, 2021 at 7:35 PM Oliver Margetts <oliver.margetts@gmail.com> wrote:
I remember the discussion in 2013 about the quadratic performance of summing strings. There were proposals for optimization of the string case, but they amounted to using the "fast path" sometimes present in `.__iadd__()` take over the `.__add__()` operation. In fact, I gave keynotes at PyCon-UK 2013, and PyCon-ZA 2014 on exactly this discussion (but really more about what it shows about Python design: https://gnosis.cx/pycon-za-2014/Keynote-Ideas.pdf). Even though I used the `sum(list_of_lists)` example in my presentations, I don't think I have EVER encountered it in the wild. I'm sympathetic to raising an exception on `sum(list_of_lists)` similar to `sum(list_of_strings)`. But what exactly is the recommended substitute? We have this: list(chain.from_iterable(list_of_lists)) Which is pretty nice, and what I'd probably do, but it DOES require importing from itertools to use it. That feels like a drawback, at least a minor one. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Wed, Jun 16, 2021 at 8:04 PM David Mertz <mertz@gnosis.cx> wrote:
IIUC, the idea behind raising, rather than optimizing, is that there is a "better" way to do it, and it's a good lesson. After all, sum() aside, we also want to discourage folks from doing a lot of string concatenation in a loop as well. I'm sympathetic to raising an exception on `sum(list_of_lists)` similar to
I agree -- str.join() is built in and really idiomatic Python -- itertools.chain is, I suppose idiomatic Python, but there's the import, a lot of extra typing, and a substantial cognitive load to figure it out the first (few) times you use it. (I don't think I ever have...) If, as suggested, flattening a list of lists is a common operation, a nice clean and efficient built in way to do it would be reasonable. Heck, you could make it a list method :-) If you read the BPO the OP linked, that was a suggested patch to optimize sum(list_of_lists) -- I'm not sure that's such a bad idea after all. -CHB NOTE: One reason I've never had to flatten a list of lists is because I'm a heavy numpy user -- I'd be more likely to have an ndarray and flatten that -- which is very easy and efficient :-) -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Thu, Jun 17, 2021, 1:14 AM Christopher Barker
The proposal was to drop in .__iadd__() for .__add__(), wasn't it? As a heavy NumPy user, you know those sometimes have different semantics. I actually showed that in my 7 year old talk as one argument against it. More-itertools has flatten(): https://more-itertools.readthedocs.io/en/stable/api.html#more_itertools.flat.... That seems better than a method specific to lists.


On Wed, Jun 16, 2021 at 10:24 PM David Mertz <mertz@gnosis.cx> wrote:
well, yes, but the numpy example was brought up -- and a patch that addressed that was offered. I think it's clearly possible to optimize certain types -- and maybe all Sequence container types. The question is whether that's a good idea. The fact that it was decided to raise for strings, when an optimization could have been added answers that question. Despite my personal opinion, I think the only options are to raise for more types, with a helpful message, or just leave it alone. -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Wed, Jun 16, 2021 at 10:13:40PM -0700, Christopher Barker wrote:
People have been talking about a flatten builtin or list method since Python 1.5 days. Perhaps it is time to do a PEP? Here is one example implementation from eleven years ago: https://code.activestate.com/recipes/577250-flatten-a-list/ Start designing the API for flatten and one quickly discovers that the problem isn't how to implement it, but the number of options that people may, or may not, want. What counts as an atomic type? Flatten all the way down or just one level? Return a list or an iterator? Inplace or a new list? Etc. -- Steve

Arbitrary and complex nested structures do seem like they would require a complex solution. OTOH `more_itertools.flatten` seems ergonomic - and it is very simple, just a wrapper around `itertools.chain.from_iterable` with a memorable name. If that's the preferred solution, nudging users in the direction of those functions, or a comprehension, would be useful (thinking here there should be one-- and preferably only one --obvious way to do it). Maybe deprecation is extreme, but what about a warning ("looks like you're trying to X maybe you should Y")? Adding hints might become annoying (think "Clippy the office assistant"), a solution with added complexity would be to warn when you get to the 100th list/tuple - so you can still sum a few lists in the REPL. On Thu, 17 Jun 2021 at 10:46, Steven D'Aprano <steve@pearwood.info> wrote:

17.06.21 06:03, David Mertz пише:
And it is equivalent to pure Python code [x for chunk in list_of_lists for x in chunk] It has linear complexity and is only by a constant factor slower because it iterates loops in bytecode instead of the C code. It would be possible to make the compiler recognizing such pattern and generating more efficient bytecode (LIST_EXTEND instead of an internal loop with LIST_APPEND), but I am not sure that this case is common enough to complicate the compiler.

On Thu, Jun 17, 2021 at 12:37 AM Serhiy Storchaka <storchaka@gmail.com> wrote:
And it is equivalent to pure Python code
[x for chunk in list_of_lists for x in chunk]
Okay, slightly off-topic, but can we *please* allow [*chunk for chunk in list_of_lists] some day. I think it was left out because some discussion concluded it would be too confusing, which is ridiculous. I assumed it would work and was confused to find that it didn't. It's blatantly inconsistent. It's not even the performance I care about, it's the unreadability of having an extra "for i_have_to_think_of_yet_another_variable_name in whatever" at the end of the list comprehension (at maximum distance from where the variable is used). I've wished for this feature ridiculously many times.
In my personal experience it's very common.

On Thu, Jun 17, 2021, 5:24 PM Ben Rudiak-Gould
Okay, slightly off-topic, but can we *please* allow
[*chunk for chunk in list_of_lists]
It is completely non-obvious to me what that would even MEAN. I cannot derive anything obvious from other uses of *. If I had to guess, I'd think that this was tuple unpacking, and the result would be a list of tuples. However, apparently in your mind there is some way to read this as "flatten." I don't know how to get there mentally (other than just memorizing a weird behavior).

On Thu, Jun 17, 2021 at 5:52 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
It is completely non-obvious to me what that would even MEAN. I cannot
derive anything obvious from other uses of *.
I guess I can kinda see that analogy. But a loop is an assignment, and this doesn't mean "flatten": a, b, *c = 1, 2, 3, 4, 5, 6 Moreover, in a regular loop:
Neither of my examples, I admit, are quite the same as the comprehension. Nonetheless, I definitely don't think the proposed syntax is "the one obvious way to do it." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Thu, Jun 17, 2021 at 02:51:44PM -0700, Jelle Zijlstra wrote:
Oh, that's clever, and I might even have thought of that myself if it wasn't described as "off-topic" *wry smile* So in a generator comprehension, what would it do? (*chunk for chunk in values) # equivalent to... ? # (item for chunk in values for item in chunk) perhaps? I guess we could allow an equivalent in dict comprehensions: {**chunk for chunk in values} for unpacking nested dicts or (key,value) pairs. Clever... or maybe *too* clever? -- Steve

David Mertz writes:
Pretty clearly the * means to unpack the chunk, and substitute extend for append, no? result = [] for chunk in list_of_lists: result.extend(chunk) You write elsewhere that a loop is an (iterated) assignment (to chunk), but it's *not* an assignment to result, it's an in-place modification. It did take me a bit of thought to come to Ben's intended interpretation, but I think if you explain it this way, it becomes obvious to the non-Dutch. I'll grant that this doesn't *really* work for genexps, but I think of those as very lazy, very forgetful lists, so WFM. Serhiy writes that a reason for not allowing this is that you'd want to allow [x, y for x in l], splicing the x, y values into the result list. That doesn't make sense to me, for two reasons. x, y already has a meaning in that context, and the result should be a list of pairs, each with y as the second element. On the other hand, the r-value *(x, y) requires a context into which it can be spliced: w, z = *(x, y) is a syntax error. As the result element in a comprehension, the context is clearly the comprehension being built. Perhaps Serhiy meant a similar but different syntax that's problematic? But if not, I kinda like this. Steve

On Fri, Jun 18, 2021 at 10:10:33PM +0900, Stephen J. Turnbull wrote:
So perhaps not actually that clear then? *wink* In hindsight, after realising what Ben's intention was (at least we assume that is what he was thinking of) it does seem like a reasonable way of unpacking multiple items into a list or set comprehension at once. I don't think that this is something that we could say was "intuitively obvious" to somebody who is neither Dutch not experienced with Python, but I think it is retroactively obvious once explained. Unpacking comprehensions: [*item for item in sequence] {*item for item in sequence] are syntactic sugar for something roughly equivalent to: result = [] # or set() for item in sequence: for tmp in item: result.append(tmp) # or result.add The exact implementation could vary, e.g. by using `extend` for list comprehensions. That suggests a meaning for double-star unpacking in a dict comprehension (single-star unpacking would make it a set). {**item for item in seq} # equivalent to: result = {} for item in seq: result.update(item) And these suggest meanings for the equivalent in generator expressions: (*item for item in sequence) # equivalent to: (tmp for item in sequence for tmp in item) The double-star version would follow similar rules to dict.update: (**item for item in sequence) # equivalent to: def gen(): for item in sequence: if hasattr(item, 'keys'): for k in item: yield (k, item[k]) else: for k, v in item: yield (k, v) Works for me. -- Steve

On Thu, Jun 17, 2021 at 02:22:29PM -0700, Ben Rudiak-Gould wrote:
Okay, slightly off-topic, but can we *please* allow
[*chunk for chunk in list_of_lists]
What would that do? The only thing I can guess it would do is the equivalent of: result = [] for chunk in list_of_lists: result.append(*chunk) which is a long and obfuscated way of saying `raise TypeError` :-) Well, there is this: result = [] for chunk in list_of_lists: *temp, = chunk result.append(temp) which would make it an obfuscated way to spell `list(chunk)`.
Blatently inconsistent with what? I have no idea what you are contrasting the non-support of sequence unpacking with. It's not this: >>> chunk = (1, 2, 3) >>> t = *chunk File "<stdin>", line 1 SyntaxError: can't use starred expression here but I can't tell what you're thinking of. Some context with sequence unpacking that is "slightly off-topic". -- Steve

On Thu, Jun 17, 2021 at 3:09 PM Steven D'Aprano <steve@pearwood.info> wrote:
The difference between chunk and *chunk in the expression of a list comprehension would be the same as the difference between them in the expressions of a starred_list. The only thing I can guess it would do is the
It would be reasonable to allow list.append to take any number of arguments to be appended to the list, as though its definition was def append(self, *args): self.extend(args) If it did, then that translation would work and do the right thing. Some similar functions do accept multiple arguments as a convenience, though it's not very consistent: myset.add(1, 2) # no myset.update([1, 2], [3, 4]) # ok mylist.append(1, 2) # no mylist.extend([1, 2], [3, 4]) # no mydict.update({'a': 1}, b=2, c=3) # ok mydict.update({'a': 1}, {'b': 2}, c=3) # no Well, there is this:
Unpacking would be useless in every context if you interpreted it like that.

18.06.21 00:22, Ben Rudiak-Gould пише:
It was originally proposed in PEP 448 (Additional Unpacking Generalizations) but was excluded after discussing. If we allow [*chunk for chunk in list_of_lists] we should allow also [x, y for x in a] which would be equivalent to [a[0], y, a[1], y, a[2], y, ...]

On Fri, Jun 18, 2021 at 07:38:49AM -0700, Guido van Rossum wrote:
We already have a rule to disambiguate generator comprehensions: they must always be parenthesized unless they are already parenthised: g = (y for y in a) # parens required t = 999, (y for y in a) # parens required func((y for y in a)) # inner parens optional
That’s a good enough reason for me to also disallow *chunks.
That's an odd way to look at it. We must disallow an unambiguous syntax because a completely different syntax would have been ambiguous if we didn't already have a rule in place that disambiguates it. -- Steve

On Fri, Jun 18, 2021 at 8:40 PM Steven D'Aprano <steve@pearwood.info> wrote:
Yes, that's exactly what I was referring to.
Maybe, but the two are not unrelated. In other contexts, when we accept "*chunk", and 'chunk' equals "1, 2", we also accept "1, 2" in the original position, and it means the same thing. This is useful as an explanation of the semantics of the unary '*' operator[1]. For example # Given: chunk = 1, 2 # Equivalent: f(*chunk) f(1, 2) # Equivalent: [*chunk] [1, 2] So then if we were to allow this: [*chunk for chunk in ...] we ought to consider this equivalent: [1, 2 for chunk in ...] (Note there's nothing that says the expressions to the left of 'for' need to involve the for-control variable 'chunk'. :-) Now, this shouldn't be considered an airtight argument against [*chunk for ...], but it does show that there's no straightforward explanation of its meaning through equivalence (like the OP seemed to think), and I think this is what Serhiy was also getting at in his post. __________ [1] Does the unary star operator have a name when used like this in Python? In JavaScript the equivalent syntax ("...chunk", where the "..." are a literal ellipsis) is called "spread". We could borrow this term. -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On Fri, Jun 18, 2021 at 09:33:49PM -0700, Guido van Rossum wrote:
Indeed. I was initially confused by what Ben thought was a simple and obvious connection between star unpacking in some other contexts and his suggestion for comprehensions. The analogy with `[*a]` never crossed my mind, and I don't think that we should look at this as literally the application of sequence unpacking in a comprehension, for reasons I gave in my earlier post. But having it explained to me, I think that treating this as an analogy rather than literal unpacking works. We already give unary star and double star a number of meanings, not all of which are related: - import wildcard; - capture positional and keyword parameters `def func(*args, **kw)` - sequence and keyword unpacking in function calls; - sequence capture in assignment targets `head, *a, tail = items` - sequence unpacking in list etc displays; Have I missed any? We could define *star comprehensions* as syntactic sugar for nested comprehensions, to aid in flattening nested sequences and mappings. [*expression for name in sequence if condition] results in this: result = [] for name in sequence: if condition: for tmp in expression: result.append(tmp) return result I haven't thought deeply into this, but I think that if the starred expression is anything but a simple name, it may require parentheses? *name # okay *(name.attr or []) # everything else needs parens Alternatively, we could just do something that people have been asking about since Python 1.5 and provide a flatten builtin or list method :-) -- Steve

On Fri, Jun 18, 2021 at 10:15 PM Steven D'Aprano <steve@pearwood.info> wrote:
What you seem to be (intentionally) missing is that all but the import wildcard *are* closely related, even though they are specified separately in the syntax. (Saying that they are unrelated would be like saying that the name occurring after a 'def' keyword and the function name occurring in a function call are unrelated. :-)
The grammar for the last three forms you give allows pretty much any expression, for example https://github.com/python/cpython/blob/291848195f85e23c01adb76d5a0ff9c6eb7f2...
Alternatively, we could just do something that people have been asking about since Python 1.5 and provide a flatten builtin or list method :-)
Probably a builtin, since in my mind I want to write flatten(a), never a.flatten(), and it would be useful for any kind of sequence of sequences (or even iterable of iterables). I think I would find flatten(a) more readable than [*chunk for chunk in a], and more discoverable: this operation is called "flatten" in other languages, so users are going to search the docs or help for that. But there could be endless debate about whether flatten( ("x", "y") ) should return a list or a tuple... -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

On 19.06.2021 17:03, Guido van Rossum wrote:
+1
But there could be endless debate about whether flatten( ("x", "y") ) should return a list or a tuple...
Have it return an iterator :-) flatten() would be in the same category of builtins as reversed() and enumerate(). I think we'll see more discussion about exactly how to flatten the structures, e.g. do you stop at strings or flatten them into lists of characters ? But I'm sure we'd reach a sensible default which makes most happy. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 19 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

On 19.06.2021 17:17, Serhiy Storchaka wrote:
Well, like I said: modulo the discussion around what "flatten" should mean, e.g. you will probably want to have flatten() go a certain number of levels deep and not necessarily flatten strings. But yes, such a definition is certainly a good start. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Jun 19 2021)
Python Projects, Coaching and Support ... https://www.egenix.com/ Python Product Development ... https://consulting.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 https://www.egenix.com/company/contact/ https://www.malemburg.com/

Flatten is used in numpy: https://numpy.org/doc/stable/reference/generated/numpy.ndarray.flatten.html?... Where it is a method, and goes all the way to one dimension. A builtin flatten can’t be the same, but it would be nice if it weren’t too different. e.g. list(flatten(a_2d_array)) would do the same thing as list(a_2d_array.flatten()). For a builtin, perhaps it could take an optional integer “depth” parameter which would provide flexibility and be one way to control its behavior with strings. -CHB On Sat, Jun 19, 2021 at 8:36 AM Marc-Andre Lemburg <mal@egenix.com> wrote:
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Sun, Jun 20, 2021 at 1:07 PM Christopher Barker <pythonchb@gmail.com> wrote:
Flatten is used in numpy: Where it is a method, and goes all the way to one dimension.
I think it's worth keeping in mind the differences though. In NumPy, arr.flatten() doesn't even TOUCH the array itself. It is solely a manipulation of the `.shape` attribute. In essence, we could define it like this: arr.flatten = lambda a: a.reshape(reduce(mul, a.shape, 1)) Actual NumPy arrays don't allow attaching a monkey-patch method, but if they did that would be it. I guess this example relies on knowing that `.reshape` also doesn't touch the data though (in particular, zero memory copies). A builtin flatten can’t be the same, but it would be nice if it weren’t too
I think that would be nice as a signature. I don't really care about builtin vs itertools, but something like `flatten(seq, depth=2)` would be handy. The question is really whether the default depth is 1 or sys.maxsize. Both have plausible cases. Some special depth like 2 or 3 could be useful at times, but definitely should not be default. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, 2021-06-20 at 13:48 -0400, David Mertz wrote:
Passer by comment. But that is not true: `flatten` explicitly copies the data and does not create a view (in case the argument was about names). NumPy has: * `flatten()` (alwasy copy) * `ravel()` (copies if needed, and additionally ensures contiguity) * `reshape(-1)` (copies if needed) They are all subtly different, unfortunately. - Sebastian

On Sun, 20 Jun 2021 at 23:29, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
There's also np.concatenate which to me seems like the closest to what sum does with lists. If there is a problem to solve here it's that people want to concatenate things and reach out to the sum function because the concatenate function doesn't exist. The sum function refuses to sum strings but the preferred alternative is text = ''.join(strings) which is just weird looking. Why would you call a method on the empty string? It would be much better spelt as text = concatenate(strings) and that spelling could work equally well for lists, tuples, etc. -- Oscar

But that’s not the problem here — sum IS concatenation for any type for which __add__ means concatonate. The problem is that continuous concatenation is inefficient for many types. np.concatonate() exists because __add__ means something different for arrays. And it is only used for arrays, so it can be done efficiently. But how would a builtin concatonate() function know how to concatonate arbitrary objects? I suppose there could be a __concat__ dunder. Which is what __add__ already is for Sequences. The fact is that only the type itself knows how to efficiently concatonate a lot of smaller objects. Which is why str.join() is a string method — it is about strings, and only works with strings. Which makes me think maybe the solution is to make join() (or call it concatonate()) a Sequence method. Or, since it’s problematic to add new methods to ABCs ( the name may already be used by existing custom Sequence types), go back to the __concat__ dunder, which would take an iterable of objects to concatonate. Which inspires a new idea: extend the Sequence __add__ to take an optional additional parameter, which could be an iterable of objects to add up. I’m not sure which is more expensive— trying to pass an extra parameter to an object to see if it will except it, or checking for the existence for a dunder. But I do like that there wouldn’t be a need for a new dunder. What remains is a key question of what people need: a generic and efficient way to concatonate Sequences, or a way to flatten sequences. They are overlap, but are not the same, particularly if there is a need to flatten more than one level deep. -CHB The sum function refuses to sum
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

19.06.21 07:33, Guido van Rossum пише:
I would not call is an operator. Operator forms an expression, but *x is not an expression. It is not more operator than colon. Is there such term as "syntax form" in English? I would call *x, x for x in a, x:y syntax forms. They are not expressions, but can be part of expression. They can only be used in some context, and the meaning depends on the context (x:y are two different things in {x:y} and in a[x:y]).

As someone who originally worked on implementing PEP 448, I wanted to chime in that while I agreed with the decision to be conservative when adding new features, I originally hoped that [*x for x in xs] would eventually make its way into Python. To me, it seemed intuitive and I hoped people would eventually come around to seeing things that way, and to Steven's similar suggestion for dicts {**x for x in xs}. However, I had never seen Serhiy and Guido's point about the similarity to [a, b for a, b in abs]. I agree that the comparison makes sense, and that that notation is really confusing. Maybe this should be added to the PEP (https://www.python.org/dev/peps/pep-0448/) as a reminder? Given the counterpoint, I don't have a strong opinion either way. Maybe it's better to remain conservative and see where the language is at in another few years? On another tangent, I'd like to see array[*x, y] working one day. If I remember correctly, that was an oversight. It regularly comes up when indexing numpy arrays, and needs an ugly workaround. Best, Neil

18.06.21 17:38, Guido van Rossum пише:
Yes, I think that it could be interpreted in one of the following ways: [x, (y for y in a)] [x, *(y for y in a)] [(x, y) for y in a] [*(x, y) for y in a] # if allow [*chunk for ...] Any interpretation can be well-justified and formally non-ambiguous once we choose the one to be allowed. But it will still *look* ambiguous, so it is better to avoid such syntax in Python which is famous for its clear syntax. I withed that I could write just [*chunk for ...] several times per year, but I understand that there were reasons to not allow it.

Why are we arguing about `[x,y for y in a]` when nobody has requested that syntax? -- Steve

On Sat, Jun 19, 2021, 5:00 AM Steven D'Aprano <steve@pearwood.info> wrote: Why are we arguing about `[x,y for y in a]` when nobody has requested that syntax? -- Steve I've wanted many times to be able to write the star unpack there, even as a relatively modestly experienced python user. But it has never occurred to me to write this one.

I don't see how allowing [x, y for x in a] follows from allowing [*chunk for chunk in list_of_lists].

On Fri, 18 Jun 2021, Serhiy Storchaka wrote:
The first rejected variation in the PEP look perfectly unambigous to me. Only in a the second variation could there be multiple interpretations, but nothing worse than the difference between f(x, y, z) f((x, y, z)) Suggested solution: require appropriate brackets or parentheses. This would leave the left form in the second variation a syntax error. but allow the right side the obvious interpretation. If someone would like to "unpack into the arguments of the call to f", that could then be an entirely separate discussion. However, Discarding the first variation because the second could be ambigous, seems to me like a case of baby and bath water.
But why ??
which would be equivalent to
[a[0], y, a[1], y, a[2], y, ...]
I dont see that equivalence, what are you building it on ? We already have [x, (y for x in a)] [(x, y) for x in a] both meaning something else. Try this instead: [*(x, y) for x in a] = [*(a[0], y), *(a[1], y), *(a[2], y), ...] = [a[0], y, a[1], y, a[2], y, ...] /Paul
participants (21)
-
Ben Rudiak-Gould
-
Chris Angelico
-
Christopher Barker
-
Damian Shaw
-
David Mertz
-
Greg Ewing
-
Guido van Rossum
-
Jelle Zijlstra
-
Marc-Andre Lemburg
-
Mathew Elman
-
MRAB
-
Neil Girdhar
-
Oliver Margetts
-
Oscar Benjamin
-
Paul Svensson
-
Ricky Teachey
-
Rob Cliffe
-
Sebastian Berg
-
Serhiy Storchaka
-
Stephen J. Turnbull
-
Steven D'Aprano