I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables. It could look something like this.
dirty = "f8sjGe7" clean = c"char for char in dirty if char in string.ascii_letters" clean
'fsjGe'
Currently, the best way to do this (in the general case) seems to be the following.
clean = "".join(char for char in dirty if char in string.ascii_letters)
But I think the proposed syntax would be superior for two main reasons.
- Consistency with the comprehension style for all other iterables (which seems to be one of the most beloved features of python) - Confusion surrounding the str.join(iter) syntax is very well documented https://stackoverflow.com/questions/493819/why-is-it-string-joinlist-instead-of-list-joinstring and I believe it is particularly unintuitive when the string is empty
I also believe the following reasons carry some weight.
- Skips unnecessary type switching from str to iter and back to str - Much much MUCH more readable/intuitive
Please let me know what you all think. It was mentioned (by @rhettinger) in the PBT issue https://bugs.python.org/issue43900 that this will likely require a PEP which I would happily write if there is a positive response.
Hi David
I see where you are coming from. I find it helps to think of sep.join as a special case. Here's a more general join, with sep.join equivalent to genjoin(sep, '', '').
def genjoin(sep, left, right): def fn(items): return left + sep.join(items) + right return fn
Here's how it works
genjoin('', '', '')('0123') == '0123' genjoin(',', '', '')('0123') == '0,1,2,3' genjoin(',', '[', ']')('0123') == '[0,1,2,3]'
All of these examples of genjoin can be thought of as string comprehensions. But they don't fit into your pattern for a string comprehension literal.
By the way, one might want something even more general. Sometimes one wants a fn such that
fn('') == '[]' fn('0') == '[0,]' fn('01') == '[0,1,]'
which is again a string comprehension.
I hope this helps.
On Sat, May 1, 2021 at 2:52 AM Jonathan Fine jfine2358@gmail.com wrote:
Hi David
I see where you are coming from. I find it helps to think of sep.join as a special case. Here's a more general join, with sep.join equivalent to genjoin(sep, '', '').
def genjoin(sep, left, right): def fn(items): return left + sep.join(items) + right return fn
Here's how it works
genjoin('', '', '')('0123') == '0123' genjoin(',', '', '')('0123') == '0,1,2,3' genjoin(',', '[', ']')('0123') == '[0,1,2,3]'
All of these examples of genjoin can be thought of as string comprehensions. But they don't fit into your pattern for a string comprehension literal.
For those cases where you're merging literal parts and generated parts, it may be of value to use an f-string:
f"[{','.join('0123')}]"
'[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals.
ChrisA
On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico rosuav@gmail.com wrote:
For those cases where you're merging literal parts and generated
parts, it may be of value to use an f-string:
f"[{','.join('0123')}]"
'[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals.
For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']')
It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing).
I accept that in some cases Chris's ingenious construction has benefits.
I appreciate the feedback, but I don't think the proposed ideas address any of my points.
1. *Consistency *(with other comprehensions) 2. *Intuitiveness *(as opposed to str.join(iter) which is widely deemed to be confusing and seemingly-backwards) 3. *Efficiency *(with respect to line count and function calls... though perhaps the cpython implementation could actually avoid the type switching and improve time complexity) 4. *Readability *(due to *much *clearer typing and lack of highly-nested function calls ( f"[{','.join('0123')}]" ) and higher-order functions ( genjoin('', '', '')('0123') )
I would also like readers/commenters to consider the fact that, though I have only provided one use-case, the proposed enhancement would serve as the primary syntax for constructing or filtering a string *when dependent on any other iterable or condition*. I believe this to be an extremely common (almost universal) use-case. Here are just a couple more examples.
new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1
To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so.
My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Thank you.
DQAL
On Fri, Apr 30, 2021 at 1:10 PM Jonathan Fine jfine2358@gmail.com wrote:
On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico rosuav@gmail.com wrote:
For those cases where you're merging literal parts and generated
parts, it may be of value to use an f-string:
f"[{','.join('0123')}]"
'[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals.
For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']')
It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing).
I accept that in some cases Chris's ingenious construction has benefits.
-- Jonathan
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UVTLPO... Code of Conduct: http://python.org/psf/codeofconduct/
Small correction: isinstance(x, int) should be x.isdigit() in the last example.
On Fri, Apr 30, 2021 at 2:14 PM David Álvarez Lombardi < alvarezdqal@gmail.com> wrote:
I appreciate the feedback, but I don't think the proposed ideas address any of my points.
- *Consistency *(with other comprehensions)
- *Intuitiveness *(as opposed to str.join(iter) which is widely
deemed to be confusing and seemingly-backwards) 3. *Efficiency *(with respect to line count and function calls... though perhaps the cpython implementation could actually avoid the type switching and improve time complexity) 4. *Readability *(due to *much *clearer typing and lack of highly-nested function calls ( f"[{','.join('0123')}]" ) and higher-order functions ( genjoin('', '', '')('0123') )
I would also like readers/commenters to consider the fact that, though I have only provided one use-case, the proposed enhancement would serve as the primary syntax for constructing or filtering a string *when dependent on any other iterable or condition*. I believe this to be an extremely common (almost universal) use-case. Here are just a couple more examples.
new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1
To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so.
My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Thank you.
DQAL
On Fri, Apr 30, 2021 at 1:10 PM Jonathan Fine jfine2358@gmail.com wrote:
On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico rosuav@gmail.com wrote:
For those cases where you're merging literal parts and generated
parts, it may be of value to use an f-string:
f"[{','.join('0123')}]"
'[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals.
For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']')
It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing).
I accept that in some cases Chris's ingenious construction has benefits.
-- Jonathan
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UVTLPO... Code of Conduct: http://python.org/psf/codeofconduct/
On 2021-04-30 at 14:14:50 -0400, David Álvarez Lombardi alvarezdqal@gmail.com wrote:
[...]
new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1
[...]
My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Embedding a[nother] domain specific language in a string also doesn't decrease complexity; look at all the regular expression builders.
Unless you're a core developer (or perhaps not even then), I suspect that most library functions started as "personal" functions. Hey, here's something I need for this project ... hey, I just wrote that for the last project ... how many times will I write this before I stick it in general_utilities ... let's see what python-ideas thinks ...
Add the following to your personal library and see how many times you use it in the coming weeks or months:
def string_from_iterable_of_characters(iterable): return ''.join(iterable)
I haven't tested anything, but string_from_iterable_of_characters should take everything inside your c-strings unchanged.
On 30/04/2021 19:14, David Álvarez Lombardi wrote:
I appreciate the feedback, but I don't think the proposed ideas address any of my points.
- *Consistency *(with other comprehensions)
You're actually adding an inconsistency: having a comprehension inside string quotes instead of not.
- *Intuitiveness *(as opposed to str.join(iter)which is widely deemed to be confusing and seemingly-backwards)
Yes I agree your examples read nicely, without the usual boilerplate. Whether this is worth adding to the language is a moot point. Every addition increases the size of the compiler/interpreter, increases the maintenance burden, and adds to the learning curve for newbies (and not-so-newbies). As far as I can see in every case c'SOMETHING' can be replaced by ''.join(SOMETHING) or str.join('', (SOMETHING)) Having many ways to do the same thing is not a plus.
- *Efficiency *(with respect to line count and function calls... though perhaps the cpython implementation could actually avoid the type switching and improve time complexity)
It seems to me it would probably save a function call. That seems like a minor consideration.
- *Readability *(due to /much /clearer typing and lack of highly-nested function calls ( f"[{','.join('0123')}]" ) and higher-order functions ( genjoin('', '', '')('0123') )
This seems to me to be making the same point as "Intuitiveness". Best wishes Rob Cliffe (I can't hack your heading auto-numbering so they've all ended up being numbered 1.)
I would also like readers/commenters to consider the fact that, though I have only provided one use-case, the proposed enhancement would serve as the primary syntax for constructing or filtering a string *when dependent on any other iterable or condition*. I believe this to be an extremely common (almost universal) use-case. Here are just a couple more examples.
new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1
To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so.
My goal is to /decrease/ complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Thank you.
DQAL
On Fri, Apr 30, 2021 at 1:10 PM Jonathan Fine <jfine2358@gmail.com mailto:jfine2358@gmail.com> wrote:
On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico <rosuav@gmail.com <mailto:rosuav@gmail.com>> wrote: For those cases where you're merging literal parts and generated parts, it may be of value to use an f-string: >>> f"[{','.join('0123')}]" '[0,1,2,3]' The part in the braces is evaluated as Python code, and the rest is simple literals. For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']') It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing). I accept that in some cases Chris's ingenious construction has benefits. -- Jonathan _______________________________________________ Python-ideas mailing list -- python-ideas@python.org <mailto:python-ideas@python.org> To unsubscribe send an email to python-ideas-leave@python.org <mailto:python-ideas-leave@python.org> https://mail.python.org/mailman3/lists/python-ideas.python.org/ <https://mail.python.org/mailman3/lists/python-ideas.python.org/> Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UVTLPOK4S663GIMSTUWBDMFSFHUEYHGJ/ <https://mail.python.org/archives/list/python-ideas@python.org/message/UVTLPOK4S663GIMSTUWBDMFSFHUEYHGJ/> Code of Conduct: http://python.org/psf/codeofconduct/ <http://python.org/psf/codeofconduct/>
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/QXDXJ3... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, May 1, 2021 at 6:23 AM Rob Cliffe via Python-ideas python-ideas@python.org wrote:
Yes I agree your examples read nicely, without the usual boilerplate. Whether this is worth adding to the language is a moot point. Every addition increases the size of the compiler/interpreter, increases the maintenance burden, and adds to the learning curve for newbies (and not-so-newbies). As far as I can see in every case c'SOMETHING' can be replaced by ''.join(SOMETHING) or str.join('', (SOMETHING)) Having many ways to do the same thing is not a plus.
(We can ignore the str.join('', THING) option, as that's just a consequence of the way that instance method lookups work, and shouldn't happen in people's code (although I'm sure it does).)
If people want a more intuitive way to join things, how about this?
class Str(str):
... __rmul__ = str.join ...
["a", "b", "c"] * Str(",")
'a,b,c'
Or perhaps:
... def __rmul__(self, iter): ... return self.join(str(x) for x in iter) ...
["a", 123, "b"] * Str(" // ")
'a // 123 // b'
If you want an intuitive way to join strings, surely multiplying a collection by a string makes better sense than wrapping it up in a literal-like thing. A string-literal-like-thing already exists for complex constructions - it's the f-string. The c-string doesn't really add anything above that.
ChrisA
On 2021-04-30 11:14, David Álvarez Lombardi wrote:
To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so.
The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python. You can't do this:
[wh4t3ver I feel like!!! okay?^@^&]
But you can do this:
"wh4t3ver I feel like!!! okay?^@^&"
This means that the way people think about and visually comprehend strings is quite different from other Python types. You propose to have the string delimiters now contain actual Python code that Python will parse and run, but this isn't what people are used to seeing between quote marks.
I think the closest existing thing to your string comprehensions is not any existing comprehension, but rather f-strings, which are the one place where Python does potentially parse and execute code in a string. However, f-strings are different in notable ways.
First, the code in f-strings is delimited (by curly braces), so it is visually distinguished from "freeform" text within the string. Second, f-strings do not restrict the normal usage of strings for freeform text content (apart from making the curly brace characters special). So `f"wh4t3ver I feel like!!! okay?^@^&"` is a valid f-string just like it's a valid string. In your proposal (I assume), something like `c"item for item in other_seq and then the string text continues here"` would have to be a syntax error. That is, unlike f-strings (or any other existing kind of string), the string comprehension would "claim" the entire string and you could no longer put normal string content in there.
Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data.
For the same reason, string comprehensions are likely to be less useful. I would look doubtfully on code that tried to do anything complex in a string comprehension, in the same way that I would look doubtfully on code that used f-strings with huge, complex expressions. It would be more readable to do whatever data preparation you need to do before creating the string and then use a simpler final step to create the string itself.
Also, string comprehensions would only facilitate the creation of simple "linear" strings which draw their content sequentially from iterables. I find that in practice, if I want to create a string, programmatically, I'm not doing that. Rather, I'm pulling disparate content from different places and putting it together in a template-like fashion, in the way that f-strings or str.format() facilitate. So I don't think this proposal would have much practical use in string creation.
So overall I think your proposed string comprehensions would tend to make Python code less readable in the relatively rare cases where they were useful at all.
Given that there is very little you can test about a single character, a new construct feels excessive. Basically, the only possible question is "is it in this subset of codepoints?"
However, that use is perfectly covered by the str.translate() method already. Regular expressions also cover this well.
On Fri, Apr 30, 2021, 12:08 PM David Álvarez Lombardi alvarezdqal@gmail.com wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables. It could look something like this.
dirty = "f8sjGe7" clean = c"char for char in dirty if char in string.ascii_letters" clean
'fsjGe'
Currently, the best way to do this (in the general case) seems to be the following.
clean = "".join(char for char in dirty if char in string.ascii_letters)
But I think the proposed syntax would be superior for two main reasons.
- Consistency with the comprehension style for all other iterables
(which seems to be one of the most beloved features of python)
- Confusion surrounding the str.join(iter) syntax is very well
documented https://stackoverflow.com/questions/493819/why-is-it-string-joinlist-instead-of-list-joinstring and I believe it is particularly unintuitive when the string is empty
I also believe the following reasons carry some weight.
- Skips unnecessary type switching from str to iter and back to str
- Much much MUCH more readable/intuitive
Please let me know what you all think. It was mentioned (by @rhettinger) in the PBT issue https://bugs.python.org/issue43900 that this will likely require a PEP which I would happily write if there is a positive response.
--
*David Álvarez Lombardi* Machine Learning Spanish Linguist Amazon | Natural Language Understanding Boston, Massachusetts alvarezdqal https://www.linkedin.com/in/alvarezdqal/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MVQGP4... Code of Conduct: http://python.org/psf/codeofconduct/
It's kind of weird that people seem to be missing the point about this. Python already has comprehensions for all the iterable builtins except strings. The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. Comprehensions express intent. Joining a generator expression with an empty string doesn't convey the intent that you're building a string where each character is derived from another iterable.
Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations.
Strings are VERY different from other iterables. Every item in a string is itself an (iterables) string. In many ways, strings are more like scalars, and very often we treat them as such.
You could make an argument that e.g. a NumPy array of homogenous scalars is similar. However, that would be a wrong argument.
Quite literally the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters. Yes, you could express that in convoluted ways like it's ord() being in a certain range, but it boils down to subset membership.
In contrast, predicates of unlimited complexity can be expressed of numbers. You can ask if an integer is prime. You can ask is the sine of the square of a float is more than pi/4. Arbitrary predicates make sense of arbitrary iterables. This is not so of the characters making up strings strings.
On Fri, Apr 30, 2021, 11:08 PM Valentin Berlier berlier.v@gmail.com wrote:
It's kind of weird that people seem to be missing the point about this. Python already has comprehensions for all the iterable builtins except strings. The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. Comprehensions express intent. Joining a generator expression with an empty string doesn't convey the intent that you're building a string where each character is derived from another iterable.
Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CYCM35... Code of Conduct: http://python.org/psf/codeofconduct/
the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters
You seem to be assuming that the comprehension would be purposefully restricted to iterating over strings. The original author already provided examples with predicates that don't involve checking for a subset of characters.
old = [0, 1, None, 2] new = c"str(x + 1) for x in old if isinstance(x, int)"
The existing "".join() idiom isn't restricted to iterating over an existing string. You also have to account for nested comprehensions. There's nothing that would prevent you from having arbitrary complexity in string comprehension predicates, just like nothing prevents you from having arbitrary predicates when you join a generator expression.
Ok... If the suggestion is trying concatenation of arbitrary objects that aren't strings, I go from thinking it's unnecessary to thinking it's a massively horrible idea.
On Fri, Apr 30, 2021, 11:43 PM Valentin Berlier berlier.v@gmail.com wrote:
the ONLY predicate that can be expressed about a single character is it
being a member of a subset of all Unicode characters
You seem to be assuming that the comprehension would be purposefully restricted to iterating over strings. The original author already provided examples with predicates that don't involve checking for a subset of characters.
old = [0, 1, None, 2] new = c"str(x + 1) for x in old if isinstance(x, int)"
The existing "".join() idiom isn't restricted to iterating over an existing string. You also have to account for nested comprehensions. There's nothing that would prevent you from having arbitrary complexity in string comprehension predicates, just like nothing prevents you from having arbitrary predicates when you join a generator expression. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/6T7NQT... Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, May 1, 2021 at 1:43 PM Valentin Berlier berlier.v@gmail.com wrote:
the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters
You seem to be assuming that the comprehension would be purposefully restricted to iterating over strings. The original author already provided examples with predicates that don't involve checking for a subset of characters.
old = [0, 1, None, 2] new = c"str(x + 1) for x in old if isinstance(x, int)"
The existing "".join() idiom isn't restricted to iterating over an existing string. You also have to account for nested comprehensions. There's nothing that would prevent you from having arbitrary complexity in string comprehension predicates, just like nothing prevents you from having arbitrary predicates when you join a generator expression.
Rather than toy examples, how about scouring the Python standard library for some real examples? Find some actual existing code and show how it would be improved by this new construct. Consistency on its own is not a sufficient goal; you have to demonstrate that the change would be of material value.
ChrisA
On 2021-05-01 at 03:05:51 -0000, Valentin Berlier berlier.v@gmail.com wrote:
Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations.
In certain special cases, maybe. In the general case, no. How much optimization can you do on something like the following:
c"f(c) for c in some_string if g(c)"
I'll even let you assume that f and g are pure functions (i.e., no side effects), but you can't assume that f always returns a string of length 1. Even the simpler c"c + c for c in some_string" at some point has to decide whether (a) to collect all the pieces in a temporary container and join them at the end, or (b) to suffer quadratic (or worse) behavior by appending the pieces to an intermediate accumulator as it iterates.
Also, how often do any of the use cases come up in inner loops, where performance is important?
c"f(c) for c in some_string if g(c)"
Even this example would allow the interpreter to skip building the generator object and having to feed the result of every f(c) back into the iterator protocol. This is similar to f-strings vs str.format. You could say that f-strings are redundant because they can't do anything that str.format can't, but they make it possible to shave off the static overhead of going through python's protocols and enable additional optimizations.
On Fri, Apr 30, 2021 at 11:15 PM Valentin Berlier berlier.v@gmail.com wrote:
You could say that f-strings are redundant because they can't do
anything that str.format can't, but they make it possible to shave off the static overhead of going through python's protocols and enable additional optimizations.
But that was not the primary motivator for adding them to the language.
Nor is it the primary motivator for using them. I really like f-strings, and I have never even thought about their performance characteristics.
With regard to the possible performance benefits of “string comprehensions”: Python is already poorly performant when working with strings character by character. Which is one reason we have nifty string methods like .replace() and .translate. (And join).
I’d bet that many (most?) potential “string comprehensions” would perform better if done with string methods, even if they were optimized.
Another note that I don’t think has been said explicitly— yes strings are Sequences, but they are a very special case in that they can contain only one type of thing: length-1 strings. Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods.
[actually, I think this is a similar point as that made by David Mertz)
-CHB
I started seeing this, as the objecting people are putting, something that is really outside of the scope.
But it just did occur to me that having to use str.join _inside_ an f-string expression is somewhat cumbersome
I mean, think of a typical repr for a sequence class:
return f"MyClass({', '.join(str(item) for item in self) } )"
So, maybe, not going for another kind of string, or string comprehensions, but rather for a formatting acceptable by the format-mini-language that could do a "map to str and join" when the item is a generator?
This maybe would: suffice the O.P. request, introduce no fundamental changes in the way we think the language, _and_ be somewhat useful.
The example above could become
return f"MyClass({self:, j}"
The "j" suffix meaning to use ", " as the separator, and map the items to "str" - this, if the option is kept terse as the other indicators in the format mini language, or could maybe be more readable (bikeshed at will) .
(Other than that, I hope it is clear I am with Steven, Chris, Christopher et al. on the objections to the 'string comprehension' proposal as it is)
On Sat, 1 May 2021 at 17:36, Christopher Barker pythonchb@gmail.com wrote:
On Fri, Apr 30, 2021 at 11:15 PM Valentin Berlier berlier.v@gmail.com wrote:
You could say that f-strings are redundant because they can't do
anything that str.format can't, but they make it possible to shave off the static overhead of going through python's protocols and enable additional optimizations.
But that was not the primary motivator for adding them to the language.
Nor is it the primary motivator for using them. I really like f-strings, and I have never even thought about their performance characteristics.
With regard to the possible performance benefits of “string comprehensions”: Python is already poorly performant when working with strings character by character. Which is one reason we have nifty string methods like .replace() and .translate. (And join).
I’d bet that many (most?) potential “string comprehensions” would perform better if done with string methods, even if they were optimized.
Another note that I don’t think has been said explicitly— yes strings are Sequences, but they are a very special case in that they can contain only one type of thing: length-1 strings. Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods.
[actually, I think this is a similar point as that made by David Mertz)
-CHB
-- Christopher Barker, PhD (Chris)
Python Language Consulting
- Teaching
- Scientific Software Development
- Desktop GUI and Web Development
- wxPython, numpy, scipy, Cython
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/Z3J727... Code of Conduct: http://python.org/psf/codeofconduct/
But that was not the primary motivator for adding them to the language.
I don't think the original author thinks that way either about string comprehensions. I was asked about the kind of speed benefits that string comprehensions would have over using a generator with "".join() and I used f-strings as an example because the benefits would be similar.
By the way now that i think about it, comprehensions would fit into f-string interpolation pretty nicely.
f""" Guest list ({len(people)} people): {person.name + '\n' for person in people} """
Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods.
I actually replied to David Mertz about this. String comprehensions can derive substrings from any iterable. Just like the only requirement for using a generator expression in "".join() is that it produces strings. Comprehensions can also have nested loops which can come in handy at times. And of course this doesn't mean I'm going to advocate for using them with complex predicates.
Valentin Berlier writes:
f""" Guest list ({len(people)} people): {person.name + '\n' for person in people} """
That's nice! It's already (almost[1]) legal syntax, but it prints the repr of the generator function. This could work, though:
f""" Guest list ({len(people)} people): {person.name + chr(10) for person in people:5.25i} """
with i for "iterate iterable". (The iterable might need to be parenthesized if it's a generator function.) The width spec is intended to be max_elems.per_elem_width. I guess you could also generalize it to something like
f""" Guest list ({len(people)} people): {person.name, '>25s', chr(10), '' for person in people:i} """
where the 2d element of the tuple is a format spec to apply to each element, the 3d is the separator and the 4th the terminator. Or perhaps those parameters belong in the syntax of the 'i' format code.
I saw your later post that suggests making this default. We could tell programmers to use !s or !r if they want to see things like
<generator object <genexpr> at 0x100fde580>
Probably not, though, at least not if you want all iterables treated this way. Possibly this could be restricted to generators, especially if you use the element format as tuple syntax I proposed above rather than embed the element format spec in the overall generator spec.
Steve
Footnotes: [1] Need to substitute 'chr(10)' for '\n' in an f-string.
I really appreciate all the feedback and all of the thought put into this idea. I wanted to make a couple of comments on some of the responses and provide my current thoughts on the idea.
--- Responses to comments ---
*All* others?
Tuple, frozenset, bytes, bytearray, memoryview, enumerate, range, map, zip,
reversed and filter suggest otherwise.
Yes. You are right. My use of "all" was technically incorrect. But I think it is *very* disingenuous to pretend that these types play anywhere near as central a role in python use as list, dict, and set... especially for newbies. Please try to provide contentful comments instead of "gotchas".
If we were re-doing Python from scratch, there's a good chance that we
would limit ourselves to a single comprehension syntax, namely generators:
list(expression for x in items if cond) set(expression for x in items if cond) dict((key, value) for x in items if cond)
rather than have dedicated syntax for those three cases.
This is a very very helpful point. I will address it at the end.
The proposed syntax doesn't introduce any new concept and would simply
make strings more consistent with the rest of the builtins. The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly.
This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
Quite literally the ONLY predicate that can be expressed about a single
character is it being a member of a subset of all Unicode characters. Yes, you could express that in convoluted ways like it's ord() being in a certain range, but it boils down to subset membership.
I understand if my initial examples led you to think this because I only iterated over a string "old" to construct "new", but consider the following. (I know it is a silly example but I'm just trying to get the point across.)
my_list = ["hotel", "echo", "lima" , "lima", "oscar"] new = c"x[0] for x in my_list" new
'hello'
Rather than toy examples, how about scouring the Python standard library
for some real examples?
Here are 73 of them that I found by grepping through Lib.
- https://github.com/python/cpython/blob/master/Lib/email/_encoded_words.py#L9... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_import.p... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_next.py#... - https://github.com/python/cpython/blob/master/Lib/lib2to3/refactor.py#L235 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L178 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L290 - https://github.com/python/cpython/blob/master/Lib/test/_test_multiprocessing... - https://github.com/python/cpython/blob/master/Lib/test/multibytecodec_suppor... - https://github.com/python/cpython/blob/master/Lib/test/test_audioop.py#L6 - https://github.com/python/cpython/blob/master/Lib/test/test_buffer.py#L853 - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L149 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1544 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1548 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1552 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1556 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1953 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1991 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L1092 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L5346 - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L91 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L92 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L93 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L94 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L360 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L366 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L372 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L378 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L384 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L391 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L397 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L403 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L409 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L415 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L421 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L427 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L433 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L455 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L457 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L459 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L461 - https://github.com/python/cpython/blob/master/Lib/test/test_long.py#L305 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1049 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1087 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L914 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L921 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L927 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L933 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L968 - https://github.com/python/cpython/blob/master/Lib/test/test_re.py#L1013 - https://github.com/python/cpython/blob/master/Lib/test/test_strtod.py#L226 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L192 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L65 - https://github.com/python/cpython/blob/master/Lib/test/test_unicodedata.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py#L1833 - https://github.com/python/cpython/blob/master/Lib/tkinter/__init__.py#L268 - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_asserti... - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_case.py... - https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L907 - https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L...
To me, the chosen syntax is problematic. The idea of introducing
structural logic by using “” seems likely to cause confusion. Across all languages I use, quotes are generally and almost always used to introduce constant values. Sometimes, maybe, there are macro related things that may use quoting, but as a developer, if I see quotes, I’m thinking: the runtime will treat this as a constant.
I think this is an over-simplification of the quotations syntax. Python has several prefix characters that you have to look out for when you see quotes, namely the following: r, u, f, fr, rf, b, br, rb. Not only can these change the construction syntax, but they can even construct an object of a completely different type (bytes).
Second, f-strings do not restrict the normal usage of strings for
freeform text content (apart from making the curly brace characters special).
Not to nit-pick too much, but the following is a valid string but not a valid f-string.
s = f"This is a valid string but invalid f-string {}"
File "<stdin>", line 1 s = f"This is a valid string but invalid f-string {}" ^ SyntaxError: f-string: empty expression not allowed
Your proposal is focusing on strings as iterables and drawing a parallel
with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code.
--- My current thoughts ---
I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that *python strings are not nearly as pure as some of the objections make them out to be*. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ).
I think that a comment left by steve was particularly helpful.
If we were re-doing Python from scratch, there's a good chance that we
would limit ourselves to a single comprehension syntax, namely generators:
list(expression for x in items if cond) set(expression for x in items if cond) dict((key, value) for x in items if cond)
rather than have dedicated syntax for those three cases.
Would readers see any merit in a syntax like the following?
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'fsjGe'
Or would it stray too far from the behavior of the str() constructor in general?
As of now, the behavior is the following.
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'<generator object <genexpr> at 0x7f10fc917660>'
*I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.*
Please continue to send your thoughts! I really appreciate it!
DQAL
On Sun, May 2, 2021 at 3:51 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Valentin Berlier writes:
f""" Guest list ({len(people)} people): {person.name + '\n' for person in people} """
That's nice! It's already (almost[1]) legal syntax, but it prints the repr of the generator function. This could work, though:
f""" Guest list ({len(people)} people): {person.name + chr(10) for person in people:5.25i} """
with i for "iterate iterable". (The iterable might need to be parenthesized if it's a generator function.) The width spec is intended to be max_elems.per_elem_width. I guess you could also generalize it to something like
f""" Guest list ({len(people)} people): {person.name, '>25s', chr(10), '' for person in people:i} """
where the 2d element of the tuple is a format spec to apply to each element, the 3d is the separator and the 4th the terminator. Or perhaps those parameters belong in the syntax of the 'i' format code.
I saw your later post that suggests making this default. We could tell programmers to use !s or !r if they want to see things like
<generator object <genexpr> at 0x100fde580>
Probably not, though, at least not if you want all iterables treated this way. Possibly this could be restricted to generators, especially if you use the element format as tuple syntax I proposed above rather than embed the element format spec in the overall generator spec.
Steve
Footnotes: [1] Need to substitute 'chr(10)' for '\n' in an f-string.
Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/YG4JTK... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, May 3, 2021 at 1:00 PM David Álvarez Lombardi alvarezdqal@gmail.com wrote:
Rather than toy examples, how about scouring the Python standard library for some real examples?
Here are 73 of them that I found by grepping through Lib.
https://github.com/python/cpython/blob/master/Lib/email/_encoded_words.py#L9... https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_import.p... https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_next.py#... https://github.com/python/cpython/blob/master/Lib/lib2to3/refactor.py#L235 https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L178 https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L290 https://github.com/python/cpython/blob/master/Lib/test/_test_multiprocessing... https://github.com/python/cpython/blob/master/Lib/test/multibytecodec_suppor... https://github.com/python/cpython/blob/master/Lib/test/test_audioop.py#L6 https://github.com/python/cpython/blob/master/Lib/test/test_buffer.py#L853 https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L149 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1544 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1548 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1552 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1556 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1953 https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1991 https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L1092 https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L5346 https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L91 https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L92 https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L93 https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L94 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L360 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L366 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L372 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L378 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L384 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L391 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L397 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L403 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L409 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L415 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L421 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L427 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L433 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L455 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L457 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L459 https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L461 https://github.com/python/cpython/blob/master/Lib/test/test_long.py#L305 https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1049 https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1087 https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L914 https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L921 https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L927 https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L933 https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L968 https://github.com/python/cpython/blob/master/Lib/test/test_re.py#L1013 https://github.com/python/cpython/blob/master/Lib/test/test_strtod.py#L226 https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L192 https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L65 https://github.com/python/cpython/blob/master/Lib/test/test_unicodedata.py#L... https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py#L1833 https://github.com/python/cpython/blob/master/Lib/tkinter/__init__.py#L268 https://github.com/python/cpython/blob/master/Lib/unittest/test/test_asserti... https://github.com/python/cpython/blob/master/Lib/unittest/test/test_case.py... https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L907 https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L...
Tests don't really count, so there's a small handful here. I haven't looked at them all. Some of them definitely could be done this way, but the best way to make your point is to show the current code and your proposed alternative, and show how the new syntax improves things. Not just "it could be done this way", but "this way looks massively better".
Second, f-strings do not restrict the normal usage of strings for freeform text content (apart from making the curly brace characters special).
Not to nit-pick too much, but the following is a valid string but not a valid f-string.
s = f"This is a valid string but invalid f-string {}"
File "<stdin>", line 1 s = f"This is a valid string but invalid f-string {}" ^ SyntaxError: f-string: empty expression not allowed
That's exactly because the curly braces are special. Not sure your point here?
Would readers see any merit in a syntax like the following?
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'fsjGe'
Or would it stray too far from the behavior of the str() constructor in general?
The str constructor is also the generic "turn anything into a string" function. If it were not for that, I'd say it's fairly reasonable; but I don't want to see a genexp automatically pump itself and join the results just because someone printed it out. But if you wanted to make a dedicated constructor, eg str.from_substrings(iterable), that would definitely be viable. Would it be useful? Not sure.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
Yes, "".join() takes some learning, but if that's the problem being solved, I'd much rather look into simpler solutions. I'd really like to see str.__rmul__() accept any iterable and join it, as mentioned earlier (or maybe that was in a related thread):
class Str(str):
... __rmul__ = str.join ...
["Hello", "world"] * Str(" ")
'Hello world'
or, if it became part of the core:
["Hello", "world"] * " "
But in terms of embedding a join expression in the middle of an f-string (NOT cases where the join is the entire expression), I do think it'd be nice to have a mutator syntax that iterates over the thing, formatting each element according to the given definition, and outputting them all together - effectively equivalent to joining with an empty string.
ChrisA
For the record I am definitely a -1 on this. The arguments against are overwhelming and the arguments for are pretty weak. However I felt the need to rebut:
Tests don't really count, so there's a small handful here.
Tests 100% count as real use cases. If this is a pattern that would be useful in test case generation then we should be discussing that. I have worked on plenty of projects which were almost exclusively documented through tests. Being able to read and write tests fluently is as important as any other piece of code.
On Sun, May 2, 2021 at 8:41 PM Chris Angelico rosuav@gmail.com wrote:
On Mon, May 3, 2021 at 1:00 PM David Álvarez Lombardi alvarezdqal@gmail.com wrote:
Rather than toy examples, how about scouring the Python standard
library for some real examples?
Here are 73 of them that I found by grepping through Lib.
https://github.com/python/cpython/blob/master/Lib/email/_encoded_words.py#L9...
https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser...
https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser...
https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser...
https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser...
https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser...
https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_import.p...
https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_next.py#...
https://github.com/python/cpython/blob/master/Lib/lib2to3/refactor.py#L235
https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L178
https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L290
https://github.com/python/cpython/blob/master/Lib/test/_test_multiprocessing...
https://github.com/python/cpython/blob/master/Lib/test/multibytecodec_suppor...
https://github.com/python/cpython/blob/master/Lib/test/test_audioop.py#L6
https://github.com/python/cpython/blob/master/Lib/test/test_buffer.py#L853
https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L...
https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L...
https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p...
https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p...
https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p...
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L149
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1544
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1548
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1552
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1556
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1953
https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1991
https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L1092
https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L5346
https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email...
https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email...
https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L91
https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L92
https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L93
https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L94
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L360
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L366
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L372
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L378
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L384
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L391
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L397
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L403
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L409
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L415
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L421
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L427
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L433
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L455
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L457
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L459
https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L461
https://github.com/python/cpython/blob/master/Lib/test/test_long.py#L305
https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1049
https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1087
https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L914
https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L921
https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L927
https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L933
https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L968
https://github.com/python/cpython/blob/master/Lib/test/test_re.py#L1013
https://github.com/python/cpython/blob/master/Lib/test/test_strtod.py#L226
https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L192 https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L65
https://github.com/python/cpython/blob/master/Lib/test/test_unicodedata.py#L...
https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py#L1833
https://github.com/python/cpython/blob/master/Lib/tkinter/__init__.py#L268
https://github.com/python/cpython/blob/master/Lib/unittest/test/test_asserti...
https://github.com/python/cpython/blob/master/Lib/unittest/test/test_case.py...
https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L907
https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L...
Tests don't really count, so there's a small handful here. I haven't looked at them all. Some of them definitely could be done this way, but the best way to make your point is to show the current code and your proposed alternative, and show how the new syntax improves things. Not just "it could be done this way", but "this way looks massively better".
Second, f-strings do not restrict the normal usage of strings for
freeform text content (apart from making the curly brace characters special).
Not to nit-pick too much, but the following is a valid string but not a
valid f-string.
s = f"This is a valid string but invalid f-string {}"
File "<stdin>", line 1 s = f"This is a valid string but invalid f-string {}" ^ SyntaxError: f-string: empty expression not allowed
That's exactly because the curly braces are special. Not sure your point here?
Would readers see any merit in a syntax like the following?
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'fsjGe'
Or would it stray too far from the behavior of the str() constructor in
general?
The str constructor is also the generic "turn anything into a string" function. If it were not for that, I'd say it's fairly reasonable; but I don't want to see a genexp automatically pump itself and join the results just because someone printed it out. But if you wanted to make a dedicated constructor, eg str.from_substrings(iterable), that would definitely be viable. Would it be useful? Not sure.
I don't intend to reinvent strings, I only mean to leverage an already
existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
Yes, "".join() takes some learning, but if that's the problem being solved, I'd much rather look into simpler solutions. I'd really like to see str.__rmul__() accept any iterable and join it, as mentioned earlier (or maybe that was in a related thread):
class Str(str):
... __rmul__ = str.join ...
["Hello", "world"] * Str(" ")
'Hello world'
or, if it became part of the core:
["Hello", "world"] * " "
But in terms of embedding a join expression in the middle of an f-string (NOT cases where the join is the entire expression), I do think it'd be nice to have a mutator syntax that iterates over the thing, formatting each element according to the given definition, and outputting them all together - effectively equivalent to joining with an empty string.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/CWYW5V... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, May 4, 2021 at 5:16 AM Caleb Donovick donovick@cs.stanford.edu wrote:
For the record I am definitely a -1 on this. The arguments against are overwhelming and the arguments for are pretty weak. However I felt the need to rebut:
Tests don't really count, so there's a small handful here.
Tests 100% count as real use cases. If this is a pattern that would be useful in test case generation then we should be discussing that. I have worked on plenty of projects which were almost exclusively documented through tests. Being able to read and write tests fluently is as important as any other piece of code.
That's true, but in many cases, tests are there to test specific functionality. Since no functionality is being removed, anything that's testing str.join() will need to continue testing str.join(). I didn't dig into the specific examples to see which ones were testing str.join and which ones happened to be using str.join to test something else.
ChrisA
On Mon, 3 May 2021 at 04:00, David Álvarez Lombardi alvarezdqal@gmail.com wrote:
This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I think this is something of an exaggeration. It's "notoriously difficult" (;-)) for an expert to appreciate what looks difficult to a newcomer, but I'd argue that while ''.join() is non-obvious at first, it's something you learn once and then remember. If it's really awkward for you, you can write `concat = ''.join` and use that (but I'd recommend against it, as it makes getting used to the idiom *other* people use that much harder).
Here are 73 of them that I found by grepping through Lib.
Thank you. I only spot-checked one or two, but I assume from this list that your argument is simply that *all* occurrences of ''.join(something) can be replaced by c"something". Which suggests a couple of points:
* If it doesn't add anything *more* than an alternative spelling for ''.join, is it worth it? * Is the fact that it's a quoted string construct going to add problematic edge cases? You can't use " inside c"..." without backslash-quoting it. That seems like it could be a problem, although I'll admit I can't come up with an example that doesn't feel contrived at the moment. In particular, is the fact that within c"..." you're writing a comprehension but you're not allowed to use unescaped " symbols, more awkward than using ''.join was originally?
To me, the chosen syntax is problematic. The idea of introducing structural logic by using “” seems likely to cause confusion. Across all languages I use, quotes are generally and almost always used to introduce constant values. Sometimes, maybe, there are macro related things that may use quoting, but as a developer, if I see quotes, I’m thinking: the runtime will treat this as a constant.
I think this is an over-simplification of the quotations syntax. Python has several prefix characters that you have to look out for when you see quotes, namely the following: r, u, f, fr, rf, b, br, rb. Not only can these change the construction syntax, but they can even construct an object of a completely different type (bytes).
On the contrary, I think you're missing the point here. When I, as a programmer, see "..." (with any form of prefix) I think "that's a constant". That's common for all quoting. I'd argue that even f-strings are very careful to avoid disrupting this intuition any more than necessary - yes, {...} within an f-string is executable code, but the non-constant part is delimited and it's conventionally limited to simple expressions. Conversely, it's basically impossible to view your c-strings as "mostly a constant value".
Also, how would c-strings be handled in conjunction with other string forms? Existing string types can be concatenated by putting them adjacent to each other:
a="hello" f"{a}, " r"world"
'hello, world'
How would c-strings work?
As code, I might want to format a generator over multiple lines. How would c-strings work with that?
( c"val.strip().upper() " c"for val in file " c"if val != '' " # Skip empty lines c"and not val.startswith(chr(34))" # And lines commented with " - chr(34) is ", but we can't use " directly without a backslash )
That doesn't feel readable to me. I could use a triple-quoted c-string, but then I have an indentation problem. Also, with triple quoting I couldn't include those comments (or could I??? You haven't said whether comments are valid *within* c-strings. But I assume not - the syntax would be a nightmare otherwise).
Second, f-strings do not restrict the normal usage of strings for freeform text content (apart from making the curly brace characters special).
Not to nit-pick too much, but the following is a valid string but not a valid f-string.
s = f"This is a valid string but invalid f-string {}"
File "<stdin>", line 1 s = f"This is a valid string but invalid f-string {}" ^ SyntaxError: f-string: empty expression not allowed
That comes under the heading of making curly braces special...
Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code.
That's a very explicit "slippery slope" argument - "now that f-strings stopped quotes meaning constant, we can do anything we like" - and like most such arguments, it's a massive over-generalisation. f-strings were debated very carefully, and a lot of effort was put into the question of whether it broke the "literal string" intuition too much. The conclusion was that it didn't, *for that specific case*. But there's no reason to assume that the same arguments apply for other uses (and indeed, the decision was close enough that there's very good reasons to assume those arguments *won't* apply in general).
I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that python strings are not nearly as pure as some of the objections make them out to be. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ).
OK. I'll object in those terms. I think your syntax proposal strays *way* too far. And I don't believe the example you gave is a good use of f-strings - I don't recall the context, but if it was from real code (rather than being a constructed example to make a point) I'd strongly insist that it be rewritten for better readability.
I think that a comment left by steve was particularly helpful.
If we were re-doing Python from scratch, there's a good chance that we would limit ourselves to a single comprehension syntax, namely generators:
list(expression for x in items if cond) set(expression for x in items if cond) dict((key, value) for x in items if cond)
rather than have dedicated syntax for those three cases.
Would readers see any merit in a syntax like the following?
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'fsjGe'
Or would it stray too far from the behavior of the str() constructor in general?
Would you object to using a different function rather than re-using the str constructor? How about "concat"? Or maybe ''.join?
OK, so that was a little facetious, but hopefully you get my point, that you've now reached the point where you're in effect saying that the only thing you really want to change is the name of the ''.join function.
As of now, the behavior is the following.
dirty = "f8sjGe7" clean = str(char for char in dirty if char in string.ascii_letters) clean
'<generator object <genexpr> at 0x7f10fc917660>'
And that's why reusing str isn't going to be an option. It's a backward compatibility issue. It's not so much that anyone is relying on str() returning that particular value, but they *do* rely on it returning a (semi-)readable representation of the passed in value, and not executing it if it's a generator.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
The fact that you're getting responses suggesting you do want to "reinvent strings" implies that your proposed syntax is being understood in a way that wasn't your intent. That in itself is a strong indicator that the syntax isn't nearly as intuitive as you'd hoped (or as a language construct for Python typically needs to be to fit in with Python's "easily readable" style).
Please continue to send your thoughts! I really appreciate it!
I hope the above was of use. Overall, I'm a strong -1 on this proposal, I'm afraid.
Paul
Summary: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be read by the Python parser as a Python expression. But in "this and that" the content need not be a Python expression.
Hi David
I find your suggestion a good one, in that to respond to it properly requires a good understanding of Python. This deepens our understanding of the language. I'm going to follow on from a contribution from Brendan Barnwell.
Please consider the following examples
Similarity. >>> list( x*x for x in range(5) ) [0, 1, 4, 9, 16] >>> [ x*x for x in range(5) ] [0, 1, 4, 9, 16]
Difference. >>> tmp = (x*x for x in range(5)) ; list(tmp) [0, 1, 4, 9, 16] >>> tmp = (x*x for x in range(5)) ; [ tmp ] [<generator object <genexpr> at 0x7fec02319678>]
Difference. >>> list( (x*x for x in range(5)) ) [0, 1, 4, 9, 16] >>> [ (x*x for x in range(5)) ] [<generator object <genexpr> at 0x7fec02319620>]
Now consider , >>> str( x * 2 for x in 'abc' ) '<generator object <genexpr> at 0x7fec02319728>'
This last one genuinely surprised me. I was expecting 'aabbcc'. To understand this, first note the quote marks in the response. Next recall that str returns the string representation of the argument, via type(obj).__str__(obj).
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ). It works because list takes an iterable as its argument (if it has one argument). But str with one argument gives the string representation of an arbitrary object. Here's an example.
>>> list(None) TypeError: 'NoneType' object is not iterable >>> str(None) 'None'
Here's what Brendan wrote: The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python.
In a nutshell: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be a Python expression, whereas in "this and that" the content need not be a Python expression.
I hope this helps.
Jonathan
On Mon, May 3, 2021 at 8:03 PM Jonathan Fine jfine2358@gmail.com wrote:
Difference. >>> tmp = (x*x for x in range(5)) ; list(tmp) [0, 1, 4, 9, 16] >>> tmp = (x*x for x in range(5)) ; [ tmp ] [<generator object <genexpr> at 0x7fec02319678>]
Closer parallel:
tmp = (x*x for x in range(5)) ; [ *tmp ]
[0, 1, 4, 9, 16]
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply replace one with the other, but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp, so the compiled bytecode for list(genexp) has to go to a lot of unnecessary work to permit that flexibility, whereas the list comp can simplify things down. That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound.
But your main point (about str(x) not iterating) is absolutely correct. Perhaps, if Python were being started fresh right now, str(x) would have different behaviour, and the behaviour of "turn anything into a string" would be done by format(), but as it is, str(x) needs to come up with a string representation for x, without iterating over it (which might be impossible - consider an infinite generator).
ChrisA
On Mon, May 03, 2021 at 09:04:51PM +1000, Chris Angelico wrote:
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply replace one with the other,
Only because the `list` name could be shadowed or rebound to something else. Syntactically and functionally, aside from the lazy vs eager difference, a comprehension is a comprehension and there is nothing generator comprehensions can do that list comprehensions can't.
In Python 2 there were scoping differences between the two, but I believe that in Python 3 those have been eliminated.
but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp,
Aside from the lazy nature of generator comprehensions, what else?
so the compiled bytecode for list(genexp) has to go to a lot of unnecessary work to permit that flexibility, whereas the list comp can simplify things down.
I don't think so. The bytecode in 3.9 is remarkably similar.
>>> dis.dis('list(spam for spam in eggs)') 1 0 LOAD_NAME 0 (list) 2 LOAD_CONST 0 (<code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>) 4 LOAD_CONST 1 ('<genexpr>') 6 MAKE_FUNCTION 0 8 LOAD_NAME 1 (eggs) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 RETURN_VALUE
Disassembly of <code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>: 1 0 LOAD_FAST 0 (.0) >> 2 FOR_ITER 10 (to 14) 4 STORE_FAST 1 (spam) 6 LOAD_FAST 1 (spam) 8 YIELD_VALUE 10 POP_TOP 12 JUMP_ABSOLUTE 2 >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE
The bytecode for the list comp `[spam for spam in eggs]` is only three bytecodes shorter, so that doesn't support your comment about "a lot of unnecessary work".
`dis.dis('[spam for spam in eggs]')` can:
- skip the name lookup for list (LOAD_NAME);
- and the CALL_FUNCTION that ends up calling it;
The dissassemblies of the two code objects, "<genexpr>" and "<listcomp>", have slightly different implementations but only differ by one bytecode overall.
As far as runtime efficiency, list comps are a little faster. Iterating over a 1000-item sequence is 33% faster for a list comp, but for a 100000-item sequence that drops to 25% faster. But as soon as you do a significant amount of work inside the comprehension, that work is likely to dominate the other costs.
There's definitely some overhead needed to support starting and stopping a generator, but we can argue that is an implementation detail. A sufficiently clever interpreter could avoid that overhead.
That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound.
That and timing.
On Mon, May 3, 2021 at 10:08 PM Steven D'Aprano steve@pearwood.info wrote:
On Mon, May 03, 2021 at 09:04:51PM +1000, Chris Angelico wrote:
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply replace one with the other,
Only because the `list` name could be shadowed or rebound to something else. Syntactically and functionally, aside from the lazy vs eager difference, a comprehension is a comprehension and there is nothing generator comprehensions can do that list comprehensions can't.
I mention the rebinding, but I'm not ruling out the possibility of other distinctions, perhaps due to order of execution.
but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp,
Aside from the lazy nature of generator comprehensions, what else?
Yielding is bidirectional. You won't see it when you just pass it to the list() constructor, but the genexp can have values sent back into it. That entails some extra machinery that is completely unnecessary for building a list, although, as I mentioned...
so the compiled bytecode for list(genexp) has to go to a lot of unnecessary work to permit that flexibility, whereas the list comp can simplify things down.
... it's mainly just a matter of simplifications.
I don't think so. The bytecode in 3.9 is remarkably similar.
Yes, it looks similar.
>>> dis.dis('list(spam for spam in eggs)') Disassembly of <code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>: 1 0 LOAD_FAST 0 (.0) >> 2 FOR_ITER 10 (to 14) 4 STORE_FAST 1 (spam) 6 LOAD_FAST 1 (spam) 8 YIELD_VALUE 10 POP_TOP
YIELD_VALUE followed by POP_TOP is your clue that it's bidirectional. The comprehension simply appends onto the list immediately. The genexp has to have two completely separate scopes and switch between them; the list comp runs everything in the same inner scope, building up the list.
The bytecode for the list comp `[spam for spam in eggs]` is only three bytecodes shorter, so that doesn't support your comment about "a lot of unnecessary work".
"Three bytecodes shorter" conceals the fact that some bytecodes do a LOT of work. Look into how much work it takes to restart a generator, and compare that to the bytecode "APPEND_LIST".
As far as runtime efficiency, list comps are a little faster. Iterating over a 1000-item sequence is 33% faster for a list comp, but for a 100000-item sequence that drops to 25% faster. But as soon as you do a significant amount of work inside the comprehension, that work is likely to dominate the other costs.
How often are you doing a significant amount of work inside a comprehension?
There's definitely some overhead needed to support starting and stopping a generator, but we can argue that is an implementation detail. A sufficiently clever interpreter could avoid that overhead.
No, it can't - except by rewriting it as a list comp, and I'm not certain that there wouldn't be timing distinctions. A genexp cannot skip the overhead of being a generator.
That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound.
That and timing.
Yes, I don't count that as a behavioural difference. Nor memory usage, within reason.
ChrisA
On Mon, May 3, 2021 at 9:04 AM Paul Moore p.f.moore@gmail.com wrote:
On Mon, 3 May 2021 at 04:00, David Álvarez Lombardi
This is the mindset that I had. I understand there are other ways to do
what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I think this is something of an exaggeration. It's "notoriously difficult" (;-)) for an expert to appreciate what looks difficult to a newcomer, but I'd argue that while ''.join() is non-obvious at first, it's something you learn once and then remember.
Yeah, I don't get this point at all. The `"delim".join(collection)` idiom may not be the first pattern someone thinks of the first time. But you learn it once, maybe repeat it a second time, then it's easy.
In contrast, each time I see the "string comprehension" again, I realize more and more stumbling points that I would continue to have for years. Plus the fact that it just LOOKS UGLY is a drawback.
list(expression for x in items if cond) set(expression for x in items if cond) dict((key, value) for x in items if cond)
I kinda like this. I'm tempted to start writing all of these this way. And if I wanted, I could add `concat(...)` to that parallel structure easily enough.
I hope the above was of use. Overall, I'm a strong -1 on this
proposal, I'm afraid.
I'm more like -100.
On Sun, May 02, 2021 at 10:57:59PM -0400, David Álvarez Lombardi wrote:
I really appreciate all the feedback and all of the thought put into this idea. I wanted to make a couple of comments on some of the responses and provide my current thoughts on the idea.
--- Responses to comments ---
*All* others?
Tuple, frozenset, bytes, bytearray, memoryview, enumerate, range, map, zip, reversed and filter suggest otherwise.
Yes. You are right. My use of "all" was technically incorrect. But I think it is *very* disingenuous to pretend that these types play anywhere near as central a role in python use as list, dict, and set... especially for newbies.
I didn't say anything about those other iterable types playing a central role. Although now that you mention is, I *do* think that bytes, tuple, enumerate, range and zip are pretty central, even for newbies.
Please try to provide contentful comments instead of "gotchas".
One of your central arguments was that str is the only builtin iterable that doesn't have a comprehension form. That argument doesn't stand up to scrutiny.
It doesn't even stand up if we weaken the argument to common, newbie-friendly, builtin containers with dedicated syntax: as well as str, there are bytes and tuple.
No matter how you count them, the comprehension types (dict, set and list) don't exceed 50% of the candidates.
If we have str comprehensions, we'd need at least two prefixes: one for raw strings, one for regular (cooked) strings. If it's worth doing for strings, its worth doing for bytes, which likewise would need two prefixes.
Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument. Especially not for newbies.
Seriously, why do we think that people with no programming experience, who might not even know the difference between print and return or a variable and a constant, are the gold standard in being able to recognise a good language API?
It breaks my brain, and my heart, when people argue that "it's intuitive" trumps "I thought really hard and carefully about this, and this is a better way". This is why we can't have nice things :-(
Anyway, lets go back to string comprehensions.
To me, the argument that string comps could be more efficient is, at best, a weak argument. It isn't that I don't want more efficient Python code, but adding more and more specialised, single-purpose syntactic features for that efficiency is a poor way to do it. It makes the language harder to learn, and more work for implementers.
But, if the efficiency gain is large, I guess it counts as an argument. If only it were a proven optimization, not a hypothetical one.
I'm not really comfortable with having syntax that looks like a quoted string contain executable code, but f-strings broke that trail so at least you have precedence in your favour. (Although I'm not as enamoured with f-strings as many folks.)
Ultimately, I think that the three major arguments in favour are weak:
- "strings are the only (important) iterable missing a comprehension" is just wrong;
- "str.join is unintuitive" depends on whose intuition you are talking about, but even if we agree it is still a weak argument: programming has many unintuitive things that need to be learned;
- and the optimization argument is purely hypothetical, and probably not enough to justify dedicated syntax.
Another weakness is that it can only join the substrings with no separator. I've looked at a sample of my code, and around 60% of the time I'm joining substrings I've given a separator, e.g.
', '.join(...)
so a comprehension wouldn't work.
Ultimately I don't think this is a terrible idea, but so far it hasn't crossed the threshhold of "benefits outweigh the costs".
To be clear, I'm -1 as well -- we just don't need it. but a few thoughts:
On Mon, May 3, 2021 at 6:32 AM Steven D'Aprano steve@pearwood.info wrote:
If we have str comprehensions, we'd need at least two prefixes: one for
raw strings, one for regular (cooked) strings.
would we? I don't think so -- because of the other arguments made here -- a string comprehension would no longer be a string literal at all. After all, "raw strings" are not a different ty[e, they are a different literal for the same type. That is, (duh) r"this\n" creates exactly the same string as "this\n".
IIUC the proposal, a string comprehension would be:
c" expr for something in an_iterable"
which would mean exactly the same as:
"".join(expr for something in an_iterable)
and thus there IS no escaping to ignore (or process) -- the expr could contain a raw string, an_iterable could be a raw string, buit no need for a raw string comprehension. or an f-string comprehension, or ...
If it's worth doing for strings, its worth doing for bytes,
I'm not so sure about that -- bytes are far more special purpose, and there are other nifty types like bytarrays.
Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument.
I think the "".join() idiom is kinda non-intuitive -- heck I've been known, twenty years in, to absentmindedly write:
a_sequence.join(",")
Once in a while.
And my newbie students defiantly get tripped up by this. But they find comprehensions pretty confusing too :-) so I don't think this would "solve" that minor problem.
Also -- I think you made this point: "intuitiveness" is nice, but it's not the primary design goal of a feature. What I do like about str.join() is that is very clearly is a string operation.
To me, the argument that string comps could be more efficient is, at
best, a weak argument.
Also, if this were a bottleneck, "".join(a_gen_expr) could be optimized.
Now that I think about it, a lot of uses of generator expressions could be optimized whenever it is iterated over right away. (Well, maybe, with the ability to rename "list" and such it would be a bit tricky).
Would it be worth it to do that? I doubt it. Those aren't often in tight loops, because they ARE the loop :-)
I'm not really comfortable with having syntax that looks like a quoted
string contain executable code, but f-strings broke that trail so at least you have precedence in your favour. (Although I'm not as enamoured with f-strings as many folks.)
I am :-)
But while f-strings do put executable code inside a string, they are more conceptually similar to regular string literals -- right down to having a raw version :-)
However, if one wants to go with that argument, maybe a different delimter than " -- I think back ticks are available -- and even used to mean (stringify) so not so bad.
But I'm still -1
-CHB
On Mon, May 03, 2021 at 11:49:31AM -0700, Christopher Barker wrote:
To be clear, I'm -1 as well -- we just don't need it. but a few thoughts:
On Mon, May 3, 2021 at 6:32 AM Steven D'Aprano steve@pearwood.info wrote:
If we have str comprehensions, we'd need at least two prefixes: one for
raw strings, one for regular (cooked) strings.
would we? I don't think so
On further thought, I think you're right. String comprehensions aren't literals; they're not even a hybrid "part literal, part code" like f-strings.
So scrub the raw comprehension versions. That just leaves a string version and a bytes version.
[...]
Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument.
I think the "".join() idiom is kinda non-intuitive -- heck I've been known, twenty years in, to absentmindedly write:
a_sequence.join(",")
Once in a while.
There is only one truly intuitive interface, and that is the nipple. Everything else is learned.
Just because we occasionally screw up and get syntax wrong doesn't make it "unintuitive" in any meaningful sense. We've all messed up code from time to time, especially when we're distracted, or tired and emotional.
I've been known to write dicts `{key=value}`, invariably when it is a large dict with dozens of entries. Also `import func from module`, my fingers frequently type string.strip when I meant string.split, and visa versa, and for my most embarrasing mistake I once managed to write a module with no fewer than six classes like this:
def MyClass(object): def __init__(self, obj): ...
before actually running the code and discovering that it didn't do what I wanted.
So I wouldn't read too much into the occasional typo or braino.
And my newbie students defiantly get tripped up by this. But they find comprehensions pretty confusing too :-) so I don't think this would "solve" that minor problem.
Indeed. I found comprehensions confusing too, and that was despite having many years experience with the syntax that inspired it, set builder notation in mathematics. For the longest time I had to literally write out my comprehension using mathematical notation and manually translate it to Python syntax to get anywhere.
On Sat, May 01, 2021 at 03:05:51AM -0000, Valentin Berlier wrote:
It's kind of weird that people seem to be missing the point about this. Python already has comprehensions for all the iterable builtins except strings.
No it doesn't.
I count 15 builtin iterables, only three have comprehensions.
The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. Comprehensions express intent.
Okay. What string comprehension do I write to express my intent to write a string containing words separated by commas?
What string comprehension do I write to express my intent to write a string containing lines separated by newlines?
What string comprehension do I write to express my intent to write a string containing substrings separated by ' - ' (space, hyphen, space)?
`str.join` can express the intent of every single one of those, as well as the intent to write a string containing substrings separated by the empty string.
Joining a generator expression with an empty string doesn't convey the intent that you're building a string where each character is derived from another iterable.
Of course it does. What else could `''.join(expression)` mean, if not to build a string with the substrings derived from expression separated by the empty string?
Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations.
Do you know what's worse than premature optimization? Accepting a new special-case language feature on the basis that, maybe some day, it might possibly enable a premature optimization.
If you're going to claim a micro-optimization benefit, I think you need more than just to hand-wave that "a decent implementation" would allow it. Let's start with the simplest case:
c'substring for substring in expression'
What optimizations are available for that?
I thought I had sent a response to this a few hours ago, but it seems to have been eaten by the email gremlins.
Apologies if this ends up as a duplicate.
On Fri, Apr 30, 2021 at 12:03:15PM -0400, David Álvarez Lombardi wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables.
*All* others?
The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently.
It isn't that str is the exceptional case, it is that dict, list and set are the exceptional cases. In fact, there is a sense that this is a historical accident, that list comprehensions happened to have been invented first. If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions:
list(expression for x in iterable) set(expression for x in iterable) dict((key, value) for x in iterable)
The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently.
enumerate, filter, map, range, reversed and zip don't apply because they're not collections, you wouldn't be able to store the result of the computation anywhere. bytes comprehensions would make sense if string comprehensions are added. This leaves us with bytearray, frozenset and memoryview. How often are these used compared to strings, dicts, and lists?
If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions
I don't know about this, but unless everything besides generator expressions get deprecated the current comprehensions are here to stay and string comprehensions would fit perfectly alongside them (this is my opinion).
On Sat, May 01, 2021 at 06:21:43AM -0000, Valentin Berlier wrote:
The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently.
enumerate, filter, map, range, reversed and zip don't apply because they're not collections,
You didn't say anything about *collections*, you talked about builtin *iterables*.
And range is a collection:
>>> import collections.abc >>> isinstance(range(10), collections.abc.Collection) True
you wouldn't be able to store the result of the computation anywhere.
I don't know what this objection means. The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required.
you talked about builtin *iterables*
My mistake, I reused the terminology used by the original author to make it easier to follow.
The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required.
Of course. Maybe I wasn't clear enough. I don't know why we're bringing up these operators in a discussion about comprehensions. And what would a "range" comprehension even look like? To me the fact that there's no comprehensions for enumerate, filter, map, range, reversed and zip doesn't contribute to making dict, list and set exceptional cases.
As I said we're left with bytearray, frozenset and memoryview. These are much less frequently used and don't even have a literal form so expecting comprehensions for them would be a bit nonsensical. On the other hand strings, bytes, lists, dicts and sets all have literal forms but only lists, dicts and sets have comprehensions. Three out of five doesn't make them exceptional cases so it's only logical to at least consider the idea of adding comprehensions for strings (and bytes) too.
On Fri, 30 Apr 2021 at 17:08, David Álvarez Lombardi alvarezdqal@gmail.com wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables. It could look something like this.
dirty = "f8sjGe7" clean = c"char for char in dirty if char in string.ascii_letters" clean
'fsjGe'
Currently, the best way to do this (in the general case) seems to be the following.
clean = "".join(char for char in dirty if char in string.ascii_letters)
But I think the proposed syntax would be superior for two main reasons.
I’m not against a specialised string generator construct per-se (I’m not for it either :) as it’s not a problem I have experienced, and I’ve been doing a lot of string parsing/formatting at scale recently) but that doesn’t mean your use-cases are invalid.
To me, the chosen syntax is problematic. The idea of introducing structural logic by using “” seems likely to cause confusion. Across all languages I use, quotes are generally and almost always used to introduce constant values. Sometimes, maybe, there are macro related things that may use quoting, but as a developer, if I see quotes, I’m thinking: the runtime will treat this as a constant.
Having a special case where the quotes are a glorified function call just feels very wrong to me. And likely to be confusing.
Steve
- Consistency with the comprehension style for all other iterables
(which seems to be one of the most beloved features of python)
- Confusion surrounding the str.join(iter) syntax is very well
documented https://stackoverflow.com/questions/493819/why-is-it-string-joinlist-instead-of-list-joinstring and I believe it is particularly unintuitive when the string is empty
I also believe the following reasons carry some weight.
- Skips unnecessary type switching from str to iter and back to str
- Much much MUCH more readable/intuitive
Please let me know what you all think. It was mentioned (by @rhettinger) in the PBT issue https://bugs.python.org/issue43900 that this will likely require a PEP which I would happily write if there is a positive response.
--
*David Álvarez Lombardi* Machine Learning Spanish Linguist Amazon | Natural Language Understanding Boston, Massachusetts alvarezdqal https://www.linkedin.com/in/alvarezdqal/ _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/MVQGP4... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Apr 30, 2021 at 9:06 AM David Álvarez Lombardi < alvarezdqal@gmail.com> wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables. It could look something like this.
dirty = "f8sjGe7" clean = c"char for char in dirty if char in string.ascii_letters" clean
'fsjGe'
Currently, the best way to do this (in the general case) seems to be the following.
clean = "".join(char for char in dirty if char in string.ascii_letters)
If a feature like this is useful -- and I'm not sure it is -- there is a much better way to do this IMHO. Add a new format converter to the syntax for replacement fields:
*>>> f"{c for c in dirty if c in string.ascii_letters !j}"* *'fsjGe'*
where *!j* means join. It could optionally take a separator string as in this example:
*>>> f"{chr(65 + i) for i in range(4) !j('-')}"* *'A-B-C-D'*
--- Bruce
Bruce Leban writes:
where *!j* means join. It could optionally take a separator string as in this example:
Converters *could* take arguments but they currently don't: it's a simple switch on a str argument.
We already have one complex minilanguage inside {}, do we really want another?
Maybe if we use regexps .... ;-) But seriously, if you want complex conversions, you can just call a function in there, which gives you arguments if you want them. Or in this context you can wrap the object in a proxy with an appropriate __format__. This can be quite generic, and allows you to put the arguments into the format spec.
Steve