String comprehension

I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables. It could look something like this.
Currently, the best way to do this (in the general case) seems to be the following.
clean = "".join(char for char in dirty if char in string.ascii_letters)
But I think the proposed syntax would be superior for two main reasons. - Consistency with the comprehension style for all other iterables (which seems to be one of the most beloved features of python) - Confusion surrounding the str.join(iter) syntax is very well documented <https://stackoverflow.com/questions/493819/why-is-it-string-joinlist-instead...> and I believe it is particularly unintuitive when the string is empty I also believe the following reasons carry some weight. - Skips unnecessary type switching from str to iter and back to str - Much much MUCH more readable/intuitive Please let me know what you all think. It was mentioned (by @rhettinger) in the PBT issue <https://bugs.python.org/issue43900> that this will likely require a PEP which I would happily write if there is a positive response. -- *David Álvarez Lombardi* Machine Learning Spanish Linguist Amazon | Natural Language Understanding Boston, Massachusetts alvarezdqal <https://www.linkedin.com/in/alvarezdqal/>

Hi David I see where you are coming from. I find it helps to think of sep.join as a special case. Here's a more general join, with sep.join equivalent to genjoin(sep, '', ''). def genjoin(sep, left, right): def fn(items): return left + sep.join(items) + right return fn Here's how it works genjoin('', '', '')('0123') == '0123' genjoin(',', '', '')('0123') == '0,1,2,3' genjoin(',', '[', ']')('0123') == '[0,1,2,3]' All of these examples of genjoin can be thought of as string comprehensions. But they don't fit into your pattern for a string comprehension literal. By the way, one might want something even more general. Sometimes one wants a fn such that fn('') == '[]' fn('0') == '[0,]' fn('01') == '[0,1,]' which is again a string comprehension. I hope this helps. -- Jonathan

On Sat, May 1, 2021 at 2:52 AM Jonathan Fine <jfine2358@gmail.com> wrote:
For those cases where you're merging literal parts and generated parts, it may be of value to use an f-string:
f"[{','.join('0123')}]" '[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals. ChrisA

On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico <rosuav@gmail.com> wrote: For those cases where you're merging literal parts and generated
For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']') It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing). I accept that in some cases Chris's ingenious construction has benefits. -- Jonathan

I appreciate the feedback, but I don't think the proposed ideas address any of my points. 1. *Consistency *(with other comprehensions) 2. *Intuitiveness *(as opposed to str.join(iter) which is widely deemed to be confusing and seemingly-backwards) 3. *Efficiency *(with respect to line count and function calls... though perhaps the cpython implementation could actually avoid the type switching and improve time complexity) 4. *Readability *(due to *much *clearer typing and lack of highly-nested function calls ( f"[{','.join('0123')}]" ) and higher-order functions ( genjoin('', '', '')('0123') ) I would also like readers/commenters to consider the fact that, though I have only provided one use-case, the proposed enhancement would serve as the primary syntax for constructing or filtering a string *when dependent on any other iterable or condition*. I believe this to be an extremely common (almost universal) use-case. Here are just a couple more examples. new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1 To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so. My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes. Thank you. DQAL On Fri, Apr 30, 2021 at 1:10 PM Jonathan Fine <jfine2358@gmail.com> wrote:

On 2021-04-30 at 14:14:50 -0400, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote: [...]
[...]
My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Embedding a[nother] domain specific language in a string also doesn't decrease complexity; look at all the regular expression builders. Unless you're a core developer (or perhaps not even then), I suspect that most library functions started as "personal" functions. Hey, here's something I need for this project ... hey, I just wrote that for the last project ... how many times will I write this before I stick it in general_utilities ... let's see what python-ideas thinks ... Add the following to your personal library and see how many times you use it in the coming weeks or months: def string_from_iterable_of_characters(iterable): return ''.join(iterable) I haven't tested anything, but string_from_iterable_of_characters should take everything inside your c-strings unchanged.

On 30/04/2021 19:14, David Álvarez Lombardi wrote:
You're actually adding an inconsistency: having a comprehension inside string quotes instead of not.
1. *Intuitiveness *(as opposed to str.join(iter)which is widely deemed to be confusing and seemingly-backwards)
Yes I agree your examples read nicely, without the usual boilerplate. Whether this is worth adding to the language is a moot point. Every addition increases the size of the compiler/interpreter, increases the maintenance burden, and adds to the learning curve for newbies (and not-so-newbies). As far as I can see in every case c'SOMETHING' can be replaced by ''.join(SOMETHING) or str.join('', (SOMETHING)) Having many ways to do the same thing is not a plus.
It seems to me it would probably save a function call. That seems like a minor consideration.
This seems to me to be making the same point as "Intuitiveness". Best wishes Rob Cliffe (I can't hack your heading auto-numbering so they've all ended up being numbered 1.)

On Sat, May 1, 2021 at 6:23 AM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
(We can ignore the str.join('', THING) option, as that's just a consequence of the way that instance method lookups work, and shouldn't happen in people's code (although I'm sure it does).) If people want a more intuitive way to join things, how about this?
Or perhaps: ... def __rmul__(self, iter): ... return self.join(str(x) for x in iter) ...
["a", 123, "b"] * Str(" // ") 'a // 123 // b'
If you want an intuitive way to join strings, surely multiplying a collection by a string makes better sense than wrapping it up in a literal-like thing. A string-literal-like-thing already exists for complex constructions - it's the f-string. The c-string doesn't really add anything above that. ChrisA

On 2021-04-30 11:14, David Álvarez Lombardi wrote:
The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python. You can't do this: [wh4t3ver I feel like!!! okay?^@^&] But you can do this: "wh4t3ver I feel like!!! okay?^@^&" This means that the way people think about and visually comprehend strings is quite different from other Python types. You propose to have the string delimiters now contain actual Python code that Python will parse and run, but this isn't what people are used to seeing between quote marks. I think the closest existing thing to your string comprehensions is not any existing comprehension, but rather f-strings, which are the one place where Python does potentially parse and execute code in a string. However, f-strings are different in notable ways. First, the code in f-strings is delimited (by curly braces), so it is visually distinguished from "freeform" text within the string. Second, f-strings do not restrict the normal usage of strings for freeform text content (apart from making the curly brace characters special). So `f"wh4t3ver I feel like!!! okay?^@^&"` is a valid f-string just like it's a valid string. In your proposal (I assume), something like `c"item for item in other_seq and then the string text continues here"` would have to be a syntax error. That is, unlike f-strings (or any other existing kind of string), the string comprehension would "claim" the entire string and you could no longer put normal string content in there. Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data. For the same reason, string comprehensions are likely to be less useful. I would look doubtfully on code that tried to do anything complex in a string comprehension, in the same way that I would look doubtfully on code that used f-strings with huge, complex expressions. It would be more readable to do whatever data preparation you need to do before creating the string and then use a simpler final step to create the string itself. Also, string comprehensions would only facilitate the creation of simple "linear" strings which draw their content sequentially from iterables. I find that in practice, if I want to create a string, programmatically, I'm not doing that. Rather, I'm pulling disparate content from different places and putting it together in a template-like fashion, in the way that f-strings or str.format() facilitate. So I don't think this proposal would have much practical use in string creation. So overall I think your proposed string comprehensions would tend to make Python code less readable in the relatively rare cases where they were useful at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Given that there is very little you can test about a single character, a new construct feels excessive. Basically, the only possible question is "is it in this subset of codepoints?" However, that use is perfectly covered by the str.translate() method already. Regular expressions also cover this well. On Fri, Apr 30, 2021, 12:08 PM David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:

It's kind of weird that people seem to be missing the point about this. Python already has comprehensions for all the iterable builtins except strings. The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. Comprehensions express intent. Joining a generator expression with an empty string doesn't convey the intent that you're building a string where each character is derived from another iterable. Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations.

Strings are VERY different from other iterables. Every item in a string is itself an (iterables) string. In many ways, strings are more like scalars, and very often we treat them as such. You could make an argument that e.g. a NumPy array of homogenous scalars is similar. However, that would be a wrong argument. Quite literally the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters. Yes, you could express that in convoluted ways like it's ord() being in a certain range, but it boils down to subset membership. In contrast, predicates of unlimited complexity can be expressed of numbers. You can ask if an integer is prime. You can ask is the sine of the square of a float is more than pi/4. Arbitrary predicates make sense of arbitrary iterables. This is not so of the characters making up strings strings. On Fri, Apr 30, 2021, 11:08 PM Valentin Berlier <berlier.v@gmail.com> wrote:

the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters
You seem to be assuming that the comprehension would be purposefully restricted to iterating over strings. The original author already provided examples with predicates that don't involve checking for a subset of characters. old = [0, 1, None, 2] new = c"str(x + 1) for x in old if isinstance(x, int)" The existing "".join() idiom isn't restricted to iterating over an existing string. You also have to account for nested comprehensions. There's nothing that would prevent you from having arbitrary complexity in string comprehension predicates, just like nothing prevents you from having arbitrary predicates when you join a generator expression.

On Sat, May 1, 2021 at 1:43 PM Valentin Berlier <berlier.v@gmail.com> wrote:
Rather than toy examples, how about scouring the Python standard library for some real examples? Find some actual existing code and show how it would be improved by this new construct. Consistency on its own is not a sufficient goal; you have to demonstrate that the change would be of material value. ChrisA

On 2021-05-01 at 03:05:51 -0000, Valentin Berlier <berlier.v@gmail.com> wrote:
In certain special cases, maybe. In the general case, no. How much optimization can you do on something like the following: c"f(c) for c in some_string if g(c)" I'll even let you assume that f and g are pure functions (i.e., no side effects), but you can't assume that f always returns a string of length 1. Even the simpler c"c + c for c in some_string" at some point has to decide whether (a) to collect all the pieces in a temporary container and join them at the end, or (b) to suffer quadratic (or worse) behavior by appending the pieces to an intermediate accumulator as it iterates. Also, how often do any of the use cases come up in inner loops, where performance is important?

c"f(c) for c in some_string if g(c)"
Even this example would allow the interpreter to skip building the generator object and having to feed the result of every f(c) back into the iterator protocol. This is similar to f-strings vs str.format. You could say that f-strings are redundant because they can't do anything that str.format can't, but they make it possible to shave off the static overhead of going through python's protocols and enable additional optimizations.

On Fri, Apr 30, 2021 at 11:15 PM Valentin Berlier <berlier.v@gmail.com> wrote:
But that was not the primary motivator for adding them to the language. Nor is it the primary motivator for using them. I really like f-strings, and I have never even thought about their performance characteristics. With regard to the possible performance benefits of “string comprehensions”: Python is already poorly performant when working with strings character by character. Which is one reason we have nifty string methods like .replace() and .translate. (And join). I’d bet that many (most?) potential “string comprehensions” would perform better if done with string methods, even if they were optimized. Another note that I don’t think has been said explicitly— yes strings are Sequences, but they are a very special case in that they can contain only one type of thing: length-1 strings. Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods. [actually, I think this is a similar point as that made by David Mertz) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I started seeing this, as the objecting people are putting, something that is really outside of the scope. But it just did occur to me that having to use str.join _inside_ an f-string expression is somewhat cumbersome I mean, think of a typical repr for a sequence class: return f"MyClass({', '.join(str(item) for item in self) } )" So, maybe, not going for another kind of string, or string comprehensions, but rather for a formatting acceptable by the format-mini-language that could do a "map to str and join" when the item is a generator? This maybe would: suffice the O.P. request, introduce no fundamental changes in the way we think the language, _and_ be somewhat useful. The example above could become return f"MyClass({self:, j}" The "j" suffix meaning to use ", " as the separator, and map the items to "str" - this, if the option is kept terse as the other indicators in the format mini language, or could maybe be more readable (bikeshed at will) . (Other than that, I hope it is clear I am with Steven, Chris, Christopher et al. on the objections to the 'string comprehension' proposal as it is) On Sat, 1 May 2021 at 17:36, Christopher Barker <pythonchb@gmail.com> wrote:

But that was not the primary motivator for adding them to the language.
I don't think the original author thinks that way either about string comprehensions. I was asked about the kind of speed benefits that string comprehensions would have over using a generator with "".join() and I used f-strings as an example because the benefits would be similar. By the way now that i think about it, comprehensions would fit into f-string interpolation pretty nicely. f""" Guest list ({len(people)} people): {person.name + '\n' for person in people} """
Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods.
I actually replied to David Mertz about this. String comprehensions can derive substrings from any iterable. Just like the only requirement for using a generator expression in "".join() is that it produces strings. Comprehensions can also have nested loops which can come in handy at times. And of course this doesn't mean I'm going to advocate for using them with complex predicates.

Valentin Berlier writes:
That's nice! It's already (almost[1]) legal syntax, but it prints the repr of the generator function. This could work, though: f""" Guest list ({len(people)} people): {person.name + chr(10) for person in people:5.25i} """ with i for "iterate iterable". (The iterable might need to be parenthesized if it's a generator function.) The width spec is intended to be max_elems.per_elem_width. I guess you could also generalize it to something like f""" Guest list ({len(people)} people): {person.name, '>25s', chr(10), '' for person in people:i} """ where the 2d element of the tuple is a format spec to apply to each element, the 3d is the separator and the 4th the terminator. Or perhaps those parameters belong in the syntax of the 'i' format code. I saw your later post that suggests making this default. We could tell programmers to use !s or !r if they want to see things like <generator object <genexpr> at 0x100fde580> Probably not, though, at least not if you want all iterables treated this way. Possibly this could be restricted to generators, especially if you use the element format as tuple syntax I proposed above rather than embed the element format spec in the overall generator spec. Steve Footnotes: [1] Need to substitute 'chr(10)' for '\n' in an f-string.

I really appreciate all the feedback and all of the thought put into this idea. I wanted to make a couple of comments on some of the responses and provide my current thoughts on the idea. --- Responses to comments ---
Yes. You are right. My use of "all" was technically incorrect. But I think it is *very* disingenuous to pretend that these types play anywhere near as central a role in python use as list, dict, and set... especially for newbies. Please try to provide contentful comments instead of "gotchas".
The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument
This is a very very helpful point. I will address it at the end. that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I understand if my initial examples led you to think this because I only iterated over a string "old" to construct "new", but consider the following. (I know it is a silly example but I'm just trying to get the point across.)
Rather than toy examples, how about scouring the Python standard library for some real examples?
Here are 73 of them that I found by grepping through Lib. - https://github.com/python/cpython/blob/master/Lib/email/_encoded_words.py#L9... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_import.p... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_next.py#... - https://github.com/python/cpython/blob/master/Lib/lib2to3/refactor.py#L235 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L178 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L290 - https://github.com/python/cpython/blob/master/Lib/test/_test_multiprocessing... - https://github.com/python/cpython/blob/master/Lib/test/multibytecodec_suppor... - https://github.com/python/cpython/blob/master/Lib/test/test_audioop.py#L6 - https://github.com/python/cpython/blob/master/Lib/test/test_buffer.py#L853 - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L149 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1544 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1548 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1552 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1556 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1953 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1991 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L1092 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L5346 - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L91 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L92 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L93 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L94 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L360 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L366 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L372 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L378 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L384 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L391 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L397 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L403 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L409 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L415 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L421 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L427 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L433 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L455 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L457 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L459 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L461 - https://github.com/python/cpython/blob/master/Lib/test/test_long.py#L305 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1049 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1087 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L914 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L921 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L927 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L933 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L968 - https://github.com/python/cpython/blob/master/Lib/test/test_re.py#L1013 - https://github.com/python/cpython/blob/master/Lib/test/test_strtod.py#L226 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L192 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L65 - https://github.com/python/cpython/blob/master/Lib/test/test_unicodedata.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py#L1833 - https://github.com/python/cpython/blob/master/Lib/tkinter/__init__.py#L268 - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_asserti... - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_case.py... - https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L907 - https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L...
I think this is an over-simplification of the quotations syntax. Python has several prefix characters that you have to look out for when you see quotes, namely the following: r, u, f, fr, rf, b, br, rb. Not only can these change the construction syntax, but they can even construct an object of a completely different type (bytes).
Not to nit-pick too much, but the following is a valid string but not a valid f-string.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code. --- My current thoughts --- I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that *python strings are not nearly as pure as some of the objections make them out to be*. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ). I think that a comment left by steve was particularly helpful.
Would readers see any merit in a syntax like the following?
Or would it stray too far from the behavior of the str() constructor in general? As of now, the behavior is the following.
*I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.* Please continue to send your thoughts! I really appreciate it! DQAL On Sun, May 2, 2021 at 3:51 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Mon, May 3, 2021 at 1:00 PM David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
Tests don't really count, so there's a small handful here. I haven't looked at them all. Some of them definitely could be done this way, but the best way to make your point is to show the current code and your proposed alternative, and show how the new syntax improves things. Not just "it could be done this way", but "this way looks massively better".
That's exactly because the curly braces are special. Not sure your point here?
The str constructor is also the generic "turn anything into a string" function. If it were not for that, I'd say it's fairly reasonable; but I don't want to see a genexp automatically pump itself and join the results just because someone printed it out. But if you wanted to make a dedicated constructor, eg str.from_substrings(iterable), that would definitely be viable. Would it be useful? Not sure.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
Yes, "".join() takes some learning, but if that's the problem being solved, I'd much rather look into simpler solutions. I'd really like to see str.__rmul__() accept any iterable and join it, as mentioned earlier (or maybe that was in a related thread):
or, if it became part of the core: ["Hello", "world"] * " " But in terms of embedding a join expression in the middle of an f-string (NOT cases where the join is the entire expression), I do think it'd be nice to have a mutator syntax that iterates over the thing, formatting each element according to the given definition, and outputting them all together - effectively equivalent to joining with an empty string. ChrisA

For the record I am definitely a -1 on this. The arguments against are overwhelming and the arguments for are pretty weak. However I felt the need to rebut:
Tests don't really count, so there's a small handful here.
Tests 100% count as real use cases. If this is a pattern that would be useful in test case generation then we should be discussing that. I have worked on plenty of projects which were almost exclusively documented through tests. Being able to read and write tests fluently is as important as any other piece of code. On Sun, May 2, 2021 at 8:41 PM Chris Angelico <rosuav@gmail.com> wrote:

On Tue, May 4, 2021 at 5:16 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:
That's true, but in many cases, tests are there to test specific functionality. Since no functionality is being removed, anything that's testing str.join() will need to continue testing str.join(). I didn't dig into the specific examples to see which ones were testing str.join and which ones happened to be using str.join to test something else. ChrisA

On Mon, 3 May 2021 at 04:00, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I think this is something of an exaggeration. It's "notoriously difficult" (;-)) for an expert to appreciate what looks difficult to a newcomer, but I'd argue that while ''.join() is non-obvious at first, it's something you learn once and then remember. If it's really awkward for you, you can write `concat = ''.join` and use that (but I'd recommend against it, as it makes getting used to the idiom *other* people use that much harder).
Here are 73 of them that I found by grepping through Lib.
Thank you. I only spot-checked one or two, but I assume from this list that your argument is simply that *all* occurrences of ''.join(something) can be replaced by c"something". Which suggests a couple of points: * If it doesn't add anything *more* than an alternative spelling for ''.join, is it worth it? * Is the fact that it's a quoted string construct going to add problematic edge cases? You can't use " inside c"..." without backslash-quoting it. That seems like it could be a problem, although I'll admit I can't come up with an example that doesn't feel contrived at the moment. In particular, is the fact that within c"..." you're writing a comprehension but you're not allowed to use unescaped " symbols, more awkward than using ''.join was originally?
On the contrary, I think you're missing the point here. When I, as a programmer, see "..." (with any form of prefix) I think "that's a constant". That's common for all quoting. I'd argue that even f-strings are very careful to avoid disrupting this intuition any more than necessary - yes, {...} within an f-string is executable code, but the non-constant part is delimited and it's conventionally limited to simple expressions. Conversely, it's basically impossible to view your c-strings as "mostly a constant value". Also, how would c-strings be handled in conjunction with other string forms? Existing string types can be concatenated by putting them adjacent to each other:
How would c-strings work? As code, I might want to format a generator over multiple lines. How would c-strings work with that? ( c"val.strip().upper() " c"for val in file " c"if val != '' " # Skip empty lines c"and not val.startswith(chr(34))" # And lines commented with " - chr(34) is ", but we can't use " directly without a backslash ) That doesn't feel readable to me. I could use a triple-quoted c-string, but then I have an indentation problem. Also, with triple quoting I couldn't include those comments (or could I??? You haven't said whether comments are valid *within* c-strings. But I assume not - the syntax would be a nightmare otherwise).
That comes under the heading of making curly braces special...
Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code.
That's a very explicit "slippery slope" argument - "now that f-strings stopped quotes meaning constant, we can do anything we like" - and like most such arguments, it's a massive over-generalisation. f-strings were debated very carefully, and a lot of effort was put into the question of whether it broke the "literal string" intuition too much. The conclusion was that it didn't, *for that specific case*. But there's no reason to assume that the same arguments apply for other uses (and indeed, the decision was close enough that there's very good reasons to assume those arguments *won't* apply in general).
I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that python strings are not nearly as pure as some of the objections make them out to be. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ).
OK. I'll object in those terms. I think your syntax proposal strays *way* too far. And I don't believe the example you gave is a good use of f-strings - I don't recall the context, but if it was from real code (rather than being a constructed example to make a point) I'd strongly insist that it be rewritten for better readability.
Would you object to using a different function rather than re-using the str constructor? How about "concat"? Or maybe ''.join? OK, so that was a little facetious, but hopefully you get my point, that you've now reached the point where you're in effect saying that the only thing you really want to change is the name of the ''.join function.
And that's why reusing str isn't going to be an option. It's a backward compatibility issue. It's not so much that anyone is relying on str() returning that particular value, but they *do* rely on it returning a (semi-)readable representation of the passed in value, and not executing it if it's a generator.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
The fact that you're getting responses suggesting you do want to "reinvent strings" implies that your proposed syntax is being understood in a way that wasn't your intent. That in itself is a strong indicator that the syntax isn't nearly as intuitive as you'd hoped (or as a language construct for Python typically needs to be to fit in with Python's "easily readable" style).
Please continue to send your thoughts! I really appreciate it!
I hope the above was of use. Overall, I'm a strong -1 on this proposal, I'm afraid. Paul

Summary: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be read by the Python parser as a Python expression. But in "this and that" the content need not be a Python expression. Hi David I find your suggestion a good one, in that to respond to it properly requires a good understanding of Python. This deepens our understanding of the language. I'm going to follow on from a contribution from Brendan Barnwell. Please consider the following examples Similarity. >>> list( x*x for x in range(5) ) [0, 1, 4, 9, 16] >>> [ x*x for x in range(5) ] [0, 1, 4, 9, 16] Difference. >>> tmp = (x*x for x in range(5)) ; list(tmp) [0, 1, 4, 9, 16] >>> tmp = (x*x for x in range(5)) ; [ tmp ] [<generator object <genexpr> at 0x7fec02319678>] Difference. >>> list( (x*x for x in range(5)) ) [0, 1, 4, 9, 16] >>> [ (x*x for x in range(5)) ] [<generator object <genexpr> at 0x7fec02319620>] Now consider , >>> str( x * 2 for x in 'abc' ) '<generator object <genexpr> at 0x7fec02319728>' This last one genuinely surprised me. I was expecting 'aabbcc'. To understand this, first note the quote marks in the response. Next recall that str returns the string representation of the argument, via type(obj).__str__(obj). My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ). It works because list takes an iterable as its argument (if it has one argument). But str with one argument gives the string representation of an arbitrary object. Here's an example. >>> list(None) TypeError: 'NoneType' object is not iterable >>> str(None) 'None' Here's what Brendan wrote: The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python. In a nutshell: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be a Python expression, whereas in "this and that" the content need not be a Python expression. I hope this helps. Jonathan

On Mon, May 3, 2021 at 8:03 PM Jonathan Fine <jfine2358@gmail.com> wrote:
Closer parallel:
tmp = (x*x for x in range(5)) ; [ *tmp ] [0, 1, 4, 9, 16]
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply replace one with the other, but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp, so the compiled bytecode for list(genexp) has to go to a lot of unnecessary work to permit that flexibility, whereas the list comp can simplify things down. That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound. But your main point (about str(x) not iterating) is absolutely correct. Perhaps, if Python were being started fresh right now, str(x) would have different behaviour, and the behaviour of "turn anything into a string" would be done by format(), but as it is, str(x) needs to come up with a string representation for x, without iterating over it (which might be impossible - consider an infinite generator). ChrisA

On Mon, May 03, 2021 at 09:04:51PM +1000, Chris Angelico wrote:
Only because the `list` name could be shadowed or rebound to something else. Syntactically and functionally, aside from the lazy vs eager difference, a comprehension is a comprehension and there is nothing generator comprehensions can do that list comprehensions can't. In Python 2 there were scoping differences between the two, but I believe that in Python 3 those have been eliminated.
but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp,
Aside from the lazy nature of generator comprehensions, what else?
I don't think so. The bytecode in 3.9 is remarkably similar. >>> dis.dis('list(spam for spam in eggs)') 1 0 LOAD_NAME 0 (list) 2 LOAD_CONST 0 (<code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>) 4 LOAD_CONST 1 ('<genexpr>') 6 MAKE_FUNCTION 0 8 LOAD_NAME 1 (eggs) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 RETURN_VALUE Disassembly of <code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>: 1 0 LOAD_FAST 0 (.0) >> 2 FOR_ITER 10 (to 14) 4 STORE_FAST 1 (spam) 6 LOAD_FAST 1 (spam) 8 YIELD_VALUE 10 POP_TOP 12 JUMP_ABSOLUTE 2 >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE The bytecode for the list comp `[spam for spam in eggs]` is only three bytecodes shorter, so that doesn't support your comment about "a lot of unnecessary work". `dis.dis('[spam for spam in eggs]')` can: - skip the name lookup for list (LOAD_NAME); - and the CALL_FUNCTION that ends up calling it; The dissassemblies of the two code objects, "<genexpr>" and "<listcomp>", have slightly different implementations but only differ by one bytecode overall. As far as runtime efficiency, list comps are a little faster. Iterating over a 1000-item sequence is 33% faster for a list comp, but for a 100000-item sequence that drops to 25% faster. But as soon as you do a significant amount of work inside the comprehension, that work is likely to dominate the other costs. There's definitely some overhead needed to support starting and stopping a generator, but we can argue that is an implementation detail. A sufficiently clever interpreter could avoid that overhead.
That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound.
That and timing. -- Steve

On Mon, May 3, 2021 at 10:08 PM Steven D'Aprano <steve@pearwood.info> wrote:
I mention the rebinding, but I'm not ruling out the possibility of other distinctions, perhaps due to order of execution.
Yielding is bidirectional. You won't see it when you just pass it to the list() constructor, but the genexp can have values sent back into it. That entails some extra machinery that is completely unnecessary for building a list, although, as I mentioned...
... it's mainly just a matter of simplifications.
I don't think so. The bytecode in 3.9 is remarkably similar.
Yes, it looks similar.
YIELD_VALUE followed by POP_TOP is your clue that it's bidirectional. The comprehension simply appends onto the list immediately. The genexp has to have two completely separate scopes and switch between them; the list comp runs everything in the same inner scope, building up the list.
"Three bytecodes shorter" conceals the fact that some bytecodes do a LOT of work. Look into how much work it takes to restart a generator, and compare that to the bytecode "APPEND_LIST".
How often are you doing a significant amount of work inside a comprehension?
No, it can't - except by rewriting it as a list comp, and I'm not certain that there wouldn't be timing distinctions. A genexp cannot skip the overhead of being a generator.
Yes, I don't count that as a behavioural difference. Nor memory usage, within reason. ChrisA

On Mon, May 3, 2021 at 9:04 AM Paul Moore <p.f.moore@gmail.com> wrote:
Yeah, I don't get this point at all. The `"delim".join(collection)` idiom may not be the first pattern someone thinks of the first time. But you learn it once, maybe repeat it a second time, then it's easy. In contrast, each time I see the "string comprehension" again, I realize more and more stumbling points that I would continue to have for years. Plus the fact that it just LOOKS UGLY is a drawback.
I kinda like this. I'm tempted to start writing all of these this way. And if I wanted, I could add `concat(...)` to that parallel structure easily enough. I hope the above was of use. Overall, I'm a strong -1 on this
proposal, I'm afraid.
I'm more like -100. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, May 02, 2021 at 10:57:59PM -0400, David Álvarez Lombardi wrote:
I didn't say anything about those other iterable types playing a central role. Although now that you mention is, I *do* think that bytes, tuple, enumerate, range and zip are pretty central, even for newbies.
Please try to provide contentful comments instead of "gotchas".
One of your central arguments was that str is the only builtin iterable that doesn't have a comprehension form. That argument doesn't stand up to scrutiny. It doesn't even stand up if we weaken the argument to common, newbie-friendly, builtin containers with dedicated syntax: as well as str, there are bytes and tuple. No matter how you count them, the comprehension types (dict, set and list) don't exceed 50% of the candidates. If we have str comprehensions, we'd need at least two prefixes: one for raw strings, one for regular (cooked) strings. If it's worth doing for strings, its worth doing for bytes, which likewise would need two prefixes. Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument. Especially not for newbies. Seriously, why do we think that people with no programming experience, who might not even know the difference between print and return or a variable and a constant, are the gold standard in being able to recognise a good language API? It breaks my brain, and my heart, when people argue that "it's intuitive" trumps "I thought really hard and carefully about this, and this is a better way". This is why we can't have nice things :-( Anyway, lets go back to string comprehensions. To me, the argument that string comps could be more efficient is, at best, a weak argument. It isn't that I don't want more efficient Python code, but adding more and more specialised, single-purpose syntactic features for that efficiency is a poor way to do it. It makes the language harder to learn, and more work for implementers. But, if the efficiency gain is large, I guess it counts as an argument. If only it were a proven optimization, not a hypothetical one. I'm not really comfortable with having syntax that looks like a quoted string contain executable code, but f-strings broke that trail so at least you have precedence in your favour. (Although I'm not as enamoured with f-strings as many folks.) Ultimately, I think that the three major arguments in favour are weak: - "strings are the only (important) iterable missing a comprehension" is just wrong; - "str.join is unintuitive" depends on whose intuition you are talking about, but even if we agree it is still a weak argument: programming has many unintuitive things that need to be learned; - and the optimization argument is purely hypothetical, and probably not enough to justify dedicated syntax. Another weakness is that it can only join the substrings with no separator. I've looked at a sample of my code, and around 60% of the time I'm joining substrings I've given a separator, e.g. ', '.join(...) so a comprehension wouldn't work. Ultimately I don't think this is a terrible idea, but so far it hasn't crossed the threshhold of "benefits outweigh the costs". -- Steve

To be clear, I'm -1 as well -- we just don't need it. but a few thoughts: On Mon, May 3, 2021 at 6:32 AM Steven D'Aprano <steve@pearwood.info> wrote:
If we have str comprehensions, we'd need at least two prefixes: one for
raw strings, one for regular (cooked) strings.
would we? I don't think so -- because of the other arguments made here -- a string comprehension would no longer be a string literal at all. After all, "raw strings" are not a different ty[e, they are a different literal for the same type. That is, (duh) r"this\n" creates exactly the same string as "this\\n". IIUC the proposal, a string comprehension would be: c" expr for something in an_iterable" which would mean exactly the same as: "".join(expr for something in an_iterable) and thus there IS no escaping to ignore (or process) -- the expr could contain a raw string, an_iterable could be a raw string, buit no need for a raw string comprehension. or an f-string comprehension, or ...
If it's worth doing for strings, its worth doing for bytes,
I'm not so sure about that -- bytes are far more special purpose, and there are other nifty types like bytarrays.
Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument.
I think the "".join() idiom is kinda non-intuitive -- heck I've been known, twenty years in, to absentmindedly write: a_sequence.join(",") Once in a while. And my newbie students defiantly get tripped up by this. But they find comprehensions pretty confusing too :-) so I don't think this would "solve" that minor problem. Also -- I think you made this point: "intuitiveness" is nice, but it's not the primary design goal of a feature. What I do like about str.join() is that is very clearly is a string operation.
To me, the argument that string comps could be more efficient is, at
best, a weak argument.
Also, if this were a bottleneck, "".join(a_gen_expr) could be optimized. Now that I think about it, a lot of uses of generator expressions could be optimized whenever it is iterated over right away. (Well, maybe, with the ability to rename "list" and such it would be a bit tricky). Would it be worth it to do that? I doubt it. Those aren't often in tight loops, because they ARE the loop :-)
I'm not really comfortable with having syntax that looks like a quoted
I am :-) But while f-strings do put executable code inside a string, they are more conceptually similar to regular string literals -- right down to having a raw version :-) However, if one wants to go with that argument, maybe a different delimter than " -- I think back ticks are available -- and even used to mean (stringify) so not so bad. But I'm still -1 -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, May 03, 2021 at 11:49:31AM -0700, Christopher Barker wrote:
On further thought, I think you're right. String comprehensions aren't literals; they're not even a hybrid "part literal, part code" like f-strings. So scrub the raw comprehension versions. That just leaves a string version and a bytes version. [...]
There is only one truly intuitive interface, and that is the nipple. Everything else is learned. Just because we occasionally screw up and get syntax wrong doesn't make it "unintuitive" in any meaningful sense. We've all messed up code from time to time, especially when we're distracted, or tired and emotional. I've been known to write dicts `{key=value}`, invariably when it is a large dict with dozens of entries. Also `import func from module`, my fingers frequently type string.strip when I meant string.split, and visa versa, and for my most embarrasing mistake I once managed to write a module with no fewer than six classes like this: def MyClass(object): def __init__(self, obj): ... before actually running the code and discovering that it didn't do what I wanted. So I wouldn't read too much into the occasional typo or braino.
Indeed. I found comprehensions confusing too, and that was despite having many years experience with the syntax that inspired it, set builder notation in mathematics. For the longest time I had to literally write out my comprehension using mathematical notation and manually translate it to Python syntax to get anywhere. -- Steve

On Sat, May 01, 2021 at 03:05:51AM -0000, Valentin Berlier wrote:
No it doesn't. I count 15 builtin iterables, only three have comprehensions.
Okay. What string comprehension do I write to express my intent to write a string containing words separated by commas? What string comprehension do I write to express my intent to write a string containing lines separated by newlines? What string comprehension do I write to express my intent to write a string containing substrings separated by ' - ' (space, hyphen, space)? `str.join` can express the intent of every single one of those, as well as the intent to write a string containing substrings separated by the empty string.
Of course it does. What else could `''.join(expression)` mean, if not to build a string with the substrings derived from expression separated by the empty string?
Do you know what's worse than premature optimization? Accepting a new special-case language feature on the basis that, maybe some day, it might possibly enable a premature optimization. If you're going to claim a micro-optimization benefit, I think you need more than just to hand-wave that "a decent implementation" would allow it. Let's start with the simplest case: c'substring for substring in expression' What optimizations are available for that? -- Steve

I thought I had sent a response to this a few hours ago, but it seems to have been eaten by the email gremlins. Apologies if this ends up as a duplicate. On Fri, Apr 30, 2021 at 12:03:15PM -0400, David Álvarez Lombardi wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables.
*All* others? The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently. It isn't that str is the exceptional case, it is that dict, list and set are the exceptional cases. In fact, there is a sense that this is a historical accident, that list comprehensions happened to have been invented first. If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions: list(expression for x in iterable) set(expression for x in iterable) dict((key, value) for x in iterable) -- Steve

The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently.
enumerate, filter, map, range, reversed and zip don't apply because they're not collections, you wouldn't be able to store the result of the computation anywhere. bytes comprehensions would make sense if string comprehensions are added. This leaves us with bytearray, frozenset and memoryview. How often are these used compared to strings, dicts, and lists?
If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions
I don't know about this, but unless everything besides generator expressions get deprecated the current comprehensions are here to stay and string comprehensions would fit perfectly alongside them (this is my opinion).

On Sat, May 01, 2021 at 06:21:43AM -0000, Valentin Berlier wrote:
You didn't say anything about *collections*, you talked about builtin *iterables*. And range is a collection: >>> import collections.abc >>> isinstance(range(10), collections.abc.Collection) True
you wouldn't be able to store the result of the computation anywhere.
I don't know what this objection means. The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required. -- Steve

you talked about builtin *iterables*
My mistake, I reused the terminology used by the original author to make it easier to follow.
The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required.
Of course. Maybe I wasn't clear enough. I don't know why we're bringing up these operators in a discussion about comprehensions. And what would a "range" comprehension even look like? To me the fact that there's no comprehensions for enumerate, filter, map, range, reversed and zip doesn't contribute to making dict, list and set exceptional cases. As I said we're left with bytearray, frozenset and memoryview. These are much less frequently used and don't even have a literal form so expecting comprehensions for them would be a bit nonsensical. On the other hand strings, bytes, lists, dicts and sets all have literal forms but only lists, dicts and sets have comprehensions. Three out of five doesn't make them exceptional cases so it's only logical to at least consider the idea of adding comprehensions for strings (and bytes) too.

On Fri, 30 Apr 2021 at 17:08, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
I’m not against a specialised string generator construct per-se (I’m not for it either :) as it’s not a problem I have experienced, and I’ve been doing a lot of string parsing/formatting at scale recently) but that doesn’t mean your use-cases are invalid. To me, the chosen syntax is problematic. The idea of introducing structural logic by using “” seems likely to cause confusion. Across all languages I use, quotes are generally and almost always used to introduce constant values. Sometimes, maybe, there are macro related things that may use quoting, but as a developer, if I see quotes, I’m thinking: the runtime will treat this as a constant. Having a special case where the quotes are a glorified function call just feels very wrong to me. And likely to be confusing. Steve

On Fri, Apr 30, 2021 at 9:06 AM David Álvarez Lombardi < alvarezdqal@gmail.com> wrote:
If a feature like this is useful -- and I'm not sure it is -- there is a much better way to do this IMHO. Add a new format converter to the syntax for replacement fields: *>>> f"{c for c in dirty if c in string.ascii_letters !j}"* *'fsjGe'* where *!j* means join. It could optionally take a separator string as in this example: *>>> f"{chr(65 + i) for i in range(4) !j('-')}"* *'A-B-C-D'* --- Bruce

Bruce Leban writes:
where *!j* means join. It could optionally take a separator string as in this example:
Converters *could* take arguments but they currently don't: it's a simple switch on a str argument. We already have one complex minilanguage inside {}, do we really want another? Maybe if we use regexps .... ;-) But seriously, if you want complex conversions, you can just call a function in there, which gives you arguments if you want them. Or in this context you can wrap the object in a proxy with an appropriate __format__. This can be quite generic, and allows you to put the arguments into the format spec. Steve

Hi David I see where you are coming from. I find it helps to think of sep.join as a special case. Here's a more general join, with sep.join equivalent to genjoin(sep, '', ''). def genjoin(sep, left, right): def fn(items): return left + sep.join(items) + right return fn Here's how it works genjoin('', '', '')('0123') == '0123' genjoin(',', '', '')('0123') == '0,1,2,3' genjoin(',', '[', ']')('0123') == '[0,1,2,3]' All of these examples of genjoin can be thought of as string comprehensions. But they don't fit into your pattern for a string comprehension literal. By the way, one might want something even more general. Sometimes one wants a fn such that fn('') == '[]' fn('0') == '[0,]' fn('01') == '[0,1,]' which is again a string comprehension. I hope this helps. -- Jonathan

On Sat, May 1, 2021 at 2:52 AM Jonathan Fine <jfine2358@gmail.com> wrote:
For those cases where you're merging literal parts and generated parts, it may be of value to use an f-string:
f"[{','.join('0123')}]" '[0,1,2,3]'
The part in the braces is evaluated as Python code, and the rest is simple literals. ChrisA

On Fri, Apr 30, 2021 at 6:00 PM Chris Angelico <rosuav@gmail.com> wrote: For those cases where you're merging literal parts and generated
For readability, reuse and testing I think it often helps to have a function (whose name is meaningful). We can get this via as_list_int_literal = gensep(',', '[', ']') It would also be nice to allow as_list_int_literal to have a docstring (which could also be used for testing). I accept that in some cases Chris's ingenious construction has benefits. -- Jonathan

I appreciate the feedback, but I don't think the proposed ideas address any of my points. 1. *Consistency *(with other comprehensions) 2. *Intuitiveness *(as opposed to str.join(iter) which is widely deemed to be confusing and seemingly-backwards) 3. *Efficiency *(with respect to line count and function calls... though perhaps the cpython implementation could actually avoid the type switching and improve time complexity) 4. *Readability *(due to *much *clearer typing and lack of highly-nested function calls ( f"[{','.join('0123')}]" ) and higher-order functions ( genjoin('', '', '')('0123') ) I would also like readers/commenters to consider the fact that, though I have only provided one use-case, the proposed enhancement would serve as the primary syntax for constructing or filtering a string *when dependent on any other iterable or condition*. I believe this to be an extremely common (almost universal) use-case. Here are just a couple more examples. new = c"x.lower() for x in old if x in HARDCODED_LIST" # filter-in chars that appear in earlier-defined HARDCODED_LIST and convert to lower new = c"x for x in old if not x.isprintable()" # filter-in non-printable chars new = c"str(int(x) + 1) for x in old if isinstance(x, int)" # increment all integers by 1 To me, it is hard to see how any argument against this design (for anything other than implementation-difficulty or something along these lines) can be anything but an argument against iter comprehensions in general... but if someone disagrees, please say so. My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes. Thank you. DQAL On Fri, Apr 30, 2021 at 1:10 PM Jonathan Fine <jfine2358@gmail.com> wrote:

On 2021-04-30 at 14:14:50 -0400, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote: [...]
[...]
My goal is to *decrease* complexity, and personal/higher-order/nested procedures do not accomplish this in my eyes.
Embedding a[nother] domain specific language in a string also doesn't decrease complexity; look at all the regular expression builders. Unless you're a core developer (or perhaps not even then), I suspect that most library functions started as "personal" functions. Hey, here's something I need for this project ... hey, I just wrote that for the last project ... how many times will I write this before I stick it in general_utilities ... let's see what python-ideas thinks ... Add the following to your personal library and see how many times you use it in the coming weeks or months: def string_from_iterable_of_characters(iterable): return ''.join(iterable) I haven't tested anything, but string_from_iterable_of_characters should take everything inside your c-strings unchanged.

On 30/04/2021 19:14, David Álvarez Lombardi wrote:
You're actually adding an inconsistency: having a comprehension inside string quotes instead of not.
1. *Intuitiveness *(as opposed to str.join(iter)which is widely deemed to be confusing and seemingly-backwards)
Yes I agree your examples read nicely, without the usual boilerplate. Whether this is worth adding to the language is a moot point. Every addition increases the size of the compiler/interpreter, increases the maintenance burden, and adds to the learning curve for newbies (and not-so-newbies). As far as I can see in every case c'SOMETHING' can be replaced by ''.join(SOMETHING) or str.join('', (SOMETHING)) Having many ways to do the same thing is not a plus.
It seems to me it would probably save a function call. That seems like a minor consideration.
This seems to me to be making the same point as "Intuitiveness". Best wishes Rob Cliffe (I can't hack your heading auto-numbering so they've all ended up being numbered 1.)

On Sat, May 1, 2021 at 6:23 AM Rob Cliffe via Python-ideas <python-ideas@python.org> wrote:
(We can ignore the str.join('', THING) option, as that's just a consequence of the way that instance method lookups work, and shouldn't happen in people's code (although I'm sure it does).) If people want a more intuitive way to join things, how about this?
Or perhaps: ... def __rmul__(self, iter): ... return self.join(str(x) for x in iter) ...
["a", 123, "b"] * Str(" // ") 'a // 123 // b'
If you want an intuitive way to join strings, surely multiplying a collection by a string makes better sense than wrapping it up in a literal-like thing. A string-literal-like-thing already exists for complex constructions - it's the f-string. The c-string doesn't really add anything above that. ChrisA

On 2021-04-30 11:14, David Álvarez Lombardi wrote:
The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python. You can't do this: [wh4t3ver I feel like!!! okay?^@^&] But you can do this: "wh4t3ver I feel like!!! okay?^@^&" This means that the way people think about and visually comprehend strings is quite different from other Python types. You propose to have the string delimiters now contain actual Python code that Python will parse and run, but this isn't what people are used to seeing between quote marks. I think the closest existing thing to your string comprehensions is not any existing comprehension, but rather f-strings, which are the one place where Python does potentially parse and execute code in a string. However, f-strings are different in notable ways. First, the code in f-strings is delimited (by curly braces), so it is visually distinguished from "freeform" text within the string. Second, f-strings do not restrict the normal usage of strings for freeform text content (apart from making the curly brace characters special). So `f"wh4t3ver I feel like!!! okay?^@^&"` is a valid f-string just like it's a valid string. In your proposal (I assume), something like `c"item for item in other_seq and then the string text continues here"` would have to be a syntax error. That is, unlike f-strings (or any other existing kind of string), the string comprehension would "claim" the entire string and you could no longer put normal string content in there. Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data. For the same reason, string comprehensions are likely to be less useful. I would look doubtfully on code that tried to do anything complex in a string comprehension, in the same way that I would look doubtfully on code that used f-strings with huge, complex expressions. It would be more readable to do whatever data preparation you need to do before creating the string and then use a simpler final step to create the string itself. Also, string comprehensions would only facilitate the creation of simple "linear" strings which draw their content sequentially from iterables. I find that in practice, if I want to create a string, programmatically, I'm not doing that. Rather, I'm pulling disparate content from different places and putting it together in a template-like fashion, in the way that f-strings or str.format() facilitate. So I don't think this proposal would have much practical use in string creation. So overall I think your proposed string comprehensions would tend to make Python code less readable in the relatively rare cases where they were useful at all. -- Brendan Barnwell "Do not follow where the path may lead. Go, instead, where there is no path, and leave a trail." --author unknown

Given that there is very little you can test about a single character, a new construct feels excessive. Basically, the only possible question is "is it in this subset of codepoints?" However, that use is perfectly covered by the str.translate() method already. Regular expressions also cover this well. On Fri, Apr 30, 2021, 12:08 PM David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:

It's kind of weird that people seem to be missing the point about this. Python already has comprehensions for all the iterable builtins except strings. The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. Comprehensions express intent. Joining a generator expression with an empty string doesn't convey the intent that you're building a string where each character is derived from another iterable. Also I haven't seen anyone acknowledge the potential performance benefits of string comprehensions. The "".join() idiom needs to go through the entire generator machinery to assemble the final string, whereas a decent implementation of string comprehensions would enable some pretty significant optimizations.

Strings are VERY different from other iterables. Every item in a string is itself an (iterables) string. In many ways, strings are more like scalars, and very often we treat them as such. You could make an argument that e.g. a NumPy array of homogenous scalars is similar. However, that would be a wrong argument. Quite literally the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters. Yes, you could express that in convoluted ways like it's ord() being in a certain range, but it boils down to subset membership. In contrast, predicates of unlimited complexity can be expressed of numbers. You can ask if an integer is prime. You can ask is the sine of the square of a float is more than pi/4. Arbitrary predicates make sense of arbitrary iterables. This is not so of the characters making up strings strings. On Fri, Apr 30, 2021, 11:08 PM Valentin Berlier <berlier.v@gmail.com> wrote:

the ONLY predicate that can be expressed about a single character is it being a member of a subset of all Unicode characters
You seem to be assuming that the comprehension would be purposefully restricted to iterating over strings. The original author already provided examples with predicates that don't involve checking for a subset of characters. old = [0, 1, None, 2] new = c"str(x + 1) for x in old if isinstance(x, int)" The existing "".join() idiom isn't restricted to iterating over an existing string. You also have to account for nested comprehensions. There's nothing that would prevent you from having arbitrary complexity in string comprehension predicates, just like nothing prevents you from having arbitrary predicates when you join a generator expression.

On Sat, May 1, 2021 at 1:43 PM Valentin Berlier <berlier.v@gmail.com> wrote:
Rather than toy examples, how about scouring the Python standard library for some real examples? Find some actual existing code and show how it would be improved by this new construct. Consistency on its own is not a sufficient goal; you have to demonstrate that the change would be of material value. ChrisA

On 2021-05-01 at 03:05:51 -0000, Valentin Berlier <berlier.v@gmail.com> wrote:
In certain special cases, maybe. In the general case, no. How much optimization can you do on something like the following: c"f(c) for c in some_string if g(c)" I'll even let you assume that f and g are pure functions (i.e., no side effects), but you can't assume that f always returns a string of length 1. Even the simpler c"c + c for c in some_string" at some point has to decide whether (a) to collect all the pieces in a temporary container and join them at the end, or (b) to suffer quadratic (or worse) behavior by appending the pieces to an intermediate accumulator as it iterates. Also, how often do any of the use cases come up in inner loops, where performance is important?

c"f(c) for c in some_string if g(c)"
Even this example would allow the interpreter to skip building the generator object and having to feed the result of every f(c) back into the iterator protocol. This is similar to f-strings vs str.format. You could say that f-strings are redundant because they can't do anything that str.format can't, but they make it possible to shave off the static overhead of going through python's protocols and enable additional optimizations.

On Fri, Apr 30, 2021 at 11:15 PM Valentin Berlier <berlier.v@gmail.com> wrote:
But that was not the primary motivator for adding them to the language. Nor is it the primary motivator for using them. I really like f-strings, and I have never even thought about their performance characteristics. With regard to the possible performance benefits of “string comprehensions”: Python is already poorly performant when working with strings character by character. Which is one reason we have nifty string methods like .replace() and .translate. (And join). I’d bet that many (most?) potential “string comprehensions” would perform better if done with string methods, even if they were optimized. Another note that I don’t think has been said explicitly— yes strings are Sequences, but they are a very special case in that they can contain only one type of thing: length-1 strings. Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods. [actually, I think this is a similar point as that made by David Mertz) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

I started seeing this, as the objecting people are putting, something that is really outside of the scope. But it just did occur to me that having to use str.join _inside_ an f-string expression is somewhat cumbersome I mean, think of a typical repr for a sequence class: return f"MyClass({', '.join(str(item) for item in self) } )" So, maybe, not going for another kind of string, or string comprehensions, but rather for a formatting acceptable by the format-mini-language that could do a "map to str and join" when the item is a generator? This maybe would: suffice the O.P. request, introduce no fundamental changes in the way we think the language, _and_ be somewhat useful. The example above could become return f"MyClass({self:, j}" The "j" suffix meaning to use ", " as the separator, and map the items to "str" - this, if the option is kept terse as the other indicators in the format mini language, or could maybe be more readable (bikeshed at will) . (Other than that, I hope it is clear I am with Steven, Chris, Christopher et al. on the objections to the 'string comprehension' proposal as it is) On Sat, 1 May 2021 at 17:36, Christopher Barker <pythonchb@gmail.com> wrote:

But that was not the primary motivator for adding them to the language.
I don't think the original author thinks that way either about string comprehensions. I was asked about the kind of speed benefits that string comprehensions would have over using a generator with "".join() and I used f-strings as an example because the benefits would be similar. By the way now that i think about it, comprehensions would fit into f-string interpolation pretty nicely. f""" Guest list ({len(people)} people): {person.name + '\n' for person in people} """
Which massively reduces the possible kinds of comprehensions one might write, and I suspect most of those are already covered by string methods.
I actually replied to David Mertz about this. String comprehensions can derive substrings from any iterable. Just like the only requirement for using a generator expression in "".join() is that it produces strings. Comprehensions can also have nested loops which can come in handy at times. And of course this doesn't mean I'm going to advocate for using them with complex predicates.

Valentin Berlier writes:
That's nice! It's already (almost[1]) legal syntax, but it prints the repr of the generator function. This could work, though: f""" Guest list ({len(people)} people): {person.name + chr(10) for person in people:5.25i} """ with i for "iterate iterable". (The iterable might need to be parenthesized if it's a generator function.) The width spec is intended to be max_elems.per_elem_width. I guess you could also generalize it to something like f""" Guest list ({len(people)} people): {person.name, '>25s', chr(10), '' for person in people:i} """ where the 2d element of the tuple is a format spec to apply to each element, the 3d is the separator and the 4th the terminator. Or perhaps those parameters belong in the syntax of the 'i' format code. I saw your later post that suggests making this default. We could tell programmers to use !s or !r if they want to see things like <generator object <genexpr> at 0x100fde580> Probably not, though, at least not if you want all iterables treated this way. Possibly this could be restricted to generators, especially if you use the element format as tuple syntax I proposed above rather than embed the element format spec in the overall generator spec. Steve Footnotes: [1] Need to substitute 'chr(10)' for '\n' in an f-string.

I really appreciate all the feedback and all of the thought put into this idea. I wanted to make a couple of comments on some of the responses and provide my current thoughts on the idea. --- Responses to comments ---
Yes. You are right. My use of "all" was technically incorrect. But I think it is *very* disingenuous to pretend that these types play anywhere near as central a role in python use as list, dict, and set... especially for newbies. Please try to provide contentful comments instead of "gotchas".
The proposed syntax doesn't introduce any new concept and would simply make strings more consistent with the rest of the builtins. The argument
This is a very very helpful point. I will address it at the end. that we can already do this with the "".join() idiom is backwards. It's something we have to do _because_ there's no way to write a string comprehensions directly. This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I understand if my initial examples led you to think this because I only iterated over a string "old" to construct "new", but consider the following. (I know it is a silly example but I'm just trying to get the point across.)
Rather than toy examples, how about scouring the Python standard library for some real examples?
Here are 73 of them that I found by grepping through Lib. - https://github.com/python/cpython/blob/master/Lib/email/_encoded_words.py#L9... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/email/_header_value_parser... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_import.p... - https://github.com/python/cpython/blob/master/Lib/lib2to3/fixes/fix_next.py#... - https://github.com/python/cpython/blob/master/Lib/lib2to3/refactor.py#L235 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L178 - https://github.com/python/cpython/blob/master/Lib/msilib/__init__.py#L290 - https://github.com/python/cpython/blob/master/Lib/test/_test_multiprocessing... - https://github.com/python/cpython/blob/master/Lib/test/multibytecodec_suppor... - https://github.com/python/cpython/blob/master/Lib/test/test_audioop.py#L6 - https://github.com/python/cpython/blob/master/Lib/test/test_buffer.py#L853 - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_code_module.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codeccallbacks.p... - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L149 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1544 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1548 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1552 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1556 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1953 - https://github.com/python/cpython/blob/master/Lib/test/test_codecs.py#L1991 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L1092 - https://github.com/python/cpython/blob/master/Lib/test/test_decimal.py#L5346 - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_email/test_email... - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L91 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L92 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L93 - https://github.com/python/cpython/blob/master/Lib/test/test_fileinput.py#L94 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L360 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L366 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L372 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L378 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L384 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L391 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L397 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L403 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L409 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L415 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L421 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L427 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L433 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L455 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L457 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L459 - https://github.com/python/cpython/blob/master/Lib/test/test_gettext.py#L461 - https://github.com/python/cpython/blob/master/Lib/test/test_long.py#L305 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1049 - https://github.com/python/cpython/blob/master/Lib/test/test_lzma.py#L1087 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L914 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L921 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L927 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L933 - https://github.com/python/cpython/blob/master/Lib/test/test_random.py#L968 - https://github.com/python/cpython/blob/master/Lib/test/test_re.py#L1013 - https://github.com/python/cpython/blob/master/Lib/test/test_strtod.py#L226 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L192 - https://github.com/python/cpython/blob/master/Lib/test/test_ucn.py#L65 - https://github.com/python/cpython/blob/master/Lib/test/test_unicodedata.py#L... - https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py#L1833 - https://github.com/python/cpython/blob/master/Lib/tkinter/__init__.py#L268 - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_asserti... - https://github.com/python/cpython/blob/master/Lib/unittest/test/test_case.py... - https://github.com/python/cpython/blob/master/Lib/urllib/parse.py#L907 - https://github.com/python/cpython/blob/master/Lib/xml/etree/ElementTree.py#L...
I think this is an over-simplification of the quotations syntax. Python has several prefix characters that you have to look out for when you see quotes, namely the following: r, u, f, fr, rf, b, br, rb. Not only can these change the construction syntax, but they can even construct an object of a completely different type (bytes).
Not to nit-pick too much, but the following is a valid string but not a valid f-string.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code. --- My current thoughts --- I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that *python strings are not nearly as pure as some of the objections make them out to be*. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ). I think that a comment left by steve was particularly helpful.
Would readers see any merit in a syntax like the following?
Or would it stray too far from the behavior of the str() constructor in general? As of now, the behavior is the following.
*I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.* Please continue to send your thoughts! I really appreciate it! DQAL On Sun, May 2, 2021 at 3:51 AM Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:

On Mon, May 3, 2021 at 1:00 PM David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
Tests don't really count, so there's a small handful here. I haven't looked at them all. Some of them definitely could be done this way, but the best way to make your point is to show the current code and your proposed alternative, and show how the new syntax improves things. Not just "it could be done this way", but "this way looks massively better".
That's exactly because the curly braces are special. Not sure your point here?
The str constructor is also the generic "turn anything into a string" function. If it were not for that, I'd say it's fairly reasonable; but I don't want to see a genexp automatically pump itself and join the results just because someone printed it out. But if you wanted to make a dedicated constructor, eg str.from_substrings(iterable), that would definitely be viable. Would it be useful? Not sure.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
Yes, "".join() takes some learning, but if that's the problem being solved, I'd much rather look into simpler solutions. I'd really like to see str.__rmul__() accept any iterable and join it, as mentioned earlier (or maybe that was in a related thread):
or, if it became part of the core: ["Hello", "world"] * " " But in terms of embedding a join expression in the middle of an f-string (NOT cases where the join is the entire expression), I do think it'd be nice to have a mutator syntax that iterates over the thing, formatting each element according to the given definition, and outputting them all together - effectively equivalent to joining with an empty string. ChrisA

For the record I am definitely a -1 on this. The arguments against are overwhelming and the arguments for are pretty weak. However I felt the need to rebut:
Tests don't really count, so there's a small handful here.
Tests 100% count as real use cases. If this is a pattern that would be useful in test case generation then we should be discussing that. I have worked on plenty of projects which were almost exclusively documented through tests. Being able to read and write tests fluently is as important as any other piece of code. On Sun, May 2, 2021 at 8:41 PM Chris Angelico <rosuav@gmail.com> wrote:

On Tue, May 4, 2021 at 5:16 AM Caleb Donovick <donovick@cs.stanford.edu> wrote:
That's true, but in many cases, tests are there to test specific functionality. Since no functionality is being removed, anything that's testing str.join() will need to continue testing str.join(). I didn't dig into the specific examples to see which ones were testing str.join and which ones happened to be using str.join to test something else. ChrisA

On Mon, 3 May 2021 at 04:00, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
This is the mindset that I had. I understand there are other ways to do what I am asking. (I provided one in my initial post.) I am saying it relies on what I believe to be a notoriously unintuitive method (str.join) and an even more unintuitive way of calling it ("".join).
I think this is something of an exaggeration. It's "notoriously difficult" (;-)) for an expert to appreciate what looks difficult to a newcomer, but I'd argue that while ''.join() is non-obvious at first, it's something you learn once and then remember. If it's really awkward for you, you can write `concat = ''.join` and use that (but I'd recommend against it, as it makes getting used to the idiom *other* people use that much harder).
Here are 73 of them that I found by grepping through Lib.
Thank you. I only spot-checked one or two, but I assume from this list that your argument is simply that *all* occurrences of ''.join(something) can be replaced by c"something". Which suggests a couple of points: * If it doesn't add anything *more* than an alternative spelling for ''.join, is it worth it? * Is the fact that it's a quoted string construct going to add problematic edge cases? You can't use " inside c"..." without backslash-quoting it. That seems like it could be a problem, although I'll admit I can't come up with an example that doesn't feel contrived at the moment. In particular, is the fact that within c"..." you're writing a comprehension but you're not allowed to use unescaped " symbols, more awkward than using ''.join was originally?
On the contrary, I think you're missing the point here. When I, as a programmer, see "..." (with any form of prefix) I think "that's a constant". That's common for all quoting. I'd argue that even f-strings are very careful to avoid disrupting this intuition any more than necessary - yes, {...} within an f-string is executable code, but the non-constant part is delimited and it's conventionally limited to simple expressions. Conversely, it's basically impossible to view your c-strings as "mostly a constant value". Also, how would c-strings be handled in conjunction with other string forms? Existing string types can be concatenated by putting them adjacent to each other:
How would c-strings work? As code, I might want to format a generator over multiple lines. How would c-strings work with that? ( c"val.strip().upper() " c"for val in file " c"if val != '' " # Skip empty lines c"and not val.startswith(chr(34))" # And lines commented with " - chr(34) is ", but we can't use " directly without a backslash ) That doesn't feel readable to me. I could use a triple-quoted c-string, but then I have an indentation problem. Also, with triple quoting I couldn't include those comments (or could I??? You haven't said whether comments are valid *within* c-strings. But I assume not - the syntax would be a nightmare otherwise).
That comes under the heading of making curly braces special...
Your proposal is focusing on strings as iterables and drawing a parallel with other kinds of iterables for which we have comprehensions. But strings aren't like other iterables because they're primarily vessels for freeform text content, not structured data.
I view this as the strongest opposition to the idea in the whole thread, but I think that seal was broken with f-strings and the {}-syntax. The proposed syntax is different from those features only in *degree* (of deviation from strict char-arrays) not in *type*. But I also recognize that the delimiters {} go a long way in helping to mentally compartmentalize chars from python code.
That's a very explicit "slippery slope" argument - "now that f-strings stopped quotes meaning constant, we can do anything we like" - and like most such arguments, it's a massive over-generalisation. f-strings were debated very carefully, and a lot of effort was put into the question of whether it broke the "literal string" intuition too much. The conclusion was that it didn't, *for that specific case*. But there's no reason to assume that the same arguments apply for other uses (and indeed, the decision was close enough that there's very good reasons to assume those arguments *won't* apply in general).
I definitely see the drawbacks of the originally-proposed syntax, but I think it would be beneficial to the conversation for commenters to recognize that python strings are not nearly as pure as some of the objections make them out to be. I would be happy to hear the objection that my syntax strays *too* far, but many of the passed-around examples attest to the fact that when users see quotes, they are often already in "code-evaluation" mode (eg. f"[{','.join('0123')}]" ).
OK. I'll object in those terms. I think your syntax proposal strays *way* too far. And I don't believe the example you gave is a good use of f-strings - I don't recall the context, but if it was from real code (rather than being a constructed example to make a point) I'd strongly insist that it be rewritten for better readability.
Would you object to using a different function rather than re-using the str constructor? How about "concat"? Or maybe ''.join? OK, so that was a little facetious, but hopefully you get my point, that you've now reached the point where you're in effect saying that the only thing you really want to change is the name of the ''.join function.
And that's why reusing str isn't going to be an option. It's a backward compatibility issue. It's not so much that anyone is relying on str() returning that particular value, but they *do* rely on it returning a (semi-)readable representation of the passed in value, and not executing it if it's a generator.
I don't intend to reinvent strings, I only mean to leverage an already existing means of signifying modified string construction syntax (prefixes) to align str construction syntax with the comprehensions available for the other most common builtin iterables, avoid the notoriously unintuitive "".join syntax, and improve readability.
The fact that you're getting responses suggesting you do want to "reinvent strings" implies that your proposed syntax is being understood in a way that wasn't your intent. That in itself is a strong indicator that the syntax isn't nearly as intuitive as you'd hoped (or as a language construct for Python typically needs to be to fit in with Python's "easily readable" style).
Please continue to send your thoughts! I really appreciate it!
I hope the above was of use. Overall, I'm a strong -1 on this proposal, I'm afraid. Paul

Summary: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be read by the Python parser as a Python expression. But in "this and that" the content need not be a Python expression. Hi David I find your suggestion a good one, in that to respond to it properly requires a good understanding of Python. This deepens our understanding of the language. I'm going to follow on from a contribution from Brendan Barnwell. Please consider the following examples Similarity. >>> list( x*x for x in range(5) ) [0, 1, 4, 9, 16] >>> [ x*x for x in range(5) ] [0, 1, 4, 9, 16] Difference. >>> tmp = (x*x for x in range(5)) ; list(tmp) [0, 1, 4, 9, 16] >>> tmp = (x*x for x in range(5)) ; [ tmp ] [<generator object <genexpr> at 0x7fec02319678>] Difference. >>> list( (x*x for x in range(5)) ) [0, 1, 4, 9, 16] >>> [ (x*x for x in range(5)) ] [<generator object <genexpr> at 0x7fec02319620>] Now consider , >>> str( x * 2 for x in 'abc' ) '<generator object <genexpr> at 0x7fec02319728>' This last one genuinely surprised me. I was expecting 'aabbcc'. To understand this, first note the quote marks in the response. Next recall that str returns the string representation of the argument, via type(obj).__str__(obj). My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ). It works because list takes an iterable as its argument (if it has one argument). But str with one argument gives the string representation of an arbitrary object. Here's an example. >>> list(None) TypeError: 'NoneType' object is not iterable >>> str(None) 'None' Here's what Brendan wrote: The difference between your proposal and existing comprehensions is that strings are very different from lists, dicts, sets, and generators (which are the things we currently have comprehensions for). The syntax for those objects is Python syntax, which is strict and can include expressions that have meaning that is interpreted by Python. But strings can contain *anything*, and in general (apart from f-strings) their content is not parsed by Python. In a nutshell: The argument in list(arg) must be iterable. The argument in str(arg) can be anything. Further, in [ a, b, c, d ] the content of the literal must be a Python expression, whereas in "this and that" the content need not be a Python expression. I hope this helps. Jonathan

On Mon, May 3, 2021 at 8:03 PM Jonathan Fine <jfine2358@gmail.com> wrote:
Closer parallel:
tmp = (x*x for x in range(5)) ; [ *tmp ] [0, 1, 4, 9, 16]
My understanding of the situation is that the list comprehension [ x*x for x in range(5) ] is a shorthand for list( x*x for x in range(5) ).
Sorta-kinda. It's not a shorthand in the sense that you can't simply replace one with the other, but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp, so the compiled bytecode for list(genexp) has to go to a lot of unnecessary work to permit that flexibility, whereas the list comp can simplify things down. That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound. But your main point (about str(x) not iterating) is absolutely correct. Perhaps, if Python were being started fresh right now, str(x) would have different behaviour, and the behaviour of "turn anything into a string" would be done by format(), but as it is, str(x) needs to come up with a string representation for x, without iterating over it (which might be impossible - consider an infinite generator). ChrisA

On Mon, May 03, 2021 at 09:04:51PM +1000, Chris Angelico wrote:
Only because the `list` name could be shadowed or rebound to something else. Syntactically and functionally, aside from the lazy vs eager difference, a comprehension is a comprehension and there is nothing generator comprehensions can do that list comprehensions can't. In Python 2 there were scoping differences between the two, but I believe that in Python 3 those have been eliminated.
but they do have very similar behaviour, yes. A genexp is far more flexible than a list comp,
Aside from the lazy nature of generator comprehensions, what else?
I don't think so. The bytecode in 3.9 is remarkably similar. >>> dis.dis('list(spam for spam in eggs)') 1 0 LOAD_NAME 0 (list) 2 LOAD_CONST 0 (<code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>) 4 LOAD_CONST 1 ('<genexpr>') 6 MAKE_FUNCTION 0 8 LOAD_NAME 1 (eggs) 10 GET_ITER 12 CALL_FUNCTION 1 14 CALL_FUNCTION 1 16 RETURN_VALUE Disassembly of <code object <genexpr> at 0x7fc185ce0870, file "<dis>", line 1>: 1 0 LOAD_FAST 0 (.0) >> 2 FOR_ITER 10 (to 14) 4 STORE_FAST 1 (spam) 6 LOAD_FAST 1 (spam) 8 YIELD_VALUE 10 POP_TOP 12 JUMP_ABSOLUTE 2 >> 14 LOAD_CONST 0 (None) 16 RETURN_VALUE The bytecode for the list comp `[spam for spam in eggs]` is only three bytecodes shorter, so that doesn't support your comment about "a lot of unnecessary work". `dis.dis('[spam for spam in eggs]')` can: - skip the name lookup for list (LOAD_NAME); - and the CALL_FUNCTION that ends up calling it; The dissassemblies of the two code objects, "<genexpr>" and "<listcomp>", have slightly different implementations but only differ by one bytecode overall. As far as runtime efficiency, list comps are a little faster. Iterating over a 1000-item sequence is 33% faster for a list comp, but for a 100000-item sequence that drops to 25% faster. But as soon as you do a significant amount of work inside the comprehension, that work is likely to dominate the other costs. There's definitely some overhead needed to support starting and stopping a generator, but we can argue that is an implementation detail. A sufficiently clever interpreter could avoid that overhead.
That said, I think the only way you'd actually detect a behavioural difference is if the name "list" has been rebound.
That and timing. -- Steve

On Mon, May 3, 2021 at 10:08 PM Steven D'Aprano <steve@pearwood.info> wrote:
I mention the rebinding, but I'm not ruling out the possibility of other distinctions, perhaps due to order of execution.
Yielding is bidirectional. You won't see it when you just pass it to the list() constructor, but the genexp can have values sent back into it. That entails some extra machinery that is completely unnecessary for building a list, although, as I mentioned...
... it's mainly just a matter of simplifications.
I don't think so. The bytecode in 3.9 is remarkably similar.
Yes, it looks similar.
YIELD_VALUE followed by POP_TOP is your clue that it's bidirectional. The comprehension simply appends onto the list immediately. The genexp has to have two completely separate scopes and switch between them; the list comp runs everything in the same inner scope, building up the list.
"Three bytecodes shorter" conceals the fact that some bytecodes do a LOT of work. Look into how much work it takes to restart a generator, and compare that to the bytecode "APPEND_LIST".
How often are you doing a significant amount of work inside a comprehension?
No, it can't - except by rewriting it as a list comp, and I'm not certain that there wouldn't be timing distinctions. A genexp cannot skip the overhead of being a generator.
Yes, I don't count that as a behavioural difference. Nor memory usage, within reason. ChrisA

On Mon, May 3, 2021 at 9:04 AM Paul Moore <p.f.moore@gmail.com> wrote:
Yeah, I don't get this point at all. The `"delim".join(collection)` idiom may not be the first pattern someone thinks of the first time. But you learn it once, maybe repeat it a second time, then it's easy. In contrast, each time I see the "string comprehension" again, I realize more and more stumbling points that I would continue to have for years. Plus the fact that it just LOOKS UGLY is a drawback.
I kinda like this. I'm tempted to start writing all of these this way. And if I wanted, I could add `concat(...)` to that parallel structure easily enough. I hope the above was of use. Overall, I'm a strong -1 on this
proposal, I'm afraid.
I'm more like -100. -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.

On Sun, May 02, 2021 at 10:57:59PM -0400, David Álvarez Lombardi wrote:
I didn't say anything about those other iterable types playing a central role. Although now that you mention is, I *do* think that bytes, tuple, enumerate, range and zip are pretty central, even for newbies.
Please try to provide contentful comments instead of "gotchas".
One of your central arguments was that str is the only builtin iterable that doesn't have a comprehension form. That argument doesn't stand up to scrutiny. It doesn't even stand up if we weaken the argument to common, newbie-friendly, builtin containers with dedicated syntax: as well as str, there are bytes and tuple. No matter how you count them, the comprehension types (dict, set and list) don't exceed 50% of the candidates. If we have str comprehensions, we'd need at least two prefixes: one for raw strings, one for regular (cooked) strings. If it's worth doing for strings, its worth doing for bytes, which likewise would need two prefixes. Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument. Especially not for newbies. Seriously, why do we think that people with no programming experience, who might not even know the difference between print and return or a variable and a constant, are the gold standard in being able to recognise a good language API? It breaks my brain, and my heart, when people argue that "it's intuitive" trumps "I thought really hard and carefully about this, and this is a better way". This is why we can't have nice things :-( Anyway, lets go back to string comprehensions. To me, the argument that string comps could be more efficient is, at best, a weak argument. It isn't that I don't want more efficient Python code, but adding more and more specialised, single-purpose syntactic features for that efficiency is a poor way to do it. It makes the language harder to learn, and more work for implementers. But, if the efficiency gain is large, I guess it counts as an argument. If only it were a proven optimization, not a hypothetical one. I'm not really comfortable with having syntax that looks like a quoted string contain executable code, but f-strings broke that trail so at least you have precedence in your favour. (Although I'm not as enamoured with f-strings as many folks.) Ultimately, I think that the three major arguments in favour are weak: - "strings are the only (important) iterable missing a comprehension" is just wrong; - "str.join is unintuitive" depends on whose intuition you are talking about, but even if we agree it is still a weak argument: programming has many unintuitive things that need to be learned; - and the optimization argument is purely hypothetical, and probably not enough to justify dedicated syntax. Another weakness is that it can only join the substrings with no separator. I've looked at a sample of my code, and around 60% of the time I'm joining substrings I've given a separator, e.g. ', '.join(...) so a comprehension wouldn't work. Ultimately I don't think this is a terrible idea, but so far it hasn't crossed the threshhold of "benefits outweigh the costs". -- Steve

To be clear, I'm -1 as well -- we just don't need it. but a few thoughts: On Mon, May 3, 2021 at 6:32 AM Steven D'Aprano <steve@pearwood.info> wrote:
If we have str comprehensions, we'd need at least two prefixes: one for
raw strings, one for regular (cooked) strings.
would we? I don't think so -- because of the other arguments made here -- a string comprehension would no longer be a string literal at all. After all, "raw strings" are not a different ty[e, they are a different literal for the same type. That is, (duh) r"this\n" creates exactly the same string as "this\\n". IIUC the proposal, a string comprehension would be: c" expr for something in an_iterable" which would mean exactly the same as: "".join(expr for something in an_iterable) and thus there IS no escaping to ignore (or process) -- the expr could contain a raw string, an_iterable could be a raw string, buit no need for a raw string comprehension. or an f-string comprehension, or ...
If it's worth doing for strings, its worth doing for bytes,
I'm not so sure about that -- bytes are far more special purpose, and there are other nifty types like bytarrays.
Another pillar of your argument is that ''.join is "unintuitive" for newbies. I don't give much weight to that argument.
I think the "".join() idiom is kinda non-intuitive -- heck I've been known, twenty years in, to absentmindedly write: a_sequence.join(",") Once in a while. And my newbie students defiantly get tripped up by this. But they find comprehensions pretty confusing too :-) so I don't think this would "solve" that minor problem. Also -- I think you made this point: "intuitiveness" is nice, but it's not the primary design goal of a feature. What I do like about str.join() is that is very clearly is a string operation.
To me, the argument that string comps could be more efficient is, at
best, a weak argument.
Also, if this were a bottleneck, "".join(a_gen_expr) could be optimized. Now that I think about it, a lot of uses of generator expressions could be optimized whenever it is iterated over right away. (Well, maybe, with the ability to rename "list" and such it would be a bit tricky). Would it be worth it to do that? I doubt it. Those aren't often in tight loops, because they ARE the loop :-)
I'm not really comfortable with having syntax that looks like a quoted
I am :-) But while f-strings do put executable code inside a string, they are more conceptually similar to regular string literals -- right down to having a raw version :-) However, if one wants to go with that argument, maybe a different delimter than " -- I think back ticks are available -- and even used to mean (stringify) so not so bad. But I'm still -1 -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython

On Mon, May 03, 2021 at 11:49:31AM -0700, Christopher Barker wrote:
On further thought, I think you're right. String comprehensions aren't literals; they're not even a hybrid "part literal, part code" like f-strings. So scrub the raw comprehension versions. That just leaves a string version and a bytes version. [...]
There is only one truly intuitive interface, and that is the nipple. Everything else is learned. Just because we occasionally screw up and get syntax wrong doesn't make it "unintuitive" in any meaningful sense. We've all messed up code from time to time, especially when we're distracted, or tired and emotional. I've been known to write dicts `{key=value}`, invariably when it is a large dict with dozens of entries. Also `import func from module`, my fingers frequently type string.strip when I meant string.split, and visa versa, and for my most embarrasing mistake I once managed to write a module with no fewer than six classes like this: def MyClass(object): def __init__(self, obj): ... before actually running the code and discovering that it didn't do what I wanted. So I wouldn't read too much into the occasional typo or braino.
Indeed. I found comprehensions confusing too, and that was despite having many years experience with the syntax that inspired it, set builder notation in mathematics. For the longest time I had to literally write out my comprehension using mathematical notation and manually translate it to Python syntax to get anywhere. -- Steve

On Sat, May 01, 2021 at 03:05:51AM -0000, Valentin Berlier wrote:
No it doesn't. I count 15 builtin iterables, only three have comprehensions.
Okay. What string comprehension do I write to express my intent to write a string containing words separated by commas? What string comprehension do I write to express my intent to write a string containing lines separated by newlines? What string comprehension do I write to express my intent to write a string containing substrings separated by ' - ' (space, hyphen, space)? `str.join` can express the intent of every single one of those, as well as the intent to write a string containing substrings separated by the empty string.
Of course it does. What else could `''.join(expression)` mean, if not to build a string with the substrings derived from expression separated by the empty string?
Do you know what's worse than premature optimization? Accepting a new special-case language feature on the basis that, maybe some day, it might possibly enable a premature optimization. If you're going to claim a micro-optimization benefit, I think you need more than just to hand-wave that "a decent implementation" would allow it. Let's start with the simplest case: c'substring for substring in expression' What optimizations are available for that? -- Steve

I thought I had sent a response to this a few hours ago, but it seems to have been eaten by the email gremlins. Apologies if this ends up as a duplicate. On Fri, Apr 30, 2021 at 12:03:15PM -0400, David Álvarez Lombardi wrote:
I propose a syntax for constructing/filtering strings analogous to the one available for all other builtin iterables.
*All* others? The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently. It isn't that str is the exceptional case, it is that dict, list and set are the exceptional cases. In fact, there is a sense that this is a historical accident, that list comprehensions happened to have been invented first. If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions: list(expression for x in iterable) set(expression for x in iterable) dict((key, value) for x in iterable) -- Steve

The builtin interables bytearray, bytes, enumerate, filter frozenset, map, memoryview, range, reversed, tuple and zip suggest differently.
enumerate, filter, map, range, reversed and zip don't apply because they're not collections, you wouldn't be able to store the result of the computation anywhere. bytes comprehensions would make sense if string comprehensions are added. This leaves us with bytearray, frozenset and memoryview. How often are these used compared to strings, dicts, and lists?
If we were re-designing Python from scratch today, it is quite likely that we would have only generator comprehensions
I don't know about this, but unless everything besides generator expressions get deprecated the current comprehensions are here to stay and string comprehensions would fit perfectly alongside them (this is my opinion).

On Sat, May 01, 2021 at 06:21:43AM -0000, Valentin Berlier wrote:
You didn't say anything about *collections*, you talked about builtin *iterables*. And range is a collection: >>> import collections.abc >>> isinstance(range(10), collections.abc.Collection) True
you wouldn't be able to store the result of the computation anywhere.
I don't know what this objection means. The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required. -- Steve

you talked about builtin *iterables*
My mistake, I reused the terminology used by the original author to make it easier to follow.
The point of iterators like map, zip and filter is to *avoid* performing the computation until it is required.
Of course. Maybe I wasn't clear enough. I don't know why we're bringing up these operators in a discussion about comprehensions. And what would a "range" comprehension even look like? To me the fact that there's no comprehensions for enumerate, filter, map, range, reversed and zip doesn't contribute to making dict, list and set exceptional cases. As I said we're left with bytearray, frozenset and memoryview. These are much less frequently used and don't even have a literal form so expecting comprehensions for them would be a bit nonsensical. On the other hand strings, bytes, lists, dicts and sets all have literal forms but only lists, dicts and sets have comprehensions. Three out of five doesn't make them exceptional cases so it's only logical to at least consider the idea of adding comprehensions for strings (and bytes) too.

On Fri, 30 Apr 2021 at 17:08, David Álvarez Lombardi <alvarezdqal@gmail.com> wrote:
I’m not against a specialised string generator construct per-se (I’m not for it either :) as it’s not a problem I have experienced, and I’ve been doing a lot of string parsing/formatting at scale recently) but that doesn’t mean your use-cases are invalid. To me, the chosen syntax is problematic. The idea of introducing structural logic by using “” seems likely to cause confusion. Across all languages I use, quotes are generally and almost always used to introduce constant values. Sometimes, maybe, there are macro related things that may use quoting, but as a developer, if I see quotes, I’m thinking: the runtime will treat this as a constant. Having a special case where the quotes are a glorified function call just feels very wrong to me. And likely to be confusing. Steve

On Fri, Apr 30, 2021 at 9:06 AM David Álvarez Lombardi < alvarezdqal@gmail.com> wrote:
If a feature like this is useful -- and I'm not sure it is -- there is a much better way to do this IMHO. Add a new format converter to the syntax for replacement fields: *>>> f"{c for c in dirty if c in string.ascii_letters !j}"* *'fsjGe'* where *!j* means join. It could optionally take a separator string as in this example: *>>> f"{chr(65 + i) for i in range(4) !j('-')}"* *'A-B-C-D'* --- Bruce

Bruce Leban writes:
where *!j* means join. It could optionally take a separator string as in this example:
Converters *could* take arguments but they currently don't: it's a simple switch on a str argument. We already have one complex minilanguage inside {}, do we really want another? Maybe if we use regexps .... ;-) But seriously, if you want complex conversions, you can just call a function in there, which gives you arguments if you want them. Or in this context you can wrap the object in a proxy with an appropriate __format__. This can be quite generic, and allows you to put the arguments into the format spec. Steve
participants (16)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Brendan Barnwell
-
Bruce Leban
-
Caleb Donovick
-
Chris Angelico
-
Christopher Barker
-
David Mertz
-
David Álvarez Lombardi
-
Joao S. O. Bueno
-
Jonathan Fine
-
Paul Moore
-
Rob Cliffe
-
Stephen J. Turnbull
-
Stestagg
-
Steven D'Aprano
-
Valentin Berlier