On 8/11/2019 8:40 PM, Eric V. Smith wrote:

On 8/11/2019 4:18 PM, Glenn Linderman wrote:

On 8/11/2019 2:50 AM, Steven D'Aprano wrote:

On Sat, Aug 10, 2019 at 12:10:55PM -0700, Glenn Linderman wrote:

Or invent "really raw" in some spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or <your favorite character
here>"c:\directory\"

And that brings me to the thought that if \e wants to become an
escape for escape, that maybe there should be an "extended escape"
prefix... if you want to use more escapes, define ee"string where \\
can only be used as an escape or escaped character, \e means the ASCII
escape character, and \ followed by a character with no escape
definition would be an error."

Please no.

We already have b-strings, r-strings, u-strings, f-strings, br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in four
varieties (single quote, double quote, triple single quote and triple
double quote). Now you're talking about adding rr-strings, v-strings
(Greg suggested that) and ee-strings, presumably some or all of which
will need b*- and *b- or f*- and *f- varieties too.

Don't forget the upper & lower case varieties :)

And all orders!

>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf', 'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb', 'fr', 'br'}
>>> len(_all_string_prefixes())
25

And if you add just 'bv' and 'fv', it's 41:

{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br', 'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r', 'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR', 'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'}

There would be no need for 'uv' (not needed for backward compatibility) or 'rv' (can't be both raw and verbatim).

I'm not in any way serious about this. I just want people to realize how many wacky combinations there would be. And heaven forbid we ever add some combination of 3 characters. If 'rfv' were actually also valid, you get to 89:

{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u', 'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv', 'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf', 'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR', 'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF', 'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF', 'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF', 'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr', 'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'}

If only we could deprecate upper case prefixes!

Eric

Yes. Happily while there is a combinatorial explosion in spellings and casings, there is no cognitive overload: each character has an independent effect on the interpretation and use of the string, so once you understand the 5 existing types (b r u f and plain) you understand them all.

Should we add one or two more, it would be with the realization (hopefully realized in the documentation also) that v and e would effectively be replacements for r and plain, rather than being combined with them.

Were I to design a new language with similar string syntax, I think I would use plain quotes for verbatim strings only, and have the following prefixes, in only a single case:

(no prefix) - verbatim UTF-8 (at this point, I see no reason not to require UTF-8 for the encoding of source files)
b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings

Actually, the above could be done as a preprocessor for python, or a future import. In other words, what you see is what you get, until you add a prefix to add additional processing. The only combinations that seem useful are eb and ef. I don't know that constraining the order of the prefixes would be helpful or not, if it is helpful, I have no problem with a canonical ordering being prescribed.

As a future import, one could code modules to either the current combinatorial explosion with all its gotchas, special cases, and passing of undefined escapes; or one could code to the clean limited cases above.

Another thing that seems awkward about the current strings is that {{ and }} become "special escapes". If it were not for the permissive usage of \{ and \} in the current plain string processing, \{ and \} could have been used to escape the non-format-expression uses of { and }, which would be far more consistent with other escapes. Perhaps the future import could regularize that, also.

A future import would have no backward compatibility issues to disrupt a simplified, more regular syntax.

Does anyone know of an existing feature that couldn't be expressed in a straightforward manner with only the above capabilities?

The only other thing that I have heard about regarding strings is that multi-line strings have their first line indented, and other lines not. Some have recommended making the first line blank, and just chopping off the first \n, others have recommended indenting all lines, and replacing "\n" followed by the number of indented spaces by "\n", so the text can be aligned in the code like it will be aligned for use. Both techniques seem to have their place in aiding code readability. Both techniques could be used together, in practice, using one more prefix character for triple quotes only:

    longstring = l"""
The traditional first blank line form
could be used at it has."""

If the first character of a long-string is a newline character, then it will be removed. If the string wants to have an initial newline character, a second one can be provided, which would not be removed.

     longstring = l"""The traditional indented form
                      could be used as it has, also."""

This would be contracted by removing up to the number of space characters to reach the first character of the first line of the string (if the lexer can provide that) after newlines within the string. If fewer space characters are available after a newline, only the number available would be removed. If there are more, they would be retained.

A new form would also be permitted:

    longstring = l"""
        An indented form that isn't pushed as far right as the
        traditional indented form could also be used."""

If the first character of an l-string is a newline and the second character is a space character, this form would count the number of space characters in the second line, and remove up to that many space characters from all lines, as well as removing the initial newline character.

If l-strings were implemented (l for layout), they could be combined with f and/or e.

Are there any other string feature workarounds in common use that could be codified in a future import scenario?

Glenn