On 8/11/2019 8:40 PM, Eric V. Smith
wrote:
On
8/11/2019 4:18 PM, Glenn Linderman wrote:
On 8/11/2019 2:50 AM, Steven D'Aprano
wrote:
On Sat, Aug 10, 2019 at 12:10:55PM
-0700, Glenn Linderman wrote:
Or invent "really raw" in some
spelling, such as rr"c:\directory\"
or e for exact, or x for exact, or <your favorite
character
here>"c:\directory\"
And that brings me to the thought that if \e wants to
become an
escape for escape, that maybe there should be an "extended
escape"
prefix... if you want to use more escapes, define
ee"string where \\
can only be used as an escape or escaped character, \e means
the ASCII
escape character, and \ followed by a character with no
escape
definition would be an error."
Please no.
We already have b-strings, r-strings, u-strings, f-strings,
br-strings,
rb-strings, fr-strings, rf-strings, each of which comes in
four
varieties (single quote, double quote, triple single quote and
triple
double quote). Now you're talking about adding rr-strings,
v-strings
(Greg suggested that) and ee-strings, presumably some or all
of which
will need b*- and *b- or f*- and *f- varieties too.
Don't forget the upper & lower case varieties :)
And all orders!
>>> _all_string_prefixes()
{'', 'b', 'BR', 'bR', 'B', 'rb', 'F', 'RF', 'rB', 'FR', 'Rf',
'Fr', 'RB', 'f', 'r', 'rf', 'rF', 'R', 'u', 'fR', 'U', 'Br', 'Rb',
'fr', 'br'}
>>> len(_all_string_prefixes())
25
And if you add just 'bv' and 'fv', it's 41:
{'', 'fr', 'Bv', 'BR', 'F', 'rb', 'Fv', 'VB', 'vb', 'vF', 'br',
'FV', 'vf', 'FR', 'fV', 'bV', 'Br', 'Vb', 'Rb', 'RF', 'bR', 'r',
'R', 'Vf', 'fv', 'U', 'RB', 'B', 'rB', 'vB', 'Fr', 'rF', 'fR',
'Rf', 'BV', 'VF', 'bv', 'b', 'u', 'f', 'rf'}
There would be no need for 'uv' (not needed for backward
compatibility) or 'rv' (can't be both raw and verbatim).
I'm not in any way serious about this. I just want people to
realize how many wacky combinations there would be. And heaven
forbid we ever add some combination of 3 characters. If 'rfv' were
actually also valid, you get to 89:
{'', 'br', 'vb', 'fR', 'F', 'rFV', 'fRv', 'fV', 'rVF', 'Rfv', 'u',
'vRf', 'fVR', 'rfV', 'Fvr', 'vrf', 'fVr', 'vB', 'Vb', 'Rvf', 'Fv',
'Fr', 'FVr', 'B', 'rVf', 'FVR', 'vfr', 'VB', 'VrF', 'BR', 'VRf',
'vfR', 'FR', 'Br', 'RFV', 'Rf', 'fvR', 'f', 'rb', 'VfR', 'VFR',
'fr', 'vFR', 'VRF', 'frV', 'bR', 'b', 'FrV', 'r', 'R', 'RVF',
'FV', 'rvF', 'FRV', 'Vrf', 'rvf', 'FRv', 'Frv', 'vF', 'bV', 'VF',
'fv', 'RF', 'RB', 'rB', 'vRF', 'RFv', 'RVf', 'Rb', 'Vfr', 'vrF',
'rf', 'Bv', 'vf', 'rF', 'U', 'bv', 'FvR', 'RfV', 'Vf', 'VFr',
'vFr', 'fvr', 'BV', 'rFv', 'rfv', 'fRV', 'frv', 'RvF'}
If only we could deprecate upper case prefixes!
Eric
Yes. Happily while there is a combinatorial explosion in spellings
and casings, there is no cognitive overload: each character has an
independent effect on the interpretation and use of the string, so
once you understand the 5 existing types (b r u f and plain) you
understand them all.
Should we add one or two more, it would be with the realization
(hopefully realized in the documentation also) that v and e would
effectively be replacements for r and plain, rather than being
combined with them.
Were I to design a new language with similar string syntax, I think
I would use plain quotes for verbatim strings only, and have the
following prefixes, in only a single case:
(no prefix) - verbatim UTF-8 (at this point, I see no reason not to
require UTF-8 for the encoding of source files)
b - for verbatim bytes
e - allow (only explicitly documented) escapes
f - format strings
Actually, the above could be done as a preprocessor for python, or a
future import. In other words, what you see is what you get, until
you add a prefix to add additional processing. The only
combinations that seem useful are eb and ef. I don't know that
constraining the order of the prefixes would be helpful or not, if
it is helpful, I have no problem with a canonical ordering being
prescribed.
As a future import, one could code modules to either the current
combinatorial explosion with all its gotchas, special cases, and
passing of undefined escapes; or one could code to the clean limited
cases above.
Another thing that seems awkward about the current strings is that
{{ and }} become "special escapes". If it were not for the
permissive usage of \{ and \} in the current plain string
processing, \{ and \} could have been used to escape the
non-format-expression uses of { and }, which would be far more
consistent with other escapes. Perhaps the future import could
regularize that, also.
A future import would have no backward compatibility issues to
disrupt a simplified, more regular syntax.
Does anyone know of an existing feature that couldn't be expressed
in a straightforward manner with only the above capabilities?
The only other thing that I have heard about regarding strings is
that multi-line strings have their first line indented, and other
lines not. Some have recommended making the first line blank, and
just chopping off the first \n, others have recommended indenting
all lines, and replacing "\n" followed by the number of indented
spaces by "\n", so the text can be aligned in the code like it will
be aligned for use. Both techniques seem to have their place in
aiding code readability. Both techniques could be used together, in
practice, using one more prefix character for triple quotes only:
longstring = l"""
The traditional first blank line form
could be used at it has."""
If the first character of a long-string is a newline character, then
it will be removed. If the string wants to have an initial newline
character, a second one can be provided, which would not be removed.
longstring = l"""The traditional indented form
could be used as it has, also."""
This would be contracted by removing up to the number of space
characters to reach the first character of the first line of the
string (if the lexer can provide that) after newlines within the
string. If fewer space characters are available after a newline,
only the number available would be removed. If there are more, they
would be retained.
A new form would also be permitted:
longstring = l"""
An indented form that isn't pushed as far right as the
traditional indented form could also be used."""
If the first character of an l-string is a newline and the second
character is a space character, this form would count the number of
space characters in the second line, and remove up to that many
space characters from all lines, as well as removing the initial
newline character.
If l-strings were implemented (l for layout), they could be combined
with f and/or e.
Are there any other string feature workarounds in common use that
could be codified in a future import scenario?
Glenn