On 8/14/2019 8:02 AM, Random832 wrote:

On Mon, Aug 12, 2019, at 15:15, Terry Reedy wrote:

Please no more combinations. The presence of both legal and illegal 
combinations is already a mild nightmare for processing and testing. 
idlelib.colorizer has the following re to detest legal combinations

     stringprefix = r"(?i:r|u|f|fr|rf|b|br|rb)?"

More advanced syntax highlighting editors have to handle each string type separately anyway, because they highlight (valid) backslash-escapes and f-string formatters. The proposed 'v-string' type would need separate handling even in a simplistic editor like IDLE, because it's different at the basic level of \" not ending the string (whereas, for better or worse, all current string types have exactly the same rules for how to find the end delimiter)

I had to read this several times, and then only after reading Eric's reply, it finally hit me that what you are saying is that \" doesn't end the string in any other form of string, but that sequence would end a v-string.

It seems that also explains why Serhiy, in describing his experiment really raw string literals mentioned having to change the tokenizer as well as the parser (proving that it isn't impossible to deal with truly raw strings).

\" not ending a raw string was certainly a gotcha for me when I started using Python (with a background in C and Perl among other languages), and it convinced me not to raw strings, that that gotcha was not worth the other benefits of raw strings. Serhiy said:

Currently a raw literal cannot end in a single backslash (e.g. in r"C:\User\"). Although there are reasons for this. It is an old gotcha, and there are many closed issues about it. This question is even included in FAQ.

which indicates that I am not the only one that has been tripped up by that over the years.

Trying to look at it from the eyes of a beginning programmer, the whole idea of backslash being an escape character is an unnatural artifice. I'm unaware (but willing to be educated) of any natural language, when using quotations, that has such a concept. Nested quotations exist, in various forms: use of a different quotation mark for the inner and outer quotations, and block quotations (which in English, have increased margin on both sides, and have a blank line before and after).

Python actually supports constructs very similar to the natural language formats, allowing both " and ' for quotations and nested quotations, and the triple-quoted string with either " or ' is very similar in concept to a block quotation. But _all_ the strings forms are burdened with surprises for the beginning programmer: escape sequences of one sort or another must be learned and understood to avoid surprises when using the \ character.

Programming languages certainly need an escape character mechanism to deal with characters that cannot easily be typed on a keyboard (such as ¤ ¶ etc.), or which are visually indistinguishable from other characters or character sequences (various widths of white space), or which would be disruptive to the flow of code or syntax if represented by the usual character (newline, carriage return, formfeed, maybe others). But these are programming concepts, not natural language concept. The basic concept of a quoted string should best be borrowed directly from natural language, and then enhancements to that made to deal with programming concepts.

In Python, as in C, the escape characters are built in the basic string syntax, one must learn the quirks of the escaping mechanism in order to write

In Perl, " strings include escapes, and ' strings do not. So there is a basic string syntax that is similar to natural language, and one that is extended to include programming concepts. [N.B. There are lots of reasons I switched from Perl to Python, and don't have any desire to go back, but I have to admit, that the lack of a truly raw string in Python was a disappointment.]

So that, together with the desire for new escape sequences, and the creation of a new escape mechanism in the f-string {} (which adds both { and } as escape characters by requiring them to be doubled to be treated as literal inside an f-string, instead of using \{ and \} as the escapes [which would have been possible, due to the addition of the f prefix]), and the issue that because every current \-escape is defined to do something, is why I suggested elsewhere in this thread that perhaps the whole irregular string syntax should be rebooted with a future import, and it seems it could both be simpler, more regular, and more powerful as a result. And by using a future import, there are no backward incompatibility issues, and migration can be module by module.

The more I think about this, the more tempting it is to attempt to fork Python just to have a better string syntax! But alas! So many other time commitments, and a lack of in-depth internals knowledge make that an impossibility. I daresay, though, that if I get a free week, I might well write a preprocessor that converts my suggested future syntax to C-Python, so that I can use it in my own projects!