On 8/15/2019 4:17 AM, Petr Viktorin wrote:
On 8/15/19 10:40 AM, Greg Ewing wrote:
If we want a truly raw string format that allows all characters, including any kind of quote, we could take a tip from Fortran:
s = 31HThis is a "totally raw" string!
Or from Rust:
let s = r"Here's a raw string"; let s = r#"Here's a raw string with "quotes" in it"#; let s = r##"Here's r#"the raw string syntax"# in raw string"##; let s = r###"and here's a '##"' as well"###;
Indeed, Fortran has raw strings, but comes with the disadvantage of having to count characters. This is poor form when edits want to change the length of the string, although it might be advantageous if the string must fit into a certain fixed-width on a line printer. Let's not go there. Without reading the Rust spec, but from your examples, it seems that Rust has borrowed concepts from Perl's q and qq operators, both of which allowed specification of any non-alphanumeric character as the delimiter. Not sure if that included Unicode characters (certainly not in the early days before Unicode support was added), but it did have a special case for paired characters such as <> [] {} to allow those pairs to be used as delimiters, and still allow properly nested instances of themselves inside the string. It looks like Rust might only allow #, but any number of them, to delimit raw strings. This is sufficient, but for overly complex raw strings containing lots of # character sequences, it could get cumbersome, and starts to border on the problems of the Fortran solution, where character counting is an issue, whereas the choice of an alternative character or character sequence would result in a simpler syntax. I don't know if Rust permits implicit string concatenation, but a quick search convinces me it doesn't. The combination of Python's triple-quote string literal, together with implicit concatenation, is a powerful way to deal with extremely complex string literals, although it does require breaking them into pieces occasionally, mostly when including a string describing the triple-quote syntax. Note that regex searching for triple-quotes can use "{3} or '{3} to avoid the need to embed triple-quotes in the regex. Perl's "choice of delimiter" syntax is maybe a bit more convenient sometimes, but makes parsing of long strings mentally exhausting (although it is quick for the interpreter), due to needing to remember what character is being used as the delimiter. My proposal isn't intended to change the overall flavor of Python's string syntax, just to regularize and simplify it, while allowing additional escapes and other extensions to be added in the future, without backward-compatibility issues.