On 8/15/2019 4:17 AM, Petr Viktorin
wrote:
On
8/15/19 10:40 AM, Greg Ewing wrote:
If we want a truly raw string format that
allows all characters,
including any kind of quote, we could take a tip from Fortran:
s = 31HThis is a "totally raw" string!
Or from Rust:
let s = r"Here's a raw string";
let s = r#"Here's a raw string with "quotes" in it"#;
let s = r##"Here's r#"the raw string syntax"# in raw string"##;
let s = r###"and here's a '##"' as well"###;
Indeed, Fortran has raw strings, but comes with the disadvantage of
having to count characters. This is poor form when edits want to
change the length of the string, although it might be advantageous
if the string must fit into a certain fixed-width on a line printer.
Let's not go there.
Without reading the Rust spec, but from your examples, it seems that
Rust has borrowed concepts from Perl's q and qq operators, both of
which allowed specification of any non-alphanumeric character as the
delimiter. Not sure if that included Unicode characters (certainly
not in the early days before Unicode support was added), but it did
have a special case for paired characters such as <> [] {} to
allow those pairs to be used as delimiters, and still allow properly
nested instances of themselves inside the string.
It looks like Rust might only allow #, but any number of them, to
delimit raw strings. This is sufficient, but for overly complex raw
strings containing lots of # character sequences, it could get
cumbersome, and starts to border on the problems of the Fortran
solution, where character counting is an issue, whereas the choice
of an alternative character or character sequence would result in a
simpler syntax.
I don't know if Rust permits implicit string concatenation, but a
quick search convinces me it doesn't.
The combination of Python's triple-quote string literal, together
with implicit concatenation, is a powerful way to deal with
extremely complex string literals, although it does require breaking
them into pieces occasionally, mostly when including a string
describing the triple-quote syntax. Note that regex searching for
triple-quotes can use "{3} or '{3} to avoid the need to embed
triple-quotes in the regex.
Perl's "choice of delimiter" syntax is maybe a bit more convenient
sometimes, but makes parsing of long strings mentally exhausting
(although it is quick for the interpreter), due to needing to
remember what character is being used as the delimiter.
My proposal isn't intended to change the overall flavor of Python's
string syntax, just to regularize and simplify it, while allowing
additional escapes and other extensions to be added in the future,
without backward-compatibility issues.