On Oct 23, 2019, at 13:10, Steven D'Aprano email@example.com wrote:
David, you literally wrote the book on text processing in Python. I think you are being disingenious here, and below when you describe a standard string hex-escape \x20 that has been in Python forever and in just about all C-like languages as "weird".
I think what he’s saying is that it’s weird that \x20 doesn’t count as white space here, when it literally means a space character.
We do have to deal with this kind of weirdness in regexes, and that’s part of the reason we have raw strings literal, and this is no more confusing than passing a raw string literal to re.compile.
But arguably it’s also no _less_ confusing than passing a raw to re.compile, and that does actually confuse people, and now we’re talking about promoting that kind of confusion from a parser buried inside a module that novices don’t have to use to the actual Python parser that handles every line you type.
If you can understand why this works:
string = "Single\n quoted\n string\n containing newlines!"
you can understand the burnt\x20umber example.
Not really. Your string contains new lines; it also contains spaces. Your burnt\x20umber example doesn’t contain a space.
Or, rather, it doesn’t contain a space that separates the elements, but one of the elements does anyway. As if this:
strings = "Single\n quoted\n string\n containing newlines!".splitlines()
… gave you a list of one string that contains new lines instead of a list of three strings that don’t.