On Wed, Oct 23, 2019, 4:31 PM Steven D'Aprano
David, you literally wrote the book on text processing in Python. I think you are being disingenious here, and below when you describe a standard string hex-escape \x20 that has been in Python forever and in just about all C-like languages as "weird".
I'm so flattered anyone remembers that from long ago. It was a very fun book to write. :-)
I think, however, that I've never written '\x20' before this moment in my life. I do know the ASCII and Unicode code point for a space. I've run the 'hexdump' utility plenty of times. But it's hard to think of an occasion when I would have needed to enter a space by code point rather than just quoted.
So I don't think it's so disingenuous to think needing to do that would be "weird." I've escaped lots of other characters that don't have a giant key about 7x the width of other keys on my keyboard.
If you can understand why this works:
string = "Single\n quoted\n string\n containing newlines!"
you can understand the burnt\x20umber example.
I can discern your intention for the new behavior, yes. But:
In : "burnt\x20umber".split() Out: ['burnt', 'umber'] In : "Single\n quoted\n string\n containing newlines!".split() Out: ['Single', 'quoted', 'string', 'containing', 'newlines!']
So this new syntax would behave in a way that is counter-intuitive for folks familiar with Python strings to date.
Also, I genuinely am not clear what should happen if an expression like
%w[cyan forest green burnt\x20umber]
Contains any of the following (non-escaped) characters. If they occur inside quotes, it seems straightforward, but in this new '%w' thing, who knows?
U+00A0 NO-BREAK SPACE foo bar As a space, but often not adjusted U+1680 OGHAM SPACE MARK foo bar Unspecified; usually not really a space but a dash U+180E MONGOLIAN VOWEL SEPARATOR foobar 0 U+2000 EN QUAD foo bar 1 en (= 1/2 em) U+2001 EM QUAD foo bar 1 em (nominally, the height of the font) U+2002 EN SPACE (nut) foo bar 1 en (= 1/2 em) U+2003 EM SPACE (mutton) foo bar 1 em U+2004 THREE-PER-EM SPACE (thick space) foo bar 1/3 em U+2005 FOUR-PER-EM SPACE (mid space) foo bar 1/4 em U+2006 SIX-PER-EM SPACE foo bar 1/6 em U+2007 FIGURE SPACE foo bar “Tabular width”, the width of digits U+2008 PUNCTUATION SPACE foo bar The width of a period “.” U+2009 THIN SPACE foo bar 1/5 em (or sometimes 1/6 em) U+200A HAIR SPACE foo bar Narrower than THIN SPACE U+200B ZERO WIDTH SPACE foobar 0 U+202F NARROW NO-BREAK SPACE foo bar Narrower than NO-BREAK SPACE (or SPACE), “typically the width of a thin space or a mid space” U+205F MEDIUM MATHEMATICAL SPACE foo bar 4/18 em U+3000 IDEOGRAPHIC SPACE foo bar The width of ideographic (CJK) characters. U+FEFF