<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 7, 2015 at 2:15 AM, Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com" target="_blank">rosuav@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> wrote:<br>

> On Thu, Aug 06, 2015 at 12:26:14PM -0400, <a href="mailto:random832@fastmail.us">random832@fastmail.us</a> wrote:<br>

>> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:<br>

>> > Because strings containing \{ are currently valid<br>

>><br>

>> Which raises the question of why.<br>

><br>

> Because \C is currently valid, for all values of C. The idea is that if<br>

> you typo an escape, say \d for \f, you get an obvious backslash in your<br>

> string which is easy to spot.<br>

><br>

> Personally, I think that's a mistake. It leads to errors like this:<br>

><br>

> filename = 'C:\some\path\something.txt'<br>

><br>

> silently doing the wrong thing. If we're going to change the way escapes<br>

> work, it's time to deprecate the misfeature that \C is a literal<br>

> backslash followed by C. Outside of raw strings, a backslash should<br>

> *only* be allowed in an escape sequence.<br>

<br>

</span>I agree; plus, it means there's yet another thing for people to<br>

complain about when they switch to Unicode strings:<br>

<br>

path = "c:\users", "C:\Users" # OK on Py2<br>

path = u"c:\users", u"C:\Users" # Fails<br></blockquote><div><br></div><div>So this doesn't work?</div><div><br></div><div>    path = pathilb.Path(u"c:\users")</div><div>    # SEC: path concatenation is often in conjunction with user-supplied input</div><div><br></div><div>- [ ] docs for these</div><div>- [ ] to/from r'rawstring' (DOC: encode/decode)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">

<br>

Or equivalently, moving to Py3 and having those strings quietly become<br>

Unicode strings, and now having meaning on the \U and \u escapes.<br>

<br>

That said, though: It's now too late to change Python 2, which means<br>

that this is going to be yet another hurdle when people move<br>

(potentially large) Windows codebases to Python 3. IMO it's a good<br>

thing to trip people up immediately, rather than silently doing the<br>

wrong thing - but it is going to be another thing that people moan<br>

about when Python 3 starts complaining. First they have to add<br>

parentheses to print, then it's all those pointless (in their eyes)<br>

encode/decode calls, and now they have to go through and double all<br>

their backslashes as well! But the alternative is that some future<br>

version of Python adds a new escape code, and all their code starts<br>

silently doing weird stuff - or they change the path name and it goes<br>

haywire (changing from "c:\users\demo" to "c:\users\all users" will be<br>

a fun one to diagnose) - so IMO it's better to know about it early.<br>

<span class=""><br>

> If we're going to make major changes to the way escapes work, I'd rather<br>

> add new escapes, not take them away:<br>

><br>

><br>

> \e escape \x1B, as supported by gcc and clang;<br>

<br>

</span>Please, yes! Also supported by a number of other languages and<br>

commands (Pike, GNU echo, and some others that I don't recall (but not<br>

bind9, which has its own peculiarities)).<br>

<span class=""><br>

> the escaping rules from Haskell:<br>

><br>

> <a href="http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html" rel="noreferrer" target="_blank">http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html</a><br>

><br>

> \P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)<br>

<br>

</span>Hmm. Not sure how useful this would be. Personally, I consider this to<br>

be a platform-specific encoding, on par with expecting b"\xc2\xa1" to<br>

display "¡", and as such, it should be kept to boundaries. Work with<br>

"\n" internally, and have input routines convert to that, and output<br>

routines optionally add "\r" before them all.<br>

<span class=""><br>

> \U+xxxx Unicode code point U+xxxx (with four to six hex digits)<br>

><br>

> It's much nicer to be able to write Unicode code points that (apart from<br>

> the backslash) look like the standard Unicode notation U+0000 to<br>

> U+10FFFF, rather than needing to pad to a full eight digits as the<br>

> \U00xxxxxx syntax requires.<br>

<br>

</span>The problem is the ambiguity. How do you specify that "\U+101010" be a<br>

two-character string? "\U000101010" forces it by having exactly eight<br>

digits, but as soon as you allow variable numbers of digits, you run<br>

into problems. I suppose you could always pad to six for that:<br>

"\U+0101010" could know that it doesn't need a seventh digit. (Though<br>

what would ever happen if the Unicode consortium decides to drop<br>

support for UTF-16 and push for a true 32-bit character set, I don't<br>

know.) It is tempting, though - it both removes the need for two<br>

pointless zeroes, and broadly unifies the syntax for Unicode escapes,<br>

instead of having a massive boundary from "\u1234" to "\U00012345".<br>

<br>

ChrisA<br>

<div class="HOEnZb"><div class="h5">_______________________________________________<br>

Python-ideas mailing list<br>

<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>

<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>

Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a></div></div></blockquote></div><br></div></div>