<div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Aug 7, 2015 at 2:15 AM, Chris Angelico <span dir="ltr"><<a href="mailto:rosuav@gmail.com" target="_blank">rosuav@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Fri, Aug 7, 2015 at 3:12 PM, Steven D'Aprano <<a href="mailto:steve@pearwood.info">steve@pearwood.info</a>> wrote:<br>
> On Thu, Aug 06, 2015 at 12:26:14PM -0400, <a href="mailto:random832@fastmail.us">random832@fastmail.us</a> wrote:<br>
>> On Wed, Aug 5, 2015, at 14:56, Eric V. Smith wrote:<br>
>> > Because strings containing \{ are currently valid<br>
>><br>
>> Which raises the question of why.<br>
><br>
> Because \C is currently valid, for all values of C. The idea is that if<br>
> you typo an escape, say \d for \f, you get an obvious backslash in your<br>
> string which is easy to spot.<br>
><br>
> Personally, I think that's a mistake. It leads to errors like this:<br>
><br>
> filename = 'C:\some\path\something.txt'<br>
><br>
> silently doing the wrong thing. If we're going to change the way escapes<br>
> work, it's time to deprecate the misfeature that \C is a literal<br>
> backslash followed by C. Outside of raw strings, a backslash should<br>
> *only* be allowed in an escape sequence.<br>
<br>
</span>I agree; plus, it means there's yet another thing for people to<br>
complain about when they switch to Unicode strings:<br>
<br>
path = "c:\users", "C:\Users" # OK on Py2<br>
path = u"c:\users", u"C:\Users" # Fails<br></blockquote><div><br></div><div>So this doesn't work?</div><div><br></div><div> path = pathilb.Path(u"c:\users")</div><div> # SEC: path concatenation is often in conjunction with user-supplied input</div><div><br></div><div>- [ ] docs for these</div><div>- [ ] to/from r'rawstring' (DOC: encode/decode)</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
Or equivalently, moving to Py3 and having those strings quietly become<br>
Unicode strings, and now having meaning on the \U and \u escapes.<br>
<br>
That said, though: It's now too late to change Python 2, which means<br>
that this is going to be yet another hurdle when people move<br>
(potentially large) Windows codebases to Python 3. IMO it's a good<br>
thing to trip people up immediately, rather than silently doing the<br>
wrong thing - but it is going to be another thing that people moan<br>
about when Python 3 starts complaining. First they have to add<br>
parentheses to print, then it's all those pointless (in their eyes)<br>
encode/decode calls, and now they have to go through and double all<br>
their backslashes as well! But the alternative is that some future<br>
version of Python adds a new escape code, and all their code starts<br>
silently doing weird stuff - or they change the path name and it goes<br>
haywire (changing from "c:\users\demo" to "c:\users\all users" will be<br>
a fun one to diagnose) - so IMO it's better to know about it early.<br>
<span class=""><br>
> If we're going to make major changes to the way escapes work, I'd rather<br>
> add new escapes, not take them away:<br>
><br>
><br>
> \e escape \x1B, as supported by gcc and clang;<br>
<br>
</span>Please, yes! Also supported by a number of other languages and<br>
commands (Pike, GNU echo, and some others that I don't recall (but not<br>
bind9, which has its own peculiarities)).<br>
<span class=""><br>
> the escaping rules from Haskell:<br>
><br>
> <a href="http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html" rel="noreferrer" target="_blank">http://book.realworldhaskell.org/read/characters-strings-and-escaping-rules.html</a><br>
><br>
> \P platform-specific newline (e.g. \r\n on Windows, \n on POSIX)<br>
<br>
</span>Hmm. Not sure how useful this would be. Personally, I consider this to<br>
be a platform-specific encoding, on par with expecting b"\xc2\xa1" to<br>
display "¡", and as such, it should be kept to boundaries. Work with<br>
"\n" internally, and have input routines convert to that, and output<br>
routines optionally add "\r" before them all.<br>
<span class=""><br>
> \U+xxxx Unicode code point U+xxxx (with four to six hex digits)<br>
><br>
> It's much nicer to be able to write Unicode code points that (apart from<br>
> the backslash) look like the standard Unicode notation U+0000 to<br>
> U+10FFFF, rather than needing to pad to a full eight digits as the<br>
> \U00xxxxxx syntax requires.<br>
<br>
</span>The problem is the ambiguity. How do you specify that "\U+101010" be a<br>
two-character string? "\U000101010" forces it by having exactly eight<br>
digits, but as soon as you allow variable numbers of digits, you run<br>
into problems. I suppose you could always pad to six for that:<br>
"\U+0101010" could know that it doesn't need a seventh digit. (Though<br>
what would ever happen if the Unicode consortium decides to drop<br>
support for UTF-16 and push for a true 32-bit character set, I don't<br>
know.) It is tempting, though - it both removes the need for two<br>
pointless zeroes, and broadly unifies the syntax for Unicode escapes,<br>
instead of having a massive boundary from "\u1234" to "\U00012345".<br>
<br>
ChrisA<br>
<div class="HOEnZb"><div class="h5">_______________________________________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a></div></div></blockquote></div><br></div></div>