<div dir="ltr"><br><br><div class="gmail_quote"><div dir="ltr">On Mon, 11 Apr 2016 at 14:11 Ethan Furman <<a href="mailto:ethan@stoneleaf.us">ethan@stoneleaf.us</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 04/11/2016 01:42 PM, Victor Stinner wrote:<br>
> 2016-04-11 21:00 GMT+02:00 Brett Cannon:<br>
<br>
>> I'm -0 on allowing __fspath__ to return bytes, but we can see what others<br>
>> think.<br>
><br>
> With the PEP 383, a bytes filename can be stored as str using the<br>
> surrogateescape error handler. So DirEntry can convert a bytes path to<br>
> str using os.fsdecode().<br>
<br>
I am far from a unicode expert, but if I understand this correctly you<br>
are proposing that DirEntry.__whatever__ can always return a str using<br>
the surogateescape (SE) method.<br>
<br>
However, before this SE string can be used, it would need to be<br>
converted back to bytes, and with the same SE method, yes? And this has<br>
already been implemented in the stdlib?<br>
<br>
So my concern in such a case is what happens if we pass this SE string<br>
somewhere else: a UTF-8 file, or over a socket, or into a database?<br>
Does this have issues that we wouldn't face if we just used bytes?<br></blockquote><div><br></div><div>This is my worry as well and why I have not proposed this kind of universal normalizing of bytes paths using os.fsdecode() w/ surrogateescape. Doing this sort of thing from the system boundary and documenting as such as PEP 383 proposed makes a bit more sense as the expectation is more controlled and is a clear input boundary. </div></div></div>