[Python-Dev] Bytes path support
cs at zip.com.au
Fri Aug 22 00:27:21 CEST 2014
On 21Aug2014 09:20, Antoine Pitrou <antoine at python.org> wrote:
>Le 21/08/2014 00:52, Cameron Simpson a écrit :
>>The "bytes in some arbitrary encoding where at least the slash character
>>maybe a couple others) is ascii compatible" notion is completely bogus.
>>There's only one special byte, the slash (code 47). There's no OS-level
>>need that it or anything else be ASCII compatible.
>Of course there is. Try to split an UTF-16-encoded file path on the
>byte 47 and you'll get a lot of garbage. So, yes, POSIX implicitly
>mandates an ASCII-compatible encoding for file paths.
[Rolls eyes.] Looking at the UTF-16 encoding, it looks like it also embeds NUL
bytes for various codes below 32768. How are they handled? As remarked, codes 0
(NUL) and 47 (ASCII slash code) _are_ special to UNIX filename bytes strings.
If you imagine you can embed bare UTF-16 freely even excluding code 47, I think
one of us is missing something.
That's not "ASCII compatible". That's "not all byte codes can be freely used
without thought", and any multibyte coding will have to consider such things
when embedding itself in another coding scheme.
Cameron Simpson <cs at zip.com.au>
Microsoft: Committed to putting the "backward" into "backward compatibility."
More information about the Python-Dev