[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces
v+python at g.nevcal.com
Wed Apr 29 23:09:26 CEST 2009
On approximately 4/29/2009 1:28 PM, came the following characters from
the keyboard of Martin v. Löwis:
>>>>>>>> C. File on disk with the invalid surrogate code, accessed via the
>>>>>>>> str interface, no decoding happens, matches in memory the file on disk
>>>>>>>> with the byte that translates to the same surrogate, accessed via the
>>>>>>>> bytes interface. Ambiguity.
>>> What does that mean? What specific interface are you referring to to
>>> obtain file names?
>> So I guess I'd better suggest that a specific, equivalent directory name
>> be passed in either bytes or str form.
> [Leaving the issue of the empty string apparently having different
> meanings aside ...]
> Ok. Now I understand the example. So you do
> and you have a file in c:/tmp that is named "abc\uDC10".
>> So what you are saying here is that Python doesn't use the "A" forms of
>> the Windows APIs for filenames, but only the "W" forms, and uses lossy
>> decoding (from MS) to the current code page (which can never be UTF-8 on
> Actually, it does use the A form, in the second listdir example. This,
> in turn (inside Windows), uses the lossy CP_ACP encoding. You get back
> a byte string; the listdirs should give
> (not quite sure about the second - I only guess that CP_ACP will replace
> the half surrogate with a question mark).
> So where is the ambiguity here?
None. But not everyone can read all the Python source code to try to
understand it; they expect the documentation to help them avoid that.
Because the documentation is lacking in this area, it makes your
concisely stated PEP rather hard to understand.
Thanks for clarifying the Windows behavior, here. A little more
clarification in the PEP could have avoided lots of discussion. It
would seem that a PEP, proposed to modify a poorly documented (and
therefore likely poorly understood) area, should be educational about
the status quo, as well as presenting the suggested change. Or is it
the Python philosophy that the PEPs should be as incomprehensible as
possible, to generate large discussions?
Glenn -- http://nevcal.com/
A protocol is complete when there is nothing left to remove.
-- Stuart Cheshire, Apple Computer, regarding Zero Configuration Networking
More information about the Python-Dev