[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

Paul Moore p.f.moore at gmail.com
Sat Apr 25 16:38:03 CEST 2009

2009/4/25 "Martin v. Löwis" <martin at v.loewis.de>:
>> Following on from that, would this (under Martin's proposal) result in
>> programs receiving encoded strings, or just semantically-incorrect
>> ones?
> Not sure I understand the question - what is an "encoded string"?

Sorry. I was struggling to come up with terminology for the various
concepts I was trying to express, as I went along.

I was meaning a string which has been created from a non-decodable
byte sequence using the encoding process you specify in the PEP (with
the current version of the PEP, this would be a string with lone half
surrogate codes).

I was distinguishing these because some people seemed to be implying
that such strings were the ones which would result in exceptions. (I
think that was Stephen, when he referred to a "careful API").

> As you analyse below, sometimes, the current (2.x) file system encoding
> will do the right thing; sometimes, it will decode successfully, but
> still not give the intended string, and sometimes, it will fail. With
> the PEP, it won't fail, but give a string back that likely wasn't
> intended by the user. This might be confusing if you try to render it to
> a user interface; if the application merely passes it back to file
> system APIs, it will work fine.

OK, looks like my analysis matches yours, except that I wasn't sure if
the third case (a string that "likely wasn't intended") could result
in exceptions. From what you're saying, it sounds like it would
actually be similar to the second case - I'm not clear on how
surrogates work, though.

>> So, the next question is - do people on such systems frequently use
>> high-bit characters in filenames?
> They typically do until they run into problems. For example, if they
> set the locale to something, and then create files in their
> homedirectory, it will work just fine, and nobody else will ever see
> the files (except for the backup software).
> When they find that the files they created are inaccessible to others,
> they will often stop using funny characters.

Which sounds fairly practical - and the irony of someone with a "funny
character" in his surname telling me this hasn't escaped me :-)


More information about the Python-Dev mailing list