[Python-Dev] PEP 383: Non-decodable Bytes in System Character Interfaces

"Martin v. Löwis" martin at v.loewis.de
Sat Apr 25 14:22:27 CEST 2009

> The problem with this, and other preceding schemes that have been
> discussed here, is that there is no means of ascertaining whether a
> particular file name str was obtained from a str API, or was funny-
> decoded from a bytes API... and thus, there is no means of reliably
> ascertaining whether a particular filename str should be passed to a
> str API, or funny-encoded back to bytes.

Why is it necessary that you are able to make this distinction?

> Picking a character (I don't find U+F01xx in the
> Unicode standard, so I don't know what it is)

It's a private use area. It will never carry an official character

> As I realized in the email-sig, in talking about decoding corrupted
> headers, there is only one way to guarantee this... to encode _all_
> character sequences, from _all_ interfaces.  Basically it requires
> reserving an escape character (I'll use ? in these examples -- yes, an
> ASCII question mark -- happens to be illegal in Windows filenames so
> all the better on that platform, but the specific character doesn't
> matter... avoiding / \ and . is probably good, though).

I think you'll have to write an alternative PEP if you want to see
something like this implemented throughout Python.


More information about the Python-Dev mailing list