[Python-Dev] PEP 383 and GUI libraries

Fri May 1 18:38:46 CEST 2009

> Okay, I am wrong about this.  Having a flag to remember whether I had to
> fall back to the utf-8b trick is one method to implement my requirement,
> but my actual requirement is this:
> 
> Requirement: either the unicode string or the bytes are faithfully
> transmitted from one system to another.

I don't understand this requirement very well, in particular not
the "faithfully" part.

> That is: if you read a filename from the filesystem, and transmit that
> filename to another system and use it, then there are two cases:

What do you mean by "use it"? Things like opening files? How does
that work? In general, a file name valid on one system is invalid
on a different system - or, at least, refers to a different file
over there. This is independent of encodings.

> Requirement 1: the byte string was valid in the encoding of source
> system, in which case the unicode name is faithfully transmitted
> (i.e. the bytes that finally land on the target system are the result of
> sourcebytes.decode(source_sys_encoding).encode(target_sys_encoding).

In all your descriptions, I'm puzzled as to where exactly you get
the source bytes from. If you use the PEP 383 interfaces, you will
start with character strings, not byte strings, always.

> Okay, I find it surprisingly easy to make subtle errors in this encoding
> stuff, so please let me know if you spot one.  Is it true that
> srcbytes.encode(srcencoding, 'python-escape').decode('utf-8',
> 'python-escape') will always produce srcbytes ? 

I think you mixed up bytes and unicode here: if srcbytes is indeed
a bytes object, then you can't apply .encode to it.

Regards,
Martin