[Python-Dev] PEP 383 and GUI libraries

James Y Knight foom at fuhm.net
Sat May 2 04:12:15 CEST 2009


On May 1, 2009, at 9:42 PM, Zooko O'Whielacronx wrote:
> Yep, I reversed the order of encode() and decode().  However, my whole
> statement was utterly wrong and shows that I still didn't fully get it
> yet.  I have flip-flopped again and currently think that PEP 383 is
> useless for this use case and that my original plan [1] is still the
> way to go.  Please let me know if you spot a flaw in my plan or a
> ridiculousity in my requirements, or if you see a way that PEP 383 can
> help me.

If I were designing a new system such as this, I'd probably just go  
for utf8b *always*. That is, set the filesystem encoding to utf-8b.  
The end. All files always keep the same bytes transferring between  
unix systems. Thus, for the 99% of the world that uses either windows  
or a utf-8 locale, they get useful filenames inside tahoe. The other  
1% of the world that uses something like latin-1, EUC_JP, etc. on  
their local system sees mojibake filenames in tahoe, but will see the  
same filename that they put in when they take it back out.

Gnome already uses only utf-8 for filename displays for a few years  
now, for example, so this isn't exactly an unheard-of position to  
take...

But if you don't do that, then, I still don't see what purpose your  
requirements serve. If I have two systems: one with a UTF-8 locale,  
and one with a Latin-1 locale, why should transmitting filenames from  
system 1 to system 2 through tahoe preserve the raw bytes, but doing  
the reverse *not* preserve the raw bytes? (all byte-sequences are  
valid in latin-1, remember, so they'll all decode into unicode without  
error, and then be reencoded in utf-8...). This seems rather a useless  
behavior to me.

James


More information about the Python-Dev mailing list