[Python-Dev] transform() and untransform() methods, and the codec registry

Victor Stinner victor.stinner at haypocalc.com
Sun Dec 5 23:25:27 CET 2010


On Saturday 04 December 2010 09:31:04 you wrote:
> Alexander Belopolsky writes:
>  > In fact, once the language moratorium is over, I will argue that
>  > str.encode() and byte.decode() should deprecate encoding argument and
>  > just do UTF-8 encoding/decoding.  Hopefully by that time most people
>  > will forget that other encodings exist.  (I can dream, right?)
> 
> It's just a dream.  There's a pile of archival material, often on R/O
> media, out there that won't be transcoded any more quickly than the
> inscriptions on Tutankhamun's tomb.

Not only, many libraries expect use bytes arguments encoded to a specific 
encoding (eg. locale encoding). Said differenlty, only few libraries written in 
C accept wchar* strings.

The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte 
strings. The libc only accept wide characters for a few operations. I don't 
know how to open a file with an unicode path with the Linux libc: you have to 
encode it...

Alexander: you should first patch all UNIX/BSD kernels to use unicode 
everywhere, then patch all libc implementations, and then all libraries 
(written in C). After that, you can have a break.

Victor


More information about the Python-Dev mailing list