[Python-Dev] transform() and untransform() methods, and the codec registry
Victor Stinner
victor.stinner at haypocalc.com
Sun Dec 5 23:25:27 CET 2010
On Saturday 04 December 2010 09:31:04 you wrote:
> Alexander Belopolsky writes:
> > In fact, once the language moratorium is over, I will argue that
> > str.encode() and byte.decode() should deprecate encoding argument and
> > just do UTF-8 encoding/decoding. Hopefully by that time most people
> > will forget that other encodings exist. (I can dream, right?)
>
> It's just a dream. There's a pile of archival material, often on R/O
> media, out there that won't be transcoded any more quickly than the
> inscriptions on Tutankhamun's tomb.
Not only, many libraries expect use bytes arguments encoded to a specific
encoding (eg. locale encoding). Said differenlty, only few libraries written in
C accept wchar* strings.
The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte
strings. The libc only accept wide characters for a few operations. I don't
know how to open a file with an unicode path with the Linux libc: you have to
encode it...
Alexander: you should first patch all UNIX/BSD kernels to use unicode
everywhere, then patch all libc implementations, and then all libraries
(written in C). After that, you can have a break.
Victor
More information about the Python-Dev
mailing list