[Python-Dev] transform() and untransform() methods, and the codec registry

Nick Coghlan ncoghlan at gmail.com
Mon Dec 6 05:25:30 CET 2010


On Mon, Dec 6, 2010 at 8:25 AM, Victor Stinner
<victor.stinner at haypocalc.com> wrote:
> Not only, many libraries expect use bytes arguments encoded to a specific
> encoding (eg. locale encoding). Said differenlty, only few libraries written in
> C accept wchar* strings.
>
> The Linux kernel (or many, or all, UNIX/BSD kernels) only manipulate byte
> strings. The libc only accept wide characters for a few operations. I don't
> know how to open a file with an unicode path with the Linux libc: you have to
> encode it...
>
> Alexander: you should first patch all UNIX/BSD kernels to use unicode
> everywhere, then patch all libc implementations, and then all libraries
> (written in C). After that, you can have a break.

Slightly less ambitious is to get them all to agree to standardise on
UTF-8 as the encoding mechanism (which is actually in the process of
happening, it just has a long way to go).

However, as a glue language, it is part of Python's role to help
manage the transition from legacy encodings to UTF-8, so it will be a
very long time before the idea of removing support for the encoding
argument becomes even remotely feasible.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list