[Python-Dev] transform() and untransform() methods, and the codec registry

Sat Dec 4 09:31:04 CET 2010

Alexander Belopolsky writes:

 > In fact, once the language moratorium is over, I will argue that
 > str.encode() and byte.decode() should deprecate encoding argument and
 > just do UTF-8 encoding/decoding.  Hopefully by that time most people
 > will forget that other encodings exist.  (I can dream, right?)

It's just a dream.  There's a pile of archival material, often on R/O
media, out there that won't be transcoded any more quickly than the
inscriptions on Tutankhamun's tomb.

Remember, Python is a language used to implement such translations.
It's not an application.  I think it would be reasonable to make UTF-8
the *default* encoding on all platforms, except for internal OS
functions, where Windows will presumably continue to use UTF-16 and
*nix distros will probably continue to agree to disagree about whether
on-disk format is NFD or NFC (and the Python language as yet doesn't
know about NFC v. NFD, although the library does).

In the discussion of PEP 263, I proposed that the external encoding of
Python scripts themselves be fixed as UTF-8, and other encodings would
have to be pretranslated by an appropriate codec.  That was shouted
down by the European contingent, who wanted to continue using Latin-1
and Latin-2 without codecs or a wrapper to call them transparently.
However, this time around you might get a more sympathetic hearing.