[Python-Dev] transform() and untransform() methods, and the codec registry
Stephen J. Turnbull
stephen at xemacs.org
Sat Dec 4 09:31:04 CET 2010
Alexander Belopolsky writes:
> In fact, once the language moratorium is over, I will argue that
> str.encode() and byte.decode() should deprecate encoding argument and
> just do UTF-8 encoding/decoding. Hopefully by that time most people
> will forget that other encodings exist. (I can dream, right?)
It's just a dream. There's a pile of archival material, often on R/O
media, out there that won't be transcoded any more quickly than the
inscriptions on Tutankhamun's tomb.
Remember, Python is a language used to implement such translations.
It's not an application. I think it would be reasonable to make UTF-8
the *default* encoding on all platforms, except for internal OS
functions, where Windows will presumably continue to use UTF-16 and
*nix distros will probably continue to agree to disagree about whether
on-disk format is NFD or NFC (and the Python language as yet doesn't
know about NFC v. NFD, although the library does).
In the discussion of PEP 263, I proposed that the external encoding of
Python scripts themselves be fixed as UTF-8, and other encodings would
have to be pretranslated by an appropriate codec. That was shouted
down by the European contingent, who wanted to continue using Latin-1
and Latin-2 without codecs or a wrapper to call them transparently.
However, this time around you might get a more sympathetic hearing.
More information about the Python-Dev