[Python-ideas] Support WHATWG versions of legacy encodings

Guido van Rossum guido at python.org
Wed Jan 31 11:36:14 EST 2018

On Wed, Jan 31, 2018 at 3:03 AM, Serhiy Storchaka <storchaka at gmail.com>

> 19.01.18 05:51, Guido van Rossum пише:
>> Can someone explain to me why this is such a controversial issue?
>> It seems reasonable to me to add new encodings to the stdlib that do the
>> roundtripping requested in the first message of the thread. As long as they
>> have new names that seems to fall under "practicality beats purity".
>> (Modifying existing encodings seems wrong -- did the feature request
>> somehow transmogrify into that?)
> In any case you need to change your code. If add new error handler -- you
> need to change the decoding code to use this error handler:
>     text = data.decode(encoding, 'whatwgreplace')
> If add new encodings -- you need to support an alias table that maps
> standard encoding names to corresponding names of WHATWG encoding:
>     aliases = {'windows_1252': 'windows-1252-whatwg',
>                'windows_1251': 'windows-1251-whatwg',
>                'utf_8': 'utf-8-whatwg', # utf-8 + surrogatepass
>                ...
>               }
>     ...
>     text = data.decode(aliases.get(normalize_encoding(encoding),
> encoding))
> I don't see an advantage of the second approach for the end user. And of
> course it is more costly for maintainers, because we will need  to
> implement around 20 new encodings, and adds a cognitive burden for new
> Python users, which now have more tables of encodings in the documentation.

Hm. As a user, unless I run into problems with a specific encoding, I never
care about how many encodings we have, so I don't see how adding extra
encodings bothers those users who have no need for them.

There's a reason to prefer new encoding names (maybe augmented with alias
table) over a new error handler: there are lots of places where encodings
are passed around via text files, Internet protocols, RPC calls, layers and
layers of function calls. Many of these treat the encoding as a string, not
as a (string, errorhandler) pair. So there may be situations where there is
no way in a given API to preserve the need for using a special error
handler, while the API would not have a problem preserving just the
encoding name.

--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180131/cb5f780e/attachment.html>

More information about the Python-ideas mailing list