[Python-ideas] Support WHATWG versions of legacy encodings

Serhiy Storchaka storchaka at gmail.com
Wed Jan 31 06:03:20 EST 2018


19.01.18 05:51, Guido van Rossum пише:
> Can someone explain to me why this is such a controversial issue?
> 
> It seems reasonable to me to add new encodings to the stdlib that do the 
> roundtripping requested in the first message of the thread. As long as 
> they have new names that seems to fall under "practicality beats 
> purity". (Modifying existing encodings seems wrong -- did the feature 
> request somehow transmogrify into that?)

In any case you need to change your code. If add new error handler -- 
you need to change the decoding code to use this error handler:

     text = data.decode(encoding, 'whatwgreplace')

If add new encodings -- you need to support an alias table that maps 
standard encoding names to corresponding names of WHATWG encoding:

     aliases = {'windows_1252': 'windows-1252-whatwg',
                'windows_1251': 'windows-1251-whatwg',
                'utf_8': 'utf-8-whatwg', # utf-8 + surrogatepass
                ...
               }
     ...
     text = data.decode(aliases.get(normalize_encoding(encoding), encoding))

I don't see an advantage of the second approach for the end user. And of 
course it is more costly for maintainers, because we will need  to 
implement around 20 new encodings, and adds a cognitive burden for new 
Python users, which now have more tables of encodings in the documentation.



More information about the Python-ideas mailing list