[Python-ideas] Support WHATWG versions of legacy encodings

Rob Speer rspeer at luminoso.com
Wed Jan 31 15:51:42 EST 2018

On Wed, 31 Jan 2018 at 12:50 Serhiy Storchaka <storchaka at gmail.com> wrote:

> The passed encoding differs from the name of new Python encoding. It is
> just 'windows-1252', not 'windows-1252-whatwg'. If just change the
> existing encoding, this can break other code that expects the standard
> 'windows-1252'. Thus every time when you need 'windows-1252-whatwg'
> instead of 'windows-1252' passed with the text, you need to map encoding
> names. How this differs from using a special error handler?

How is that the *same* as using a special error handler? This is not at all
what error handlers are for.

Mapping Python encoding names to the WHATWG standard (which, incidentally,
is now also the W3C standard) is currently addressed by the "webencodings"
package. That package currently doesn't return the correct encodings
(because they don't exist), but it does at least return windows-1252 when a
Web page says it's in "iso-8859-1", because that's what the Web standard
says to do.

Yet one problem, is that actually we need two error handlers. WHATWG
> specifies two behaviors for unmapped codes outside of C0-C1 range:
> replacing with a special character or error. This corresponds standard
> Python handlers 'replace' and 'strict'. Thus we need either add two new
> error handlers 'whatwgreplace' and 'whatwgstrict', or add *two* sets of
> new encodings (more than 70 encodings totally!).

What?! This is going way off the rails.

There are 8 new encodings. Not 70. Those 8 encodings would use the error
handlers that already exist in Python.

Why are you even talking about the C0 range? The C0 range is in ASCII.

The ridiculous complexity of some of these counter-proposals has largely
come from trying to use an error handler to do an encoding's job; now
you're proposing to also use more encodings to do the error handler's job.
I don't think it's a serious proposal, it's just so you could say "now you
need 70 encodings lol". Maybe you just like to torpedo things?

The "whatwg error handler" thing will not happen. It is a terrible design,
a misunderstanding of what error handlers are for, and it attempts to be an
overly-general solution to a _problem that does not generalize_. Even if
this task could be sensibly implemented with error handlers, there are no
other instances where these error handlers would ever be useful.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180131/6a5c0233/attachment.html>

More information about the Python-ideas mailing list