[Python-ideas] Support WHATWG versions of legacy encodings

Serhiy Storchaka storchaka at gmail.com
Mon Feb 5 01:40:40 EST 2018


05.02.18 05:01, Nick Coghlan пише:
> On 2 February 2018 at 16:52, Steven D'Aprano <steve at pearwood.info> wrote:
>> If it were my decision, I'd have these codecs raise a warning (not an
>> error) when used for encoding. But I guess some people will consider
>> that either going too far or not far enough :-)
> 
> Rob pointed out that one of the main use cases for these codecs is
> when going "Oh, this was decoded with a WHATWG encoding, which isn't
> right, so I need to re-encode it with that encoding, and then decode
> it with the right encoding". So encoding is very much part of the
> usage model: it's needed when you've received the data over a Unicode
> based interface rather than a binary one.

Wasn't the "surrogateescape" error handler designed for this purpose?

WHATWG encodings solve the same problem that "surrogateescape", but

1) They use different range for representing unmapped characters.
2) Not all unmapped characters can be decoded, thus a decoding is lossy, 
and a round-trip not always works.



More information about the Python-ideas mailing list