[Python-ideas] Support WHATWG versions of legacy encodings
njs at pobox.com
Wed Jan 17 18:30:41 EST 2018
On Wed, Jan 17, 2018 at 10:13 AM, Rob Speer <rspeer at luminoso.com> wrote:
> I'm going to push back on the idea that this should only be used for
> decoding, not encoding.
> The use case I started with -- showing people how to fix mojibake using
> Python -- would *only* use these codecs in the encoding direction. To fix
> the most common case of mojibake, you encode it as web-1252 and decode it as
> UTF-8 (because you got the data from someone who did the opposite).
It's also nice to be able to parse some HTML data, make a few changes
in memory, and then serialize it back to HTML. Having this crash on
random documents is rather irritating, esp. if these documents are
standards-compliant HTML as in this case.
Nathaniel J. Smith -- https://vorpus.org
More information about the Python-ideas