[Python-ideas] Support WHATWG versions of legacy encodings
random832 at fastmail.com
Thu Jan 18 12:32:42 EST 2018
On Thu, Jan 18, 2018, at 11:04, Stephen J. Turnbull wrote:
> Nathaniel Smith writes:
> > It's also nice to be able to parse some HTML data, make a few changes
> > in memory, and then serialize it back to HTML. Having this crash on
> > random documents is rather irritating, esp. if these documents are
> > standards-compliant HTML as in this case.
> This example doesn't make sense to me. Why would *conformant* HTML
> crash the codec? Unless you're saying the source is non-conformant
> and *lied* about the encoding?
I think his point is that the WHATWG standard is the one that governs HTML and therefore HTML that uses these encodings (including the C1 characters) are conformant to *that* standard, regardless of their status with regards to anything published by Unicode, and that the new encodings (whatever they are called), including the round-trip for b'\x81' as \u0081, are the ones identified by a statement in an HTML document that it uses windows-1252, and therefore such a statement is not a lie.
More information about the Python-ideas