Re: [Python-ideas] Support WHATWG versions of legacy encodings

Feb. 6, 2018

      By now, it sounds right to me that I should implement these codecs in a
package. I accept that I've established the use case, but not sufficiently
established why it belongs in Python.

The package can easily be ftfy -- although I should point out that what's
in ftfy at the moment isn't quite right! "ftfy.bad_codecs" implements the
"fall back on Latin-1" idea that many people here have intuitively
suggested, because I was implementing it just based on the evidence of text
I saw; I didn't know at the time that there was an actual standard
involved. The result differs subtly from what Web browsers do in cases
outside the C1 range. But of course I can work on re-implementing the
encodings correctly based on what I've learned.

I think it would be best if these encodings were actually implemented in
the "webencodings" package, or in a package that both ftfy and webencodings
could use. I have certainly encountered cases in web scraping where,
because webencodings doesn't use the same Windows-1252 as the actual web
does, I have had to decode the text even more incorrectly using Latin-1 and
_then_ run it through ftfy -- in effect, adding a layer of mojibake so I
can fix two layers of mojibake. That's kind of absurd and it's why I
thought this belonged in Python itself. But I'll talk to the webencodings
author instead.

On Tue, 6 Feb 2018 at 05:12 Stephen J. Turnbull <
turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
...
Nick Coghlan writes:
...
Personally, I think a See Also note pointing to ftfy in the "codecs"
module documentation would be quite a reasonable outcome of the thread
Yes please.  The more I hear about purported use cases (with the
exception of Nathaniel's "don't crash when I manipulate the DOM" case,
which is best handled by errors='surrogateescape'), the less I see
anything "standard" about them.
_______________________________________________
Python-ideas mailing list
Python-ideas@python.org
https://mail.python.org/mailman/listinfo/python-ideas
Code of Conduct: http://python.org/psf/codeofconduct/

Re: [Python-ideas] Support WHATWG versions of legacy encodings

Rob Speer