<div dir="ltr">By now, it sounds right to me that I should implement these codecs in a package. I accept that I've established the use case, but not sufficiently established why it belongs in Python.<div><br></div><div>The package can easily be ftfy -- although I should point out that what's in ftfy at the moment isn't quite right! "ftfy.bad_codecs" implements the "fall back on Latin-1" idea that many people here have intuitively suggested, because I was implementing it just based on the evidence of text I saw; I didn't know at the time that there was an actual standard involved. The result differs subtly from what Web browsers do in cases outside the C1 range. But of course I can work on re-implementing the encodings correctly based on what I've learned.<div><br></div><div>I think it would be best if these encodings were actually implemented in the "webencodings" package, or in a package that both ftfy and webencodings could use. I have certainly encountered cases in web scraping where, because webencodings doesn't use the same Windows-1252 as the actual web does, I have had to decode the text even more incorrectly using Latin-1 and _then_ run it through ftfy -- in effect, adding a layer of mojibake so I can fix two layers of mojibake. That's kind of absurd and it's why I thought this belonged in Python itself. But I'll talk to the webencodings author instead.</div></div></div><br><div class="gmail_quote"><div dir="ltr">On Tue, 6 Feb 2018 at 05:12 Stephen J. Turnbull <<a href="mailto:turnbull.stephen.fw@u.tsukuba.ac.jp">turnbull.stephen.fw@u.tsukuba.ac.jp</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Nick Coghlan writes:<br>
<br>
> Personally, I think a See Also note pointing to ftfy in the "codecs"<br>
> module documentation would be quite a reasonable outcome of the thread<br>
<br>
Yes please. The more I hear about purported use cases (with the<br>
exception of Nathaniel's "don't crash when I manipulate the DOM" case,<br>
which is best handled by errors='surrogateescape'), the less I see<br>
anything "standard" about them.<br>
<br>
_______________________________________________<br>
Python-ideas mailing list<br>
<a href="mailto:Python-ideas@python.org" target="_blank">Python-ideas@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-ideas" rel="noreferrer" target="_blank">https://mail.python.org/mailman/listinfo/python-ideas</a><br>
Code of Conduct: <a href="http://python.org/psf/codeofconduct/" rel="noreferrer" target="_blank">http://python.org/psf/codeofconduct/</a><br>
</blockquote></div>