
On 2 February 2018 at 16:52, Steven D'Aprano <steve@pearwood.info> wrote:
If it were my decision, I'd have these codecs raise a warning (not an error) when used for encoding. But I guess some people will consider that either going too far or not far enough :-)
Rob pointed out that one of the main use cases for these codecs is when going "Oh, this was decoded with a WHATWG encoding, which isn't right, so I need to re-encode it with that encoding, and then decode it with the right encoding". So encoding is very much part of the usage model: it's needed when you've received the data over a Unicode based interface rather than a binary one. So I think the *use case* for the WHATWG encodings has been pretty well established. What hasn't been established is whether our answer to "How do I handle the WHATWG encodings?" is going to be: * "Here they are in the standard library (for 3.8+)!"; or * "These are available as part of the 'ftfy' library on PyPI, which also helps fixes various other problems in decoded text" Personally, I think a See Also note pointing to ftfy in the "codecs" module documentation would be quite a reasonable outcome of the thread - when it comes to consuming arbitrary data from the internet and cleaning up decoding issues, ftfy's data introspection based approach is likely to be far easier to start with than characterising the common errors for specific data sources and applying them individually, and if you're already using ftfy to figure out which fixes are needed, then it shouldn't be a big deal to keep it around for the more relaxed codecs that it provides. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia