
On 01.02.2018 00:40, Chris Angelico wrote:
On Thu, Feb 1, 2018 at 10:15 AM, Chris Barker <chris.barker@noaa.gov> wrote:
I still have no ide4a why there is such resistance to this -- yes, it's a fairly small benefit over a package no PyPi, but there is also virtually no downside.
I don't understand it either. Aside from maybe bikeshedding the *name* of the encoding, this seems like a pretty straight-forward addition.
I guess many of you are not aware of how we have treated such encoding additions in the past 1.5 decades. In general, we have only added new encodings when there was an encoding missing which a lot of people were actively using. We asked for official documentation defining the mappings, references showing usage and IANA or similar standard names to use for the encoding itself and its aliases. In recent years, we had only very few such requests, mainly because the set we have in Python is already fairly complete. Now the OP comes proposing to add a whole set of encodings which only differ slightly from our existing ones. Backing is their use and definition by WHATWG, a consortium of browser vendors who are interested in showing web pages to users in a consistent way. WHATWG decided to simply override the standard names for encodings with new mappings under their control. Again, their motivation is clear: browsers get documents with advertised encoding which don't always match the standard ones, so they have to make some choices on how to display those documents. The easiest way for them is to define all special cases in a set of new mappings for each standard encoding name. This is all fine, but it's also a very limited use case: that of wanting to display web pages in a browser. It's certainly needed for applications implementing browser interfaces and probably also for ones which do web scraping, but otherwise, the need should rarely arise. What WHATWG uses as workarounds may also not necessarily be what actual users would like to have. Such workarounds are always trade-offs and they can change over time - which WHATWG addresses by making the encodings "living standards". They are a solution, but not a one fits all way of dealing with broken data. We also have the naming issue, since WHATWG chose to use the same names as the standard mappings. Anything we'd define will neither match WHATWG nor any other encoding standard name, so we'd be creating a new set of encoding names - which is really not what the world is after, including WHATWG itself. People would start creating encoded text using these new encoding names, resulting in even more mojibake out there instead of fixing the errors in the data and using Unicode or UTF-8 for interchange. As I mentioned before, we could disable encoding in the new mappings to resolve this concern, but the OP wasn't interested in such an approach. As alternative approach we proposed error handlers, which are the normal technology to use when dealing with encoding errors. Again, the OP wasn't interested. Please also note that once we start adding, say "whatwg-<original name>" encodings (or rather decodings :-), going for the simple charmap encodings first, someone will eventually also request addition of the more complex Asian encodings which WHATWG defines. Maintaining these is hard, since they require writing C code for performance reasons and to keep the mapping tables small. I probably forgot a few aspects, but the above is how I would summarize the discussion from the perspective of the people who have dealt with such discussions in the past. There are quite a few downsides to consider and since the OP is not interested in going for a compromise as described above, I don't see a way forward. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Feb 01 2018)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/