[issue5902] Stricter codec names

Marc-Andre Lemburg report at bugs.python.org
Thu Feb 24 10:20:45 CET 2011


Marc-Andre Lemburg <mal at egenix.com> added the comment:

Alexander Belopolsky wrote:
> 
> Alexander Belopolsky <belopolsky at users.sourceforge.net> added the comment:
> 
>> Accepting all common forms for
>> encoding names means that you can usually give Python an encoding name
>> from, e.g. a HTML page, or any other file or system that specifies an
>> encoding.
> 
> I don't buy this argument.  Running attached script on http://www.iana.org/assignments/character-sets shows that there are hundreds of registered charsets that are not accepted by python:
> 
> $ ./python.exe iana.py| wc -l
>      413
> 
> Any serious HTML or XML processing software should be based on the IANA character-sets file rather than on the ad-hoc list of aliases that made it into encodings/aliases.py.

Let's do a reality check:

How often do you see requests for additions to the aliases we
have in Python ? Perhaps one every year, if at all.

We take great care not to add aliases that are not in common
use or that do not have a proven track record of really being
compatible to the codec in question.

If you think we are missing some aliases, please open tickets
for them, indicating why these should be added.

If you really want complete IANA coverage, I suggest you create
a normalization module which maps the IANA names to our names
and upload it to PyPI.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5902>
_______________________________________


More information about the Python-bugs-list mailing list