How to add an encoding alias?

In the spambayes project we encountered some mail samples that use an encoding name ('ansi-x3-4-1968') that's not in encodings/aliases.py. (At least not until I added it to CVS yesterday.) I'd like the spambayes code base to be compatible with Python 2.2.1, so I like to add this one to the list of aliases. Is there an official API to add an alias, or do I just have to write import encodings.aliases encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii' ??? (BTW, there's an alias 'ansi_x3.4_1986' for ASCII. Was the ASCII standard renewed in 1986, or is that simply because there are encoding designators out there in real life that contain a typo?) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
There's no other API to do this and since new features are not allowed in 2.2.x that's the only way to go unless you register your own lookup function which knows about the extra alias.
That was one of the official names for ASCII: http://www.archivists.org/catalog/stds99/chapter7.html#x3_4 More details on the history of ASCII can be found at the top of that page. The original version X3.4 was approved in 1968, so it's not a typo. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Thanks, I'll do that.
Wow. Cute. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
import encodings.aliases encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii'
In order for the lookup to work, you have to replace hyphens with underscores; see the top of aliases.py. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Good catch! Then my "fix" to aliases.py was also wrong. Would it make sense to change the lookup function to convert *all* punctuation to underscores before doing the lookup? (Then this one would actually have worked...) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Codecs must currently use names as defined by the search function in the encodings package: Codec modules must have names corresponding to standard lower-case encoding names with hyphens mapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py'. We could extend this to: Codec modules must have names corresponding to standard lower-case encoding names with all non-alphanumeric charactersmapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py' and 'ISO 639:1988' would be implemented as module 'iso_639_1988'. Note that the aliasing dictionary is consulted *after* having applied this mapping. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

+1; +1 on backport to 2.2.2 also. Note that this requires some changes to the dict in aliases.py. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Done. Not backported to 2.2.2, though, since this is a new feature. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Guido van Rossum wrote:
There's no other API to do this and since new features are not allowed in 2.2.x that's the only way to go unless you register your own lookup function which knows about the extra alias.
That was one of the official names for ASCII: http://www.archivists.org/catalog/stds99/chapter7.html#x3_4 More details on the history of ASCII can be found at the top of that page. The original version X3.4 was approved in 1968, so it's not a typo. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Thanks, I'll do that.
Wow. Cute. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
import encodings.aliases encodings.aliases.aliases['ansi-x3-4-1968'] = 'ascii'
In order for the lookup to work, you have to replace hyphens with underscores; see the top of aliases.py. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

Good catch! Then my "fix" to aliases.py was also wrong. Would it make sense to change the lookup function to convert *all* punctuation to underscores before doing the lookup? (Then this one would actually have worked...) --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Codecs must currently use names as defined by the search function in the encodings package: Codec modules must have names corresponding to standard lower-case encoding names with hyphens mapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py'. We could extend this to: Codec modules must have names corresponding to standard lower-case encoding names with all non-alphanumeric charactersmapped to underscores, e.g. 'utf-8' is implemented by the module 'utf_8.py' and 'ISO 639:1988' would be implemented as module 'iso_639_1988'. Note that the aliasing dictionary is consulted *after* having applied this mapping. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/

+1; +1 on backport to 2.2.2 also. Note that this requires some changes to the dict in aliases.py. --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Done. Not backported to 2.2.2, though, since this is a new feature. -- Marc-Andre Lemburg CEO eGenix.com Software GmbH _______________________________________________________________________ eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,... Python Consulting: http://www.egenix.com/ Python Software: http://www.egenix.com/files/python/
participants (2)
-
Guido van Rossum
-
M.-A. Lemburg