[IronPython] IronPython codec names not compatible with CPython

John Machin sjmachin at lexicon.net
Sun Oct 8 04:54:00 CEST 2006


CPython recognises both 'gbk' and 'cp936' i.e. unicode('some string', 
'gbk') does what you'd expect.
IronPython 1.0.1 recognises only 'cp936'.

CPython recognises 'mac_roman', 'mac_greek', etc.
IronPython doesn't.

After a [rare] flash of inspiration, I tried 'cp10000', 'cp10006', etc 
and IronPython recognises these, which CPython doesn't.

The "differences" document says: """
IronPython's _codecs module implementation is incomplete.  There are 
several replace_error/lookup_error handlers that IronPython does not 
implement.
"""
It is not apparent whether this is intended to mean that missing error 
handlers is the *only* known deficiency.

IronPython Bug #3214 mentions "import encodings" as fixing a 
LookupError. Well, you learn something new every day:
1. CPython permits one to import encodings, but it's not documented 
AFAICT, and it's *not* necessary in order to use 'gbk', 'mac_roman', etc.
2. After import encodings, IronPython recognises 'mac_roman' and 
'mac_greek', but still not 'gbk'.

How much of the above is bug and how much is feature? What is this 
mysterious encodings module anyway? Does this mean the CPython test 
suite doesn't cover the above cases? Are the equivalences (mac_roman, 
cp10000) etc correct and official? Should I just dump all of the above 
into the IronPython Issue Tracker?

Cheers,
John



More information about the Ironpython-users mailing list