[IronPython] IronPython codec names not compatible with CPython

Dino Viehland dinov at exchange.microsoft.com
Mon Oct 9 20:41:39 CEST 2006

Encodings appears to be a special module that gets imported by CPython on startup.  It appears to get imported even if you startup and disable reading site.py on CPython.  Currently IronPython has no dependencies on the standard CPython library and we didn't want to add one just for this.  The mac_roman and mac_green encodings are all defined in this module - it's just that CPython will import it for you so it justworks.

We probably also need to add some additional aliases for the encodings we support but for which you're not finding (e.g. gbk). We have some hardcoded encodings already but most of them are just automatically translated from .NET to Python names and that doesn't always get all the correct mappings.  We also map all code pages to cp# so that explains why we support that but CPython doesn't.  I don't think we would special case to not do this mapping in some cases either.

The CPython test suite actually gives us a little trouble here because it doesn't import encodings.  I believe we have a slightly modified version that does the import and makes sure everything works.

Dumping this into a bug would be great..  I think the main thing to capture here is that we're missing some names & alises in our standard encodings.

-----Original Message-----
From: users-bounces at lists.ironpython.com [mailto:users-bounces at lists.ironpython.com] On Behalf Of John Machin
Sent: Saturday, October 07, 2006 7:54 PM
To: Discussion of IronPython
Subject: [IronPython] IronPython codec names not compatible with CPython

CPython recognises both 'gbk' and 'cp936' i.e. unicode('some string',
'gbk') does what you'd expect.
IronPython 1.0.1 recognises only 'cp936'.

CPython recognises 'mac_roman', 'mac_greek', etc.
IronPython doesn't.

After a [rare] flash of inspiration, I tried 'cp10000', 'cp10006', etc and IronPython recognises these, which CPython doesn't.

The "differences" document says: """
IronPython's _codecs module implementation is incomplete.  There are several replace_error/lookup_error handlers that IronPython does not implement.
It is not apparent whether this is intended to mean that missing error handlers is the *only* known deficiency.

IronPython Bug #3214 mentions "import encodings" as fixing a LookupError. Well, you learn something new every day:
1. CPython permits one to import encodings, but it's not documented AFAICT, and it's *not* necessary in order to use 'gbk', 'mac_roman', etc.
2. After import encodings, IronPython recognises 'mac_roman' and 'mac_greek', but still not 'gbk'.

How much of the above is bug and how much is feature? What is this mysterious encodings module anyway? Does this mean the CPython test suite doesn't cover the above cases? Are the equivalences (mac_roman,
cp10000) etc correct and official? Should I just dump all of the above into the IronPython Issue Tracker?

users mailing list
users at lists.ironpython.com

More information about the Ironpython-users mailing list