[Pythonmac-SIG] TECManager

Bob Ippolito bob at redivi.com
Mon Sep 29 17:41:05 EDT 2003


On Friday, Sep 26, 2003, at 07:19 America/New_York, Jack Jansen wrote:

>
> On Friday, September 26, 2003, at 11:42 AM, Bob Ippolito wrote:
>> Currently TECManager only does decoding (not encoding) of 
>> MacSanskrit.  The reason I didn't write the encoding routines is 
>> because the API for it is more far complex and I didn't need the 
>> stuff :)
>
> We'll wait for a native Sanskrit speaker who's also a Python 
> programmer to add those:-)
>
>> The issue with adding it to Python's unicode support is that you 
>> don't get to pass a lot of context when you say 
>> str.decode('macroman'), where you may want to say 
>> TECManager.ConvertToUnicode(str, script=smRoman, language=langDutch, 
>> region=verNetherlands) .. there's also a richer set of unicode 
>> fallbacks than the Python version.  Of course, that said, just 
>> putting it in place of what is already there would be better than 
>> what's there now, but we'll have to make decisions as to what to call 
>> the scripts (do we use 'smRoman' or 'macroman' or both.. rinse and 
>> repeat for the other 36 script codes).
>
> I looked specifically at this at the time I looked at TEC, and I got 
> the impression
> that there's a mapping between the Apple script/language/region tuple 
> and the unicode
> name. I haven't tried these, but I would expect them to return 
> something like
> "MacRoman" or "roman" or so that we could convert to "macroman" or 
> "mac_roman"
> to register as the codec. As long as we have *any* bidirectional 
> mapping
> between script/language/region tuples and strings that are acceptable 
> to
> unicode.encode() we should be fine.

I have hacked in encoding and decoding support to TECManager, and it 
now has stateful StreamWriter/StreamReader objects and a stateless 
Codecc object..  I need to figure out how to reasonably hook in PEP 293 
(Codec Error Handling Callbacks) and then write some good unit tests 
and handlers for obscure scenarios and we should be good.

Right now it's using some magic in order to install the codecs.. 
basically it goes through its ScriptCode enumerator, changes all the 
"smSomeScript" names into "mac_somescript" and injects that as a 
"built-in" to sys.modules (via the types.ModuleType(name) mechanism for 
creating modules).  Apparently that's how the codec registry finds 
codecs, so it works just perfect.  The one problem is that it does 
create about 30 state objects on import (where the normal mechanisms 
would only create the module when you tried to use it as a codec).  I 
don't see these hurting anything and it doesn't take a long time.

-bob




More information about the Pythonmac-SIG mailing list