[I18n-sig] Re: [XML-SIG] Re: [4suite] Output encodings again

uche.ogbuji@fourthought.com uche.ogbuji@fourthought.com
Tue, 28 Nov 2000 22:33:05 -0700

MAL and MvL Earlier...

> > > It's not really all that hard to write codecs for Python 2.0.
> > >
> > > You'll have to do two things:
> > > 1. write the codec by subclassing the base classes in codecs.py
> > > 2. write a search function which returns the needed constructors
> > >    and functions.
> > 
> > So how would I write a codec that converts all characters to Latin-1,
> > and converts those out of latin-1 to &#xxx; (instead of the
> > replacement character)? I'd need knowledge about what character are in
> > Latin-1, and I'd need to do conversion on a character-by-character
> > basis, right?
> Right.
> > And I can't possible use any of the _codecs helper
> > functions?
> You could play some tricks with the character mapping codec
> which is used by all code page codecs.
> You will achieve better performance with a native codec written
> in C though.
> > This is certainly feasible if I want it for a single character set,
> > but now if I want to do it wholesale for the entire set of character
> > sets supported by Python 2.0.
> This is probably not possible since there's no way to have the
> codecs use e.g. a callback function to handle error situations.
> But the situation is not all that bad: most codecs rely on the
> character mapping codec and you could simply implement a new
> version of it which does the XML escaping instead of raising
> errors.

OK.  I began tackling this and gave all the sources a once-over.  I think I 
have a decent idea how to write a codec, but I'm not sure how the character 
map codec fits in.  I've looked at charmap.py, and maybe I'm cross-eyed, but 
inspiration isn't coming to me.

Might I have any pointers?  Any cheat-sheets?  I'll probably be implementing 
in C.

Uche Ogbuji                               Principal Consultant
uche.ogbuji@fourthought.com               +1 303 583 9900 x 101
Fourthought, Inc.                         http://Fourthought.com 
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python