[I18n-sig] Re: [XML-SIG] Re: [4suite] Output encodings again
Tue, 28 Nov 2000 22:33:05 -0700
MAL and MvL Earlier...
> > > It's not really all that hard to write codecs for Python 2.0.
> > >
> > > You'll have to do two things:
> > > 1. write the codec by subclassing the base classes in codecs.py
> > > 2. write a search function which returns the needed constructors
> > > and functions.
> > So how would I write a codec that converts all characters to Latin-1,
> > and converts those out of latin-1 to &#xxx; (instead of the
> > replacement character)? I'd need knowledge about what character are in
> > Latin-1, and I'd need to do conversion on a character-by-character
> > basis, right?
> > And I can't possible use any of the _codecs helper
> > functions?
> You could play some tricks with the character mapping codec
> which is used by all code page codecs.
> You will achieve better performance with a native codec written
> in C though.
> > This is certainly feasible if I want it for a single character set,
> > but now if I want to do it wholesale for the entire set of character
> > sets supported by Python 2.0.
> This is probably not possible since there's no way to have the
> codecs use e.g. a callback function to handle error situations.
> But the situation is not all that bad: most codecs rely on the
> character mapping codec and you could simply implement a new
> version of it which does the XML escaping instead of raising
OK. I began tackling this and gave all the sources a once-over. I think I
have a decent idea how to write a codec, but I'm not sure how the character
map codec fits in. I've looked at charmap.py, and maybe I'm cross-eyed, but
inspiration isn't coming to me.
Might I have any pointers? Any cheat-sheets? I'll probably be implementing
Uche Ogbuji Principal Consultant
email@example.com +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python