[I18n-sig] Re: [XML-SIG] Re: [4suite] Output encodings again

M.-A. Lemburg mal@lemburg.com
Wed, 13 Sep 2000 19:57:07 +0200

"Martin v. Loewis" wrote:
> > It's not really all that hard to write codecs for Python 2.0.
> >
> > You'll have to do two things:
> > 1. write the codec by subclassing the base classes in codecs.py
> > 2. write a search function which returns the needed constructors
> >    and functions.
> So how would I write a codec that converts all characters to Latin-1,
> and converts those out of latin-1 to &#xxx; (instead of the
> replacement character)? I'd need knowledge about what character are in
> Latin-1, and I'd need to do conversion on a character-by-character
> basis, right?


> And I can't possible use any of the _codecs helper
> functions?

You could play some tricks with the character mapping codec
which is used by all code page codecs.

You will achieve better performance with a native codec written
in C though.

> This is certainly feasible if I want it for a single character set,
> but now if I want to do it wholesale for the entire set of character
> sets supported by Python 2.0.

This is probably not possible since there's no way to have the
codecs use e.g. a callback function to handle error situations.

But the situation is not all that bad: most codecs rely on the
character mapping codec and you could simply implement a new
version of it which does the XML escaping instead of raising

Marc-Andre Lemburg
Business:                                        http://www.lemburg.com/
Python Pages:                             http://www.lemburg.com/python/