[I18n-sig] XML and UTF-16

Walter Doerwald walter@livinglogic.de
Fri, 01 Jun 2001 15:58:09 +0200


On 01.06.01 at 14:59 Martin v. Loewis wrote:

> [...]
> As for XML and encodings, having a convenient mechanism to extend
> existing codecs to encode unknown characters as character entities is
> much more important, IMO, since that is very difficult to achieve with
> the existing API.

I've written such functions:

- escapeText(S, encoding) -> unicode
  Return a copy of the unicode string S, where every occurrence of
  '<', '>' and '&' and all unencodable characters in the
  specified encoding have been replaced with their XML character entity.

- escapeAttr(S, encoding) -> unicode
  Return a copy of the unicode string S, where every occurrence of
  '<', '>', '&', and '\"' and all unencodable characters in the
  specified encoding have been replaced with their XML character entity.

Although these functions are written in C, they have to call the codec
twice for every single character (if encoding the string in one go fails),
so they are rather slow for codecs implemented in Python.

Could this be used until we get codecs with customizable errror handling?

If yes, I could put them as a patch on python.sf.net or pyxml.sf.net
or mail them to Martin.


Bye,
   Walter D=F6rwald

-- 
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7
www.livinglogic.de