[I18n-sig] XML and UTF-16
Walter Doerwald
walter@livinglogic.de
Fri, 01 Jun 2001 15:58:09 +0200
On 01.06.01 at 14:59 Martin v. Loewis wrote:
> [...]
> As for XML and encodings, having a convenient mechanism to extend
> existing codecs to encode unknown characters as character entities is
> much more important, IMO, since that is very difficult to achieve with
> the existing API.
I've written such functions:
- escapeText(S, encoding) -> unicode
Return a copy of the unicode string S, where every occurrence of
'<', '>' and '&' and all unencodable characters in the
specified encoding have been replaced with their XML character entity.
- escapeAttr(S, encoding) -> unicode
Return a copy of the unicode string S, where every occurrence of
'<', '>', '&', and '\"' and all unencodable characters in the
specified encoding have been replaced with their XML character entity.
Although these functions are written in C, they have to call the codec
twice for every single character (if encoding the string in one go fails),
so they are rather slow for codecs implemented in Python.
Could this be used until we get codecs with customizable errror handling?
If yes, I could put them as a patch on python.sf.net or pyxml.sf.net
or mail them to Martin.
Bye,
Walter D=F6rwald
--
Walter D=F6rwald =B7 LivingLogic AG =B7 Bayreuth, Germany =B7
www.livinglogic.de