[XML-SIG] [ pyxml-Patches-615114 ] saxutils.py: CharRef escaping

noreply@sourceforge.net noreply@sourceforge.net
Thu, 26 Sep 2002 11:31:47 -0700


Patches item #615114, was opened at 2002-09-26 20:31
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=306473&aid=615114&group_id=6473

Category: SAX
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Carsten Oberscheid (oberscheid)
Assigned to: Nobody/Anonymous (nobody)
Summary: saxutils.py: CharRef escaping

Initial Comment:
saxutils.XMLGenerator selects a codec for output according to 
the encoding argument given to its constructor. All output is written 
through this codec, and any character in the data that doesn't fit the 
selected encoding raises a UnicodeError.

The patch adds 
a cr_escape() function that replaces all characters with codes > 
127 by XML character references. So the output encoding can be 
selected independent from the actual characters in the 
document.

This is done for character data and for attribute 
values, where CharRefs are allowed. It is not done for element 
names, attribute names etc., where CharRefs are not allowd 
(although there can be non-ASCII-characters, as well -- these still 
have to fit the output encoding).

It's a brute force thing, it can 
be slow, but it should do what it's supposed to do. Walter Dörwald 
pointed out that PEP 239 should deprecate this for Python 2.3, but 
for Python < 2.3 it may be useful.

It's my first patch, so if 
there's anything wrong with it, give me a chance to learn and tell me. 
If there's a better way to do it (I'm sure, there is), ditto.

Nearly 
forgot: Patch against saxutils.py from 0.8.1, but I checked the 
CVS version and it seemed to be unchanged.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=306473&aid=615114&group_id=6473