[XML-SIG] Newbie : Identifying characters that will choke XML parser

John Wilson tug@wilson.co.uk
Tue, 6 May 2003 13:32:47 +0100


Ian,

If the character is in the following ranges it's illegal:

c < 0X0009
c > 0X000A and c < 0X000D
c > 0X000D and c < 0X0020
c > 0XD7FF and c < 0XE000
c > 0XFFFD

John Wilson
The Wilson Partnership
http://www.wilson.co.uk

----- Original Message ----- 
From: "Ian Sparks" <Ian.Sparks@etrials.com>
To: "Martin v. Löwis" <martin@v.loewis.de>
Cc: "Xml-Sig (E-mail)" <xml-sig@python.org>
Sent: Tuesday, May 06, 2003 1:17 PM
Subject: RE: [XML-SIG] Newbie : Identifying characters that will choke XML
parser


Hmm...as I feared. As I discover new XML-chokers I'm building up a library
like :

#Remove ACK's (I've seen it!)
w = w.replace(chr(6),'')
#Remove ... characters (again, I've seen it)
w = w.replace(chr(133),'')

I was hoping to find some way of identifying everything that will choke my
XML, some rule to auto-filter out the nastiness..