
I have to load an XML file generated by a third-party program. They have some non-printing chars in the document and I get a "lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value..." exception. Is there any way to relax or switch off the allowed characters in lxml? If not then is there a way to plug into the character entity parsing code? I really don't want to resort to a pre-parse regex replace. Thanks This E-Mail is sent in confidence for the addressee only. Unauthorised recipients must preserve this confidentiality and should please advise the sender immediately by telephone (+44 (0)1625 505100) and return the original E-Mail to the sender without taking a copy. Cyprotex has taken all reasonable precautions to ensure that no viruses are transmitted from Cyprotex to any third party. Cyprotex accepts no responsibility for any loss or damage resulting directly or indirectly from the use of this E-Mail or the contents.

David Roe, 06.02.2012 18:48:
I have to load an XML file generated by a third-party program. They have some non-printing chars in the document and I get a "lxml.etree.XMLSyntaxError: xmlParseCharRef: invalid xmlChar value..." exception.
Looks like it's not giving you XML then. Could you post an example of the character references that it cannot parse? Here's the list of allowed XML characters: http://www.w3.org/TR/REC-xml/#charsets But please take care to send your next post without the legal restrictions at the bottom of your first e-mail.
Is there any way to relax or switch off the allowed characters in lxml?
No, it uses a standards compliant XML parser. You can pass it the "recover" option, but that may or may not do what you want.
If not then is there a way to plug into the character entity parsing code?
No.
I really don't want to resort to a pre-parse regex replace.
And you shouldn't.
This E-Mail is sent in confidence for the addressee only. Unauthorised recipients must preserve this confidentiality and should please advise the sender immediately by telephone (+44 (0)1625 505100) and return the original E-Mail to the sender without taking a copy.
Hmm, I can't find myself in the list of recipients - am I authorised to keep a copy of your e-mail or not? Sorry for not having my phone within reach while receiving e-mails. Anyway, this is not the kind of comment I expect when you ask others for help. And it's certainly not appropriate for a public mailing list. Stefan
participants (2)
-
David Roe
-
Stefan Behnel