Hi, Eric Garin wrote:
At first, congratulations, I'm using lxml for more that one year and enjoy the huge progress (and work) you have done.
:) Happy to hear that.
I'm using lxml to validate XML documents instances with etree.XMLSchema(schema_doc).validate(xml_doc). I've used to work with DTD's where it's possible to include standard sets of HTML entities declarations like for example for ( é etc ...).
Now, working with XML schemas, sometimes I have some of those common HTML entities that appears (from an editor like FCK) in the content. And at the validation time, of course, I have an error like this :
File "lxml.etree.pyx", line 2520, in lxml.etree.parse File "parser.pxi", line 1309, in lxml.etree._parseDocument File "parser.pxi", line 1338, in lxml.etree._parseDocumentFromURL File "parser.pxi", line 1248, in lxml.etree._parseDocFromFile File "parser.pxi", line 828, in lxml.etree._BaseParser._parseDocFromFile File "parser.pxi", line 452, in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 536, in lxml.etree._handleParseResult File "parser.pxi", line 478, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 21, column 16
1. Is there a way to escape those entities at validation time ?
The stack trace above shows up at parse time. If you have entity references in your XML document, you have to use a DTD at parse time that defines them, or you can pass the "resolve_entities=False" option to the parser to keep them in the tree (which might make tree handling a little harder, though).
2. Or Do I need to declare entities in the schema (I understand that this question is not in the lxml topic, but I didn't find a way to do that)
XML Schema deliberately does not support entity declarations (or references, for that purpose). They are a pure DTD thing. Stefan