[lxml-dev] Is resolve_entities not working ??

Hello all, Is the resolve_entities XmlParser constructor attribute not working or what did I do wrong ? howe@yezda ~ $ python Python 2.5.1 (r251:54863, Jan 9 2008, 05:34:21) [GCC 4.2.2 (Gentoo 4.2.2 p1.0)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree print etree.__version__ 2.0.1 print etree.LIBXML_VERSION (2, 6, 30) import StringIO xml = StringIO.StringIO('<?xml version="1.0" encoding="utf-8"?> <p>©</p>') etree.parse(xml, etree.XMLParser(resolve_entities=False)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1743, in lxml.etree._parseDocument File "parser.pxi", line 1775, in lxml.etree._parseMemoryDocument File "parser.pxi", line 1676, in lxml.etree._parseDoc File "parser.pxi", line 793, in lxml.etree._BaseParser._parseDoc File "parser.pxi", line 450, in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: Entity 'copy' not defined, line 1, column 46
Thanks! -- Best Regards, Steve Howe

Hi, Steve Howe wrote:
Is the resolve_entities XmlParser constructor attribute not working or what did I do wrong ?
howe@yezda ~ $ python Python 2.5.1 (r251:54863, Jan 9 2008, 05:34:21) [GCC 4.2.2 (Gentoo 4.2.2 p1.0)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
from lxml import etree print etree.__version__ 2.0.1 print etree.LIBXML_VERSION (2, 6, 30) import StringIO xml = StringIO.StringIO('<?xml version="1.0" encoding="utf-8"?> <p>©</p>') etree.parse(xml, etree.XMLParser(resolve_entities=False)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "lxml.etree.pyx", line 2515, in lxml.etree.parse File "parser.pxi", line 1743, in lxml.etree._parseDocument File "parser.pxi", line 1775, in lxml.etree._parseMemoryDocument File "parser.pxi", line 1676, in lxml.etree._parseDoc File "parser.pxi", line 793, in lxml.etree._BaseParser._parseDoc File "parser.pxi", line 450, in lxml.etree._ParserContext._handleParseResultDoc File "parser.pxi", line 534, in lxml.etree._handleParseResult File "parser.pxi", line 476, in lxml.etree._raiseParseError lxml.etree.XMLSyntaxError: Entity 'copy' not defined, line 1, column 46
As the document does not specify a DTD, the entity "copy" is undefined, which is an error if you instructed the parser to *resolve* the entities. Stefan

Hello Stefan Behnel,
As the document does not specify a DTD, the entity "copy" is undefined, which is an error if you instructed the parser to *resolve* the entities. Agreed, but I set "resolve_entities=False" so it should not be resolving anything, right ? Or did I misunderstand something ?
-- Best Regards, Steve Howe

Steve Howe wrote:
As the document does not specify a DTD, the entity "copy" is undefined, which is an error if you instructed the parser to *resolve* the entities. Agreed, but I set "resolve_entities=False" so it should not be resolving anything, right ? Or did I misunderstand something ?
Ah, sorry, I misread your example as saying "=True" ... Documents that do not declare their entities are not well-formed: --------------------------- Well-formedness constraint: Entity Declared In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference MUST match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a general entity MUST precede any reference to it which appears in a default value in an attribute-list declaration. --------------------------- with one exception: --------------------------- Note that non-validating processors are not obligated to read and process entity declarations occurring in parameter entities or in the external subset; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. --------------------------- But since your document does not define an external Subset, the parser knows that the Entity is not defined and that the document is not well-formed. If you add a DOCTYPE, the parser will assume the entity to be defined in the referenced DTD (even if it does not load it), and thus ignore the missing declaration (you should still get a warning in the parser "error_log", though). Also, if you add "recover=True" to the parser, it will ignore the (otherwise fatal) error. Note that entities appear as children since lxml 2.0, not as text. Stefan
participants (2)
-
Stefan Behnel
-
Steve Howe