
Steve Howe wrote:
As the document does not specify a DTD, the entity "copy" is undefined, which is an error if you instructed the parser to *resolve* the entities. Agreed, but I set "resolve_entities=False" so it should not be resolving anything, right ? Or did I misunderstand something ?
Ah, sorry, I misread your example as saying "=True" ... Documents that do not declare their entities are not well-formed: --------------------------- Well-formedness constraint: Entity Declared In a document without any DTD, a document with only an internal DTD subset which contains no parameter entity references, or a document with "standalone='yes'", for an entity reference that does not occur within the external subset or a parameter entity, the Name given in the entity reference MUST match that in an entity declaration that does not occur within the external subset or a parameter entity, except that well-formed documents need not declare any of the following entities: amp, lt, gt, apos, quot. The declaration of a general entity MUST precede any reference to it which appears in a default value in an attribute-list declaration. --------------------------- with one exception: --------------------------- Note that non-validating processors are not obligated to read and process entity declarations occurring in parameter entities or in the external subset; for such documents, the rule that an entity must be declared is a well-formedness constraint only if standalone='yes'. --------------------------- But since your document does not define an external Subset, the parser knows that the Entity is not defined and that the document is not well-formed. If you add a DOCTYPE, the parser will assume the entity to be defined in the referenced DTD (even if it does not load it), and thus ignore the missing declaration (you should still get a warning in the parser "error_log", though). Also, if you add "recover=True" to the parser, it will ignore the (otherwise fatal) error. Note that entities appear as children since lxml 2.0, not as text. Stefan