Mailman 3 Re: [lxml-dev] resolve_entities=False seems to have no effect - lxml - The Python XML Toolkit

7 Feb 2009

      Hi,

I forwarded your question to the lxml mailing list, which is a much better
place to discuss this as there are more people listening who might have an
idea.

http://comments.gmane.org/gmane.comp.python.lxml.devel/4359

usernamenumber wrote:
...
...
Well, what you get is well-formed XML. May I ask why you need the entity
references in the output?
I am calculating checksums based on the combined contents of several
specific tags within a given document. The tool I am writing is designed
to replace a pre-existing tool, which did the same thing and stored
those checksums for comparison. The old tool does not convert entities,
so in order for it to not generate a slew of false-negative checksum
mismatches when we switch over, mine can't either.
It's rarely easy to replace a tool if you are required to mimic the
original quirks. The right way to do it is to calculate the checksums on
the parsed in-memory tree rather than the serialised XML stream. The second
best solution is to serialise to canonical XML (C14N) and to work on that.
But having checksums depend on a byte stream as serialised by a specific
tool is definitely not future proof.

To emulate the old behaviour, you could maybe build the checksum from the
in-memory tree and just replace all occurrences of »'« and »"« by their
escaped equivalent before using a text value. If your XML source documents
consistently use the entity references everywhere, this should yield the
same checksums.

Does that help?

Stefan

Re: [lxml-dev] resolve_entities=False seems to have no effect

Stefan Behnel

tags

participants (1)