Re: [lxml-dev] resolve_entities=False seems to have no effect
![](https://secure.gravatar.com/avatar/8b97b5aad24c30e4a1357b38cc39aeaa.jpg?s=120&d=mm&r=g)
Hi, I forwarded your question to the lxml mailing list, which is a much better place to discuss this as there are more people listening who might have an idea. http://comments.gmane.org/gmane.comp.python.lxml.devel/4359 usernamenumber wrote:
Well, what you get is well-formed XML. May I ask why you need the entity references in the output?
I am calculating checksums based on the combined contents of several specific tags within a given document. The tool I am writing is designed to replace a pre-existing tool, which did the same thing and stored those checksums for comparison. The old tool does not convert entities, so in order for it to not generate a slew of false-negative checksum mismatches when we switch over, mine can't either.
It's rarely easy to replace a tool if you are required to mimic the original quirks. The right way to do it is to calculate the checksums on the parsed in-memory tree rather than the serialised XML stream. The second best solution is to serialise to canonical XML (C14N) and to work on that. But having checksums depend on a byte stream as serialised by a specific tool is definitely not future proof. To emulate the old behaviour, you could maybe build the checksum from the in-memory tree and just replace all occurrences of »'« and »"« by their escaped equivalent before using a text value. If your XML source documents consistently use the entity references everywhere, this should yield the same checksums. Does that help? Stefan
participants (1)
-
Stefan Behnel