Idempotent XML processing
Robert Kern
rkern at ucsd.edu
Fri Aug 19 14:20:38 EDT 2005
Michael Ekstrand wrote:
> Hello all,
>
> In my current project, I am working with XML data in a protocol that has
> checksum/signature verification of a portion of the document. There is
> an envelope with a header element, containing signature data; following
> the header is a body. The signatures are computed as cryptographic
> checksums of the entire Body element, including start and end tags,
> exactly as it appears in the data transmission.
>
> Therefore, I need to extract the entire text of an element of an XML
> document. I have a function that scans an XML string and does this, but
> it seems like a rather clumsy way to accomplish this task. I've been
> playing with xml.dom.minidom and its toxml() method, but to no avail -
> the server sends me XML with empty elements as full open/close tags,
> but toxml() serializes them to the XML empty element (<Element/>), so
> the checksum winds up not matching.
>
> Is there some parsing mechanism (using PyXML or any other freely usable
> 3rd party library is an option) that will allow me to accomplish this?
> Or am I best off sticking with my little string scanning function?
Read up on XML canonicalization (abrreviated as c14n). lxml implements
this, also xml.dom.ext.c14n in PyXML. You'll need to canonicalize on
both ends before hashing.
To paraphrase an Old Master, if you are running a cryptographic hash
over a non-canonical XML string representation, then you are living in a
state of sin.
--
Robert Kern
rkern at ucsd.edu
"In the fields of hell where the grass grows high
Are the graves of dreams allowed to die."
-- Richard Harter
More information about the Python-list
mailing list