
Hello,
I need to calculate the SHA1 digest of an XML node. In order to do this correctly, I need to do C14N on the node. Specifically, http://www.w3.org/2001/10/xml-exc-c14n
Here is a minimal working example:
import io import copy from lxml import etree
ORIGINAL_XML = b'''soap:Envelope xmlns:ns="http://docs.oasis-open.org/ws-sx/ws-trust/200512" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" soap:Header wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" <wsu:Timestamp wsu:Id="TS-1a8236fc-8e3a-9b71-495f-20f52709e893"> wsu:Created2017-09-19T07:38:45Z</wsu:Created>wsu:Expires2017-09-19T08:38:45Z</wsu:Expires> </wsu:Timestamp> </wsse:Security> </soap:Header> soap:Body xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="id-1ea12929-0b1a-54d4-08db-b049cee527b5" ns:RequestSecurityToken ns:RequestTypehttp://docs.oasis-open.org/ws-sx/ws-trust/200512/Issue</ns:RequestType> ns:TokenTypehttp://docs.oasis-open.org/wss/oasis-wss-saml-token-profile-1.1#SAMLV2.0</ns:TokenType> </ns:RequestSecurityToken> </soap:Body> </soap:Envelope> '''.replace(b'\n', b'') PREFIXES = ["soap", "wsse", "wsu"] # This is known to be the "good" C14N version of the wsu:Timestamp node GOOD_C14N_XML = b'''wsu:Timestamp xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="TS-1a8236fc-8e3a-9b71-495f-20f52709e893"wsu:Created2017-09-19T07:38:45Z</wsu:Created>wsu:Expires2017-09-19T08:38:45Z</wsu:Expires></wsu:Timestamp>'''
root_node = etree.fromstring(ORIGINAL_XML) test_node = root_node.find(".//{*}Timestamp") # This is the node to calculate digest for
output = io.BytesIO() node = copy.copy(test_node)
node.getroottree().write(output, method="c14n", inclusive_ns_prefixes=PREFIXES, exclusive=True, with_comments=False, pretty_print=False) output.seek(0) TEST_RESULT = output.read() assert TEST_RESULT == GOOD_C14N_XML
The GOOD_C14N_XML contains the "good" canonical version of the Timestamp element. I say that it is "good" because for the ORIGINAL_XML, the C14N version created by a java program looks like this, and I must replicate this exact format. So by "good" I mean: this is the one that I need to replicate.
In this example program, the assertion fails. The GOOD_C14N_XML contains all namespace declarations that were given in the PREFIXES, *in that specific order*. The TEST_RESULT only contains the wsu namespace.
Questions:
* Is it possible to achieve my goal with lxml? E.g. create a C14N format that matches the GOOD_C14N_XML in every bit. * I have noticed that the write method accepts pretty_print=True when write method is c14n. It has no effect. But shouldn't it throw an exception? (The same way it throws an exception when encoding is specified for c14n method.)
Thanks,
Laszlo

Nagy László Zsolt schrieb am 22.09.2017 um 15:57:
I need to calculate the SHA1 digest of an XML node. In order to do this correctly, I need to do C14N on the node. Specifically, http://www.w3.org/2001/10/xml-exc-c14n
Here is a minimal working example:
import io import copy from lxml import etree
ORIGINAL_XML = b'''soap:Envelope xmlns:ns="http://docs.oasis-open.org/ws-sx/ws-trust/200512" xmlns:soap="http://www.w3.org/2003/05/soap-envelope" soap:Header wsse:Security xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" <wsu:Timestamp wsu:Id="TS-1a8236fc-8e3a-9b71-495f-20f52709e893"> wsu:Created2017-09-19T07:38:45Z</wsu:Created>wsu:Expires2017-09-19T08:38:45Z</wsu:Expires> </wsu:Timestamp> </wsse:Security> </soap:Header> soap:Body xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="id-1ea12929-0b1a-54d4-08db-b049cee527b5" ns:RequestSecurityToken ns:RequestTypehttp://docs.oasis-open.org/ws-sx/ws-trust/200512/Issue</ns:RequestType> ns:TokenTypehttp://docs.oasis-open.org/wss/oasis-wss-saml-token-profile-1.1#SAMLV2.0</ns:TokenType> </ns:RequestSecurityToken> </soap:Body> </soap:Envelope> '''.replace(b'\n', b'') PREFIXES = ["soap", "wsse", "wsu"] # This is known to be the "good" C14N version of the wsu:Timestamp node GOOD_C14N_XML = b'''wsu:Timestamp xmlns:soap="http://www.w3.org/2003/05/soap-envelope" xmlns:wsse="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-secext-1.0.xsd" xmlns:wsu="http://docs.oasis-open.org/wss/2004/01/oasis-200401-wss-wssecurity-utility-1.0.xsd" wsu:Id="TS-1a8236fc-8e3a-9b71-495f-20f52709e893"wsu:Created2017-09-19T07:38:45Z</wsu:Created>wsu:Expires2017-09-19T08:38:45Z</wsu:Expires></wsu:Timestamp>'''
root_node = etree.fromstring(ORIGINAL_XML) test_node = root_node.find(".//{*}Timestamp") # This is the node to calculate digest for
output = io.BytesIO() node = copy.copy(test_node)
This is your problem. You are copying a single node out of a document, which loses the unrelated namespace declarations of other nodes.
node.getroottree().write(output, method="c14n", inclusive_ns_prefixes=PREFIXES, exclusive=True, with_comments=False, pretty_print=False)
Instead of (deep-)copying and asking for the root-tree, just wrap the node in a new ElementTree() and call .write() on that.
output.seek(0) TEST_RESULT = output.read()
This is just "output.getvalue()" in complex.
- I have noticed that the write method accepts pretty_print=True when write method is c14n. It has no effect. But shouldn't it throw an exception? (The same way it throws an exception when encoding is specified for c14n method.)
Ah, yes, it's ignored. I'll make it raise a warning for now. Thanks.
Stefan

root_node = etree.fromstring(ORIGINAL_XML) test_node = root_node.find(".//{*}Timestamp") # This is the node to calculate digest for
output = io.BytesIO() node = copy.copy(test_node)
This is your problem. You are copying a single node out of a document, which loses the unrelated namespace declarations of other nodes.
node.getroottree().write(output, method="c14n", inclusive_ns_prefixes=PREFIXES, exclusive=True, with_comments=False, pretty_print=False)
Instead of (deep-)copying and asking for the root-tree, just wrap the node in a new ElementTree() and call .write() on that.
This is exactly what I was missing. I had to preserve those namespaces because of the ec:InclusiveNamespaces declarations. But since I could not find a write() method on the Element, I have created a new copy and asked for its root. I did not know, that it is possible to create multiple ElementTree() -s on a single parsed XML
output.seek(0) TEST_RESULT = output.read()
This is just "output.getvalue()" in complex.
Thanks for the hint. :-)
participants (2)
-
Nagy László Zsolt
-
Stefan Behnel