[XML-SIG] Re: C14N Performacne

Joseph Reagle reagle@w3.org
Wed, 29 May 2002 12:33:06 -0400


On Tuesday 28 May 2002 04:31 pm, Joseph Reagle wrote:
> Doing XPath selection and C14N has *terrible* performance with PyXML. For
> a 100K xml file, it's not even worth it, I'll walk away from the computer
> come back later, and it'll still be working at it. The big problem is the
> XPath evaluation. The default nodeset that is canonicalized is to "blow
> up" the input document/nodeset akin to a "pattern = '(//. | //@* |
> //namespace::*)'" [1] I know this is a very bad (slow) evaluation (this
> is where I time-out) so can anyone suggest an optimization/alternative I
> can use in its place?

If I'm c14n'izing a file (not an xpath subset) the problem is easily 
remedied as such (don't do the XPath evaluation if you don't need to).

diff -r1.4 test_c14n.py
286c286,287
<     query = '(//. | //@* | //namespace::*)'
---
>     #query = '(//. | //@* | //namespace::*)'
>     query = None
331c332,335
<     nodelist = xpath.Evaluate(query, context=context)
---
>     if query:
>         nodelist = xpath.Evaluate(query, context=context)
>     else:
>         nodelist = None

Arbitrary