[XML-SIG] Performance question

Fred L. Drake, Jr. fdrake@acm.org
Mon, 4 Nov 2002 17:16:22 -0500

Content-Type: text/plain; charset=us-ascii
Content-Description: message body and .signature
Content-Transfer-Encoding: 7bit

Bryan Pendleton writes:
 > I was trying to figure out what sort of XML Parser
 > performance I could expect out of pyxml. I'm using
 > Python 2.2.2 under Windows 2000 with pyxml 0.8.1.

My test below was run using Python 2.2.2 on RedHat Linux 7.2 using
PyXML from CVS.

 > I wrote the following trivial little program, and it
 > seems to be showing me that I can get between 20 and
 > 25 parses per second on my PC. Is this a reasonable
 > result to achieve? I was hoping to be able to get 
 > many hundreds of parses a second, so getting only 20
 > or so was rather alarming.

That doesn't sound good to me, but I noticed you were using 4DOM,
which I don't normally use.  I decided to try your test with minidom
instead, and got very different results.

I changed your script to allow the parseString function to be passed
in to doTest(), and added an import and a second call to doTest:

from xml.dom import expatbuilder
doTest(100, s1, expatbuilder.parseString)

I got this output:

parser performance test
100 parses took 7.44 seconds, or 0.07 seconds/parse
100 parses took 0.47 seconds, or 0.00 seconds/parse

(The first measurement is the original 4DOM DOM builder, and the
second is the expatbuilder.)

I've attached the modified script.

 > Is there anything I can do to make this code faster?

Appearantly, use minidom instead of 4DOM.  I suspect you could also
use one of the other DOM implementations from 4Thought -- they have


Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation

Content-Type: text/x-python
Content-Description: modified perf test
Content-Disposition: inline;
Content-Transfer-Encoding: 7bit

import time
from xml.dom.ext.reader import PyExpat
from xml.dom import expatbuilder

def parseString(s):
    reader = PyExpat.Reader()
    return reader.fromString(s)

def doTest(numTimes, s, parse):
    t1 = time.time()
    for i in range(numTimes):
        d = parse(s)
    t2 = time.time()
    print '%d parses took %.2f seconds, or %.2f seconds/parse' % \
        ( numTimes, t2 - t1, (t2 - t1) / numTimes )
    return d

s1 = '''<?xml version="1.0" encoding="UTF-8"?>
<participant id="83b3f0" xsi:type="xsd:string">built-in supplier</participant>
<role id="848ac0" xsi:type="xsd:string">Supplier</role>
<message id="8dbb60" xsi:type="xsd:string">Processing catalog request</message>
<indentLevel xsi:type="xsd:int">0</indentLevel>
<timestamp id="1284630" xsi:type="xsd:string">Tue Oct 29 15:03:48 2002</timestamp>

print 'parser performance test'
doTest(100, s1, parseString)
doTest(100, s1, expatbuilder.parseString)