[XML-SIG] Developer's Day position paper

Andrew Kuchling akuchlin@mems-exchange.org
Tue, 18 Jan 2000 22:21:40 -0500 (EST)


Skip Montanaro writes:
>this list because of optical glazing).  I currently use
>xmllib+sgmlop+xmlrpclib to do XML-RPC stuff in a production server.  If you
>can't find a solution that is as easy to use and that performs at least as
>well, I'll have to freeze on what I have now. 

OK, hold that thought.  Attached is a simple benchmark script that use
PyExpat and xmllib+sgmlop to parse the 279K hamlet.xml file
(ummm.. it's part of Jon Bosak's XML sample data, but I'm not sure
where to download it from).  Could someone please verify these
results, or point out some stupid error in the benchmark script?
(Since I use my system for developing the XML package, it's possible
that I'm getting an old or broken version of xmllib+sgmlop.)

[amk@mira bench]$ python pyexp.py
PyExpat w/ null handlers: 279663 bytes in 0.04 seconds = 6575.70 K/sec
PyExpat w/ StartElementHandler: 279663 bytes in 0.19 seconds = 1457.33
K/sec
PyExpat w/ Start,End: 279663 bytes in 0.25 seconds = 1102.61 K/sec
PyExpat w/ Start,End,Char,PI: 279663 bytes in 0.36 seconds = 758.03
K/sec
Fast xmllib: 279663 bytes in 3.15 seconds = 86.66 K/sec
Slow xmllib: 279663 bytes in 17.77 seconds = 15.37 K/sec
Raw sgmlop: 279663 bytes in 0.02 seconds = 11004.42 K/sec
[amk@mira bench]$

Assuming no errors in the benchmark, xmllib on top of PyExpat should
be around half as fast as xmllib on top of sgmlop, probably roughly
40K/sec on my machine.  (That's just a guess, though.)  Like
economists, this benchmark probably points in several directions. :)

--amk



import os, time
from xml.parsers import pyexpat

f = open('hamlet.xml', 'r')
data = f.read()
size = f.tell()

def dummy(*args): pass

def print_duration(parser, duration):
    print '%s: %i bytes in %.02f seconds = %.02f K/sec' % (parser, size,
                                       duration, size/duration/1024.0)

parser = pyexpat.ParserCreate( )
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ null handlers', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ StartElementHandler', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
parser.EndElementHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ Start,End', time.time() - start)

parser = pyexpat.ParserCreate( )
parser.StartElementHandler = dummy
parser.EndElementHandler = dummy
parser.CharacterDataHandler = dummy
parser.ProcessingInstructionHandler = dummy
start = time.time()
parser.Parse( data, 1 )
print_duration('PyExpat w/ Start,End,Char,PI', time.time() - start)


from xml.parsers import xmllib
p = xmllib.FastXMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Fast xmllib', time.time() - start)

p = xmllib.SlowXMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Slow xmllib', time.time() - start)

import sgmlop
p = sgmlop.XMLParser()
start = time.time()
p.feed(data)
p.close()
print_duration('Raw sgmlop', time.time() - start)