[XML-SIG] Performance question

05 Nov 2002 10:02:48 +0000

"Fred L. Drake, Jr." <fdrake@acm.org> writes:

> Bryan Pendleton writes:
>  > I was trying to figure out what sort of XML Parser
>  > performance I could expect out of pyxml. I'm using
>  > Python 2.2.2 under Windows 2000 with pyxml 0.8.1.
> 
> My test below was run using Python 2.2.2 on RedHat Linux 7.2 using
> PyXML from CVS.

If you want _another_ factor of 10, go to PyLTXML.  The report below
is from Python 2.2.1 on RedHat Linux 7.2 using PyXML 0.8.1 and
PyLTXML-1.3-2.

I used Fred's driver, added two new functions to text bit-level and
tree-level access via PyLTXML.

parser performance test
100 parses took 3.88 seconds, or 0.04 seconds/parse
100 parses took 0.25 seconds, or 0.00 seconds/parse
100 parses took 0.02 seconds, or 0.00 seconds/parse
100 parses took 0.03 seconds, or 0.00 seconds/parse

The first measurement is the original 4DOM DOM builder, the second is
the expatbuilder, the third is PyLTXML returning the whole tree, the
fourth is PyLTXML returning every bit (start tag, end tag, text).  I
guess the tree is faster because it's slightly lazy wrt Python
structures, i.e. only the root is in Python form as returned, the rest
gets converted from the native C structs as you walk the Python tree.

Here are the additions I made to Fred's version of the script:

import PyLTXML

. . .

def allBits(s):
  f=PyLTXML.OpenString(s1,PyLTXML.NSL_read|PyLTXML.NSL_read_namespaces)
  b=PyLTXML.GetNextBit(f)
  while b:
    b=PyLTXML.GetNextBit(f)
  PyLTXML.Close(f)

def itemParse(s):
  f=PyLTXML.OpenString(s1,PyLTXML.NSL_read|PyLTXML.NSL_read_namespaces)
  b=PyLTXML.GetNextBit(f)
  while b.type!='start':
    b=PyLTXML.GetNextBit(f)
  d=PyLTXML.ItemParse(f,b.item)
  PyLTXML.Close(f)
  return d  

. . .

doTest(100, s1, itemParse)
doTest(100, s1, allBits)

-- 
  Henry S. Thompson, HCRC Language Technology Group, University of Edinburgh
          W3C Fellow 1999--2002, part-time member of W3C Team
     2 Buccleuch Place, Edinburgh EH8 9LW, SCOTLAND -- (44) 131 650-4440
	    Fax: (44) 131 650-4587, e-mail: ht@cogsci.ed.ac.uk
		     URL: http://www.ltg.ed.ac.uk/~ht/
 [mail really from me _always_ has this .sig -- mail without it is forged spam]