[XML-SIG] Pyana 0.2.0 released

Brian Quinlan brian@sweetapp.com
Tue, 18 Dec 2001 13:04:16 -0800


Uche Ogbuji wrote:

> -----Original Message-----
> From: uogbuji@fourthought.com [mailto:uogbuji@fourthought.com] On
Behalf
> Of Uche Ogbuji
> Sent: Tuesday, December 18, 2001 11:27 AM
> To: brian@sweetapp.com
> Cc: 'Martin v. Loewis'; xml-sig@python.org; python-list@python.org
> Subject: Re: [XML-SIG] Pyana 0.2.0 released
> 
> > This is a multi-part message in MIME format.
> >
> > ------=_NextPart_000_0036_01C1872A.6648D570
> > Content-Type: text/plain;
> > 	charset="iso-8859-1"
> > Content-Transfer-Encoding: quoted-printable
> >
> > Uche Ogbuji wrote:
> >
> > > > PIRXX is focused on providing Xerces XML services to Python. The
> > current
> > > > release of PIRXX provides SAX2 interfaces but I believe that =
> > J=FCrgen
> > is
> > > > working on DOM support.
> > > >
> > > > So, right now, Pyana is probably your best bet for
high-performance
> > XSLT
> > > > processing in Python while PIRXX offers Xerces SAX2 interfaces.
> > >=20
> > > Are you basing this on actual benchmarks?  In particular, I'd be
> > surprised
> > > if Pyana was faster overall than current CVS of 4XSLT, Since Xalan
> > isn't,
> > > as I measure it.
> >
> > I am basing this on the timings of largish transformations that I
was
> > doing around 2 months ago. Since then I haven't really compared them
and
> > I have never run any formal benchmarks.=20
> >
> > Note that one of the big problems with timing Xalan from the command
> > line is that it is very slow to load, especially on windows. I just
> > timed "import Pyana" on my PIV 1.7GHz and it took 0.74s. But the
beauty
> > of using Pyana instead of something like "popen('xalan ..." is that
the
> > load time becomes a one-time cost for the application.
> >
> > For fun, I just downloaded:
> > http://www.datapower.com/XSLTMark/download/XSLTMark_2_1_0.zip
> >
> > And wrote the attached script. I did this without expending any
effort
> > trying to understand the benchmark suite; I just test each .xml/.xsl
> > pair. Notice that all of the source/stylesheet documents are small
so
> > the advantage should go to 4suite.
> >
> > I don't want to get 4suite from CVS so why don't you get Pyana:
> 
> No.  I'm no more interested in running a benchmark between the two
than
> you are.  I have much better things to do, like actually working to
> improve 4suite.  Therefore, I know better than to make such
> unsubstantiated comments as "foo is your best bet for high-performance
> XSLT processing".

My substantiation was my personal experience a few months ago. I also
have several user testimonials stating that they are using Pyana instead
of 4suite for performance reasons. One actually sent me their timings,
which [at that time] demonstrated a 41x performance edge for Pyana.

But, since you are being so picky, I tested the latest Pyana release
(0.2.0) against the latest 4suite release (0.11.1), using the test
script that I attached in the previous e-mail*:

time to import Pyana (probably cached or something): 0.0916s
time to import 4suite (already byte-compiled): 0.9955s
Pyana:  time to execute axis: 0.0093s
4suite: time to execute axis: 0.1088s
Pyana:  time to execute bottles: 0.0165s
4suite: time to execute bottles: 0.2519s
Pyana:  time to execute brutal: 0.0139s
4suite: time to execute brutal: 0.1658s
Pyana:  time to execute chart: 0.0122s
4suite: time to execute chart: 0.1397s
Pyana:  time to execute current: 0.0067s
4suite: time to execute current: 0.0520s
Pyana:  time to execute game: 0.0095s
4suite: time to execute game: 0.1022s
Pyana:  time to execute html: 0.0065s
4suite: time to execute html: 0.0491s
Pyana:  time to execute identity: 0.0061s
4suite: time to execute identity: 0.0121s
Pyana:  time to execute inventory: 0.0091s
4suite: time to execute inventory: 0.1493s
Pyana:  time to execute metric: 0.0114s
4suite: time to execute metric: failed!
Pyana:  time to execute number: 0.0089s
4suite: time to execute number: failed!
Pyana:  time to execute oddtemplate: 0.0099s
4suite: time to execute oddtemplate: 0.0851s
Pyana:  time to execute priority: 0.0095s
4suite: time to execute priority: 0.0810s
Pyana:  time to execute products: 0.0107s
4suite: time to execute products: 0.3034s
Pyana:  time to execute queens: 0.0855s
4suite: time to execute queens: 2.3471s
Pyana:  time to execute tower: 0.1664s
4suite: time to execute tower: 6.7971s
Pyana:  time to execute trend: 0.0518s
4suite: time to execute trend: 2.6534s
Pyana:  time to execute union: 0.0056s
4suite: time to execute union: 0.0382s
Pyana:  time to execute xpath: 0.0067s
4suite: time to execute xpath: 0.1720s
Pyana:  time to execute xslbench1: 0.0158s
4suite: time to execute xslbench1: 0.1960s
Pyana  total: 0.4720s
4suite total: 14.0028s

The first 4suite failure was caused by this XPath: " . * 1000"
The second was caused by a <xsl:decimal-format/> element. The exception
says:

XsltException: Illegal Element "decimal-format" in XSLT Namespace (see
XSLT Spec: 2.1).

I looked it up in the XSLT spec and the context seems to be appropriate.


> I'm sure we can all just get along: multiple XSLT implementations for
> Python is a Good Thing.

Agreed

Cheers,
Brian

Here is the 4suite code [please let me know if it can be optimized]:

processor = Processor()
processor.appendStylesheetUri(styleFile)
processor.runUri(sourceFile)