[XML-SIG] 4XSLT Performance Problems with Large Files

Thomas B. Passin tpassin@home.com
Sat, 15 Sep 2001 01:07:15 -0400


I just downloaded the latest version of 4Suite (0.11.1), along with pyxml
0.66, for both Python 1.5.2 and 2.1.1, on Windows.  I want to report on
performance transforming a 4 MB xml source file, and especially terrible
performance when using the 4xslt.bat batch file that gets installed in the
Scripts directory.

The computer: Win98 SE, 450 MHz Pentium 3, 256 MB Ram.

At the start of the tests, I had from 194 to 210 MB free.  I ran Saxon
5.5.1, msxsl (the Microsoft command line wrapper around msxml3), a python
script 4xslt.py that I wrote some time ago, and the 4xslt.bat file as
supplied with 4Suite.  My script uses cDomlette; I don't know  what
4xslt.bat uses, though I suspect it's not cDomlette for reasons that will be
apparent.

Source file size: 3.94 MB
Result file size: 667 KB
Main stylesheet imports two others and builds keys to index into lookup
tables in a separate file.

Results:

processor       time to transform, sec          remarks
msxsl                    7
saxon                    10
4xslt.py                 167                                Python 1.5.2
4xslt.py                 201                                Python 2.1.1
4xslt.bat          gave up at 253                     Py 1.5.2 (see below)
4xslt.bat          gave up when memory ran out  Py 2.1.1 (see below)

I gave up on 4xslt.bat because it used up all 194 MB of free memory then
went to virtual memory, which it kept on using more of until I quit
(previously I had waited much longer with no completion, but did not get an
accurate timing).

Here is the memory used by the various processors during processing.

processor       decrease in free memory, MB
msxsl                    17
saxon                    21
4xslt.py                 32                   (Py 1.5.2)
4xslt.py                 45                   (Py 2.1.1)
4xst.bat              > 194

The 167 seconds using my script is not acceptable for my particular
application, but the behavior when the transformation is launched by
4xslt.bat is impossible.  Why should the very same transformation take ten
times the memory that msxsl or Saxon use?  You can't have an application run
down your memory like this.  And I don't even know how much virtual memory
was used on top of the 194 MB.  These results have been reasonably
repeatable tonight.

I hope something can be done to improve the performance and memory usage for
large files.  How about it, Uche and Mike?  Any thoughts about what is
happening here?

I'll be happy to send my files for testing if anyone likes.  The source file
is pretty horrid ( I don't have any control over that, I'm afraid).  It has
very long paths, and the element names are extremely long, the result of
machine translation of some CORBA IDL.  I wonder if that has something to do
with the results.  However, what the stylesheet does is not very complex, it
just has to do it 3.94 MB worth.  Saxon and MS do get through it reasonably
quickly.

Cheers,

Tom P