[XML-SIG] Re: XML parser benchmarks

Tue, 11 May 1999 10:52:54 -0500

Robb Shecter wrote:

> The tests were apparently done with the unix "time" command, by
> shelling out, and starting a new process for each document. This
> means that the interpreter-based languages get hit with two
> disadvantages: 1) They're penalized for VM startup and shutdown
> times.  2) After parsing a document, all loaded objects, references,
> cached whatevers and knowedge gained are thrown away, and can't be
> used for the next document.

There are a few things to consider when claiming unfairness in
the tests:

 - The larger files would have much less relative penalty in terms of
VM startup. The test data had files of roughly 150K, 890K, 1.2MB,
3.4MB, and 5MB. For anything but the smallest, VM startup time should
be a fairly small part of the total.

 - Several of the parsers rely on expat at their core. Naturally,
their results will consist of whatever time expat needs for the job,
plus all the overhead of the scripting-language wrapper.

 - Did the Java benchmarks use a just-in-time compiler? I suspect not, 
though there is one for Linux (tya) which might have chopped those
times in half.

 - Finally, these are PARSING benchmarks. Not "parse the file and then
ponder the results in Biblical detail" benchmarks. If someone wants to
parse XML, twiddle the data in kaleidoscopic variety and then
benchmark that, feel free. Time required to write usable code for a
task in a given language is also another matter, and one at which
languages like Perl and Python excel. That doesn't mean that the
parsing benchmark itself is bad or unfair.