[XML-SIG] Xalan and Xerces...
Tue, 17 Oct 2000 17:43:42 -0600
> Being confronted with Xerces for the first time, I took the
> opportunity to port their SAXCount example to PyXML, which took me
> half an hour (plus minus five minutes), including installing Xerces.
> On my system (AMD K6, 350MHz, JDK 1.3.0beta-b07) I got the following
> Xerces with no options:
> data/personal.xml: 903 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
> Xerces with -w (i.e. parse the file once, then measure time for second run)
> data/personal.xml: 85 ms (37 elems, 18 attrs, 26 spaces, 242 chars)
> PyXML 0.6.1, expat as the parser:
> data/personal.xml: 0.0128449s (37 elems, 12 attrs,0 spaces, 268 chars)
Good stats to have on hand. Thanks.
> First, you'll notice that Python beats Java by an order of magnitude
> even in the "fast" java case. I'm not really surprised - expat is a
> fast parser, and it is written in C.
> Next, you'll notice that expat does not report ignorableWhitespace;
> instead, the spaces are reported as character data. I'm not sure which
> one is right here (or whether both are acceptable) - both parsers
> operate in a non-validating mode. Somebody cares to clarify.
There is really no such thing as ignorable whitespace in non-validating mode.
According to XML 1.0, white-space can only be ignored when it occurs where the
is no corresponding #PCDATA in the content model from the DTD. Since the DTD
is not used in non-validating mode, the parser _cannot_ make assumptions that
So in this case expat is right and Xerces is wrong.
> The difference in number of attributes apparently comes from Xerces
> passing the default value for an implied attribute from the DTD,
> whereas expat doesn't.
Since expat is strictly non-validating, this is quite valid.
Uche Ogbuji Principal Consultant
firstname.lastname@example.org +1 303 583 9900 x 101
Fourthought, Inc. http://Fourthought.com
4735 East Walnut St, Ste. C, Boulder, CO 80301-2537, USA
Software-engineering, knowledge-management, XML, CORBA, Linux, Python