[Expat-discuss] Expat-discuss Digest, Vol 72, Issue 7

Jimmy Zhang jzhang at ximpleware.com
Fri Mar 10 03:00:10 CET 2006


When VTD-XML project got officially started, the DTD part and external 
references
were already deprecated somewhat... no major vocabularies seem to use the 
external
refernces anymore...

Buffer reuse is introduced in the latest releases, maybe it should always be 
used, the
performance improves starts from the second time VTDGen parses XML 
documents..
We are adding more documents on this feature right now...

External references not withstanding, VTD-XML conforms to, and passes, every 
XML
test suite, VTD-XML handles namespace problem a little different than DOM or 
SAX,
the error checking is delayed until during navigation, the prefix induced 
attribute duplicate
problem is quite unlikely to concern anyone, and is in fact part of the 
problems of XML
namespace spec...

The cost of encoding transformation ranges from zero to negligible, most are 
ascii anyway.

One can argue that, to process XML, SAX parsers need to be used at least 
twice...
first time is to scan the document from start to end, just to check 
wellformedness, the
second pass is to perform the application processing... otherwise, what 
happens
if the application perform 10 transactions but then discover that  the last 
angle
bracket of the XMl file is missing.?? roll back those 10 transactions ?? So 
should we
reduce the SAX perform by 50% just to be fair comparison with VTD-XML??
and VTD-XML is still forward only and unpleasant to use...

I don't see any comparison...

Maybe the world has moved forward... maybe it is time to say good bye to 
SAX??


> Date: Wed, 08 Mar 2006 18:28:41 -0500
> From: Karl Waclawek <karl at waclawek.net>
> Subject: Re: [Expat-discuss] Server JVM
> To: expat-discuss at libexpat.org
> Message-ID: <440F68A9.2010309 at waclawek.net>
> Content-Type: text/plain; charset=ISO-8859-1; format=flowed
>
> Jimmy Zhang wrote:
>> I can help you set up server JVM, just a dll file you need to put in
>> the right place... let me know if you need any help
>
> I installed the JDK and tested with the server JVM.
> It does indeed increase performance significantly, beating Expat on files
> with lots of markup vs character data (27ms for benchmark_BR vs 49ms for
> Expat).
> Question, why should one not always use the "buffer-reuse" version?
>
> Expat was still faster on the file with lots of character data (13ms vs
> 17ms for benchmark_BR).
> I recompiled the Expat library with all optimizations for speed.
>
> Now given that vtd-xml is quite fast, you still have to prove two things:
>
> 1) It does everything a conforming non-validating parser must do. How
> many of
> the tests in the XML-Test-Suite does it pass? Expat passes all
> the tests for a non-validating parser except a handful that are optional
> or in doubt.
> Example: vtd-xml failed to detect duplicate attributes when they had
> different prefixes pointing to the same namespace. Completing vtd-xml
> to conform as well as Expat may well add more overhead.
>
> 2) To which degree does it pay off to delay the work of encoding
> transformation
> to the point when the data is actually needed, as in a real-world
> application?
> If the document is encoded in UTF-8 and your application requires UTF-16,
> then this is already done by Expat, but for vtd-xml this work still has
> to be performed.
>
> Whether it will be preferable over Expat for documents where memory
> usage is not an issue, will depend on the answers to these questions.
>
> Overall I do like your approach, and I think it is excellent for random
> access with an
> in-memory document. It may also do very well on SOAP processing for
> smaller messages.
>
> Karl
>
>
>
> ------------------------------
>
> _______________________________________________
> Expat-discuss mailing list
> Expat-discuss at libexpat.org
> http://mail.libexpat.org/mailman/listinfo/expat-discuss
>
>
> End of Expat-discuss Digest, Vol 72, Issue 7
> ********************************************
> 




More information about the Expat-discuss mailing list