[Python-Dev] Performance of various marshallers

Skip Montanaro skip@pobox.com (Skip Montanaro)
Tue, 2 Oct 2001 19:33:59 -0500


    >> That's precisely why py-xmlrpc is faster.  Should it behave some
    >> other way?  I don't think there is another XML-RPC parser out there
    >> that is available from Python but that doesn't use Python.

    Paul> Okay, so we agree that the fast part is probably not so much the
    Paul> parser but the handing of data to Python. So why rewrite a parser?
    Paul> Nothing requires an Expat-using XML-RPC implementation to call
    Paul> back into Python for every element. It can collect the results in
    Paul> C and then call Python when it has values.

You're asking the wrong person.  Shilad will be the only person who can
describe his motivations.  We happen to work in the same building, but we
don't work for the same company.  That's a coincidence about on par with the
chances of winning the Powerball lottery.  We never met each other formally
until about a week ago.  Not trying to put words in his mouth, but my guess
would be that he was not approaching it as an XML problem, but as a parsing
problem.

    >> I don't understand see how you can't make that connection.  XML-RPC
    >> has a fixed vocabulary and never needs to look at intermediate
    >> results.

    Paul> Let me suggest an analogy. Someone writes "CGIPython". It uses a
    Paul> specially optimized parser designed for parsing only Python CGI
    Paul> scripts.  Do you think it would run much faster than the regular
    Paul> Python parser?

Bad analogy.  CGI scripts can contain the entire realm of "stuff" that goes
into any other Python program.  XML-RPC encodings can't contain arbitrary
XML tags or attributes.  A better analogy would have been (Martin's I think)
hypothetical Swallow - a subset of Python that could be efficiently
compiled.

    Paul> I don't personally see much benefit using XML if you don't adhere
    Paul> to the XML spec.  Just perusing the code quickly I believe I've
    Paul> found a few bugs that it would not have had if it built on Expat
    Paul> or some other XML parser.

Paul, you have to stop looking at XML-RPC with your Elton John-style
XML-colored glasses.  XML-RPC is not meant to be some sort of highly
structured hierarchical data representation that you can sniff around in
with arbitrary XML tools of one sort or another.  That its on-the-wire
representation happens to be XML is almost ridiculously unimportant.  Dave
Winer created an RPC tool that used XML at about the same time every
computer journalist was wetting their pants every time they heard the
letters X-M-L.  Many implementations were able to leverage existing XML
parsing tools to get going quickly, and Dave got some well-deserved
publicity that he and XML-RPC wouldn't have gotten if he'd chosen some other
serliazation format like Pickle, or invented something new.  Next step: make
it go faster.  Can that be done with standard XML tools?  Yeah, I'm sure it
can be.  Not everybody approaches the problem with the same background you
have though.

    Paul>  1. It doesn't handle ? syntax.

    Paul>  2. It doesn't handle <methodCall > (extra whitespace)

    Paul>  3. I strongly suspect it won't handle comments in the XML.

    Paul>  4. It won't handle the mandatory UTF-16 encoding from XML

    Paul>  5. It won't handle CDATA sections.

Fine.  I'm sure Shilad appreciates the input.  I think your approach to bug
detection and reporting could have been a bit less heavy handed.

As for handling things like CDATA, UTF-16 and extra whitespace after tag
names, I suspect some other XML-RPC packages would exhibit similar problems
if they were exposed to a standards-toting XML gunslinger like yourself.
That it's not a problem in practice is probably because the set of XML-RPC
encoding and decoding software is fairly small and that the stuff that
encodes into XML-RPC is fairly well-behaved.

XML-RPC's widespread availability and practical interoperability (the
XML-RPC website lists 48 implementations) probably owes more to the
cooperative nature of the people involved than the purity of the parsers.

Skip