[Python-Dev] Performance of various marshallers

Shilad Sen shilad@sourcelight.com
Tue, 2 Oct 2001 20:28:53 -0500 (CDT)


Skip has been kind enough to copy me on the bulk of correspondence
regarding py-xmlrpc versus other xmlrpc parsing options.

py-xmlrpc began as a short hack to accomplish specific things that
xmlrpclib couldn't easily accomodate.  I used a hand build parser
because I thought it would be fun and easy (it was!).

Paul, you are correct in that my library doesn't support the 5 items you
mentioned.  I am aware of these, but they are actually not officially
supported by the spec either.  XML-RPC is a bit strange in that the spec
does not allow or require true XML.

My library has been adopted far more than I would have guessed, and I
have had many questions about things like SSL support (which is not up
to spec either).  As a result, I am almost finished with a rewrite that
has all the transport and protocol components nicely split up.  I have
on my list of todo's switching the hand coded parser to expat.  My own
parser works just fine, though, and I haven't had any complaints so that
is relatively low on the list.

My library is certainly not as flexible as xmlrpclib in it's current
format.  I'm hoping that the rewrite will move it to a nice place in the
performance / flexibility spectrum.  As a side effect, it will have a
nice extensible standalone HTTP client and server that offers better
performance for people who really need it.

I am perfectly aware of py-xmlrpc's shortcomings.  On the other hand it
is exactly what the app we use needs, and I would be surprised if there
aren't others who have similar needs.  My hope is that with the next
major release, the library will move a bit closer to a place that suits
people like Paul.  Meanwhile, it works nicely for applications where
performance requirements are absolutely critical.

Shilad Sen


> 
>     >> That's precisely why py-xmlrpc is faster.  Should it behave some
>     >> other way?  I don't think there is another XML-RPC parser out there
>     >> that is available from Python but that doesn't use Python.
> 
>     Paul> Okay, so we agree that the fast part is probably not so much the
>     Paul> parser but the handing of data to Python. So why rewrite a parser?
>     Paul> Nothing requires an Expat-using XML-RPC implementation to call
>     Paul> back into Python for every element. It can collect the results in
>     Paul> C and then call Python when it has values.
> 
> You're asking the wrong person.  Shilad will be the only person who can
> describe his motivations.  We happen to work in the same building, but we
> don't work for the same company.  That's a coincidence about on par with the
> chances of winning the Powerball lottery.  We never met each other formally
> until about a week ago.  Not trying to put words in his mouth, but my guess
> would be that he was not approaching it as an XML problem, but as a parsing
> problem.
> 
>     >> I don't understand see how you can't make that connection.  XML-RPC
>     >> has a fixed vocabulary and never needs to look at intermediate
>     >> results.
> 
>     Paul> Let me suggest an analogy. Someone writes "CGIPython". It uses a
>     Paul> specially optimized parser designed for parsing only Python CGI
>     Paul> scripts.  Do you think it would run much faster than the regular
>     Paul> Python parser?
> 
> Bad analogy.  CGI scripts can contain the entire realm of "stuff" that goes
> into any other Python program.  XML-RPC encodings can't contain arbitrary
> XML tags or attributes.  A better analogy would have been (Martin's I think)
> hypothetical Swallow - a subset of Python that could be efficiently
> compiled.
> 
>     Paul> I don't personally see much benefit using XML if you don't adhere
>     Paul> to the XML spec.  Just perusing the code quickly I believe I've
>     Paul> found a few bugs that it would not have had if it built on Expat
>     Paul> or some other XML parser.
> 
> Paul, you have to stop looking at XML-RPC with your Elton John-style
> XML-colored glasses.  XML-RPC is not meant to be some sort of highly
> structured hierarchical data representation that you can sniff around in
> with arbitrary XML tools of one sort or another.  That its on-the-wire
> representation happens to be XML is almost ridiculously unimportant.  Dave
> Winer created an RPC tool that used XML at about the same time every
> computer journalist was wetting their pants every time they heard the
> letters X-M-L.  Many implementations were able to leverage existing XML
> parsing tools to get going quickly, and Dave got some well-deserved
> publicity that he and XML-RPC wouldn't have gotten if he'd chosen some other
> serliazation format like Pickle, or invented something new.  Next step: make
> it go faster.  Can that be done with standard XML tools?  Yeah, I'm sure it
> can be.  Not everybody approaches the problem with the same background you
> have though.
> 
>     Paul>  1. It doesn't handle ? syntax.
> 
>     Paul>  2. It doesn't handle <methodCall > (extra whitespace)
> 
>     Paul>  3. I strongly suspect it won't handle comments in the XML.
> 
>     Paul>  4. It won't handle the mandatory UTF-16 encoding from XML
> 
>     Paul>  5. It won't handle CDATA sections.
> 
> Fine.  I'm sure Shilad appreciates the input.  I think your approach to bug
> detection and reporting could have been a bit less heavy handed.
> 
> As for handling things like CDATA, UTF-16 and extra whitespace after tag
> names, I suspect some other XML-RPC packages would exhibit similar problems
> if they were exposed to a standards-toting XML gunslinger like yourself.
> That it's not a problem in practice is probably because the set of XML-RPC
> encoding and decoding software is fairly small and that the stuff that
> encodes into XML-RPC is fairly well-behaved.
> 
> XML-RPC's widespread availability and practical interoperability (the
> XML-RPC website lists 48 implementations) probably owes more to the
> cooperative nature of the people involved than the purity of the parsers.
> 
> Skip
>