[Python-Dev] Performance of various marshallers

Skip Montanaro skip@pobox.com (Skip Montanaro)
Tue, 2 Oct 2001 16:53:15 -0500


    >> Still, Unicode or not, the notion that XML-RPC is a data
    >> serialization mechanism instead of a compound data markup language
    >> means you don't need to provide hooks for processing each element, so
    >> full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates.

    Paul> I don't see how that follows. py-xmlrpc needs to handle <struct>
    Paul> different than <array> so it needs to have a "hook" for each of
    Paul> those element types. Having a fixed list of hooks or an extensible
    Paul> array of them should not be much different from a performance
    Paul> point of view.

Sure, <struct> and <array> mean different things, but <struct> will always
mean the same thing in an XML-RPC context.  There's no need to provide any
hooks.  Once you've successfully parsed a <struct> you get a Python
dictionary.

As far as I can tell sgmlop is always going to be slower than py-xmlrpc
because it must callback to an Unmarshaller instance for each tag.  The only
option currently available is the Unmarshaller class written in Python.
Pythonware has a FastParser/FastUnmarshaller pair available now which I
don't have access to.  Perhaps it exhibits encode/decode speeds similar to
py-xmlrpc.  You'll have to ask Fredrik.  Py-xmlrpc was written with the
knowledge that intermediate results aren't useful and that as you put it, it
has a fixed vocabulary.  Why structure a parser to accommodate situations
that aren't needed?

    Paul> Yes, an incomplete XML parser could be faster if it ignores
    Paul> Unicode, ignores character references, and does not do all of the
    Paul> error checking required by the spec. I'm not sure if this would
    Paul> really improve performance anyhow.

Does py-xmlrpc have a ways to go?  Sure.  It's still pretty new software, so
give it time.  You seem to be dismissing it completely because it's not as
mature as, say, Expat.  I doubt it will lose a factor of 8 in encoding speed
or a factor of 24 in decoding speed (the current speed advantages I measure
over xmlrpclib 1.0b4 using sgmlop) when those things are all added.  I'm not
sure all those things will ever be needed, but you're welcome to think they
will.

    Paul> py-xmlrpc is probably faster because it doesn't call out to Python
    Paul> code until the entire message has been parsed. xmlrpclib on the
    Paul> other hand, is entirely written in Python. Is there a Python
    Paul> XML-RPC implementation that uses no Python code but does use a
    Paul> true XML parser?

That's precisely why py-xmlrpc is faster.  Should it behave some other way?
I don't think there is another XML-RPC parser out there that is available
from Python but that doesn't use Python.

    >> ...  No matter how hard Shilad finds it to add Unicode support to his
    >> package, it's still likely to be miles ahead of other XML parsers.

    Paul> I think you are exaggerating the benefit of having a fixed
    Paul> vocabulary.  There is hardly any performance boost possible based
    Paul> on that one detail.

I don't understand see how you can't make that connection.  XML-RPC has a
fixed vocabulary and never needs to look at intermediate results.  It sounds
to me like all you have is a hammer so everything looks like a nail.  There
are places for general-purpose XML parsers and places for special-purpose
XML parsers.  In this particular context I only care about how fast I can
push objects between a client and server using XML-RPC.

I apologize if the subject seems more general than I intended.  My only
intention was to compare the data serialization performance of various
tools.  I didn't include "XML-RPC" in the subject of this thread because I
tossed in marshal and cPickle results as well, simply for comparison.

Skip