[Python-Dev] Performance of various marshallers
Skip Montanaro
skip@pobox.com (Skip Montanaro)
Tue, 2 Oct 2001 16:53:15 -0500
>> Still, Unicode or not, the notion that XML-RPC is a data
>> serialization mechanism instead of a compound data markup language
>> means you don't need to provide hooks for processing each element, so
>> full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates.
Paul> I don't see how that follows. py-xmlrpc needs to handle <struct>
Paul> different than <array> so it needs to have a "hook" for each of
Paul> those element types. Having a fixed list of hooks or an extensible
Paul> array of them should not be much different from a performance
Paul> point of view.
Sure, <struct> and <array> mean different things, but <struct> will always
mean the same thing in an XML-RPC context. There's no need to provide any
hooks. Once you've successfully parsed a <struct> you get a Python
dictionary.
As far as I can tell sgmlop is always going to be slower than py-xmlrpc
because it must callback to an Unmarshaller instance for each tag. The only
option currently available is the Unmarshaller class written in Python.
Pythonware has a FastParser/FastUnmarshaller pair available now which I
don't have access to. Perhaps it exhibits encode/decode speeds similar to
py-xmlrpc. You'll have to ask Fredrik. Py-xmlrpc was written with the
knowledge that intermediate results aren't useful and that as you put it, it
has a fixed vocabulary. Why structure a parser to accommodate situations
that aren't needed?
Paul> Yes, an incomplete XML parser could be faster if it ignores
Paul> Unicode, ignores character references, and does not do all of the
Paul> error checking required by the spec. I'm not sure if this would
Paul> really improve performance anyhow.
Does py-xmlrpc have a ways to go? Sure. It's still pretty new software, so
give it time. You seem to be dismissing it completely because it's not as
mature as, say, Expat. I doubt it will lose a factor of 8 in encoding speed
or a factor of 24 in decoding speed (the current speed advantages I measure
over xmlrpclib 1.0b4 using sgmlop) when those things are all added. I'm not
sure all those things will ever be needed, but you're welcome to think they
will.
Paul> py-xmlrpc is probably faster because it doesn't call out to Python
Paul> code until the entire message has been parsed. xmlrpclib on the
Paul> other hand, is entirely written in Python. Is there a Python
Paul> XML-RPC implementation that uses no Python code but does use a
Paul> true XML parser?
That's precisely why py-xmlrpc is faster. Should it behave some other way?
I don't think there is another XML-RPC parser out there that is available
from Python but that doesn't use Python.
>> ... No matter how hard Shilad finds it to add Unicode support to his
>> package, it's still likely to be miles ahead of other XML parsers.
Paul> I think you are exaggerating the benefit of having a fixed
Paul> vocabulary. There is hardly any performance boost possible based
Paul> on that one detail.
I don't understand see how you can't make that connection. XML-RPC has a
fixed vocabulary and never needs to look at intermediate results. It sounds
to me like all you have is a hammer so everything looks like a nail. There
are places for general-purpose XML parsers and places for special-purpose
XML parsers. In this particular context I only care about how fast I can
push objects between a client and server using XML-RPC.
I apologize if the subject seems more general than I intended. My only
intention was to compare the data serialization performance of various
tools. I didn't include "XML-RPC" in the subject of this thread because I
tossed in marshal and cPickle results as well, simply for comparison.
Skip