[Python-Dev] Performance of various marshallers
Paul Prescod
paul@ActiveState.com
Tue, 02 Oct 2001 15:15:49 -0700
Skip Montanaro wrote:
>
>...
>
> Sure, <struct> and <array> mean different things, but <struct> will always
> mean the same thing in an XML-RPC context. There's no need to provide any
> hooks.
There are two different issues. One is parsing: taking a string of bytes
and interpreting them as XML. The other is passing this information to
the Python programmer. The handling of "hooks" is on the backend,
passing the information to the Python programmer. I interpreted
Fredrick's question as being about the front end: does it use a real XML
parser or not.
>...
> Does py-xmlrpc have a ways to go? Sure. It's still pretty new software, so
> give it time. You seem to be dismissing it completely because it's not as
> mature as, say, Expat.
I'm not asking it to be as mature as Expat. I'm asking why it didn't
*use* Expat or some other parser. Expat would recognize structs and
arrays and pass them to C code which builds Python objects. Then those
Python objects can be passed to Python.
>...
> Paul> py-xmlrpc is probably faster because it doesn't call out to Python
> Paul> code until the entire message has been parsed. xmlrpclib on the
> Paul> other hand, is entirely written in Python. Is there a Python
> Paul> XML-RPC implementation that uses no Python code but does use a
> Paul> true XML parser?
>
> That's precisely why py-xmlrpc is faster. Should it behave some other way?
> I don't think there is another XML-RPC parser out there that is available
> from Python but that doesn't use Python.
Okay, so we agree that the fast part is probably not so much the parser
but the handing of data to Python. So why rewrite a parser? Nothing
requires an Expat-using XML-RPC implementation to call back into Python
for every element. It can collect the results in C and then call Python
when it has values.
>...
> I don't understand see how you can't make that connection. XML-RPC has a
> fixed vocabulary and never needs to look at intermediate results.
Let me suggest an analogy. Someone writes "CGIPython". It uses a
specially optimized parser designed for parsing only Python CGI scripts.
Do you think it would run much faster than the regular Python parser?
Well, syntactically CGI scripts are basically the same as ordinary
Python programs so why would you *want* a specialized parser? Parsing
angle brackets is the same whether they are in an XML-RPC message or a
Docbook document, just as parsing Python is the same, whether it is a
CGI or a GUI app.
> ... It sounds
> to me like all you have is a hammer so everything looks like a nail. There
> are places for general-purpose XML parsers and places for special-purpose
> XML parsers. In this particular context I only care about how fast I can
> push objects between a client and server using XML-RPC.
I don't personally see much benefit using XML if you don't adhere to the
XML spec. Just perusing the code quickly I believe I've found a few bugs
that it would not have had if it built on Expat or some other XML
parser.
1. It doesn't handle ? syntax.
2. It doesn't handle <methodCall > (extra whitespace)
3. I strongly suspect it won't handle comments in the XML.
4. It won't handle the mandatory UTF-16 encoding from XML
5. It won't handle CDATA sections.
Paul Prescod