
I just tested each of the marshallers readily available to me. I dumped and loaded this object: ['MusicEntry', 'email': 'foo@bar.baz.spam', 'time': '7:30pm', 'tickets': '', 'program': '', 'state': 'MA', 'start': '2002-01-26', 'venueurl': '', 'country': '', 'performers': ['An Evening with Karen Savoca'], 'addressid': 7283, 'name': '', 'zip': '', 'city': 'Sudbury', 'info': 'Reservations required. Please call (978)443-3253 or e-mail Laurie at lalcorn@ultranet.com.', 'merchandise': [], 'event': '', 'keywords': ['.zyx.41'], 'submit_time': '2001-08-28', 'key': 325629, 'active': 1, 'end': '2002-01-26', 'address1': '', 'venue': 'Fox Run House Concerts', 'price': '$17', 'address3': '', 'address2': '', 'update_time': '2001-09-22:19:28:44'}] I don't claim this is typical data, but it is typical of the type of data I push through XML-RPC, so it's important to me. You can see why moving imports out of dump_string was so worthwhile. I would be happy to change the object being marshalled to better reflect what people think is "typical". All numbers in the following table are in encodings or decodings per second. All times were measured using time.clock. The number of times the encoding/decoding operation was performed was varied to give a reasonable total test time (approximately 5 seconds). Each test was run 3 times. The largest number is recorded below, rounded to three significant digits. encode decode ------ ------ marshal 25900 7830 cPickle 1230 149 xmlrpclib 0.9.8 w/ sgmlop 416 107 w/o sgmlop 415 16.3 xmlrpclib 1.0b4 w/ sgmlop 365 92.0 w/o sgmlop 363 74.9 py-xmlrpc 2780 2260 Skip

Skip Montanaro writes:
Were the cPickle tests run in binary or original flavor?
| +---------------------------------------------------+ | +----> I presume that Expat was available for the second run and not for the first? These should probably be broken into three categories: sgmlop, expat, and xmllib. I also presume that py-xmlrpc never calls from C->Python during the parse phase, but I've not yet had a chance to look at this code. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

Were the cPickle tests run in binary or original flavor? I wasn't aware of a "binary flavor". It's not mentioned in the online docs. I just called cPickle.dumps or cPickle.loads as appropriate. It looks like I should call them with a second binary flag.
| +---------------------------------------------------+ | +----> I presume that Expat was available for the second run and not for the first? These should probably be broken into three categories: sgmlop, expat, and xmllib. In 0.9.8 there are two parsers, fast (with sgmlop) and slow (without). I believe the ExpatParser was used in the second version. It doesn't really matter to me because they are all perform so abysmally. I also presume that py-xmlrpc never calls from C->Python during the parse phase, but I've not yet had a chance to look at this code. I don't know. I've not looked at the code, only the output. I have cc'd Shilad Sen on this thread. He should be able to tell us how py-xmlrpc gets such good performance. Skip

>> I also presume that py-xmlrpc never calls from C->Python during the >> parse phase, but I've not yet had a chance to look at this code. Fredrik> does py-xmlrpc use a real XML parser? I suspect not. It's special purpose is to parse or generate XML-RPC, so you know ahead of time that the end result is the only thing you need. Skip

>> I suspect not. It's special purpose is to parse or generate XML-RPC, >> so you know ahead of time that the end result is the only thing you >> need. Paul> One reason to use a full XML parser is you get Unicode cheaply. I Paul> don't see Unicode as a feature that you add in a weekend at the Paul> end... XML-RPC's relationship to Unicode is ill-defined. The spec that Dave Winer wrote requires all data to be US-ASCII, so XML-RPC isn't really XML-compliant. (You'll have to take up issues of standards compliance with Dave.) Still, Unicode or not, the notion that XML-RPC is a data serialization mechanism instead of a compound data markup language means you don't need to provide hooks for processing each element, so full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates. No matter how hard Shilad finds it to add Unicode support to his package, it's still likely to be miles ahead of other XML parsers. Skip

Skip Montanaro wrote:
Most XML-RPC implementations support Unicode, Dave Winer notwithstanding. Plus, the XML-RPC spec says nothing to indicate that XML-RPC documents may not be encoded in either of XML's two built-in encodings (even if the data is restricted to ASCII values).
I don't see how that follows. py-xmlrpc needs to handle <struct> different than <array> so it needs to have a "hook" for each of those element types. Having a fixed list of hooks or an extensible array of them should not be much different from a performance point of view. Yes, an incomplete XML parser could be faster if it ignores Unicode, ignores character references, and does not do all of the error checking required by the spec. I'm not sure if this would really improve performance anyhow. py-xmlrpc is probably faster because it doesn't call out to Python code until the entire message has been parsed. xmlrpclib on the other hand, is entirely written in Python. Is there a Python XML-RPC implementation that uses no Python code but does use a true XML parser?
I think you are exaggerating the benefit of having a fixed vocabulary. There is hardly any performance boost possible based on that one detail. Paul Prescod

>> Still, Unicode or not, the notion that XML-RPC is a data >> serialization mechanism instead of a compound data markup language >> means you don't need to provide hooks for processing each element, so >> full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates. Paul> I don't see how that follows. py-xmlrpc needs to handle <struct> Paul> different than <array> so it needs to have a "hook" for each of Paul> those element types. Having a fixed list of hooks or an extensible Paul> array of them should not be much different from a performance Paul> point of view. Sure, <struct> and <array> mean different things, but <struct> will always mean the same thing in an XML-RPC context. There's no need to provide any hooks. Once you've successfully parsed a <struct> you get a Python dictionary. As far as I can tell sgmlop is always going to be slower than py-xmlrpc because it must callback to an Unmarshaller instance for each tag. The only option currently available is the Unmarshaller class written in Python. Pythonware has a FastParser/FastUnmarshaller pair available now which I don't have access to. Perhaps it exhibits encode/decode speeds similar to py-xmlrpc. You'll have to ask Fredrik. Py-xmlrpc was written with the knowledge that intermediate results aren't useful and that as you put it, it has a fixed vocabulary. Why structure a parser to accommodate situations that aren't needed? Paul> Yes, an incomplete XML parser could be faster if it ignores Paul> Unicode, ignores character references, and does not do all of the Paul> error checking required by the spec. I'm not sure if this would Paul> really improve performance anyhow. Does py-xmlrpc have a ways to go? Sure. It's still pretty new software, so give it time. You seem to be dismissing it completely because it's not as mature as, say, Expat. I doubt it will lose a factor of 8 in encoding speed or a factor of 24 in decoding speed (the current speed advantages I measure over xmlrpclib 1.0b4 using sgmlop) when those things are all added. I'm not sure all those things will ever be needed, but you're welcome to think they will. Paul> py-xmlrpc is probably faster because it doesn't call out to Python Paul> code until the entire message has been parsed. xmlrpclib on the Paul> other hand, is entirely written in Python. Is there a Python Paul> XML-RPC implementation that uses no Python code but does use a Paul> true XML parser? That's precisely why py-xmlrpc is faster. Should it behave some other way? I don't think there is another XML-RPC parser out there that is available from Python but that doesn't use Python. >> ... No matter how hard Shilad finds it to add Unicode support to his >> package, it's still likely to be miles ahead of other XML parsers. Paul> I think you are exaggerating the benefit of having a fixed Paul> vocabulary. There is hardly any performance boost possible based Paul> on that one detail. I don't understand see how you can't make that connection. XML-RPC has a fixed vocabulary and never needs to look at intermediate results. It sounds to me like all you have is a hammer so everything looks like a nail. There are places for general-purpose XML parsers and places for special-purpose XML parsers. In this particular context I only care about how fast I can push objects between a client and server using XML-RPC. I apologize if the subject seems more general than I intended. My only intention was to compare the data serialization performance of various tools. I didn't include "XML-RPC" in the subject of this thread because I tossed in marshal and cPickle results as well, simply for comparison. Skip

Skip Montanaro wrote:
There are two different issues. One is parsing: taking a string of bytes and interpreting them as XML. The other is passing this information to the Python programmer. The handling of "hooks" is on the backend, passing the information to the Python programmer. I interpreted Fredrick's question as being about the front end: does it use a real XML parser or not.
I'm not asking it to be as mature as Expat. I'm asking why it didn't *use* Expat or some other parser. Expat would recognize structs and arrays and pass them to C code which builds Python objects. Then those Python objects can be passed to Python.
Okay, so we agree that the fast part is probably not so much the parser but the handing of data to Python. So why rewrite a parser? Nothing requires an Expat-using XML-RPC implementation to call back into Python for every element. It can collect the results in C and then call Python when it has values.
Let me suggest an analogy. Someone writes "CGIPython". It uses a specially optimized parser designed for parsing only Python CGI scripts. Do you think it would run much faster than the regular Python parser? Well, syntactically CGI scripts are basically the same as ordinary Python programs so why would you *want* a specialized parser? Parsing angle brackets is the same whether they are in an XML-RPC message or a Docbook document, just as parsing Python is the same, whether it is a CGI or a GUI app.
I don't personally see much benefit using XML if you don't adhere to the XML spec. Just perusing the code quickly I believe I've found a few bugs that it would not have had if it built on Expat or some other XML parser. 1. It doesn't handle ? syntax. 2. It doesn't handle <methodCall > (extra whitespace) 3. I strongly suspect it won't handle comments in the XML. 4. It won't handle the mandatory UTF-16 encoding from XML 5. It won't handle CDATA sections. Paul Prescod

>> That's precisely why py-xmlrpc is faster. Should it behave some >> other way? I don't think there is another XML-RPC parser out there >> that is available from Python but that doesn't use Python. Paul> Okay, so we agree that the fast part is probably not so much the Paul> parser but the handing of data to Python. So why rewrite a parser? Paul> Nothing requires an Expat-using XML-RPC implementation to call Paul> back into Python for every element. It can collect the results in Paul> C and then call Python when it has values. You're asking the wrong person. Shilad will be the only person who can describe his motivations. We happen to work in the same building, but we don't work for the same company. That's a coincidence about on par with the chances of winning the Powerball lottery. We never met each other formally until about a week ago. Not trying to put words in his mouth, but my guess would be that he was not approaching it as an XML problem, but as a parsing problem. >> I don't understand see how you can't make that connection. XML-RPC >> has a fixed vocabulary and never needs to look at intermediate >> results. Paul> Let me suggest an analogy. Someone writes "CGIPython". It uses a Paul> specially optimized parser designed for parsing only Python CGI Paul> scripts. Do you think it would run much faster than the regular Paul> Python parser? Bad analogy. CGI scripts can contain the entire realm of "stuff" that goes into any other Python program. XML-RPC encodings can't contain arbitrary XML tags or attributes. A better analogy would have been (Martin's I think) hypothetical Swallow - a subset of Python that could be efficiently compiled. Paul> I don't personally see much benefit using XML if you don't adhere Paul> to the XML spec. Just perusing the code quickly I believe I've Paul> found a few bugs that it would not have had if it built on Expat Paul> or some other XML parser. Paul, you have to stop looking at XML-RPC with your Elton John-style XML-colored glasses. XML-RPC is not meant to be some sort of highly structured hierarchical data representation that you can sniff around in with arbitrary XML tools of one sort or another. That its on-the-wire representation happens to be XML is almost ridiculously unimportant. Dave Winer created an RPC tool that used XML at about the same time every computer journalist was wetting their pants every time they heard the letters X-M-L. Many implementations were able to leverage existing XML parsing tools to get going quickly, and Dave got some well-deserved publicity that he and XML-RPC wouldn't have gotten if he'd chosen some other serliazation format like Pickle, or invented something new. Next step: make it go faster. Can that be done with standard XML tools? Yeah, I'm sure it can be. Not everybody approaches the problem with the same background you have though. Paul> 1. It doesn't handle ? syntax. Paul> 2. It doesn't handle <methodCall > (extra whitespace) Paul> 3. I strongly suspect it won't handle comments in the XML. Paul> 4. It won't handle the mandatory UTF-16 encoding from XML Paul> 5. It won't handle CDATA sections. Fine. I'm sure Shilad appreciates the input. I think your approach to bug detection and reporting could have been a bit less heavy handed. As for handling things like CDATA, UTF-16 and extra whitespace after tag names, I suspect some other XML-RPC packages would exhibit similar problems if they were exposed to a standards-toting XML gunslinger like yourself. That it's not a problem in practice is probably because the set of XML-RPC encoding and decoding software is fairly small and that the stuff that encodes into XML-RPC is fairly well-behaved. XML-RPC's widespread availability and practical interoperability (the XML-RPC website lists 48 implementations) probably owes more to the cooperative nature of the people involved than the purity of the parsers. Skip

Skip Montanaro wrote:
But there is no evidence that this subset of XML can be more efficiently parsed than any other. XML parsing consists primarily of recognizing angle brackets and a few other characters, and passing around some extra data. Any performance loss from a "full" XML parser will shrink as people submit bug reports that require a "simplified" XML parser to conform to the XML spec (Unicode, CDATA, etc.). I strongly agree that a dedicated C-written XML-RPC implementation can be faster than one written based on Python and Expat. I haven't yet seen evidence that you can both conform with the standards and get much of a speedup over one that is built on a fast XML Parser such as Eric Kidd's XML-RPC C or xmlrpc-epi (both on SourceForge).
XML-RPC uses XML for exactly the same reason every other application of XML uses XML. Precisely so that you will not have to write yet another parser for it. That's the central reason *for* XML. That's the only advantage XML has over cPickle -- that you can be sure whatever language you have, it will have an XML parser available built in.
I'm not trying to embarrass Shilad. The software isn't at 1.0 yet. Maybe he hasn't got around to choosing an XML parser. I'm trying to point out (more to you, than to him!) that there is a good reason to build on the work other people have done. If pyxmlrpc is faster today it is probably because it doesn't conform to the specs. When it does conform, it won't be faster anymore.
Every XML-RPC implementation I have ever used (Python, Perl, C, C++, PHP) is based upon one pure XML parser or another. Most use Expat. Paul Prescod

>> Paul, you have to stop looking at XML-RPC with your Elton John-style >> XML-colored glasses. XML-RPC is not meant to be some sort of highly >> structured hierarchical data representation that you can sniff around >> in with arbitrary XML tools of one sort or another. That its >> on-the-wire representation happens to be XML is almost ridiculously >> unimportant. Paul> XML-RPC uses XML for exactly the same reason every other Paul> application of XML uses XML. I disagree with that. Lots of applications use XML because it's got that pants-wetting capability I described earlier. >> Fine. I'm sure Shilad appreciates the input. I think your approach >> to bug detection and reporting could have been a bit less heavy >> handed. Paul> I'm not trying to embarrass Shilad. The software isn't at 1.0 Paul> yet. Maybe he hasn't got around to choosing an XML parser. Or maybe he has a different set of constraints than you. Paul> I'm trying to point out (more to you, than to him!) that there is Paul> a good reason to build on the work other people have done. If Paul> pyxmlrpc is faster today it is probably because it doesn't conform Paul> to the specs. When it does conform, it won't be faster anymore. Why point this out to me? I am essentially just an XML-RPC user, not an implementer. I happen to be interested in making my XML-RPC-using code run faster. If I have to make some sacrifices I could care less, as long as my clients and my servers can talk to one another. >> As for handling things like CDATA, UTF-16 and extra whitespace after >> tag names, I suspect some other XML-RPC packages would exhibit >> similar problems if they were exposed to a standards-toting XML >> gunslinger like yourself. That it's not a problem in practice is >> probably because the set of XML-RPC encoding and decoding software is >> fairly small and that the stuff that encodes into XML-RPC is fairly >> well-behaved. Paul> Every XML-RPC implementation I have ever used (Python, Perl, C, Paul> C++, PHP) is based upon one pure XML parser or another. Most use Paul> Expat. Oh well. S

Skip has been kind enough to copy me on the bulk of correspondence regarding py-xmlrpc versus other xmlrpc parsing options. py-xmlrpc began as a short hack to accomplish specific things that xmlrpclib couldn't easily accomodate. I used a hand build parser because I thought it would be fun and easy (it was!). Paul, you are correct in that my library doesn't support the 5 items you mentioned. I am aware of these, but they are actually not officially supported by the spec either. XML-RPC is a bit strange in that the spec does not allow or require true XML. My library has been adopted far more than I would have guessed, and I have had many questions about things like SSL support (which is not up to spec either). As a result, I am almost finished with a rewrite that has all the transport and protocol components nicely split up. I have on my list of todo's switching the hand coded parser to expat. My own parser works just fine, though, and I haven't had any complaints so that is relatively low on the list. My library is certainly not as flexible as xmlrpclib in it's current format. I'm hoping that the rewrite will move it to a nice place in the performance / flexibility spectrum. As a side effect, it will have a nice extensible standalone HTTP client and server that offers better performance for people who really need it. I am perfectly aware of py-xmlrpc's shortcomings. On the other hand it is exactly what the app we use needs, and I would be surprised if there aren't others who have similar needs. My hope is that with the next major release, the library will move a bit closer to a place that suits people like Paul. Meanwhile, it works nicely for applications where performance requirements are absolutely critical. Shilad Sen

Shilad Sen wrote:
Skip has been kind enough to copy me on the bulk of correspondence regarding py-xmlrpc versus other xmlrpc parsing options.
Thanks for your good-natured response.
I think that if a spec claims to be based on XML and does not explicitly disclaim support for built-in XML features, then it allows them. For instance if it doesn't say that C syntax is illegal, then there is no reason to believe it is.
That's fine with me. If your simplified parser turns out to be significantly faster than Expat (too early to say) then you could even keep it around as an option when the client and the server are both known to be using the same subset of XML.
Did you consider wrapping one of the existing XML-RPC libraries written in C? When we needed a reentrant XML-RPC library for PHP, we wrapped Eric Kidd's xmlrpc-c. Paul Prescod

Paul> I think that if a spec claims to be based on XML and does not Paul> explicitly disclaim support for built-in XML features, then it Paul> allows them. For instance if it doesn't say that C syntax is Paul> illegal, then there is no reason to believe it is. Paul, You probably know as well as anyone that the one and only person you should talking to about XML-RPC and its XML compliance (or lack thereof) is Dave Winer. Feel free to read through the archives of the xmlrpc@yahoogroups.com mailing list if you haven't already. If you can move Dave from his current position, more power to you. You'll do something that many other people have been incapable of doing. I'm done with this topic. It's gotten way too far from python-dev-related topics. Probably should have cut it out of the cc list awhile back. Skip

I'm done with this topic. It's gotten way too far from python-dev-related topics. Probably should have cut it out of the cc list awhile back.
Amen. --Guido van Rossum (home page: http://www.python.org/~guido/)

paul wrote:
the specification says that XML-RPC uses XML and HTTP. it doesn't say anything about a Dave-specific subset of XML or HTTP... (like so many other parts of the specification, the "string" type isn't exactly well-specified. the specification first says that strings contains ASCII characters, and later that "any characters are allowed in a string" and that "a string can be used to encode binary data")
well, sgmlop is a bit faster than expat (up to 50%, in some tests). expat does a bit more error checking.
the _xmlrpclib accelerator (see the xmlrpclib.py source) uses expat, with a really fast C layer. judging from Skip's benchmarks, expat is a bit slower than the py-xmlrpc parser (which is why I asked). </F>

Paul> I have a feeling py-xmlrpc will slow down a bit when it is Paul> internationalized: Paul> if (strncmp(*cp, "<int>", 5) == 0) Paul> res = decodeInt(cp, ep, lines); Paul> else if (strncmp(*cp, "<i4>", 4) == 0) Paul> res = decodeI4(cp, ep, lines); Paul> .... Paul, If you want to find and fix bugs in py-xmlrpc or help the author improve the quality of his tools, please send your reports directly to Shilad Sen (shilad@sourcelight.com). Py-xmlrpc has nothing to do with the Python core. I apologize for even including it in the table I posted. Shilad didn't deserve any of the bad press you've given him here. Sending snickering notes to python-dev about the code is not helpful, and only serves to lessen the value I place on your other opinions. Skip

I apologize if I embarassed Shilad. I don't know him so I don't know how he will take a public critique of his code. For all I know, he agrees with me and merely hasn't got around to adding in an XML parser. On the one hand, I can see how it would be nicer to discuss it directly with you and him, but on the other, it is a real technical issue that deserves public discussion. I felt (and feel) that you've made a technical mistake in attributing py-xmlrpc's speed to its having a fixed tagset and I only posted code to demonstrate where the real speedup comes from. I've spent my whole life working around bugs in hand-rolled XML (and SGML) parsers that are supposed to be faster than general ones but end up not being so. I react almost as intemperately when someone tells me that their app embeds a new scripting language that they invented over the weekend. Although I do think that the current parsing approach taken in py-xmlrpc is flawed, I do think that the overall idea is good. It makes sense to parse XML-RPC purely in C without using Python callbacks. Paul Prescod

Skip Montanaro writes:
Were the cPickle tests run in binary or original flavor?
| +---------------------------------------------------+ | +----> I presume that Expat was available for the second run and not for the first? These should probably be broken into three categories: sgmlop, expat, and xmllib. I also presume that py-xmlrpc never calls from C->Python during the parse phase, but I've not yet had a chance to look at this code. -Fred -- Fred L. Drake, Jr. <fdrake at acm.org> PythonLabs at Zope Corporation

Were the cPickle tests run in binary or original flavor? I wasn't aware of a "binary flavor". It's not mentioned in the online docs. I just called cPickle.dumps or cPickle.loads as appropriate. It looks like I should call them with a second binary flag.
| +---------------------------------------------------+ | +----> I presume that Expat was available for the second run and not for the first? These should probably be broken into three categories: sgmlop, expat, and xmllib. In 0.9.8 there are two parsers, fast (with sgmlop) and slow (without). I believe the ExpatParser was used in the second version. It doesn't really matter to me because they are all perform so abysmally. I also presume that py-xmlrpc never calls from C->Python during the parse phase, but I've not yet had a chance to look at this code. I don't know. I've not looked at the code, only the output. I have cc'd Shilad Sen on this thread. He should be able to tell us how py-xmlrpc gets such good performance. Skip

>> I also presume that py-xmlrpc never calls from C->Python during the >> parse phase, but I've not yet had a chance to look at this code. Fredrik> does py-xmlrpc use a real XML parser? I suspect not. It's special purpose is to parse or generate XML-RPC, so you know ahead of time that the end result is the only thing you need. Skip

>> I suspect not. It's special purpose is to parse or generate XML-RPC, >> so you know ahead of time that the end result is the only thing you >> need. Paul> One reason to use a full XML parser is you get Unicode cheaply. I Paul> don't see Unicode as a feature that you add in a weekend at the Paul> end... XML-RPC's relationship to Unicode is ill-defined. The spec that Dave Winer wrote requires all data to be US-ASCII, so XML-RPC isn't really XML-compliant. (You'll have to take up issues of standards compliance with Dave.) Still, Unicode or not, the notion that XML-RPC is a data serialization mechanism instead of a compound data markup language means you don't need to provide hooks for processing each element, so full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates. No matter how hard Shilad finds it to add Unicode support to his package, it's still likely to be miles ahead of other XML parsers. Skip

Skip Montanaro wrote:
Most XML-RPC implementations support Unicode, Dave Winer notwithstanding. Plus, the XML-RPC spec says nothing to indicate that XML-RPC documents may not be encoded in either of XML's two built-in encodings (even if the data is restricted to ASCII values).
I don't see how that follows. py-xmlrpc needs to handle <struct> different than <array> so it needs to have a "hook" for each of those element types. Having a fixed list of hooks or an extensible array of them should not be much different from a performance point of view. Yes, an incomplete XML parser could be faster if it ignores Unicode, ignores character references, and does not do all of the error checking required by the spec. I'm not sure if this would really improve performance anyhow. py-xmlrpc is probably faster because it doesn't call out to Python code until the entire message has been parsed. xmlrpclib on the other hand, is entirely written in Python. Is there a Python XML-RPC implementation that uses no Python code but does use a true XML parser?
I think you are exaggerating the benefit of having a fixed vocabulary. There is hardly any performance boost possible based on that one detail. Paul Prescod

>> Still, Unicode or not, the notion that XML-RPC is a data >> serialization mechanism instead of a compound data markup language >> means you don't need to provide hooks for processing each element, so >> full-blown XML parsers tend to be overkill as py-xmlrpc demonstrates. Paul> I don't see how that follows. py-xmlrpc needs to handle <struct> Paul> different than <array> so it needs to have a "hook" for each of Paul> those element types. Having a fixed list of hooks or an extensible Paul> array of them should not be much different from a performance Paul> point of view. Sure, <struct> and <array> mean different things, but <struct> will always mean the same thing in an XML-RPC context. There's no need to provide any hooks. Once you've successfully parsed a <struct> you get a Python dictionary. As far as I can tell sgmlop is always going to be slower than py-xmlrpc because it must callback to an Unmarshaller instance for each tag. The only option currently available is the Unmarshaller class written in Python. Pythonware has a FastParser/FastUnmarshaller pair available now which I don't have access to. Perhaps it exhibits encode/decode speeds similar to py-xmlrpc. You'll have to ask Fredrik. Py-xmlrpc was written with the knowledge that intermediate results aren't useful and that as you put it, it has a fixed vocabulary. Why structure a parser to accommodate situations that aren't needed? Paul> Yes, an incomplete XML parser could be faster if it ignores Paul> Unicode, ignores character references, and does not do all of the Paul> error checking required by the spec. I'm not sure if this would Paul> really improve performance anyhow. Does py-xmlrpc have a ways to go? Sure. It's still pretty new software, so give it time. You seem to be dismissing it completely because it's not as mature as, say, Expat. I doubt it will lose a factor of 8 in encoding speed or a factor of 24 in decoding speed (the current speed advantages I measure over xmlrpclib 1.0b4 using sgmlop) when those things are all added. I'm not sure all those things will ever be needed, but you're welcome to think they will. Paul> py-xmlrpc is probably faster because it doesn't call out to Python Paul> code until the entire message has been parsed. xmlrpclib on the Paul> other hand, is entirely written in Python. Is there a Python Paul> XML-RPC implementation that uses no Python code but does use a Paul> true XML parser? That's precisely why py-xmlrpc is faster. Should it behave some other way? I don't think there is another XML-RPC parser out there that is available from Python but that doesn't use Python. >> ... No matter how hard Shilad finds it to add Unicode support to his >> package, it's still likely to be miles ahead of other XML parsers. Paul> I think you are exaggerating the benefit of having a fixed Paul> vocabulary. There is hardly any performance boost possible based Paul> on that one detail. I don't understand see how you can't make that connection. XML-RPC has a fixed vocabulary and never needs to look at intermediate results. It sounds to me like all you have is a hammer so everything looks like a nail. There are places for general-purpose XML parsers and places for special-purpose XML parsers. In this particular context I only care about how fast I can push objects between a client and server using XML-RPC. I apologize if the subject seems more general than I intended. My only intention was to compare the data serialization performance of various tools. I didn't include "XML-RPC" in the subject of this thread because I tossed in marshal and cPickle results as well, simply for comparison. Skip

Skip Montanaro wrote:
There are two different issues. One is parsing: taking a string of bytes and interpreting them as XML. The other is passing this information to the Python programmer. The handling of "hooks" is on the backend, passing the information to the Python programmer. I interpreted Fredrick's question as being about the front end: does it use a real XML parser or not.
I'm not asking it to be as mature as Expat. I'm asking why it didn't *use* Expat or some other parser. Expat would recognize structs and arrays and pass them to C code which builds Python objects. Then those Python objects can be passed to Python.
Okay, so we agree that the fast part is probably not so much the parser but the handing of data to Python. So why rewrite a parser? Nothing requires an Expat-using XML-RPC implementation to call back into Python for every element. It can collect the results in C and then call Python when it has values.
Let me suggest an analogy. Someone writes "CGIPython". It uses a specially optimized parser designed for parsing only Python CGI scripts. Do you think it would run much faster than the regular Python parser? Well, syntactically CGI scripts are basically the same as ordinary Python programs so why would you *want* a specialized parser? Parsing angle brackets is the same whether they are in an XML-RPC message or a Docbook document, just as parsing Python is the same, whether it is a CGI or a GUI app.
I don't personally see much benefit using XML if you don't adhere to the XML spec. Just perusing the code quickly I believe I've found a few bugs that it would not have had if it built on Expat or some other XML parser. 1. It doesn't handle ? syntax. 2. It doesn't handle <methodCall > (extra whitespace) 3. I strongly suspect it won't handle comments in the XML. 4. It won't handle the mandatory UTF-16 encoding from XML 5. It won't handle CDATA sections. Paul Prescod

>> That's precisely why py-xmlrpc is faster. Should it behave some >> other way? I don't think there is another XML-RPC parser out there >> that is available from Python but that doesn't use Python. Paul> Okay, so we agree that the fast part is probably not so much the Paul> parser but the handing of data to Python. So why rewrite a parser? Paul> Nothing requires an Expat-using XML-RPC implementation to call Paul> back into Python for every element. It can collect the results in Paul> C and then call Python when it has values. You're asking the wrong person. Shilad will be the only person who can describe his motivations. We happen to work in the same building, but we don't work for the same company. That's a coincidence about on par with the chances of winning the Powerball lottery. We never met each other formally until about a week ago. Not trying to put words in his mouth, but my guess would be that he was not approaching it as an XML problem, but as a parsing problem. >> I don't understand see how you can't make that connection. XML-RPC >> has a fixed vocabulary and never needs to look at intermediate >> results. Paul> Let me suggest an analogy. Someone writes "CGIPython". It uses a Paul> specially optimized parser designed for parsing only Python CGI Paul> scripts. Do you think it would run much faster than the regular Paul> Python parser? Bad analogy. CGI scripts can contain the entire realm of "stuff" that goes into any other Python program. XML-RPC encodings can't contain arbitrary XML tags or attributes. A better analogy would have been (Martin's I think) hypothetical Swallow - a subset of Python that could be efficiently compiled. Paul> I don't personally see much benefit using XML if you don't adhere Paul> to the XML spec. Just perusing the code quickly I believe I've Paul> found a few bugs that it would not have had if it built on Expat Paul> or some other XML parser. Paul, you have to stop looking at XML-RPC with your Elton John-style XML-colored glasses. XML-RPC is not meant to be some sort of highly structured hierarchical data representation that you can sniff around in with arbitrary XML tools of one sort or another. That its on-the-wire representation happens to be XML is almost ridiculously unimportant. Dave Winer created an RPC tool that used XML at about the same time every computer journalist was wetting their pants every time they heard the letters X-M-L. Many implementations were able to leverage existing XML parsing tools to get going quickly, and Dave got some well-deserved publicity that he and XML-RPC wouldn't have gotten if he'd chosen some other serliazation format like Pickle, or invented something new. Next step: make it go faster. Can that be done with standard XML tools? Yeah, I'm sure it can be. Not everybody approaches the problem with the same background you have though. Paul> 1. It doesn't handle ? syntax. Paul> 2. It doesn't handle <methodCall > (extra whitespace) Paul> 3. I strongly suspect it won't handle comments in the XML. Paul> 4. It won't handle the mandatory UTF-16 encoding from XML Paul> 5. It won't handle CDATA sections. Fine. I'm sure Shilad appreciates the input. I think your approach to bug detection and reporting could have been a bit less heavy handed. As for handling things like CDATA, UTF-16 and extra whitespace after tag names, I suspect some other XML-RPC packages would exhibit similar problems if they were exposed to a standards-toting XML gunslinger like yourself. That it's not a problem in practice is probably because the set of XML-RPC encoding and decoding software is fairly small and that the stuff that encodes into XML-RPC is fairly well-behaved. XML-RPC's widespread availability and practical interoperability (the XML-RPC website lists 48 implementations) probably owes more to the cooperative nature of the people involved than the purity of the parsers. Skip

Skip Montanaro wrote:
But there is no evidence that this subset of XML can be more efficiently parsed than any other. XML parsing consists primarily of recognizing angle brackets and a few other characters, and passing around some extra data. Any performance loss from a "full" XML parser will shrink as people submit bug reports that require a "simplified" XML parser to conform to the XML spec (Unicode, CDATA, etc.). I strongly agree that a dedicated C-written XML-RPC implementation can be faster than one written based on Python and Expat. I haven't yet seen evidence that you can both conform with the standards and get much of a speedup over one that is built on a fast XML Parser such as Eric Kidd's XML-RPC C or xmlrpc-epi (both on SourceForge).
XML-RPC uses XML for exactly the same reason every other application of XML uses XML. Precisely so that you will not have to write yet another parser for it. That's the central reason *for* XML. That's the only advantage XML has over cPickle -- that you can be sure whatever language you have, it will have an XML parser available built in.
I'm not trying to embarrass Shilad. The software isn't at 1.0 yet. Maybe he hasn't got around to choosing an XML parser. I'm trying to point out (more to you, than to him!) that there is a good reason to build on the work other people have done. If pyxmlrpc is faster today it is probably because it doesn't conform to the specs. When it does conform, it won't be faster anymore.
Every XML-RPC implementation I have ever used (Python, Perl, C, C++, PHP) is based upon one pure XML parser or another. Most use Expat. Paul Prescod

>> Paul, you have to stop looking at XML-RPC with your Elton John-style >> XML-colored glasses. XML-RPC is not meant to be some sort of highly >> structured hierarchical data representation that you can sniff around >> in with arbitrary XML tools of one sort or another. That its >> on-the-wire representation happens to be XML is almost ridiculously >> unimportant. Paul> XML-RPC uses XML for exactly the same reason every other Paul> application of XML uses XML. I disagree with that. Lots of applications use XML because it's got that pants-wetting capability I described earlier. >> Fine. I'm sure Shilad appreciates the input. I think your approach >> to bug detection and reporting could have been a bit less heavy >> handed. Paul> I'm not trying to embarrass Shilad. The software isn't at 1.0 Paul> yet. Maybe he hasn't got around to choosing an XML parser. Or maybe he has a different set of constraints than you. Paul> I'm trying to point out (more to you, than to him!) that there is Paul> a good reason to build on the work other people have done. If Paul> pyxmlrpc is faster today it is probably because it doesn't conform Paul> to the specs. When it does conform, it won't be faster anymore. Why point this out to me? I am essentially just an XML-RPC user, not an implementer. I happen to be interested in making my XML-RPC-using code run faster. If I have to make some sacrifices I could care less, as long as my clients and my servers can talk to one another. >> As for handling things like CDATA, UTF-16 and extra whitespace after >> tag names, I suspect some other XML-RPC packages would exhibit >> similar problems if they were exposed to a standards-toting XML >> gunslinger like yourself. That it's not a problem in practice is >> probably because the set of XML-RPC encoding and decoding software is >> fairly small and that the stuff that encodes into XML-RPC is fairly >> well-behaved. Paul> Every XML-RPC implementation I have ever used (Python, Perl, C, Paul> C++, PHP) is based upon one pure XML parser or another. Most use Paul> Expat. Oh well. S

Skip has been kind enough to copy me on the bulk of correspondence regarding py-xmlrpc versus other xmlrpc parsing options. py-xmlrpc began as a short hack to accomplish specific things that xmlrpclib couldn't easily accomodate. I used a hand build parser because I thought it would be fun and easy (it was!). Paul, you are correct in that my library doesn't support the 5 items you mentioned. I am aware of these, but they are actually not officially supported by the spec either. XML-RPC is a bit strange in that the spec does not allow or require true XML. My library has been adopted far more than I would have guessed, and I have had many questions about things like SSL support (which is not up to spec either). As a result, I am almost finished with a rewrite that has all the transport and protocol components nicely split up. I have on my list of todo's switching the hand coded parser to expat. My own parser works just fine, though, and I haven't had any complaints so that is relatively low on the list. My library is certainly not as flexible as xmlrpclib in it's current format. I'm hoping that the rewrite will move it to a nice place in the performance / flexibility spectrum. As a side effect, it will have a nice extensible standalone HTTP client and server that offers better performance for people who really need it. I am perfectly aware of py-xmlrpc's shortcomings. On the other hand it is exactly what the app we use needs, and I would be surprised if there aren't others who have similar needs. My hope is that with the next major release, the library will move a bit closer to a place that suits people like Paul. Meanwhile, it works nicely for applications where performance requirements are absolutely critical. Shilad Sen

Shilad Sen wrote:
Skip has been kind enough to copy me on the bulk of correspondence regarding py-xmlrpc versus other xmlrpc parsing options.
Thanks for your good-natured response.
I think that if a spec claims to be based on XML and does not explicitly disclaim support for built-in XML features, then it allows them. For instance if it doesn't say that C syntax is illegal, then there is no reason to believe it is.
That's fine with me. If your simplified parser turns out to be significantly faster than Expat (too early to say) then you could even keep it around as an option when the client and the server are both known to be using the same subset of XML.
Did you consider wrapping one of the existing XML-RPC libraries written in C? When we needed a reentrant XML-RPC library for PHP, we wrapped Eric Kidd's xmlrpc-c. Paul Prescod

Paul> I think that if a spec claims to be based on XML and does not Paul> explicitly disclaim support for built-in XML features, then it Paul> allows them. For instance if it doesn't say that C syntax is Paul> illegal, then there is no reason to believe it is. Paul, You probably know as well as anyone that the one and only person you should talking to about XML-RPC and its XML compliance (or lack thereof) is Dave Winer. Feel free to read through the archives of the xmlrpc@yahoogroups.com mailing list if you haven't already. If you can move Dave from his current position, more power to you. You'll do something that many other people have been incapable of doing. I'm done with this topic. It's gotten way too far from python-dev-related topics. Probably should have cut it out of the cc list awhile back. Skip

I'm done with this topic. It's gotten way too far from python-dev-related topics. Probably should have cut it out of the cc list awhile back.
Amen. --Guido van Rossum (home page: http://www.python.org/~guido/)

paul wrote:
the specification says that XML-RPC uses XML and HTTP. it doesn't say anything about a Dave-specific subset of XML or HTTP... (like so many other parts of the specification, the "string" type isn't exactly well-specified. the specification first says that strings contains ASCII characters, and later that "any characters are allowed in a string" and that "a string can be used to encode binary data")
well, sgmlop is a bit faster than expat (up to 50%, in some tests). expat does a bit more error checking.
the _xmlrpclib accelerator (see the xmlrpclib.py source) uses expat, with a really fast C layer. judging from Skip's benchmarks, expat is a bit slower than the py-xmlrpc parser (which is why I asked). </F>

Paul> I have a feeling py-xmlrpc will slow down a bit when it is Paul> internationalized: Paul> if (strncmp(*cp, "<int>", 5) == 0) Paul> res = decodeInt(cp, ep, lines); Paul> else if (strncmp(*cp, "<i4>", 4) == 0) Paul> res = decodeI4(cp, ep, lines); Paul> .... Paul, If you want to find and fix bugs in py-xmlrpc or help the author improve the quality of his tools, please send your reports directly to Shilad Sen (shilad@sourcelight.com). Py-xmlrpc has nothing to do with the Python core. I apologize for even including it in the table I posted. Shilad didn't deserve any of the bad press you've given him here. Sending snickering notes to python-dev about the code is not helpful, and only serves to lessen the value I place on your other opinions. Skip

I apologize if I embarassed Shilad. I don't know him so I don't know how he will take a public critique of his code. For all I know, he agrees with me and merely hasn't got around to adding in an XML parser. On the one hand, I can see how it would be nicer to discuss it directly with you and him, but on the other, it is a real technical issue that deserves public discussion. I felt (and feel) that you've made a technical mistake in attributing py-xmlrpc's speed to its having a fixed tagset and I only posted code to demonstrate where the real speedup comes from. I've spent my whole life working around bugs in hand-rolled XML (and SGML) parsers that are supposed to be faster than general ones but end up not being so. I react almost as intemperately when someone tells me that their app embeds a new scripting language that they invented over the weekend. Although I do think that the current parsing approach taken in py-xmlrpc is flawed, I do think that the overall idea is good. It makes sense to parse XML-RPC purely in C without using Python callbacks. Paul Prescod
participants (7)
-
Fred L. Drake, Jr.
-
Fredrik Lundh
-
Fredrik Lundh
-
Guido van Rossum
-
Paul Prescod
-
Shilad Sen
-
Skip Montanaro