So utterly confused w/ various XML libraries
skip at pobox.com
Tue Aug 6 00:13:58 CEST 2002
Robb> Can someone point me to a FAQ, or better yet sketch out a mindmap
Robb> of the various python XML implementations, and how they relate?
Don't know how much reading you've done on the topic, but if you haven't
done much, you might want to start with the XML topic guide:
Robb> .... But receiving even trivial data is too slow to be useful, and
Robb> the profiler shows all the time spent in "PyExpat.py".
Robb> And I've seen various pieces of advice like, "Go get
Robb> sgmlop/minidom/cdomlette". But I don't know how PyXML relates to
Robb> any of these, of PyExpat, or even why I need it...
Various XML parsers provide different features. Some validate, others
don't. Some are written in Python, others in C. If XML parsing is what's
slowing you down (sorta seems that way from your comments), my guess is that
at the lowest layer, your XML is getting parsed with Python code. Parsers
like sgmlop (which I use quite happily underneath the xmlrpclib module) are
written in C for performance.
That said, the biggest boost to performance will be found when you eliminate
as many XML tags from your serialized data as possible. *If* you know both
your client and server were written in Python, you might look at using the
cPickle or marshal modules to wrap up your input parameters or function
results, then ship them via SOAP. For portability's sake this may mean you
have two versions of most methods on your server. The "cheater" does the
marshalling and unmarshalling of the data and calls the real method.
Programs calling from other languages call the real method directly, e.g.:
def method(self, arg1, arg2, arg3):
... buncha computing elided ...
Python clients call the "cheater" method.
def methodp(self, args):
arg1, arg2, arg3 = marshal.loads(args.data)
big_hairy_result = self.method(arg1, arg2, arg3)
slimmed_down_result = marshal.dumps(big_hairy_result)
At the client end, you need to perform some extra steps to get at the real
args = marshal.dumps((arg1, arg2, arg3))
slim_result = server.methodp(xmlrpclib.Binary(args))
real_result = marshal.loads(slim_result.data)
While you're doing a bit more work, the system is having to transport and
parse a lot less data because marshal's or cPickle's encoding is much more
efficient (in both time and space) than what you would get from any XML
skip at pobox.com
More information about the Python-list