So utterly confused w/ various XML libraries

Skip Montanaro skip at pobox.com
Mon Aug 5 18:13:58 EDT 2002


    Robb> Can someone point me to a FAQ, or better yet sketch out a mindmap
    Robb> of the various python XML implementations, and how they relate?

Don't know how much reading you've done on the topic, but if you haven't
done much, you might want to start with the XML topic guide:

    http://pyxml.sourceforge.net/topics/

    Robb> .... But receiving even trivial data is too slow to be useful, and
    Robb> the profiler shows all the time spent in "PyExpat.py".

    Robb> And I've seen various pieces of advice like, "Go get
    Robb> sgmlop/minidom/cdomlette".  But I don't know how PyXML relates to
    Robb> any of these, of PyExpat, or even why I need it...

Various XML parsers provide different features.  Some validate, others
don't.  Some are written in Python, others in C.  If XML parsing is what's
slowing you down (sorta seems that way from your comments), my guess is that
at the lowest layer, your XML is getting parsed with Python code.  Parsers
like sgmlop (which I use quite happily underneath the xmlrpclib module) are
written in C for performance.  

That said, the biggest boost to performance will be found when you eliminate
as many XML tags from your serialized data as possible.  *If* you know both
your client and server were written in Python, you might look at using the
cPickle or marshal modules to wrap up your input parameters or function
results, then ship them via SOAP.  For portability's sake this may mean you
have two versions of most methods on your server.  The "cheater" does the
marshalling and unmarshalling of the data and calls the real method.
Programs calling from other languages call the real method directly, e.g.:

    def method(self, arg1, arg2, arg3):
        ... buncha computing elided ...
        return big_hairy_result

Python clients call the "cheater" method.

    def methodp(self, args):
        arg1, arg2, arg3 = marshal.loads(args.data)
        big_hairy_result = self.method(arg1, arg2, arg3)
        slimmed_down_result = marshal.dumps(big_hairy_result)
        return xmlrpclib.Binary(slimmed_down_result)

At the client end, you need to perform some extra steps to get at the real
results:

    def client_function(....):
        ...
        args = marshal.dumps((arg1, arg2, arg3))
        slim_result = server.methodp(xmlrpclib.Binary(args))
        real_result = marshal.loads(slim_result.data)

While you're doing a bit more work, the system is having to transport and
parse a lot less data because marshal's or cPickle's encoding is much more
efficient (in both time and space) than what you would get from any XML
encoding.

-- 
Skip Montanaro
skip at pobox.com
consulting: http://manatee.mojam.com/~skip/resume.html




More information about the Python-list mailing list