a few more questions on XML and python

Lars von Wedel vonWedel at lfpt.rwth-aachen.de
Fri Jan 4 09:39:06 CET 2002


Hi,


> [...] but I daresay that working code can eat
> a great deal of memory and still perform better than a dysfunctional
> collection of highly optimized fragments :)

Sure, in general, but there are a number of things about reading XML
that can be performed using a lean SAX-style implementation, e.g.
reading rather simple configuration files etc.


> If you're processing huge data sets, DOM isn't going to cut it.  DOM
> builds an in-memory representation of the entire document, whereas SAX
> handles a single element at a time.  But after you have a bit of
> Python and XML parsing under your belt, you can always move on to SAX
> if necessary, eh?

What I do in order to simplify parsing XML using SAX is the following
in a class used as an element handler. These two methods dispatch the
calls of startElement/endElement to a bunch of methods called E1_start,
E1_end, E2_start, E2_end, ... for each element type (e.g. E1, E2) 
occurring in the XML file. That saves me a large if-construct. Very
simple, but I like to use it a lot.

  def startElement(self, name, attrs):

         mth_name = string.lower(name) + '_start'
         self.attr_st.append(attrs)
         if hasattr(self, mth_name):
             method = getattr(self, mth_name)
             method(attrs)	
         else:
             if self.verbose:
                 print 'Warning: Start of element %s skipped' % name

     def endElement(self, name):
         attrs = self.attr_st.pop()
         mth_name = string.lower(name) + '_end'
         if hasattr(self, mth_name):
             method = getattr(self, mth_name)
             method(attrs)
         else:
             if self.verbose:
                 print 'Warning: End of element %s skipped' % name



Lars





More information about the Python-list mailing list