[XML-SIG] understanding the sources. wher to start?

paul@boddie.net paul@boddie.net
13 Jun 2002 10:50:42 -0000


Andrew Kuchling <akuchlin@mems-exchange.org> wrote:
>
>Generally when doing XML work I just give up and use the smaller XML
>package included with Python, and updating the HOWTO in May reinforced
>this.  The problems:
>
>	* There are too many ways to do things, and no guidance about
>	  which to use.  Parsing an XML document?  There's qp_xml.py,
>	  pulldom.py, SAX readers, DOM readers, xmllib, pyexpat?
>	  Printing an XML document?  sax/writer.py or dom.ext.Printer?
>          What's the difference between these choices?  Which ones are
>	  deprecated?

That's a really good point. It's certainly easiest to use xml.dom.minidom to do 
any work with the DOM, and that might suit most people, but then there's the 
issue of using alternative DOM implementations. Although I'm sure it's 
documented somewhere, I don't think it's that obvious how to write generic code 
which can make use of other implementations, such as pirxx, for example. 
(Please prove me wrong!)

>	* Some things, such as XSLT are simply broken; they don't work
>          at all.  (Or maybe it'll work if I installed 4Suite; the
>	  README currently says you shouldn't install the xml.xslt
>	  package from PyXML, though.)

I haven't dared to do anything with XSLT in Python, but then it can take some 
motivation to do things in XSLT at all. ;-) However, I am very impressed with 
the XPath implementation.

>We really need to do better, but it's not clear what to do.  
>Is it simply a documentation problem?  Perhaps.  But we *must*
>do something about this.

It should be clearer how the following things can be done with PyXML:

  * Opening existing documents using SAX and DOM, using different
    implementations. (The "how to" only used to cover SAX.)

  * Creating new documents using DOM. Does one instantiate a new
    xml.dom.Document object, call methods on the DOM implementation?
    What about all those arguments to createDocument in 4DOM?

  * Serializing documents - what's the best way? It's tempting to
    use xml.dom.ext.PrettyPrint, but it doesn't always work
    properly (search for bug reports on SourceForge) and it might
    not be as fast as the 'toxml' method, but is the 'toxml'
    method standard or an extension?

  * Manipulating document nodes. It's always possible to refer
    people to the DOM specification for this, but then PyXML is
    all about a Pythonic version of the DOM. Moreover, the W3C
    specifications can be too heavy for the unseasoned
    developer.

The Pythonic DOM standard really needs specifying in full. For example, there 
are some odd differences in the implementation of the special 'attributes' 
attribute between minidom and 4DOM, in my experience. Which one is correct? 
Must I always have to check the types and take evasive action?

Dinu Gherman <gherman@darwin.in-berlin.de> wrote:
>
>Alexandre <Alexandre.Fayolle@logilab.fr>:
>
>> As a sidenote, I'm currently working on a Python+XML tutorial for
>> the Europython conference. I'll be happy to contribute the slides 
>> to a documentation. 
>
>Great! If you still accept some feed-in: I'm especially in-
>terested in the following packages (like everybody else, 
>I guess):
>
>- standard Python XML 
>- PyXML
>- 4Suite
>
>evaluated using the following criteria:
>
>- borderlines
>- overlap
>- dependencies
>- versioning issues

This seems to be a big, recurring thing with PyXML, even though I've been 
reasonably lucky never to have had problems with version conflicts and broken 
features. However, I'm surprised it's even been possible for any books on XML 
processing with Python to be published because of this.

>- future unification
>
>I would also like to add versioning issues for Python (and, 
>less so, Jython) itself, but I fear I'd look like trying to
>be really mean. I can reassure you, I'm not, I'm just a bit
>confused... BTW, Jython 2.1 contains *some* of PyXML, Finn 
>Bock said recently on the Jython-Users list.

A PyXML API accessing JAXP would be nice, but I fear that the PyXML API isn't 
formal enough (unlike the DB-API accessing JDBC, as done by zxJDBC).

Paul