[XML-SIG] Re: Can anyone recommend a sensible XML parser for
Python?
Uche Ogbuji
uche.ogbuji@fourthought.com
Mon, 09 Sep 2002 13:39:01 -0600
> > Accordin to the DOM Level 2 spec: "And, cloning Document, DocumentType,
> > Entity, and Notation nodes is implementation dependent."
>
> This is why standards compliance is not terribly important to me. I would
> rather have a useful XML API than a standardized one.
Well, what do you think is the most useful behavior of cloning a document? Is
it the one I posted in response to thread? If so, don't you think the element
of surprise is too great (I'd be surprised myself at that behavior)?
Wouldn't it be better for Python/XML to offer a *separate*, specialized
function for cloning nodes, rather than doing weird things with cloneNode?
> > Can you expand a bit more on the actual use case that makes you think you want
> > to clone a document node?
>
> I have a template "frame" document. I want to clone the document, populate it
> with information lifted from other XML files, and then write the resultant
> (cloned) document out. This is the very first use-case I ever had working with
> XML and it is still the most common.
I see. It sounds as if a general document duplication function would be of
use to you. I agree that this would be useful. I'm willing to write one and
add it to xml.dom.ext.
But I don't think this is a use case for node.cloneNode.
> > We choose not to allow it. Perfectly legal, and I think this is the right
> > choice.
>
> Yes, but the point remains that this *used* to work, and now it *doesn't*.
I don't remember. What did it do when it "worked"?
> This is functionality I found useful. While I can't comment on the intrinsic
> sense or nonsense of cloning document nodes in DOM, I do know that it's
> difficult to keep track of when features like this appear and disappear in the
> various different XML solutions for Python.
Was it ever documented? Every software module has undocumented "features"
that you use at your peril. I don't think it's fair to complain when these
appear and disappear.
Then again, the poor state of PyXML documentation in general weakens that
point of mine, doesn't it? Ah well.
> Maybe this is the only feature that has done this; I don't know. It just
> happens that it's a very commonly-used one for me.
>
> This is just another instance of my general complaint that tracking versioning
> dependencies is not worth the effort for my degenerately simple use-cases for
> XML.
>
> > You mean you can't require, say PyXML 0.8.1? Tough crowd you develop for?
> > :-)
>
> There are still some parties interested in Twisted who are upset that it
> requires Python 2.1; in fact, I felt guilty doing 2.1 support because I am
> likely going to have to backport portions of it to 1.5.2 for some people. We
> can all thank Red Hat for this inane persistence of ancient python versions,
> but it is sadly the world I live in.
I sympethize. It's largely because of Red Hat that it took us so long to drop
1.5 support in 4Suite. But a couple of months ago we decided it is not worth
the developemtn and support overhead and ditched support for all versions
before 2.1. I sleep better since then :-)
> > > My main frustration is with packaging.
>
> > Here you have a point. Python, PyXML, and a lot of the related packages move
> > very quickly,. and so quickly that they cause all manner of packaging
> > problems.
>
> This is my main point, and this is the one that the PyXML community can do the
> least to address. Buggy and idiosyncratic implementations are already in the
> wild, and some apps will depend on those particular bugs and idiosyncrasies.
> If twisted depends on a new or different set of bugs and quirks, I make it
> incompatible with whatever other XML-using applications are out there today.
>
> Given that XML is an integration technology this is certainly less than
> desirable.
>
> > There is no easy solution to this.
>
> Having a project that is precipitously approaching 1.0 myself, I can
> sympathize. As much as this sort of dependency and compatibility problem has
> bothered me, I *know* there will be people that write apps for Twisted and will
> curse my name when I enhance some functionality later on :-).
>
> > I have had it in mind to suggest a PyXML-in-a-tie type effort in the Python
> > Business Forum once the effort on Python itself starts to gain legs. I guess
> > I can count on you to at least help cheerlead? :-)
>
> Cheerleading, certainly :-). Although I'm less interested in seeing PyXML
> prepared for "business" clients and more interested in just seeing the level of
> QA on the volunteer work go up. If I *had* any spare "scarce resources" to
> commit beyond my own projects, I would certainly help getting the unit tests
> unified and automated.
>
> > > or produce what amounts to my own `implementation' of an XML parser.
> >
> > If you try going this route, I guarantee you'll still be trying to get the
> > most basic things right six months from now.
>
> ...
>
> > > For the applications that I'm intending to write, just doing my own parser and
> > > API is both more appealing and more rewarding.
> >
> > Really? Color me deep skeptical. I have not seen an application on earth
> > where implementing one's own parser is a good idea, and precious few where
> > implementing one's own API is a good idea. I have a lot of colleagues who
> > have tried.
>
> While it is *possible* that I'm smarter than you think I am, it is certain that
> I'm more stubborn.
I think you take the wrong gloss on my words. I think Linus Torvalds himself
would take years to write a complete and correct XML parser. It's the nature
of the beast (XML), not the programmer.
I certainly do not consider myself smart enough to take on that dragon. I'm
just glad to lean on folk like Clark (and Drake, Evans and co), Garshol and
Viellard.
> My sophomoric attempt at an XML parser is now in Twisted
> CVS.
Interesting. So how did you test it?
> I've had this objection raised over writing yet another a web server, yet
> another remote procedure call protocol, yet another asynchronous socket server
> and yet another database interface. It seems like at least some of these ideas
> were good ones, so I went ahead and wrote an XML parser and representation
> anyway :-).
I would rather write a Web server, another RPC, another async socket server
*and* another DBMS interface all in a row than just take on the single task of
writing an XML parser. And I think I can speak authoritatively, because I
*have* implemented all four of those things.
> As a data point for this hypothesis, writing the parser and the node tree took
> me less than half as much time as writing these posts to various mailing lists
> about XML tools (not counting this post, which has been the most
> time-consuming): it took less than a quarter as much time as attempting (and
> failing) to track down bugs in PyXML, not counting the time I spent trying to
> figure out how to turn off undesired features in a way that would work on more
> than one version. My two main existing PyXML-using applications are already
> ported to this, changing barely any of their code.
As I said, I am very skeptical of the result. I'll be impressed when you tell
me your home-brew XML parser passes the OASIS conformance suite.
Anyway, this is all moot argument. It looks as if you've satisfied yourself
for now.
Good luck.
--
Uche Ogbuji Fourthought, Inc.
http://uche.ogbuji.net http://4Suite.org http://fourthought.com
Track chair, XML/Web Services One Boston: http://www.xmlconference.com/
Basic XML and RDF techniques for knowledge management, Part 7 -
http://www-106.ibm.com/developerworks/xml/library/x-think12.html
Keeping pace with James Clark - http://www-106.ibm.com/developerworks/xml/libra
ry/x-jclark.html
Python and XML development using 4Suite, Part 3: 4RDF -
http://www-105.ibm.com/developerworks/education.nsf/xml-onlinecourse-bytitle/8A
1EA5A2CF4621C386256BBB006F4CEC