[XML-SIG] py2exe and switching from PyXML 4Suite

Mon, 16 Dec 2002 06:55:59 -0700

> On Sun, Dec 15, 2002 at 02:25:04PM -0700, Uche Ogbuji wrote:
> > > > Because you're such a nice guy, I'll ditch the tease for you :-)
> > > You can also do these much more easily using XPath...
> > > from Ft.Xml import XPath
> > > def getElementsByTagName(node,name)
> > >     return XPath.evaluate(node,".//" + name)
> > I think my generator approach would be faster, but this XPath approach has the 
> > advantage of working in Python 2.0 and 2.1.  Mine is 2.2+.
> 
> Okay, I have now read up on iterators and generator - WOW! that's so
> cool!.. I immediately rewrote the search part of my program that had
> been giving me problems - with generators it just works perfectly and
> the code is so much simpler!
> 
> I have implemented Uche's generators now and the function that loads an
> XML file into my programs internal format has now gone from 6-7 seconds
> load time to 1-2 second for the 219Kb XML file I am using for testing on
> my Athlon XP 1800+.

Generators do absolutely rock.  I'm glad you got a good opportunity to learn 
them as a side effect of trying to figure out Python/XML.

> Now - my program also saves it's data to disk by creating a dom and
> filling it with the right nodes and then writing it to disk with
> PrettyPrint (which takes over 20 seconds on my machine with the 219Kb
> file). I am sure this can be done much faster with 4Suite somehow,
> but how?

Rather than 

from xml.dom.ext import PrettyPrint

use

from Ft.Xml.Domlette import PrettyPrint

Domlette's PrettyPrint is written in C (with Python fall-back), and is much 
faster.

> right now I use xml.dom.implementation to create the empty dom, I am
> betting it is possible to do the same using domlette, I just can't grok
> how without a little helping hand :)

Ouch.  So you're using 4DOM, not Domlette, which is probably a bigger reason 
for slow performance than PrettyPrint

Instead of

xml.dom.implementation

Use 

from Ft.Xml.Domlette import implementation

Again I must warn you that Domlette approximates DOM and is not a full DOM.  
You already found the lack of getElementsByTagName.  There are other subtle 
differences.  That having been said, I have not run into a situation where 
Domlette is not a suitable DOM substitute.

> Also it would be great if there was a way to ditch PrettyPrint as I
> would then be completely rid of PyXML, which seems to be giving py2exe
> some problems.

Well, the above would remove dependency from PyXML.  However, I think we would 
all like at least a report of the problems py2exe is having with PyXML, since 
it would be nice if that combo worked.

> Once again, thanks for the help so far, you guys are great! :)

I'm glad we've been of help.

We would in turn be grateful at some point if you were to write up some of 
your experiences and techniques in order to help others through the same 
issues.

-- 
Uche Ogbuji                                    Fourthought, Inc.
http://uche.ogbuji.net    http://4Suite.org    http://fourthought.com
A Python & XML Companion - http://www.xml.com/pub/a/2002/12/11/py-xml.html
XML class warfare - http://www.adtmag.com/article.asp?id=6965
MusicBrainz  metadata - http://www-106.ibm.com/developerworks/xml/library/x-thi
nk14.html