[XML-SIG] speed question re DOM parsing

Greg Stein gstein@lyra.org
Thu, 22 Jun 2000 19:41:53 -0700

On Thu, Jun 22, 2000 at 09:37:56AM -0700, Walter Underwood wrote:
> --On Wednesday, May 31, 2000 8:58 PM -0600 Bjorn Pettersen
> <bjorn@roguewave.com> wrote:
> > 
> > After some profiling, I found that most of the time was going into the
> > else branch in the cdata method.  This branch is growing a string
> > character by character by saying:
> > 
> >   elem.first_cdata = elem.first_cdata + data
> I had one of those in my character data handler too. Parsing the
> Old Testament took about 45 min, as I remember. The copies and
> reallocs in concatenation are O(n**2). Save all the strings in
> a list, then use string.join at the end. This is linear.

Exactly. Bjorn solved this with StringIO. A timing comparison against
string.join is an important test before using either approach.

I haven't had the time (unfortunately) to test these out myself. But that
doesn't preclude somebody from running the two tests, listing the values,
and providing the right patch. Another XML committer might be able to get to
it before me.

When could I get to it? eek. I *will*, but dunno when. It is amazing just
how much stuff can fall on a person's plate despite having no job :-). I've
got some layered I/O in Apache, mod_dav integration, a new httplib, imputil
issues, these qp_xml upgrades, ViewCVS stuff, edna releases, free threading
changes, Python/Apache integration, and coding for Subversion. Fuggin


Greg Stein, http://www.lyra.org/