[XML-SIG] memory related issues in Python - bug??

Tue Sep 9 16:21:44 EDT 2003

Mallick, Pinki writes:
 > <equation>
 > 	<mi>a</mi>
 > 	<mo>=</mo>
 > 	<mi>y18</mi>
 > 	<mo>*</mo>
 > 	<mi>variable</mi>
 > </equation>
 > 
 > 
 > Here, I am trying to retrieve the string as "a=y18*variable". There
 > is no other node with "y" as its data other than "y18".

But it's entirely possible that the text "y18" is split over two DOM
nodes in this case.  With a large document, you're more likely to hit
an internal buffer boundary in the middle of the text, and possibly
end up with a non-normalized tree (multiple adjacent text nodes in the
tree); depending on what you've done with the tree before this stage
of the processing, that may be expected, or it may be a bug in the DOM
builder; minidom trees will normally be built pre-normalized.

 > Although it returns correct data for all other different nodes in
 > the XML file, only in case of "<mi>y18</mi>", it returns "y"
 > instead of "y18" when the file is very big.

All the more reason to think there are multiple nodes involved;
there's an increased likely hood of a buffer boundary issue cropping
up.

I don't know if you can send your document and the code that loads the
tree, but if you can, I'd be glad to try it to see if I can reproduce
what you're seeing.  If there is a bug, I'd like to fix it.  Please
let me know what version of Python and PyXML you're using.

 > So I think it is something to do with memory handling in Python.
 > 
 > And yes, I tried to use "grandChildNode.firstChild.normalize()" and
 > "grandChildNode.normalize()" as you had suggested, but in both
 > cases it returns "None".

Try this:

    grandChildNode.normalize()
    text = grandChildNode.firstChild.data

or:

    text = grandChildNode.firstChild.wholeText

  -Fred

-- 
Fred L. Drake, Jr.  <fdrake at acm.org>
PythonLabs at Zope Corporation