[XML-SIG] memory related issues in Python - bug??
Fred L. Drake, Jr.
fdrake at acm.org
Tue Sep 9 16:21:44 EDT 2003
Mallick, Pinki writes:
> <equation>
> <mi>a</mi>
> <mo>=</mo>
> <mi>y18</mi>
> <mo>*</mo>
> <mi>variable</mi>
> </equation>
>
>
> Here, I am trying to retrieve the string as "a=y18*variable". There
> is no other node with "y" as its data other than "y18".
But it's entirely possible that the text "y18" is split over two DOM
nodes in this case. With a large document, you're more likely to hit
an internal buffer boundary in the middle of the text, and possibly
end up with a non-normalized tree (multiple adjacent text nodes in the
tree); depending on what you've done with the tree before this stage
of the processing, that may be expected, or it may be a bug in the DOM
builder; minidom trees will normally be built pre-normalized.
> Although it returns correct data for all other different nodes in
> the XML file, only in case of "<mi>y18</mi>", it returns "y"
> instead of "y18" when the file is very big.
All the more reason to think there are multiple nodes involved;
there's an increased likely hood of a buffer boundary issue cropping
up.
I don't know if you can send your document and the code that loads the
tree, but if you can, I'd be glad to try it to see if I can reproduce
what you're seeing. If there is a bug, I'd like to fix it. Please
let me know what version of Python and PyXML you're using.
> So I think it is something to do with memory handling in Python.
>
> And yes, I tried to use "grandChildNode.firstChild.normalize()" and
> "grandChildNode.normalize()" as you had suggested, but in both
> cases it returns "None".
Try this:
grandChildNode.normalize()
text = grandChildNode.firstChild.data
or:
text = grandChildNode.firstChild.wholeText
-Fred
--
Fred L. Drake, Jr. <fdrake at acm.org>
PythonLabs at Zope Corporation
More information about the XML-SIG
mailing list